data_juicer.ops.mapper.detect_character_attributes_mapper module#

class data_juicer.ops.mapper.detect_character_attributes_mapper.DetectCharacterAttributesMapper(*args, **kwargs)[source]#

Bases: Mapper

Takes an image, a caption, and main character names as input to extract the characters’ attributes.

Extracts and classifies attributes of main characters in an image using a combination of object detection, image-text matching, and language model inference. It first locates the main characters in the image using YOLOE and then uses a Hugging Face tokenizer and a LLaMA-based model to classify each character into categories like ‘object’, ‘animal’, ‘person’, ‘text’, or ‘other’. The operator also extracts detailed features such as color, material, and action for each character. The final output includes bounding boxes and a list of characteristics for each main character. The results are stored in the ‘main_character_attributes_list’ field under the ‘meta’ key.

__init__(detect_character_locations_mapper_args: Dict | None = {}, *args, **kwargs)[source]#

Initialization method.

Parameters:: detect_character_locations_mapper_args – Arguments for detect_character_locations_mapper_args. Controls the threshold for locating the main character. Default empty dict will use fixed values: default mllm_mapper_args, default image_text_matching_filter_args, yoloe_path=”yoloe-11l-seg.pt”, iou_threshold=0.7, matching_score_threshold=0.4,

process_single(samples, rank=None)[source]#

For sample level, sample –> sample

Parameters:: sample – sample to process
Returns:: processed sample

data_juicer.ops.mapper.detect_character_attributes_mapper module#

This Page