data_juicer.ops.mapper.detect_character_locations_mapper module#
- class data_juicer.ops.mapper.detect_character_locations_mapper.DetectCharacterLocationsMapper(mllm_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, yoloe_path='yoloe-11l-seg.pt', iou_threshold=0.7, matching_score_threshold=0.4, *args, **kwargs)[source]#
Bases:
MapperGiven an image and a list of main character names, extract the bounding boxes for each present character.
Detects and extracts bounding boxes for main characters in an image, this operator uses a YOLOE model to detect the presence of these characters. It then generates and refines bounding boxes for each detected character using a multimodal language model and an image-text matching filter. The final bounding boxes are stored in the metadata under ‘main_character_locations_list’. The operator considers two bounding boxes as overlapping if their Intersection over Union (IoU) score exceeds a specified threshold. Additionally, it uses a matching score threshold to determine if a cropped image region matches the character’s name. The operator utilizes a Hugging Face tokenizer and a BLIP model for image-text matching.
- __init__(mllm_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, yoloe_path='yoloe-11l-seg.pt', iou_threshold=0.7, matching_score_threshold=0.4, *args, **kwargs)[source]#
Initialization method.
- Parameters:
mllm_mapper_args – Arguments for multimodal language model mapper. Controls the generation of captions for bounding box regions. Default empty dict will use fixed values: max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, hf_model=”llava-hf/llava-v1.6-vicuna-7b-hf”.
image_text_matching_filter_args – Arguments for image-text matching filter. Controls the matching between cropped image regions and text descriptions. Default empty dict will use fixed values: min_score=0.1, max_score=1.0, hf_blip=”Salesforce/blip-itm-base-coco”, num_proc=1.
yoloe_path – The path to the YOLOE model.
iou_threshold – We consider two bounding boxes from different models to be overlapping when their IOU score is higher than the iou_threshold.
matching_score_threshold – If the matching score between the cropped image and the character’s name exceeds the matching_score_threshold, they are considered a match.