data_juicer.ops.mapper.detect_main_character_mapper module#

class data_juicer.ops.mapper.detect_main_character_mapper.DetectMainCharacterMapper(*args, **kwargs)[source]#

Bases: Mapper

Extract all main character names based on the given image and its caption.

This operator uses a multimodal language model to generate a description of the main characters in the given image. It then parses the generated JSON to extract the list of main characters. The operator filters out samples where the number of main characters is less than the specified threshold. The default arguments for the multimodal language model include using a Hugging Face model with specific generation parameters. The key metric, main_character_list, is stored in the sample’s metadata.

__init__(mllm_mapper_args: Dict | None = {}, filter_min_character_num: int = 0, *args, **kwargs)[source]#

Initialization.

Parameters:

mllm_mapper_args – Arguments for multimodal language model mapper. Controls the generation of captions for bounding box regions. Default empty dict will use fixed values: max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, hf_model=”llava-hf/llava-v1.6-vicuna-7b-hf”.
filter_min_character_num – Filters out samples where the number of main characters in the image is less than this threshold.

process_single(samples, rank=None)[source]#

For sample level, sample –> sample

Parameters:: sample – sample to process
Returns:: processed sample

data_juicer.ops.mapper.detect_main_character_mapper module#

This Page