data_juicer.ops.mapper.imgdiff_difference_caption_generator_mapper module#

class data_juicer.ops.mapper.imgdiff_difference_caption_generator_mapper.Difference_Caption_Generator_Mapper(mllm_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, text_pair_similarity_filter_args: Dict | None = {}, *args, **kwargs)[source]#

Bases: Mapper

A fused operator for OPs that is used to run sequential OPs on the same batch to allow fine-grained control on data processing.

__init__(mllm_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, text_pair_similarity_filter_args: Dict | None = {}, *args, **kwargs)[source]#

Base class that conducts data editing.

Parameters:

text_key – the key name of field that stores sample texts to be processed.
image_key – the key name of field that stores sample image list to be processed
audio_key – the key name of field that stores sample audio list to be processed
video_key – the key name of field that stores sample video list to be processed
image_bytes_key – the key name of field that stores sample image bytes list to be processed
query_key – the key name of field that stores sample queries
response_key – the key name of field that stores responses
history_key – the key name of field that stores history of queries and responses

process_single(samples, rank=None)[source]#

For sample level, sample –> sample

Parameters:: sample – sample to process
Returns:: processed sample

data_juicer.ops.mapper.imgdiff_difference_caption_generator_mapper module#

This Page