data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper module#

data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.is_noun(word)[源代码]#
data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.compare_text_index(text1, text2)[源代码]#
data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.iou_filter(samples, iou_thresh)[源代码]#
class data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.Difference_Area_Generator_Mapper(image_pair_similarity_filter_args: Dict | None = {}, image_segment_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, *args, **kwargs)[源代码]#

基类:Mapper

A fused operator for OPs that is used to run sequential OPs on the same batch to allow fine-grained control on data processing.

__init__(image_pair_similarity_filter_args: Dict | None = {}, image_segment_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, *args, **kwargs)[源代码]#

Base class that conducts data editing.

参数:
  • text_key -- the key name of field that stores sample texts to be processed.

  • image_key -- the key name of field that stores sample image list to be processed

  • audio_key -- the key name of field that stores sample audio list to be processed

  • video_key -- the key name of field that stores sample video list to be processed

  • image_bytes_key -- the key name of field that stores sample image bytes list to be processed

  • query_key -- the key name of field that stores sample queries

  • response_key -- the key name of field that stores responses

  • history_key -- the key name of field that stores history of queries and responses

process_single(samples, rank=None)[源代码]#

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample