data_juicer.ops.op_fusion module#
- data_juicer.ops.op_fusion.fuse_operators(ops, probe_res=None)[source]#
Fuse the input ops list and return the fused ops list.
- Parameters:
ops â the corresponding list of op objects.
probe_res â the probed speed for each OP from Monitor.
- Returns:
a list of fused op objects.
- data_juicer.ops.op_fusion.fuse_filter_group(original_filter_group)[source]#
Fuse single filter group and return the fused filter group.
- Parameters:
original_filter_group â the original filter group, including op definitions and objects.
- Returns:
the fused definitions and objects of the input filter group.
- class data_juicer.ops.op_fusion.FusedFilter(name: str, fused_filters: List)[source]#
Bases:
FilterA fused operator for filters.
- class data_juicer.ops.op_fusion.GeneralFusedOP(batch_size: int = 1, fused_op_list: List = None, *args, **kwargs)[source]#
Bases:
MapperAn explicitly fused operator designed to execute multiple sequential operations (OPs) on the same batch, enabling fine-grained control over data processing.
- __init__(batch_size: int = 1, fused_op_list: List = None, *args, **kwargs)[source]#
Base class that conducts data editing.
- Parameters:
text_key â the key name of field that stores sample texts to be processed.
image_key â the key name of field that stores sample image list to be processed
audio_key â the key name of field that stores sample audio list to be processed
video_key â the key name of field that stores sample video list to be processed
image_bytes_key â the key name of field that stores sample image bytes list to be processed
query_key â the key name of field that stores sample queries
response_key â the key name of field that stores responses
history_key â the key name of field that stores history of queries and responses