data_juicer.ops.filter.specified_field_filter module#
- class data_juicer.ops.filter.specified_field_filter.SpecifiedFieldFilter(field_key: str = '', target_value: List = [], *args, **kwargs)[source]#
Bases:
FilterFilter based on specified field information.
If the specified field information in the sample is not within the specified target value, the sample will be filtered.
- __init__(field_key: str = '', target_value: List = [], *args, **kwargs)[source]#
Initialization method.
- Parameters:
field_key â Filter based on the specified value corresponding to the target key. The target key corresponding to multi-level field information need to be separated by â.â.
target_value â The range of specified field information corresponding to the samples that need to be retained.
args â extra args
kwargs â extra args
- compute_stats_single(sample)[source]#
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample â input sample.
context â whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats