data_juicer.ops.filter.text_action_filter module#
- class data_juicer.ops.filter.text_action_filter.TextActionFilter(lang: str = 'en', min_action_num: int = 1, *args, **kwargs)[source]#
Bases:
FilterFilter to keep texts those contain actions in the text.
- __init__(lang: str = 'en', min_action_num: int = 1, *args, **kwargs)[source]#
Initialization method.
- Parameters:
lang โ language of the text in the samples. โenโ for detection of actions in English and โzhโ for detection of actions in Chinese.
mini_action_num โ The min action number in the filtering. samples will be filtered if their action number in the text is below this parameter.
- compute_stats_single(sample, context=False)[source]#
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample โ input sample.
context โ whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats