data_juicer.ops.filter.perplexity_filter module#
- class data_juicer.ops.filter.perplexity_filter.PerplexityFilter(lang: str = 'en', max_ppl: float = 1500, *args, **kwargs)[源代码]#
基类:
FilterFilter to keep samples with perplexity score less than a specific max value.
- __init__(lang: str = 'en', max_ppl: float = 1500, *args, **kwargs)[源代码]#
Initialization method.
- 参数:
lang -- Compute perplexity for samples in which language.
max_ppl -- The max filter perplexity in this op, samples will be filtered if their perplexity exceeds this parameter.
args -- extra args
kwargs -- extra args