data_juicer.ops.filter.text_length_filter module#
- class data_juicer.ops.filter.text_length_filter.TextLengthFilter(min_len: int = 10, max_len: int = 9223372036854775807, *args, **kwargs)[source]#
Bases:
FilterFilter to keep samples with total text length within a specific range.
- __init__(min_len: int = 10, max_len: int = 9223372036854775807, *args, **kwargs)[source]#
Initialization method.
- Parameters:
min_len â The min text length in the filtering. samples will be filtered if their text length is below this parameter.
max_len â The max text length in the filtering. samples will be filtered if their text length exceeds this parameter.
args â extra args
kwargs â extra args