data_juicer.ops.filter.special_characters_filter module#
- class data_juicer.ops.filter.special_characters_filter.SpecialCharactersFilter(min_ratio: float = 0.0, max_ratio: float = 0.25, *args, **kwargs)[source]#
Bases:
FilterFilter to keep samples with special-char ratio within a specific range.
- __init__(min_ratio: float = 0.0, max_ratio: float = 0.25, *args, **kwargs)[source]#
Initialization method.
- Parameters:
min_ratio â The min filter ratio in this op, samples will be filtered if their special-char ratio is below this parameter.
max_ratio â The max filter ratio in this op, samples will be filtered if their special-char ratio exceeds this parameter.
args â extra args
kwargs â extra args