data_juicer.ops.mapper.remove_non_chinese_character_mapper module#
- class data_juicer.ops.mapper.remove_non_chinese_character_mapper.RemoveNonChineseCharacterlMapper(keep_alphabet: bool = True, keep_number: bool = True, keep_punc: bool = True, *args, **kwargs)[source]#
Bases:
MapperMapper to remove non chinese Character in text samples.
- __init__(keep_alphabet: bool = True, keep_number: bool = True, keep_punc: bool = True, *args, **kwargs)[source]#
Initialization method.
- Parameters:
keep_alphabet â whether to keep alphabet
keep_number â whether to keep number
keep_punc â whether to keep punctuation
args â extra args
kwargs â extra args