data_juicer.ops.mapper.remove_comments_mapper module#

class data_juicer.ops.mapper.remove_comments_mapper.RemoveCommentsMapper(doc_type: str | List[str] = 'tex', inline: bool = True, multiline: bool = True, *args, **kwargs)[source]#

Bases: Mapper

Mapper to remove comments in different kinds of documents.

Only support โ€˜texโ€™ for now.

__init__(doc_type: str | List[str] = 'tex', inline: bool = True, multiline: bool = True, *args, **kwargs)[source]#

Initialization method.

Parameters:
  • doc_type โ€“ Type of document to remove comments.

  • inline โ€“ Whether to remove inline comments.

  • multiline โ€“ Whether to remove multiline comments.

  • args โ€“ extra args

  • kwargs โ€“ extra args

process_batched(samples)[source]#