跳转至主要内容
Ctrl+K

Data Juicer

  • 文档
  • API
  • Sandbox
  • Hub
  • Agents
  • GitHub
English 简体中文
main v1.5.1 v1.5.0 v1.4.6 v1.4.5 v1.4.4 v1.4.3 v1.4.2 v1.4.1 v1.4.0
  • 文档
  • API
  • Sandbox
  • Hub
  • Agents
  • GitHub
English 简体中文
main v1.5.1 v1.5.0 v1.4.6 v1.4.5 v1.4.4 v1.4.3 v1.4.2 v1.4.1 v1.4.0
  • data_juicer.ops.mapper.clean_links_mapper module

data_juicer.ops.mapper.clean_links_mapper module#

class data_juicer.ops.mapper.clean_links_mapper.CleanLinksMapper(pattern: str | None = None, repl: str = '', *args, **kwargs)[源代码]#

基类:Mapper

Mapper to clean links like http/https/ftp in text samples.

__init__(pattern: str | None = None, repl: str = '', *args, **kwargs)[源代码]#

Initialization method.

参数:
  • pattern -- regular expression pattern to search for within text.

  • repl -- replacement string, default is empty string.

  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]#
当前页面
  • CleanLinksMapper
    • CleanLinksMapper.__init__()
    • CleanLinksMapper.process_batched()

本页

  • 显示源代码

© Copyright 2024, Data-Juicer Team.

由 Sphinx 9.0.4创建。

使用 PyData Sphinx Theme 0.16.1构建.