Skip to main content
Ctrl+K

Data Juicer

  • DOCS
  • API
  • Sandbox
  • Hub
  • Agents
  • GitHub
English 简体中文
main v1.4.4 v1.4.3 v1.4.2 v1.4.1 v1.4.0
  • DOCS
  • API
  • Sandbox
  • Hub
  • Agents
  • GitHub
English 简体中文
main v1.4.4 v1.4.3 v1.4.2 v1.4.1 v1.4.0
  • data_juicer.tools.op_search module

data_juicer.tools.op_search module#

Operator Searcher - A tool for filtering Data-Juicer operators by tags

class data_juicer.tools.op_search.OPRecord(op_type: str, name: str, desc: str, tags: List[str], sig: Signature, param_desc: str)[source]#

Bases: object

A record class for storing operator metadata

__init__(op_type: str, name: str, desc: str, tags: List[str], sig: Signature, param_desc: str)[source]#
to_dict()[source]#
data_juicer.tools.op_search.analyze_modality_tag(code, op_prefix)[source]#

Analyze the modality tag for the given code content string. Should be one of the “Modality Tags” in tagging_mappings.json. It makes the choice by finding the usages of attributes {modality}_key and the prefix of the OP name. If there are multiple modality keys are used, the ‘multimodal’ tag will be returned instead.

data_juicer.tools.op_search.analyze_resource_tag(code)[source]#

Analyze the resource tag for the given code content string. Should be one of the “Resource Tags” in tagging_mappings.json. It makes the choice according to their assigning statement to attribute _accelerator.

data_juicer.tools.op_search.analyze_model_tags(code)[source]#

Analyze the model tag for the given code content string. SHOULD be one of the “Model Tags” in tagging_mappings.json. It makes the choice by finding the model_type arg in prepare_model method invocation.

data_juicer.tools.op_search.analyze_tag_with_inheritance(op_cls, analyze_func, default_tags=[], other_parm={})[source]#

Universal inheritance chain label analysis function

data_juicer.tools.op_search.analyze_tag_from_cls(op_cls, op_name)[source]#

Analyze the tags for the OP from the given cls.

data_juicer.tools.op_search.extract_param_docstring(docstring)[source]#

Extract parameter descriptions from __init__ method docstring.

class data_juicer.tools.op_search.OPSearcher(specified_op_list: List[str] | None = None, include_formatter: bool = False)[source]#

Bases: object

Operator search engine

__init__(specified_op_list: List[str] | None = None, include_formatter: bool = False)[source]#
search(tags: List[str] | None = None, op_type: str | None = None, match_all: bool = True) → List[Dict][source]#

Search operators by criteria :param tags: List of tags to match :param op_type: Operator type (mapper/filter/etc) :param match_all: True requires matching all tags, False matches any tag :return: List of matched operator records

property records_map#
data_juicer.tools.op_search.main(tags, op_type)[source]#
On this page
  • OPRecord
    • OPRecord.__init__()
    • OPRecord.to_dict()
  • analyze_modality_tag()
  • analyze_resource_tag()
  • analyze_model_tags()
  • analyze_tag_with_inheritance()
  • analyze_tag_from_cls()
  • extract_param_docstring()
  • OPSearcher
    • OPSearcher.__init__()
    • OPSearcher.search()
    • OPSearcher.records_map
  • main()

This Page

  • Show Source

© Copyright 2024, Data-Juicer Team.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.