data_juicer_agents.tools.retrieve.retrieve_operators#

retrieve_operators tool package.

class data_juicer_agents.tools.retrieve.retrieve_operators.GenericOutput(*, ok: bool = True)[源代码]#

基类:BaseModel

ok: bool#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class data_juicer_agents.tools.retrieve.retrieve_operators.RetrieveOperatorsInput(*, intent: str, top_k: Annotated[int, Ge(ge=1)] = 10, mode: str = 'auto', dataset_path: str = '')[源代码]#

基类:BaseModel

intent: str#
top_k: int#
mode: str#
dataset_path: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

data_juicer_agents.tools.retrieve.retrieve_operators.extract_candidate_names(payload: Dict[str, Any]) List[str][源代码]#
data_juicer_agents.tools.retrieve.retrieve_operators.get_available_operator_names() Set[str][源代码]#

Return installed Data-Juicer operator names.

Empty set means metadata is currently unavailable.

data_juicer_agents.tools.retrieve.retrieve_operators.resolve_operator_name(name: str, available_ops: Iterable[str] | None = None) str[源代码]#

Resolve a model-produced operator name to installed canonical name.

Resolution strategy is generic (not workflow-specific): 1) Exact match. 2) Case-insensitive match. 3) Alnum-normalized match (e.g. DocumentMinHashDeduplicator ->

document_minhash_deduplicator).

  1. Closest normalized match with a strict similarity cutoff.

data_juicer_agents.tools.retrieve.retrieve_operators.retrieve_operator_candidates(intent: str, top_k: int = 10, mode: str = 'auto', dataset_path: str | None = None) Dict[str, Any][源代码]#

Retrieve operators and return a structured payload for CLI/agent usage.