data_juicer_agents.tools.retrieve.retrieve_operators#
retrieve_operators tool package.
- class data_juicer_agents.tools.retrieve.retrieve_operators.GenericOutput(*, ok: bool = True)[source]#
Bases:
BaseModel- ok: bool#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class data_juicer_agents.tools.retrieve.retrieve_operators.RetrieveOperatorsInput(*, intent: str, top_k: Annotated[int, Ge(ge=1)] = 10, mode: str = 'auto', dataset_path: str = '')[source]#
Bases:
BaseModel- intent: str#
- top_k: int#
- mode: str#
- dataset_path: str#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- data_juicer_agents.tools.retrieve.retrieve_operators.extract_candidate_names(payload: Dict[str, Any]) List[str][source]#
- data_juicer_agents.tools.retrieve.retrieve_operators.get_available_operator_names() Set[str][source]#
Return installed Data-Juicer operator names.
Empty set means metadata is currently unavailable.
- data_juicer_agents.tools.retrieve.retrieve_operators.resolve_operator_name(name: str, available_ops: Iterable[str] | None = None) str[source]#
Resolve a model-produced operator name to installed canonical name.
Resolution strategy is generic (not workflow-specific): 1) Exact match. 2) Case-insensitive match. 3) Alnum-normalized match (e.g. DocumentMinHashDeduplicator ->
document_minhash_deduplicator).
Closest normalized match with a strict similarity cutoff.