data_juicer_agents.tools.retrieve.retrieve_operators.operator_registry module#

Installed-operator lookup utilities for retrieve tools.

data_juicer_agents.tools.retrieve.retrieve_operators.operator_registry.get_available_operator_names() Set[str][源代码]#

Return installed Data-Juicer operator names.

Empty set means metadata is currently unavailable.

data_juicer_agents.tools.retrieve.retrieve_operators.operator_registry.resolve_operator_name(name: str, available_ops: Iterable[str] | None = None) str[源代码]#

Resolve a model-produced operator name to installed canonical name.

Resolution strategy is generic (not workflow-specific): 1) Exact match. 2) Case-insensitive match. 3) Alnum-normalized match (e.g. DocumentMinHashDeduplicator ->

document_minhash_deduplicator).

  1. Closest normalized match with a strict similarity cutoff.