data_juicer_agents.tools.retrieve.retrieve_operators.operator_registry module#
Installed-operator lookup utilities for retrieve tools.
- data_juicer_agents.tools.retrieve.retrieve_operators.operator_registry.get_available_operator_names() Set[str][source]#
Return installed Data-Juicer operator names.
Empty set means metadata is currently unavailable.
- data_juicer_agents.tools.retrieve.retrieve_operators.operator_registry.resolve_operator_name(name: str, available_ops: Iterable[str] | None = None) str[source]#
Resolve a model-produced operator name to installed canonical name.
Resolution strategy is generic (not workflow-specific): 1) Exact match. 2) Case-insensitive match. 3) Alnum-normalized match (e.g. DocumentMinHashDeduplicator ->
document_minhash_deduplicator).
Closest normalized match with a strict similarity cutoff.