data_juicer_agents.tools.retrieve#
Operator retrieval tools.
- class data_juicer_agents.tools.retrieve.RetrieveOperatorsInput(*, intent: str, top_k: Annotated[int, Ge(ge=1)] = 10, mode: str = 'auto', dataset_path: str = '')[源代码]#
基类:
BaseModel- intent: str#
- top_k: int#
- mode: str#
- dataset_path: str#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- data_juicer_agents.tools.retrieve.get_available_operator_names() Set[str][源代码]#
Return installed Data-Juicer operator names.
Empty set means metadata is currently unavailable.
- data_juicer_agents.tools.retrieve.resolve_operator_name(name: str, available_ops: Iterable[str] | None = None) str[源代码]#
Resolve a model-produced operator name to installed canonical name.
Resolution strategy is generic (not workflow-specific): 1) Exact match. 2) Case-insensitive match. 3) Alnum-normalized match (e.g. DocumentMinHashDeduplicator ->
document_minhash_deduplicator).
Closest normalized match with a strict similarity cutoff.