data_juicer.ops.mapper.agent_skill_insight_mapper module#

class data_juicer.ops.mapper.agent_skill_insight_mapper.AgentSkillInsightMapper(*args, **kwargs)[源代码]#

基类:Mapper

Summarize agent_tool_types and agent_skill_types into insights via LLM.

Reads meta[agent_tool_types] and meta[agent_skill_types] (from agent_dialog_normalize_mapper), calls the API for 3–5 concrete capability phrases (about 10 Chinese characters or ~4–8 English words each; avoid vague 'read/write / processing'), and stores them in meta[agent_skill_insights]. Run after normalize. Override system_prompt for locale-specific label style.

__init__(api_model: str = 'gpt-4o', *, tool_types_key: str = 'agent_tool_types', skill_types_key: str = 'agent_skill_types', insights_key: str = 'agent_skill_insights', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 2, model_params: Dict = {}, sampling_params: Dict = {}, preferred_output_lang: str = 'en', **kwargs)[源代码]#

Base class that conducts data editing.

参数:
  • text_key -- the key name of field that stores sample texts to be processed.

  • image_key -- the key name of field that stores sample image list to be processed

  • audio_key -- the key name of field that stores sample audio list to be processed

  • video_key -- the key name of field that stores sample video list to be processed

  • image_bytes_key -- the key name of field that stores sample image bytes list to be processed

  • query_key -- the key name of field that stores sample queries

  • response_key -- the key name of field that stores responses

  • history_key -- the key name of field that stores history of queries and responses

process_single(sample, rank=None)[源代码]#

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample