data_juicer.ops.mapper.agent_skill_insight_mapper module#

class data_juicer.ops.mapper.agent_skill_insight_mapper.AgentSkillInsightMapper(*args, **kwargs)[源代码]#

基类：Mapper

Summarize agent_tool_types and agent_skill_types into insights via LLM.

Reads meta[agent_tool_types] and meta[agent_skill_types] (from agent_dialog_normalize_mapper), calls the API for 3–5 concrete capability phrases (about 10 Chinese characters or ~4–8 English words each; avoid vague 'read/write / processing'), and stores them in meta[agent_skill_insights]. Run after normalize. Override system_prompt for locale-specific label style.

__init__(api_model: str = 'gpt-4o', *, tool_types_key: str = 'agent_tool_types', skill_types_key: str = 'agent_skill_types', insights_key: str = 'agent_skill_insights', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 2, model_params: Dict = {}, sampling_params: Dict = {}, preferred_output_lang: str = 'en', **kwargs)[源代码]#

Base class that conducts data editing.

参数:

text_key -- the key name of field that stores sample texts to be processed.
image_key -- the key name of field that stores sample image list to be processed
audio_key -- the key name of field that stores sample audio list to be processed
video_key -- the key name of field that stores sample video list to be processed
image_bytes_key -- the key name of field that stores sample image bytes list to be processed
query_key -- the key name of field that stores sample queries
response_key -- the key name of field that stores responses
history_key -- the key name of field that stores history of queries and responses

process_single(sample, rank=None)[源代码]#

For sample level, sample --> sample

参数:: sample -- sample to process
返回:: processed sample

data_juicer.ops.mapper.agent_skill_insight_mapper module#

本页