data_juicer.ops.mapper.agent_insight_llm_mapper module#
- class data_juicer.ops.mapper.agent_insight_llm_mapper.AgentInsightLLMMapper(*args, **kwargs)[source]#
Bases:
MapperSynthesize stats + LLM eval text into
meta.agent_insight_llm(JSON).Intended to run after filters/mappers that populate
statsandagent_bad_case_signal_mapper. Userun_for_tiersto limit API cost.Output is best-effort JSON; raw model text is stored in
meta.agent_insight_llm_rawif parsing fails.- __init__(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_key: str = 'query', response_key: str = 'response', query_preview_max_chars: int = 500, response_preview_max_chars: int = 500, run_for_tiers: List[str] | None = None, try_num: Annotated[int, Gt(gt=0)] = 2, model_params: Dict = {}, sampling_params: Dict = {}, preferred_output_lang: str = 'en', **kwargs)[source]#
Base class that conducts data editing.
- Parameters:
text_key â the key name of field that stores sample texts to be processed.
image_key â the key name of field that stores sample image list to be processed
audio_key â the key name of field that stores sample audio list to be processed
video_key â the key name of field that stores sample video list to be processed
image_bytes_key â the key name of field that stores sample image bytes list to be processed
query_key â the key name of field that stores sample queries
response_key â the key name of field that stores responses
history_key â the key name of field that stores history of queries and responses