data_juicer.ops.mapper.agent_insight_llm_mapper module#

class data_juicer.ops.mapper.agent_insight_llm_mapper.AgentInsightLLMMapper(*args, **kwargs)[source]#

Bases: Mapper

Synthesize stats + LLM eval text into meta.agent_insight_llm (JSON).

Intended to run after filters/mappers that populate stats and agent_bad_case_signal_mapper. Use run_for_tiers to limit API cost.

Output is best-effort JSON; raw model text is stored in meta.agent_insight_llm_raw if parsing fails.

__init__(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_key: str = 'query', response_key: str = 'response', query_preview_max_chars: int = 500, response_preview_max_chars: int = 500, run_for_tiers: List[str] | None = None, try_num: Annotated[int, Gt(gt=0)] = 2, model_params: Dict = {}, sampling_params: Dict = {}, preferred_output_lang: str = 'en', **kwargs)[source]#

Base class that conducts data editing.

Parameters:
  • text_key – the key name of field that stores sample texts to be processed.

  • image_key – the key name of field that stores sample image list to be processed

  • audio_key – the key name of field that stores sample audio list to be processed

  • video_key – the key name of field that stores sample video list to be processed

  • image_bytes_key – the key name of field that stores sample image bytes list to be processed

  • query_key – the key name of field that stores sample queries

  • response_key – the key name of field that stores responses

  • history_key – the key name of field that stores history of queries and responses

process_single(sample, rank=None)[source]#

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample