data_juicer.ops.mapper.agent_insight_llm_mapper module#

class data_juicer.ops.mapper.agent_insight_llm_mapper.AgentInsightLLMMapper(*args, **kwargs)[源代码]#

基类:Mapper

Synthesize stats + LLM eval text into meta.agent_insight_llm (JSON).

Intended to run after filters/mappers that populate stats and agent_bad_case_signal_mapper. Use run_for_tiers to limit API cost.

Output is best-effort JSON; raw model text is stored in meta.agent_insight_llm_raw if parsing fails.

__init__(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_key: str = 'query', response_key: str = 'response', query_preview_max_chars: int = 500, response_preview_max_chars: int = 500, run_for_tiers: List[str] | None = None, try_num: Annotated[int, Gt(gt=0)] = 2, model_params: Dict = {}, sampling_params: Dict = {}, preferred_output_lang: str = 'en', **kwargs)[源代码]#

Base class that conducts data editing.

参数:
  • text_key -- the key name of field that stores sample texts to be processed.

  • image_key -- the key name of field that stores sample image list to be processed

  • audio_key -- the key name of field that stores sample audio list to be processed

  • video_key -- the key name of field that stores sample video list to be processed

  • image_bytes_key -- the key name of field that stores sample image bytes list to be processed

  • query_key -- the key name of field that stores sample queries

  • response_key -- the key name of field that stores responses

  • history_key -- the key name of field that stores history of queries and responses

process_single(sample, rank=None)[源代码]#

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample