data_juicer.ops.mapper.agent_dialog_normalize_mapper module#
- class data_juicer.ops.mapper.agent_dialog_normalize_mapper.AgentDialogNormalizeMapper(*args, **kwargs)[源代码]#
基类:
MapperNormalize agent format (messages + choices) to DJ fields.
Outputs: text, dialog_history, query, response; optionally meta tags agent_tool_types, agent_skill_types, agent_turn_count. When
copy_lineage_fieldsis True, also copies request_model, pt, total_cost_time, and (whencopy_request_id) the first non-empty id amongrequest_id_keysfrom the sample root into meta for cohort analysis and stable drill-down links. Always records last user/assistant message indices (in the rawmessageslist) when present. Supports multi-format tool_calls (e.g. tool_calls[].function.name as in OpenAI / demos/local/demo-agent-data-content.json) and configurable user/assistant labels. Optionalhistory_*_max_charscaps keep head+tail with an explicit middle-omitted marker sodialog_history, flattenedtext, and lastquery/responsestay aligned;meta.agent_dialog_history_compressedis set when any cap fires.- __init__(messages_key: str = 'messages', choices_key: str = 'choices', text_key: str = 'text', history_key: str = 'dialog_history', query_key: str = 'query', response_key: str = 'response', extract_tool_skill_tags: bool = True, include_system_in_first_user: bool = False, user_label: str = 'User', assistant_label: str = 'Assistant', copy_lineage_fields: bool = True, copy_request_id: bool = True, request_id_keys: List[str] = ['request_id', 'trace_id', 'id'], history_tool_result_max_chars: int = 10000, history_max_assistant_trace_chars: int = 0, history_max_user_chars: int = 0, history_compress_head_ratio: float = 0.62, **kwargs)[源代码]#
Base class that conducts data editing.
- 参数:
text_key -- the key name of field that stores sample texts to be processed.
image_key -- the key name of field that stores sample image list to be processed
audio_key -- the key name of field that stores sample audio list to be processed
video_key -- the key name of field that stores sample video list to be processed
image_bytes_key -- the key name of field that stores sample image bytes list to be processed
query_key -- the key name of field that stores sample queries
response_key -- the key name of field that stores responses
history_key -- the key name of field that stores history of queries and responses