data_juicer.ops.mapper.dialog_llm_input_utils module#

Helpers for dialog LLM mappers (intent / topic / sentiment / intensity).

data_juicer.ops.mapper.dialog_llm_input_utils.build_dialog_turns_for_prompt(sample: dict, *, history_key: str, query_key: str, response_key: str) List[Tuple[str, str]][source]#

Build (user, assistant) turns for dialog LLM mappers.

Does not mutate sample. Merge rules match dialog_quality_llm_utils._normalize_dialog_tail: after normalize, the last turn lives in both dialog_history[-1] and query/response, so those fields must not be appended again (would duplicate the final exchange; older code that mutated dialog_history in place corrupted downstream rows).

data_juicer.ops.mapper.dialog_llm_input_utils.clip_text_for_dialog_prompt(text: str, max_chars: int, note: str = 'truncated') str[source]#

Truncate long text for API prompts when max_chars > 0.

Agent traces often concatenate tool outputs into response; formatter limits elsewhere do not apply to these mappers’ history_key payloads.

data_juicer.ops.mapper.dialog_llm_input_utils.clip_query_response_pair(q: object, r: object, max_query_chars: int, max_response_chars: int) Tuple[str, str][source]#