data_juicer.utils.agent_output_locale module#

Preferred output language helpers for agent / dialog LLM operators.

YAML / op kwargs use preferred_output_lang (e.g. zh, en, zh-CN). Normalized to zh or en for prompt selection. JSON keys stay English where required for parsing; free-text fields follow this locale.

data_juicer.utils.agent_output_locale.normalize_preferred_output_lang(value: str | None) str[source]#

Return zh or en (default en if missing/unknown).

data_juicer.utils.agent_output_locale.dialog_score_json_instruction(lang: str) str[source]#

Instruction block for 1–5 + reason JSON (dialog / trace quality mappers).

data_juicer.utils.agent_output_locale.rubric_reason_language_clause(lang: str) str[source]#

Append to system prompt: rubric may be English; reason follows locale.

data_juicer.utils.agent_output_locale.llm_filter_free_text_language_appendix(lang: str | None) str[source]#

Append to LLMAnalysisFilter system_prompt for rationale / tags language.

data_juicer.utils.agent_output_locale.agent_insight_system_prompt(lang: str) str[source]#

System prompt for agent_insight_llm_mapper.

data_juicer.utils.agent_output_locale.dialog_detection_output_language_note(lang: str, mode: str) str[source]#

Append to dialog intent/topic/sentiment/intensity system prompts.

Chinese prefixes (意图分析:, 话题类别:, …) stay for regex parsers; body follows locale.

data_juicer.utils.agent_output_locale.agent_skill_insight_system_prompt(lang: str) str[source]#