data_juicer.ops.mapper.agent_bad_case_signal_mapper module#

class data_juicer.ops.mapper.agent_bad_case_signal_mapper.AgentBadCaseSignalMapper(*args, **kwargs)[source]#

Bases: Mapper

Attach structured bad-case signals and a conservative tier to each sample.

Design goal: precision over recall for the high_precision tier.

Upstream coverage (when present in the pipeline):

  • meta: tool_*, usage tokens, primary_tool_type, dominant_tool_types, dialog_intent_labels, dialog_topic_labels, dialog_sentiment_labels, agent_turn_count, lineage keys.

  • stats: llm_analysis_*, llm_quality_*, llm_difficulty_*, text_len, num_words, perplexity, lang_score.

  • meta: optional dialog_* / agent_trace_coherence / agent_tool_relevance records (1–5 scores from lightweight LLM mappers).

Each signal group can be toggled via constructor flags. high weight feeds high_precision tier (with config); medium feeds watchlist only.

Tool-heavy agent runs: use min_tool_fail_count_for_signal to avoid treating a single exploratory tool error (common before recovery) as strong bad-case evidence.

P-percentile calibration (optional): set auto_calibrate_thresholds and calibration_json_path to a JSON file produced by demos/agent/scripts/compute_percentile_thresholds.py --write-calibration. Per-sample thresholds merge default with by_request_model using meta.agent_request_model. When calibration_manual_overrides_auto is true (default), explicit max_total_tokens / max_latency_ms / perplexity settings in YAML override the file; set it false to prefer calibration.

__init__(query_key: str = 'query', response_key: str = 'response', signal_on_tool_fail: bool = True, min_tool_fail_count_for_signal: int = 1, signal_on_low_tool_success_ratio: bool = True, tool_success_ratio_max_for_signal: float = 0.499, min_tool_rounds_for_ratio_signal: int = 2, signal_on_suspect_empty_response: bool = True, min_query_len_for_empty_check: int = 80, max_response_len_for_empty_check: int = 20, max_total_tokens: int | None = None, max_latency_ms: int | None = None, calibration_json_path: str | None = None, auto_calibrate_thresholds: bool = False, calibration_manual_overrides_auto: bool = True, auto_enable_perplexity_from_calibration: bool = True, signal_on_llm_analysis_low: bool = True, llm_analysis_score_max_for_bad: float = 0.28, llm_analysis_discard_must_be_strict: bool = True, high_precision_llm_analysis_discard_threshold: float = 0.24, signal_on_llm_text_quality_low: bool = True, llm_text_quality_score_max_for_bad: float = 0.28, llm_text_quality_discard_must_be_strict: bool = True, high_precision_llm_text_quality_discard_threshold: float = 0.24, signal_on_negative_sentiment_hint: bool = False, negative_sentiment_substrings: List[str] | None = None, signal_on_high_perplexity: bool = False, perplexity_high_threshold: float = 800.0, signal_hard_query_poor_reply: bool = False, hard_query_difficulty_min: float = 0.72, poor_reply_quality_max: float = 0.36, high_precision_on_tool_fail_alone: bool = True, min_medium_signals_for_watchlist: int = 2, signal_on_low_dialog_quality_meta: bool = True, dialog_quality_low_score_threshold: float = 2.0, min_dialog_quality_low_axes_for_signal: int = 1, **kwargs)[source]#

Base class that conducts data editing.

Parameters:
  • text_key – the key name of field that stores sample texts to be processed.

  • image_key – the key name of field that stores sample image list to be processed

  • audio_key – the key name of field that stores sample audio list to be processed

  • video_key – the key name of field that stores sample video list to be processed

  • image_bytes_key – the key name of field that stores sample image bytes list to be processed

  • query_key – the key name of field that stores sample queries

  • response_key – the key name of field that stores responses

  • history_key – the key name of field that stores history of queries and responses

process_single(sample: dict) dict[source]#

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample