data_juicer.ops.mapper.agent_bad_case_signal_mapper module#
- class data_juicer.ops.mapper.agent_bad_case_signal_mapper.AgentBadCaseSignalMapper(*args, **kwargs)[source]#
Bases:
MapperAttach structured bad-case signals and a conservative tier to each sample.
Design goal: precision over recall for the
high_precisiontier.Upstream coverage (when present in the pipeline):
meta:tool_*,usagetokens,primary_tool_type,dominant_tool_types,dialog_intent_labels,dialog_topic_labels,dialog_sentiment_labels,agent_turn_count, lineage keys.stats:llm_analysis_*,llm_quality_*,llm_difficulty_*,text_len,num_words,perplexity,lang_score.meta: optionaldialog_*/agent_trace_coherence/agent_tool_relevancerecords (1â5 scores from lightweight LLM mappers).
Each signal group can be toggled via constructor flags.
highweight feedshigh_precisiontier (with config);mediumfeedswatchlistonly.Tool-heavy agent runs: use
min_tool_fail_count_for_signalto avoid treating a single exploratory tool error (common before recovery) as strong bad-case evidence.P-percentile calibration (optional): set
auto_calibrate_thresholdsandcalibration_json_pathto a JSON file produced bydemos/agent/scripts/compute_percentile_thresholds.py --write-calibration. Per-sample thresholds mergedefaultwithby_request_modelusingmeta.agent_request_model. Whencalibration_manual_overrides_autois true (default), explicitmax_total_tokens/max_latency_ms/ perplexity settings in YAML override the file; set it false to prefer calibration.- __init__(query_key: str = 'query', response_key: str = 'response', signal_on_tool_fail: bool = True, min_tool_fail_count_for_signal: int = 1, signal_on_low_tool_success_ratio: bool = True, tool_success_ratio_max_for_signal: float = 0.499, min_tool_rounds_for_ratio_signal: int = 2, signal_on_suspect_empty_response: bool = True, min_query_len_for_empty_check: int = 80, max_response_len_for_empty_check: int = 20, max_total_tokens: int | None = None, max_latency_ms: int | None = None, calibration_json_path: str | None = None, auto_calibrate_thresholds: bool = False, calibration_manual_overrides_auto: bool = True, auto_enable_perplexity_from_calibration: bool = True, signal_on_llm_analysis_low: bool = True, llm_analysis_score_max_for_bad: float = 0.28, llm_analysis_discard_must_be_strict: bool = True, high_precision_llm_analysis_discard_threshold: float = 0.24, signal_on_llm_text_quality_low: bool = True, llm_text_quality_score_max_for_bad: float = 0.28, llm_text_quality_discard_must_be_strict: bool = True, high_precision_llm_text_quality_discard_threshold: float = 0.24, signal_on_negative_sentiment_hint: bool = False, negative_sentiment_substrings: List[str] | None = None, signal_on_high_perplexity: bool = False, perplexity_high_threshold: float = 800.0, signal_hard_query_poor_reply: bool = False, hard_query_difficulty_min: float = 0.72, poor_reply_quality_max: float = 0.36, high_precision_on_tool_fail_alone: bool = True, min_medium_signals_for_watchlist: int = 2, signal_on_low_dialog_quality_meta: bool = True, dialog_quality_low_score_threshold: float = 2.0, min_dialog_quality_low_axes_for_signal: int = 1, **kwargs)[source]#
Base class that conducts data editing.
- Parameters:
text_key â the key name of field that stores sample texts to be processed.
image_key â the key name of field that stores sample image list to be processed
audio_key â the key name of field that stores sample audio list to be processed
video_key â the key name of field that stores sample video list to be processed
image_bytes_key â the key name of field that stores sample image bytes list to be processed
query_key â the key name of field that stores sample queries
response_key â the key name of field that stores responses
history_key â the key name of field that stores history of queries and responses