data_juicer.ops.mapper.usage_counter_mapper module#
- class data_juicer.ops.mapper.usage_counter_mapper.UsageCounterMapper(*args, **kwargs)[source]#
Bases:
MapperWrite token usage to meta from choices/usage (OpenAI/Anthropic-style).
Collects every non-empty usage dict found (top-level
usage_key,response_metadata, eachchoices[]entry, nested message usage). By default, deduplicates identical usage snapshots before summing: same(prompt_tokens, completion_tokens, total_tokens or prompt+completion)only counts once (typical whenresponse_usagemirrorschoices[0].usage). Setdedupe_identical_usage: falseto restore legacy double-counting.- __init__(choices_key: str = 'choices', usage_key: str = 'usage', response_metadata_key: str = 'response_metadata', dedupe_identical_usage: bool = True, **kwargs)[source]#
Base class that conducts data editing.
- Parameters:
text_key â the key name of field that stores sample texts to be processed.
image_key â the key name of field that stores sample image list to be processed
audio_key â the key name of field that stores sample audio list to be processed
video_key â the key name of field that stores sample video list to be processed
image_bytes_key â the key name of field that stores sample image bytes list to be processed
query_key â the key name of field that stores sample queries
response_key â the key name of field that stores responses
history_key â the key name of field that stores history of queries and responses