data_juicer.ops.mapper.usage_counter_mapper module#
- class data_juicer.ops.mapper.usage_counter_mapper.UsageCounterMapper(*args, **kwargs)[源代码]#
基类:
MapperWrite token usage to meta from choices/usage (OpenAI/Anthropic-style).
Collects every non-empty usage dict found (top-level
usage_key,response_metadata, eachchoices[]entry, nested message usage). By default, deduplicates identical usage snapshots before summing: same(prompt_tokens, completion_tokens, total_tokens or prompt+completion)only counts once (typical whenresponse_usagemirrorschoices[0].usage). Setdedupe_identical_usage: falseto restore legacy double-counting.- __init__(choices_key: str = 'choices', usage_key: str = 'usage', response_metadata_key: str = 'response_metadata', dedupe_identical_usage: bool = True, **kwargs)[源代码]#
Base class that conducts data editing.
- 参数:
text_key -- the key name of field that stores sample texts to be processed.
image_key -- the key name of field that stores sample image list to be processed
audio_key -- the key name of field that stores sample audio list to be processed
video_key -- the key name of field that stores sample video list to be processed
image_bytes_key -- the key name of field that stores sample image bytes list to be processed
query_key -- the key name of field that stores sample queries
response_key -- the key name of field that stores responses
history_key -- the key name of field that stores history of queries and responses