data_juicer.ops.mapper.usage_counter_mapper module#

class data_juicer.ops.mapper.usage_counter_mapper.UsageCounterMapper(*args, **kwargs)[源代码]#

基类：Mapper

Write token usage to meta from choices/usage (OpenAI/Anthropic-style).

Collects every non-empty usage dict found (top-level usage_key, response_metadata, each choices[] entry, nested message usage). By default, deduplicates identical usage snapshots before summing: same (prompt_tokens, completion_tokens, total_tokens or prompt+completion) only counts once (typical when response_usage mirrors choices[0].usage). Set dedupe_identical_usage: false to restore legacy double-counting.

__init__(choices_key: str = 'choices', usage_key: str = 'usage', response_metadata_key: str = 'response_metadata', dedupe_identical_usage: bool = True, **kwargs)[源代码]#

Base class that conducts data editing.

参数:

text_key -- the key name of field that stores sample texts to be processed.
image_key -- the key name of field that stores sample image list to be processed
audio_key -- the key name of field that stores sample audio list to be processed
video_key -- the key name of field that stores sample video list to be processed
image_bytes_key -- the key name of field that stores sample image bytes list to be processed
query_key -- the key name of field that stores sample queries
response_key -- the key name of field that stores responses
history_key -- the key name of field that stores history of queries and responses

process_single(sample)[源代码]#

For sample level, sample --> sample

参数:: sample -- sample to process
返回:: processed sample

data_juicer.ops.mapper.usage_counter_mapper module#

本页