data_juicer.ops.mapper.usage_counter_mapper module#

class data_juicer.ops.mapper.usage_counter_mapper.UsageCounterMapper(*args, **kwargs)[源代码]#

基类:Mapper

Write token usage to meta from choices/usage (OpenAI/Anthropic-style).

Collects every non-empty usage dict found (top-level usage_key, response_metadata, each choices[] entry, nested message usage). By default, deduplicates identical usage snapshots before summing: same (prompt_tokens, completion_tokens, total_tokens or prompt+completion) only counts once (typical when response_usage mirrors choices[0].usage). Set dedupe_identical_usage: false to restore legacy double-counting.

__init__(choices_key: str = 'choices', usage_key: str = 'usage', response_metadata_key: str = 'response_metadata', dedupe_identical_usage: bool = True, **kwargs)[源代码]#

Base class that conducts data editing.

参数:
  • text_key -- the key name of field that stores sample texts to be processed.

  • image_key -- the key name of field that stores sample image list to be processed

  • audio_key -- the key name of field that stores sample audio list to be processed

  • video_key -- the key name of field that stores sample video list to be processed

  • image_bytes_key -- the key name of field that stores sample image bytes list to be processed

  • query_key -- the key name of field that stores sample queries

  • response_key -- the key name of field that stores responses

  • history_key -- the key name of field that stores history of queries and responses

process_single(sample)[源代码]#

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample