data_juicer.ops.grouper#
- class data_juicer.ops.grouper.KeyValueGrouper(group_by_keys: List[str] | None = None, *args, **kwargs)[source]#
Bases:
GrouperGroup samples to batched samples according values in given keys.
- __init__(group_by_keys: List[str] | None = None, *args, **kwargs)[source]#
Initialization method.
- Parameters:
group_by_keys โ group samples according values in the keys. Support for nested keys such as โ__dj__stats__.text_lenโ. It is [self.text_key] in default.
args โ extra args
kwargs โ extra args
- class data_juicer.ops.grouper.NaiveGrouper(*args, **kwargs)[source]#
Bases:
GrouperGroup all samples to one batched sample.
- class data_juicer.ops.grouper.NaiveReverseGrouper(batch_meta_export_path=None, *args, **kwargs)[source]#
Bases:
GrouperSplit batched samples to samples.