data_juicer.utils.fingerprint_utils module#

class data_juicer.utils.fingerprint_utils.Hasher[源代码]#

基类:object

Hasher that accepts python objects as inputs.

dispatch: Dict = {}#
__init__()[源代码]#
classmethod hash_bytes(value: bytes | List[bytes]) str[源代码]#
classmethod hash_default(value: Any) str[源代码]#

Use dill to serialize objects to avoid serialization failures.

classmethod hash(value: Any) str[源代码]#
update(value: Any) None[源代码]#
hexdigest() str[源代码]#
data_juicer.utils.fingerprint_utils.update_fingerprint(fingerprint, transform, transform_args)[源代码]#

Combining various objects to update the fingerprint.

data_juicer.utils.fingerprint_utils.generate_fingerprint(ds, *args, **kwargs)[源代码]#

Generate new fingerprints by using various kwargs of the dataset.