data_juicer.utils.encryption_utils module#
- data_juicer.utils.encryption_utils.load_fernet_key(key_path=None)[源代码]#
Load a Fernet key from a file or environment variable.
Priority order: 1.
key_pathfile (if provided and exists) 2. Environment variableDJ_ENCRYPTION_KEY- 参数:
key_path -- path to a file containing the Fernet key as a base64 url-safe string. Pass
Noneto fall back to the environment variable.- 返回:
a
cryptography.fernet.Fernetinstance ready for encryption / decryption.- 抛出:
ValueError -- if no key can be found or the key is invalid.
- data_juicer.utils.encryption_utils.encrypt_file(src_path, dst_path, fernet)[源代码]#
Encrypt a file with Fernet and write the ciphertext to
dst_path.When
src_path == dst_paththe file is encrypted in-place: the plaintext is read into memory, the file is overwritten with ciphertext, and the original plaintext is never written back to disk.- 参数:
src_path -- path to the plaintext source file.
dst_path -- path where the encrypted file will be written. May be the same as
src_pathfor in-place encryption.fernet -- a
cryptography.fernet.Fernetinstance.
- data_juicer.utils.encryption_utils.decrypt_file_to_bytes(src_path, fernet)[源代码]#
Decrypt an encrypted file and return the plaintext as
bytes.The plaintext is never written to disk — only returned in memory.
- 参数:
src_path -- path to the Fernet-encrypted file.
fernet -- a
cryptography.fernet.Fernetinstance.
- 返回:
decrypted plaintext as
bytes.- 抛出:
cryptography.fernet.InvalidToken -- if the file cannot be decrypted with the provided key.
- data_juicer.utils.encryption_utils.get_secure_tmpdir()[源代码]#
Return the best available temporary directory for plaintext data.
Priority: 1.
/dev/shm— Linux in-memory tmpfs, plaintext never touches disk. 2. System default (/tmporTMPDIR) — plaintext exists briefly ondisk until the caller removes the file.
- 返回:
path string to use as the
dirargument oftempfile.NamedTemporaryFile().
- data_juicer.utils.encryption_utils.decrypt_file_to_bytesio(src_path, fernet)[源代码]#
Decrypt an encrypted file and return an
io.BytesIObuffer.Convenience wrapper around
decrypt_file_to_bytes()that wraps the result in a seekable in-memory buffer, ready to be passed directly to HuggingFaceload_datasetor PDF/DOCX parsers.- 参数:
src_path -- path to the Fernet-encrypted file.
fernet -- a
cryptography.fernet.Fernetinstance.
- 返回:
io.BytesIOpositioned at offset 0.