data_juicer.utils.encryption_utils module#
- data_juicer.utils.encryption_utils.load_fernet_key(key_path=None)[source]#
Load a Fernet key from a file or environment variable.
Priority order: 1.
key_pathfile (if provided and exists) 2. Environment variableDJ_ENCRYPTION_KEY- Parameters:
key_path â path to a file containing the Fernet key as a base64 url-safe string. Pass
Noneto fall back to the environment variable.- Returns:
a
cryptography.fernet.Fernetinstance ready for encryption / decryption.- Raises:
ValueError â if no key can be found or the key is invalid.
- data_juicer.utils.encryption_utils.encrypt_file(src_path, dst_path, fernet)[source]#
Encrypt a file with Fernet and write the ciphertext to
dst_path.When
src_path == dst_paththe file is encrypted in-place: the plaintext is read into memory, the file is overwritten with ciphertext, and the original plaintext is never written back to disk.- Parameters:
src_path â path to the plaintext source file.
dst_path â path where the encrypted file will be written. May be the same as
src_pathfor in-place encryption.fernet â a
cryptography.fernet.Fernetinstance.
- data_juicer.utils.encryption_utils.decrypt_file_to_bytes(src_path, fernet)[source]#
Decrypt an encrypted file and return the plaintext as
bytes.The plaintext is never written to disk â only returned in memory.
- Parameters:
src_path â path to the Fernet-encrypted file.
fernet â a
cryptography.fernet.Fernetinstance.
- Returns:
decrypted plaintext as
bytes.- Raises:
cryptography.fernet.InvalidToken â if the file cannot be decrypted with the provided key.
- data_juicer.utils.encryption_utils.get_secure_tmpdir()[source]#
Return the best available temporary directory for plaintext data.
Priority: 1.
/dev/shmâ Linux in-memory tmpfs, plaintext never touches disk. 2. System default (/tmporTMPDIR) â plaintext exists briefly ondisk until the caller removes the file.
- Returns:
path string to use as the
dirargument oftempfile.NamedTemporaryFile().
- data_juicer.utils.encryption_utils.decrypt_file_to_bytesio(src_path, fernet)[source]#
Decrypt an encrypted file and return an
io.BytesIObuffer.Convenience wrapper around
decrypt_file_to_bytes()that wraps the result in a seekable in-memory buffer, ready to be passed directly to HuggingFaceload_datasetor PDF/DOCX parsers.- Parameters:
src_path â path to the Fernet-encrypted file.
fernet â a
cryptography.fernet.Fernetinstance.
- Returns:
io.BytesIOpositioned at offset 0.