data_juicer.core.ray_exporter module#
- class data_juicer.core.ray_exporter.RayExporter(export_path, export_type=None, export_shard_size=0, keep_stats_in_res_ds=True, keep_hashes_in_res_ds=False, encrypt_before_export=False, encryption_key_path=None, **kwargs)[source]#
Bases:
objectThe Exporter class is used to export a ray dataset to files of specific format.
- __init__(export_path, export_type=None, export_shard_size=0, keep_stats_in_res_ds=True, keep_hashes_in_res_ds=False, encrypt_before_export=False, encryption_key_path=None, **kwargs)[source]#
Initialization method.
- Parameters:
export_path โ the path to export datasets.
export_type โ the format type of the exported datasets.
export_shard_size โ the approximate size of each shard of exported dataset. In default, itโs 0, which means export the dataset in the default setting of ray.
keep_stats_in_res_ds โ whether to keep stats in the result dataset.
keep_hashes_in_res_ds โ whether to keep hashes in the result dataset.
encrypt_before_export โ whether to encrypt each exported file in-place after Ray has finished writing. All files inside the export directory will be encrypted. S3 paths are skipped. Default: False.
encryption_key_path โ path to a file containing the Fernet key. Falls back to the
DJ_ENCRYPTION_KEYenvironment variable whenNone. Only used whenencrypt_before_exportis True.
- export(dataset, columns=None)[source]#
Export method for a dataset.
- Parameters:
dataset โ the dataset to export.
columns โ the columns to export.
- Returns:
- static write_json(dataset, export_path, **kwargs)[source]#
Export method for json/jsonl target files.
- Parameters:
dataset โ the dataset to export.
export_path โ the path to store the exported dataset.
kwargs โ extra arguments.
- Returns: