data_juicer.ops.mapper.download_file_mapper module#
- class data_juicer.ops.mapper.download_file_mapper.DownloadFileMapper(download_field: str = None, save_dir: str = None, save_field: str = None, resume_download: bool = False, timeout: int = 30, max_concurrent: int = 10, *args, **kwargs)[source]#
Bases:
MapperMapper to download url files to local files or load them into memory.
- __init__(download_field: str = None, save_dir: str = None, save_field: str = None, resume_download: bool = False, timeout: int = 30, max_concurrent: int = 10, *args, **kwargs)[source]#
Initialization method.
- Parameters:
save_dir â The directory to save downloaded files.
download_field â The filed name to get the url to download.
save_field â The filed name to save the downloaded file content.
resume_download â Whether to resume download. if True, skip the sample if it exists.
max_concurrent â Maximum concurrent downloads.
args â extra args
kwargs â extra args