s3_download_file_mapper#
Mapper to download files from S3 to local files or load them into memory.
This operator downloads files from S3 URLs (s3://...) or handles local files. It supports:
Downloading multiple files concurrently
Saving files to a specified directory or loading content into memory
Resume download functionality
S3 authentication with access keys
Custom S3 endpoints (for S3-compatible services like MinIO)
The operator processes nested lists of URLs/paths, maintaining the original structure in the output.
用于从 S3 下载文件到本地或加载到内存的 Mapper。
该算子可从 S3 URL(s3://...)下载文件,也支持处理本地文件。功能包括:
并发下载多个文件
将文件保存到指定目录或将内容加载到内存
支持断点续传
使用访问密钥进行 S3 身份验证
支持自定义 S3 端点(适用于 MinIO 等 S3 兼容服务)
该算子可处理嵌套的 URL/路径列表,并在输出中保持原始结构。
Type 算子类型: mapper
Tags 标签: cpu
🔧 Parameter Configuration 参数配置#
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
|---|---|---|---|
|
<class 'str'> |
|
The field name to get the URL/path to download. |
|
<class 'str'> |
|
The directory to save downloaded files. |
|
<class 'str'> |
|
The field name to save the downloaded file content. |
|
<class 'bool'> |
|
Whether to resume download. If True, skip the sample if it exists. |
|
<class 'int'> |
|
(Deprecated) Kept for backward compatibility, not used for S3 downloads. |
|
<class 'int'> |
|
Maximum concurrent downloads. |
|
<class 'str'> |
|
AWS access key ID for S3. |
|
<class 'str'> |
|
AWS secret access key for S3. |
|
<class 'str'> |
|
AWS session token for S3 (optional). |
|
<class 'str'> |
|
AWS region for S3. |
|
<class 'str'> |
|
Custom S3 endpoint URL (for S3-compatible services). |
|
|
extra args |
|
|
|
extra args |