data_juicer.core.executor.ray_executor module#
- class data_juicer.core.executor.ray_executor.RayExecutor(cfg: Namespace | None = None)[source]#
Bases:
ExecutorBaseExecutor based on Ray.
Run Data-Juicer data processing in a distributed cluster.
Support Filter, Mapper and Exact Deduplicator operators for now.
Only support loading .json files.
Advanced functions such as checkpoint, tracer are not supported.