data_juicer.ops.pipeline.ray_repartition_pipeline module#
- class data_juicer.ops.pipeline.ray_repartition_pipeline.RayRepartitionPipeline(*args, **kwargs)[源代码]#
基类:
PipelineRepartition a Ray Dataset into a target number of blocks.
This operator performs dataset-level block repartitioning through Ray Dataset's repartition API. It is intended for Ray executor pipelines only because local datasets do not expose Ray Dataset blocks.