data_juicer_sandbox.pipelines module#
- class data_juicer_sandbox.pipelines.SandBoxWatcher(sandbox_cfg)[source]#
Bases:
objectBasic Watcher class to manage interested results, and manage the experiment within the sandbox based on WandB UI and it’s utilities.
- class data_juicer_sandbox.pipelines.Target(iter_target_str: str = None, key: str = None, op: str = None, tgt_val: float = None)[source]#
Bases:
object- SUPPORT_OPS = ['==', '>=', '<=', '>', '<']#
- __init__(iter_target_str: str = None, key: str = None, op: str = None, tgt_val: float = None)[source]#
- key: str#
- op: str#
- tgt_val: float#
- check_target(context_infos: ContextInfos)[source]#
- class data_juicer_sandbox.pipelines.SandboxPipeline(pipeline_name='anonymous', pipeline_cfg=None, watcher=None)[source]#
Bases:
object- __init__(pipeline_name='anonymous', pipeline_cfg=None, watcher=None)[source]#
Initialization method.
- run(context_infos: ContextInfos)[source]#
Running the sandbox pipeline at once or in HPO style.
- one_trial(context_infos: ContextInfos)[source]#
- Running the sandbox pipeline at once.
- Users can flexibly conduct some steps of the whole sandbox pipeline
according to their own need and configuration. The watcher will automatically track the results in terms of data, model and specified evaluation metrics to the watcher.
- execute_hpo_wandb(context_infos)[source]#
- Running the sandbox pipeline in HPO style.
- Users can flexibly conduct some steps of the whole sandbox pipeline
according to their own need and configuration. The watcher will automatically track the results in terms of data, model and specified evaluation metrics to the watcher.
- class data_juicer_sandbox.pipelines.SandBoxExecutor(cfg=None)[source]#
Bases:
object- This SandBoxExecutor class is used to provide a sandbox environment for
- exploring data-model co-designs in a one-stop manner with fast feedback
and tiny model size, small data size, and high efficiency.
It plays as a middleware maintains the data-juicer’s data executor, a model processor (training and inference), and an auto-evaluator, where the latter two ones are usually from third-party libraries.
- parse_pipelines(cfg)[source]#
Parse the pipeline configs.
- Parameters:
cfg – the original config
- Returns:
a list of SandBoxPipeline objects.
- iterative_update_pipelines(current_pipelines: List[SandboxPipeline], last_context_infos: ContextInfos)[source]#