data_juicer_sandbox.pipelines module#

class data_juicer_sandbox.pipelines.SandBoxWatcher(sandbox_cfg)[source]#

Bases: object

Basic Watcher class to manage interested results, and manage the experiment within the sandbox based on WandB UI and it’s utilities.

__init__(sandbox_cfg)[source]#: Initialize the watcher with a reference to an executor instance.

query(meta_name: str)[source]#: Query the result from the logged_res.

watch(res, meta_name: str = '')[source]#: Flatten the result in dot structure and log it into WandB.

setup_sweep(hpo_config: dict = None, project_name: str = None)[source]#: Setup and start a new WandB sweep.

watch_cfgs(cfgs: List[tuple] = None)[source]#: Watch the configuration of the experiment.

class data_juicer_sandbox.pipelines.Target(iter_target_str: str = None, key: str = None, op: str = None, tgt_val: float = None)[source]#

Bases: object

SUPPORT_OPS = ['==', '>=', '<=', '>', '<']#

__init__(iter_target_str: str = None, key: str = None, op: str = None, tgt_val: float = None)[source]#

key: str#

op: str#

tgt_val: float#

parse_iter_targets(iter_target_str)[source]#

check_target(context_infos: ContextInfos)[source]#

class data_juicer_sandbox.pipelines.SandboxPipeline(pipeline_name='anonymous', pipeline_cfg=None, watcher=None)[source]#

Bases: object

__init__(pipeline_name='anonymous', pipeline_cfg=None, watcher=None)[source]#: Initialization method.

register_jobs()[source]#

run(context_infos: ContextInfos)[source]#: Running the sandbox pipeline at once or in HPO style.

one_trial(context_infos: ContextInfos)[source]#

Running the sandbox pipeline at once.

Users can flexibly conduct some steps of the whole sandbox pipeline: according to their own need and configuration. The watcher will automatically track the results in terms of data, model and specified evaluation metrics to the watcher.

execute_hpo_wandb(context_infos)[source]#

Running the sandbox pipeline in HPO style.

Users can flexibly conduct some steps of the whole sandbox pipeline: according to their own need and configuration. The watcher will automatically track the results in terms of data, model and specified evaluation metrics to the watcher.

class data_juicer_sandbox.pipelines.SandBoxExecutor(cfg=None)[source]#

Bases: object

This SandBoxExecutor class is used to provide a sandbox environment for

exploring data-model co-designs in a one-stop manner with fast feedback: and tiny model size, small data size, and high efficiency.

It plays as a middleware maintains the data-juicer’s data executor, a model processor (training and inference), and an auto-evaluator, where the latter two ones are usually from third-party libraries.

__init__(cfg=None)[source]#

Initialization method.

Parameters:: cfg – configuration of sandbox.

parse_pipelines(cfg)[source]#

Parse the pipeline configs.

Parameters:: cfg – the original config
Returns:: a list of SandBoxPipeline objects.

iterative_update_pipelines(current_pipelines: List[SandboxPipeline], last_context_infos: ContextInfos)[source]#

specify_job_configs(ori_config)[source]#

specify_jobs_configs(cfg)[source]#

Specify job configs by their dict objects or config file path strings.

Parameters:: cfg – the original config
Returns:: a dict of different configs.

run()[source]#

data_juicer_sandbox.pipelines module#

This Page