data_juicer_sandbox.pipelines module#

class data_juicer_sandbox.pipelines.SandBoxWatcher(sandbox_cfg)[源代码]#

基类：object

Basic Watcher class to manage interested results, and manage the experiment within the sandbox based on WandB UI and it's utilities.

__init__(sandbox_cfg)[源代码]#: Initialize the watcher with a reference to an executor instance.

query(meta_name: str)[源代码]#: Query the result from the logged_res.

watch(res, meta_name: str = '')[源代码]#: Flatten the result in dot structure and log it into WandB.

setup_sweep(hpo_config: dict = None, project_name: str = None)[源代码]#: Setup and start a new WandB sweep.

watch_cfgs(cfgs: List[tuple] = None)[源代码]#: Watch the configuration of the experiment.

class data_juicer_sandbox.pipelines.Target(iter_target_str: str = None, key: str = None, op: str = None, tgt_val: float = None)[源代码]#

基类：object

SUPPORT_OPS = ['==', '>=', '<=', '>', '<']#

__init__(iter_target_str: str = None, key: str = None, op: str = None, tgt_val: float = None)[源代码]#

key: str#

op: str#

tgt_val: float#

parse_iter_targets(iter_target_str)[源代码]#

check_target(context_infos: ContextInfos)[源代码]#

class data_juicer_sandbox.pipelines.SandboxPipeline(pipeline_name='anonymous', pipeline_cfg=None, watcher=None)[源代码]#

基类：object

__init__(pipeline_name='anonymous', pipeline_cfg=None, watcher=None)[源代码]#: Initialization method.

register_jobs()[源代码]#

run(context_infos: ContextInfos)[源代码]#: Running the sandbox pipeline at once or in HPO style.

one_trial(context_infos: ContextInfos)[源代码]#

Running the sandbox pipeline at once.

Users can flexibly conduct some steps of the whole sandbox pipeline: according to their own need and configuration. The watcher will automatically track the results in terms of data, model and specified evaluation metrics to the watcher.

execute_hpo_wandb(context_infos)[源代码]#

Running the sandbox pipeline in HPO style.

Users can flexibly conduct some steps of the whole sandbox pipeline: according to their own need and configuration. The watcher will automatically track the results in terms of data, model and specified evaluation metrics to the watcher.

class data_juicer_sandbox.pipelines.SandBoxExecutor(cfg=None)[源代码]#

基类：object

This SandBoxExecutor class is used to provide a sandbox environment for

exploring data-model co-designs in a one-stop manner with fast feedback: and tiny model size, small data size, and high efficiency.

It plays as a middleware maintains the data-juicer's data executor, a model processor (training and inference), and an auto-evaluator, where the latter two ones are usually from third-party libraries.

__init__(cfg=None)[源代码]#

Initialization method.

参数:: cfg -- configuration of sandbox.

parse_pipelines(cfg)[源代码]#

Parse the pipeline configs.

参数:: cfg -- the original config
返回:: a list of SandBoxPipeline objects.

iterative_update_pipelines(current_pipelines: List[SandboxPipeline], last_context_infos: ContextInfos)[源代码]#

specify_job_configs(ori_config)[源代码]#

specify_jobs_configs(cfg)[源代码]#

Specify job configs by their dict objects or config file path strings.

参数:: cfg -- the original config
返回:: a dict of different configs.

run()[源代码]#

data_juicer_sandbox.pipelines module#

本页