data_juicer_sandbox.model_executors module#
- class data_juicer_sandbox.model_executors.BaseModelExecutor(model_config: dict, watcher=None)[source]#
Bases:
objectBase abstraction for model executor within the DataJuicerās sandbox
- async run(run_type, run_obj=None, **kwargs)[source]#
- conduct some model-related execution tasks
given specified run_type and run_obj
- class data_juicer_sandbox.model_executors.ModelScopeExecutor(model_config: dict, watcher=None)[source]#
Bases:
BaseModelExecutor
- class data_juicer_sandbox.model_executors.ModelscopeInferProbeExecutor(model_config: dict)[source]#
Bases:
ModelScopeExecutor
- class data_juicer_sandbox.model_executors.ModelscopeTrainExecutor(model_config, watcher=None)[source]#
Bases:
ModelScopeExecutor
- class data_juicer_sandbox.model_executors.LLMInferExecutor(model_config: dict, watcher=None)[source]#
Bases:
BaseModelExecutorA inference executor for LLM inference. The model preparation method should be implemented by the subclass for specific type of model.
The config file for this type of executor should at least include the following items: 1. type: model type. 2. build_messages_func: the helper func to build the messages. 3. parse_output_func: the helper func to build the messages. 4. dataset_path: the input datasets or data pools use to construct the input messages for LLM inference.
Only support jsonl files for now.
export_path: the output dir to store the inference results.
infer_res_key: the key name to store the inference results. Itās āresponseā in default.
- class data_juicer_sandbox.model_executors.HFTransformersInferExecutor(model_config: dict, watcher=None)[source]#
Bases:
LLMInferExecutorA inference executor for model inference with Huggingface Transformers.
The config file for this executor should at least include the following items: 1. type: must be āhuggingfaceā. 2. model_path: the path to the HF model. 3. model_params: extra parameters for the model. 4. sampling_params: extra sampling parameters for the model.
- class data_juicer_sandbox.model_executors.VLLMInferExecutor(model_config: dict, watcher=None)[source]#
Bases:
LLMInferExecutorA inference executor for model inference with vLLM.
The config file for this executor should at least include the following items: 1. type: must be āvllmā. 2. model_path: the path to the vLLM model. 3. model_params: extra parameters for the model. 4. sampling_params: extra sampling parameters for the model. 5. other parameters can be referred to the class LLMInferExecutor
- class data_juicer_sandbox.model_executors.APIModelInferExecutor(model_config: dict, watcher=None)[source]#
Bases:
LLMInferExecutorA inference executor for model inference with OpenAI API.
The config file for this executor should at least include the following items: 1. type: must be āapiā. 2. model: the API model used to inference. 3. model_params: extra parameters for the model. 4. sampling_params: extra sampling parameters for the model. 5. api_endpoint: URL endpoint for the API. 6. response_path: Path to extract content from the API response. Defaults to āchoices.0.message.contentā. 7. max_retry_num: the max number of retries when the API request fails. 8. other parameters can be referred to the class LLMInferExecutor
- class data_juicer_sandbox.model_executors.LLaVAExecutor(model_config: dict)[source]#
Bases:
BaseModelExecutor
- class data_juicer_sandbox.model_executors.LLaMAFactoryExecutor(model_config: dict)[source]#
Bases:
BaseModelExecutor
- class data_juicer_sandbox.model_executors.MegatronExecutor(model_config: dict)[source]#
Bases:
BaseModelExecutor