data_juicer_sandbox.model_executors module#
- class data_juicer_sandbox.model_executors.BaseModelExecutor(model_config: dict, watcher=None)[源代码]#
基类:
objectBase abstraction for model executor within the DataJuicer's sandbox
- async run(run_type, run_obj=None, **kwargs)[源代码]#
- conduct some model-related execution tasks
given specified run_type and run_obj
- class data_juicer_sandbox.model_executors.ModelScopeExecutor(model_config: dict, watcher=None)[源代码]#
- class data_juicer_sandbox.model_executors.LLMInferExecutor(model_config: dict, watcher=None)[源代码]#
-
A inference executor for LLM inference. The model preparation method should be implemented by the subclass for specific type of model.
The config file for this type of executor should at least include the following items: 1. type: model type. 2. build_messages_func: the helper func to build the messages. 3. parse_output_func: the helper func to build the messages. 4. dataset_path: the input datasets or data pools use to construct the input messages for LLM inference.
Only support jsonl files for now.
export_path: the output dir to store the inference results.
infer_res_key: the key name to store the inference results. It's "response" in default.
- class data_juicer_sandbox.model_executors.HFTransformersInferExecutor(model_config: dict, watcher=None)[源代码]#
-
A inference executor for model inference with Huggingface Transformers.
The config file for this executor should at least include the following items: 1. type: must be "huggingface". 2. model_path: the path to the HF model. 3. model_params: extra parameters for the model. 4. sampling_params: extra sampling parameters for the model.
- class data_juicer_sandbox.model_executors.VLLMInferExecutor(model_config: dict, watcher=None)[源代码]#
-
A inference executor for model inference with vLLM.
The config file for this executor should at least include the following items: 1. type: must be "vllm". 2. model_path: the path to the vLLM model. 3. model_params: extra parameters for the model. 4. sampling_params: extra sampling parameters for the model. 5. other parameters can be referred to the class LLMInferExecutor
- class data_juicer_sandbox.model_executors.APIModelInferExecutor(model_config: dict, watcher=None)[源代码]#
-
A inference executor for model inference with OpenAI API.
The config file for this executor should at least include the following items: 1. type: must be "api". 2. model: the API model used to inference. 3. model_params: extra parameters for the model. 4. sampling_params: extra sampling parameters for the model. 5. api_endpoint: URL endpoint for the API. 6. response_path: Path to extract content from the API response. Defaults to 'choices.0.message.content'. 7. max_retry_num: the max number of retries when the API request fails. 8. other parameters can be referred to the class LLMInferExecutor