vlm_ray_vllm_engine_pipeline#

Pipeline to generate response using vLLM engine on Ray. This pipeline leverages the vLLM engine for efficient large vision language model inference. More details about ray vLLM engine can be found at: https://docs.ray.io/en/latest/data/working-with-llms.html

使用 Ray 上的 vLLM 引擎生成响应的流水线。该流水线利用 vLLM 引擎实现高效的大视觉语言模型推理。有关 Ray vLLM 引擎的更多详情,请参见:https://docs.ray.io/en/latest/data/working-with-llms.html

Type 算子类型: pipeline

Tags 标签: gpu, image

🔧 Parameter Configuration 参数配置#

name 参数名

type 类型

default 默认值

desc 说明

api_or_hf_model

<class ‘str’>

'Qwen/Qwen2.5-7B-Instruct'

API or huggingface model name.

is_hf_model

<class ‘bool’>

True

system_prompt

typing.Optional[str]

None

System prompt for guiding the optimization task.

accelerator_type

typing.Optional[str]

None

The type of accelerator to use (e.g., “V100”, “A100”). Default to None, meaning that only the CPU will be used.

sampling_params

typing.Optional[typing.Dict]

None

Sampling parameters for text generation (e.g., {‘temperature’: 0.9, ‘top_p’: 0.95}).

engine_kwargs

typing.Optional[typing.Dict]

None

The kwargs to pass to the vLLM engine. See documentation for details: https://docs.vllm.ai/en/latest/api/vllm/engine/arg_utils/#vllm.engine.arg_utils.AsyncEngineArgs.

kwargs

''

Extra keyword arguments.