video_object_segmenting_mapper#

Text-guided semantic segmentation of valid objects throughout the video (YOLOE + SAM2).

在整个视频中对有效物体进行文本引导的语义分割(YOLOE + SAM2)。

Type 算子类型: mapper

Tags 标签: gpu, hf, video

🔧 Parameter Configuration 参数配置#

name 参数名

type 类型

default 默认值

desc 说明

sam2_hf_model

<class 'str'>

'facebook/sam2.1-hiera-tiny'

yoloe_path

<class 'str'>

'yoloe-11l-seg.pt'

The path to the YOLOE model.

yoloe_conf

<class 'float'>

0.5

Confidence threshold for YOLOE object detection.

torch_dtype

<class 'str'>

'bf16'

The floating point type used for model inference. Can be one of ['fp32', 'fp16', 'bf16'].

if_binarize

<class 'bool'>

True

Whether the final mask requires binarization. If 'if_save_visualization' is set to True, 'if_binarize' will automatically be adjusted to True.

if_save_visualization

<class 'bool'>

False

Whether to save visualization results.

save_visualization_dir

<class 'str'>

DATA_JUICER_ASSETS_CACHE

The path for saving visualization results.

args

''

kwargs

''

📊 Effect demonstration 效果演示#

test#

VideoObjectSegmentingMapper(sam2_hf_model='facebook/sam2.1-hiera-tiny', yoloe_path='yoloe-11l-seg.pt', yoloe_conf=0.2, torch_dtype='bf16', if_binarize=True, if_save_visualization=False)

📥 input data 输入数据#

Sample 1: 1 video
video4.mp4:
main_character_list
['glasses', 'a woman', 'a window']
Sample 2: 1 video
video3.mp4:
main_character_list
['a laptop']

📤 output data 输出数据#

Sample 1: empty
segment_data
[673, 3, 1, 360, 480]
cls_id_dict3
object_cls_list
[3]
yoloe_conf_list
[3]
Sample 2: empty
segment_data
[1190, 1, 1, 640, 362]
cls_id_dict1
object_cls_list
[1]
yoloe_conf_list
[1]