video_whole_body_pose_estimation_mapper#

Input a video containing people, and use the DWPose model to extract the body, hand, feet, and face keypoints of the human subjects in the video, i.e., 2D Whole-body Pose Estimation.

输入包含人物的视频,并使用 DWPose 模型提取视频中人体的躯干、手部、足部和面部关键点,即 2D 全身姿态估计。

Type 算子类型: mapper

Tags 标签: gpu, video

🔧 Parameter Configuration 参数配置#

name 参数名

type 类型

default 默认值

desc 说明

onnx_det_model

<class ‘str’>

'yolox_l.onnx'

The path to ‘yolox_l.onnx’.

onnx_pose_model

<class ‘str’>

'dw-ll_ucoco_384.onnx'

The path to ‘dw-ll_ucoco_384.onnx’.

frame_num

typing.Annotated[int, Gt(gt=0)]

3

The number of frames to be extracted uniformly from the video. If it’s 1, only the middle frame will be extracted. If it’s 2, only the first and the last frames will be extracted. If it’s larger than 2, in addition to the first and the last frames, other frames will be extracted uniformly within the video duration. If “duration” > 0, frame_num is the number of frames per segment.

duration

<class ‘float’>

0

The duration of each segment in seconds. If 0, frames are extracted from the entire video. If duration > 0, the video is segmented into multiple segments based on duration, and frames are extracted from each segment.

tag_field_name

<class ‘str’>

'pose_estimation_tags'

The field name to store the tags. It’s “pose_estimation_tags” in default.

frame_dir

<class ‘str’>

DATA_JUICER_ASSETS_CACHE

Output directory to save extracted frames.

if_save_visualization

<class ‘bool’>

False

Whether to save visualization results.

save_visualization_dir

<class ‘str’>

DATA_JUICER_ASSETS_CACHE

The path for saving visualization results.

args

''

extra args

kwargs

''

extra args

📊 Effect demonstration 效果演示#

test#

VideoWholeBodyPoseEstimationMapper(onnx_det_model='yolox_l.onnx', onnx_pose_model='dw-ll_ucoco_384.onnx', frame_num=1, duration=1, tag_field_name=MetaKeys.pose_estimation_tags, frame_dir=DATA_JUICER_ASSETS_CACHE, if_save_visualization=True, save_visualization_dir=DATA_JUICER_ASSETS_CACHE)

📥 input data 输入数据#

Sample 1: 1 video
video3.mp4:
Sample 2: 1 video
video4.mp4:

📤 output data 输出数据#

Sample 1: empty
body_keypoints_shape
[2, 18, 2]
foot_keypoints_shape
[2, 6, 2]
faces_keypoints_shape
[2, 68, 2]
hands_keypoints_shape
[4, 21, 2]
bbox_results_list_length49
bbox_shape
[2, 4]
Sample 2: empty
body_keypoints_shape
[2, 18, 2]
foot_keypoints_shape
[2, 6, 2]
faces_keypoints_shape
[2, 68, 2]
hands_keypoints_shape
[4, 21, 2]
bbox_results_list_length22
bbox_shape
[2, 4]

test_mul_proc#

VideoWholeBodyPoseEstimationMapper(onnx_det_model='yolox_l.onnx', onnx_pose_model='dw-ll_ucoco_384.onnx', frame_num=1, duration=1, tag_field_name=MetaKeys.pose_estimation_tags, frame_dir=DATA_JUICER_ASSETS_CACHE, if_save_visualization=True, save_visualization_dir=DATA_JUICER_ASSETS_CACHE)

📥 input data 输入数据#

Sample 1: 1 video
video3.mp4:
Sample 2: 1 video
video4.mp4:

📤 output data 输出数据#

Sample 1: empty
body_keypoints_shape
[2, 18, 2]
foot_keypoints_shape
[2, 6, 2]
faces_keypoints_shape
[2, 68, 2]
hands_keypoints_shape
[4, 21, 2]
bbox_results_list_length49
bbox_shape
[2, 4]
Sample 2: empty
body_keypoints_shape
[2, 18, 2]
foot_keypoints_shape
[2, 6, 2]
faces_keypoints_shape
[2, 68, 2]
hands_keypoints_shape
[4, 21, 2]
bbox_results_list_length22
bbox_shape
[2, 4]