data_juicer.ops.mapper.video_hand_reconstruction_hawor_mapper module#

class data_juicer.ops.mapper.video_hand_reconstruction_hawor_mapper.VideoHandReconstructionHaworMapper(*args, **kwargs)[source]#

Bases: Mapper

Use HaWoR and MoGe-2 for hand reconstruction.

__init__(hawor_model_path: str = 'hawor.ckpt', hawor_config_path: str = 'model_config.yaml', hawor_detector_path: str = 'detector.pt', mano_right_path: str = 'path_to_mano_right_pkl', mano_left_path: str = 'path_to_mano_left_pkl', frame_field: str = 'video_frames', camera_calibration_field: str = 'camera_calibration', tag_field_name: str = 'hand_reconstruction_hawor_tags', thresh: float = 0.2, *args, **kwargs)[source]#

Initialization method.

Parameters:
  • hawor_model_path – The path to ‘hawor.ckpt’. for the HaWoR model.

  • hawor_config_path – The path to ‘model_config.yaml’ for the HaWoR model.

  • hawor_detector_path – The path to ‘detector.pt’ for the HaWoR model.

  • mano_right_path – The path to ‘MANO_RIGHT.pkl’. Users need to download this file from https://mano.is.tue.mpg.de/ and comply with the MANO license.

  • mano_left_path – The path to ‘MANO_LEFT.pkl’. Users need to download this file from https://mano.is.tue.mpg.de/ and comply with the MANO license. Used for accurate left-hand wrist offset computation (with shapedirs bug-fix).

  • frame_field – The field name where the video frames are stored.

  • camera_calibration_field – The field name where the camera calibration info is stored.

  • tag_field_name – The field name to store the tags. It’s “hand_reconstruction_hawor_tags” in default.

  • thresh – The confidence threshold for hand detection. Default is 0.2.

  • args – extra args

  • kwargs – extra args

detect_track(imgfiles: list, hand_det_model, thresh: float = 0.5) tuple[source]#

Detects and tracks hands across a sequence of images using YOLO.

Parameters:
  • imgfiles (list) – List of image frames.

  • hand_det_model (YOLO) – The initialized YOLO hand detection model.

  • thresh (float) – Confidence threshold for detection.

Returns:

(list of boxes (unused in original logic), dict of tracks)

Return type:

tuple

hawor_motion_estimation(imgfiles: list, tracks: dict, model, img_focal: float, frame_file_paths: list, single_image: bool = False) dict[source]#

Performs HAWOR 3D hand reconstruction on detected and tracked hand regions.

Parameters:
  • imgfiles (list) – List of decoded image frames (numpy arrays).

  • tracks (dict) – Dictionary mapping track ID to a list of detection objects.

  • model (HAWOR) – The initialized HAWOR model.

  • img_focal (float) – Camera focal length.

  • frame_file_paths (list) – List of file paths readable by HaWoR (pre-materialized on disk if input was bytes).

  • single_image (bool) – Flag for single-image processing mode.

Returns:

Reconstructed parameters (‘left’ and ‘right’ hand results).

Return type:

dict

process_single(sample=None, rank=None)[source]#

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample