data_juicer.ops.mapper.video_hand_reconstruction_hawor_mapper module#

class data_juicer.ops.mapper.video_hand_reconstruction_hawor_mapper.VideoHandReconstructionHaworMapper(*args, **kwargs)[source]#

Bases: Mapper

Use HaWoR and MoGe-2 for hand reconstruction.

__init__(hawor_model_path: str = 'hawor.ckpt', hawor_config_path: str = 'model_config.yaml', hawor_detector_path: str = 'detector.pt', moge_model_path: str = 'Ruicheng/moge-2-vitl', mano_right_path: str = 'path_to_mano_right_pkl', frame_num: Annotated[int, Gt(gt=0)] = 3, duration: float = 0, thresh: float = 0.2, tag_field_name: str = 'hand_reconstruction_hawor_tags', frame_dir: str = '/home/runner/.cache/data_juicer/assets', if_output_moge_info: bool = False, moge_output_info_dir: str = '/home/runner/.cache/data_juicer/assets', *args, **kwargs)[source]#

Initialization method.

Parameters:
  • hawor_model_path – The path to ‘hawor.ckpt’. for the HaWoR model.

  • hawor_config_path – The path to ‘model_config.yaml’ for the HaWoR model.

  • hawor_detector_path – The path to ‘detector.pt’ for the HaWoR model.

  • moge_model_path – The path to the Moge-2 model.

  • mano_right_path – The path to ‘MANO_RIGHT.pkl’. Users need to download this file from https://mano.is.tue.mpg.de/ and comply with the MANO license.

  • frame_num – The number of frames to be extracted uniformly from the video. If it’s 1, only the middle frame will be extracted. If it’s 2, only the first and the last frames will be extracted. If it’s larger than 2, in addition to the first and the last frames, other frames will be extracted uniformly within the video duration. If “duration” > 0, frame_num is the number of frames per segment.

  • duration – The duration of each segment in seconds. If 0, frames are extracted from the entire video. If duration > 0, the video is segmented into multiple segments based on duration, and frames are extracted from each segment.

  • thresh – Confidence threshold for hand detection.

  • tag_field_name – The field name to store the tags. It’s “hand_reconstruction_hawor_tags” in default.

  • frame_dir – Output directory to save extracted frames.

  • if_output_moge_info – Whether to save the results from MoGe-2 to an JSON file.

  • moge_output_info_dir – Output directory for saving camera parameters.

  • args – extra args

  • kwargs – extra args

detect_track(imgfiles: list, hand_det_model, thresh: float = 0.5) tuple[source]#

Detects and tracks hands across a sequence of images using YOLO.

Parameters:
  • imgfiles (list) – List of image frames.

  • hand_det_model (YOLO) – The initialized YOLO hand detection model.

  • thresh (float) – Confidence threshold for detection.

Returns:

(list of boxes (unused in original logic), dict of tracks)

Return type:

tuple

hawor_motion_estimation(imgfiles: list, tracks: dict, model, img_focal: float, img_paths: list, single_image: bool = False) dict[source]#

Performs HAWOR 3D hand reconstruction on detected and tracked hand regions.

Parameters:
  • imgfiles (list) – List of image frames.

  • tracks (dict) – Dictionary mapping track ID to a list of detection objects.

  • model (HAWOR) – The initialized HAWOR model.

  • img_focal (float) – Camera focal length.

  • img_paths (list) – List of images paths.

  • single_image (bool) – Flag for single-image processing mode.

Returns:

Reconstructed parameters (‘left’ and ‘right’ hand results).

Return type:

dict

process_single(sample=None, rank=None)[source]#

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample