data_juicer.ops.mapper.video_clip_reassembly_mapper module#

class data_juicer.ops.mapper.video_clip_reassembly_mapper.VideoClipReassemblyMapper(*args, **kwargs)[source]#

Bases: Mapper

Reassemble hand-action results from overlapping video clips.

When long videos are chopped into overlapping clips (e.g. 5 s with 2 s overlap via VideoSplitByDurationMapper), each clip is processed independently through the 3-D motion labelling pipeline. This operator merges the per-clip results back into one unified result per original video, including:

  • hand_action_tags โ€” states, actions, valid_frame_ids, joints

  • video_camera_pose_tags โ€” cam_c2w array

  • hand_reconstruction_hawor_tags โ€” frame_ids converted to global

  • video_frames โ€” per-clip frame path lists merged into one global list

  • camera_calibration_moge_tags โ€” per-clip depth/intrinsics merged

  • clips โ€” replaced with the original video path

Clip global offsets are determined automatically by pixel-matching overlapping frames between consecutive clips, rather than assuming an ideal step size. This handles ffmpeg keyframe-alignment drift that causes actual clip boundaries to differ from the nominal (split_duration - overlap_duration) * fps calculation.

Reference (paper ยง3.1):

โ€œTo enhance efficiency, we chop long videos into overlapping 20-second clips in this stage and recompose their results.โ€

__init__(hand_action_field: str = 'hand_action_tags', camera_pose_field: str = 'video_camera_pose_tags', hand_reconstruction_field: str = 'hand_reconstruction_hawor_tags', frame_field: str = 'video_frames', moge_field: str = 'camera_calibration_moge_tags', clip_field: str = 'clips', video_key: str = 'videos', split_duration: float = None, overlap_duration: float = None, fps: float = None, *args, **kwargs)[source]#

Base class that conducts data editing.

Parameters:
  • text_key โ€“ the key name of field that stores sample texts to be processed.

  • image_key โ€“ the key name of field that stores sample image list to be processed

  • audio_key โ€“ the key name of field that stores sample audio list to be processed

  • video_key โ€“ the key name of field that stores sample video list to be processed

  • image_bytes_key โ€“ the key name of field that stores sample image bytes list to be processed

  • query_key โ€“ the key name of field that stores sample queries

  • response_key โ€“ the key name of field that stores responses

  • history_key โ€“ the key name of field that stores history of queries and responses

process_single(sample=None, rank=None)[source]#

For sample level, sample โ€“> sample

Parameters:

sample โ€“ sample to process

Returns:

processed sample