data_juicer.ops.mapper.video_atomic_action_segment_mapper module#
- class data_juicer.ops.mapper.video_atomic_action_segment_mapper.VideoAtomicActionSegmentMapper(*args, **kwargs)[源代码]#
基类:
MapperSegment a unified hand trajectory into atomic action clips.
Implements the algorithm from paper https://arxiv.org/pdf/2510.21571:
"we detect speed minima of the 3D hand wrists in the world space and use them as cutting points. We smooth the hand trajectory and select points that are local speed minima within a fixed window centered on each point."
The operator reads the merged hand_action_tags (output of
VideoClipReassemblyMapper) and produces a list of segments. Each segment contains the start and end frame indices, plus sliced states / actions / joints for that segment.Segmentation is applied independently for left and right hands. A frame is a cutting point if it is a speed local minimum within a window of
min_windowframes on each side.Output field (
segment_field) structure:[ { "hand_type": "right", "segment_id": 0, "start_frame": 10, "end_frame": 45, "states": [...], "actions": [...], "valid_frame_ids": [...], "joints_world": [...], }, ... ]
- __init__(hand_action_field: str = 'hand_action_tags', segment_field: str = 'atomic_action_segments', speed_smooth_window: int = 5, min_window: int = 15, min_segment_frames: int = 8, max_segment_frames: int = 300, hand_type: str = 'both', *args, **kwargs)[源代码]#
Initialization method.
- 参数:
hand_action_field -- Meta field storing merged hand action results (output of VideoClipReassemblyMapper).
segment_field -- Output meta field for atomic segments.
speed_smooth_window -- Window size for Savitzky-Golay smoothing of the speed signal before minima detection. Must be odd.
min_window -- Half-window size for local minima detection. A frame is a local minimum only if it is the minimum within
[t - min_window, t + min_window]. Larger values → fewer, longer segments.min_segment_frames -- Minimum frames per segment. Segments shorter than this are merged with neighbors.
max_segment_frames -- Maximum frames per segment. Segments longer than this are forcibly split at the deepest speed minimum.
hand_type -- Which hand(s) to segment: 'left', 'right', or 'both'.