data_juicer.ops.mapper.video_atomic_action_segment_mapper module#

class data_juicer.ops.mapper.video_atomic_action_segment_mapper.VideoAtomicActionSegmentMapper(*args, **kwargs)[source]#

Bases: Mapper

Segment a unified hand trajectory into atomic action clips.

Implements the algorithm from paper https://arxiv.org/pdf/2510.21571:

โ€œwe detect speed minima of the 3D hand wrists in the world space and use them as cutting points. We smooth the hand trajectory and select points that are local speed minima within a fixed window centered on each point.โ€

The operator reads the merged hand_action_tags (output of VideoClipReassemblyMapper) and produces a list of segments. Each segment contains the start and end frame indices, plus sliced states / actions / joints for that segment.

Segmentation is applied independently for left and right hands. A frame is a cutting point if it is a speed local minimum within a window of min_window frames on each side.

Output field (segment_field) structure:

[
    {
        "hand_type": "right",
        "segment_id": 0,
        "start_frame": 10,
        "end_frame": 45,
        "states": [...],
        "actions": [...],
        "valid_frame_ids": [...],
        "joints_world": [...],
    },
    ...
]
__init__(hand_action_field: str = 'hand_action_tags', segment_field: str = 'atomic_action_segments', speed_smooth_window: int = 5, min_window: int = 15, min_segment_frames: int = 8, max_segment_frames: int = 300, hand_type: str = 'both', *args, **kwargs)[source]#

Initialization method.

Parameters:
  • hand_action_field โ€“ Meta field storing merged hand action results (output of VideoClipReassemblyMapper).

  • segment_field โ€“ Output meta field for atomic segments.

  • speed_smooth_window โ€“ Window size for Savitzky-Golay smoothing of the speed signal before minima detection. Must be odd.

  • min_window โ€“ Half-window size for local minima detection. A frame is a local minimum only if it is the minimum within [t - min_window, t + min_window]. Larger values โ†’ fewer, longer segments.

  • min_segment_frames โ€“ Minimum frames per segment. Segments shorter than this are merged with neighbors.

  • max_segment_frames โ€“ Maximum frames per segment. Segments longer than this are forcibly split at the deepest speed minimum.

  • hand_type โ€“ Which hand(s) to segment: โ€˜leftโ€™, โ€˜rightโ€™, or โ€˜bothโ€™.

process_single(sample=None, rank=None)[source]#

For sample level, sample โ€“> sample

Parameters:

sample โ€“ sample to process

Returns:

processed sample