data_juicer.ops.mapper.video_split_by_scene_mapper module#

data_juicer.ops.mapper.video_split_by_scene_mapper.replace_func(match, scene_counts_iter)[source]#
class data_juicer.ops.mapper.video_split_by_scene_mapper.VideoSplitBySceneMapper(detector: str = 'ContentDetector', threshold: Annotated[float, Ge(ge=0)] = 27.0, min_scene_len: Annotated[int, Ge(ge=0)] = 15, show_progress: bool = False, *args, **kwargs)[source]#

Bases: Mapper

Mapper to cut videos into scene clips.

avaliable_detectors = {'AdaptiveDetector': ['window_width', 'min_content_val', 'weights', 'luma_only', 'kernel_size', 'video_manager', 'min_delta_hsv'], 'ContentDetector': ['weights', 'luma_only', 'kernel_size'], 'ThresholdDetector': ['fade_bias', 'add_final_scene', 'method', 'block_size']}#
__init__(detector: str = 'ContentDetector', threshold: Annotated[float, Ge(ge=0)] = 27.0, min_scene_len: Annotated[int, Ge(ge=0)] = 15, show_progress: bool = False, *args, **kwargs)[source]#

Initialization method.

Parameters:
  • detector – Algorithm from scenedetect.detectors. Should be one of [‘ContentDetector’, ‘ThresholdDetector’, ‘AdaptiveDetector`].

  • threshold – Threshold passed to the detector.

  • min_scene_len – Minimum length of any scene.

  • show_progress – Whether to show progress from scenedetect.

  • args – extra args

  • kwargs – extra args

process_single(sample, context=False)[source]#

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample