data_juicer.ops.mapper.video_split_by_duration_mapper module#

data_juicer.ops.mapper.video_split_by_duration_mapper.create_replacer(replacements)[source]#
class data_juicer.ops.mapper.video_split_by_duration_mapper.VideoSplitByDurationMapper(split_duration: float = 10, min_last_split_duration: float = 0, keep_original_sample: bool = True, *args, **kwargs)[source]#

Bases: Mapper

Mapper to split video by duration.

__init__(split_duration: float = 10, min_last_split_duration: float = 0, keep_original_sample: bool = True, *args, **kwargs)[source]#

Initialization method.

Parameters:
  • split_duration โ€“ duration of each video split in seconds.

  • min_last_split_duration โ€“ The minimum allowable duration in seconds for the last video split. If the duration of the last split is less than this value, it will be discarded.

  • keep_original_sample โ€“ whether to keep the original sample. If itโ€™s set to False, there will be only cut sample in the final datasets and the original sample will be removed. Itโ€™s True in default.

  • args โ€“ extra args

  • kwargs โ€“ extra args

split_videos_by_duration(video_key, container)[source]#
process_batched(samples)[source]#