data_juicer.utils.video_utils module#
- class data_juicer.utils.video_utils.VideoMetadata(height: int | None = None, width: int | None = None, fps: float | None = None, num_frames: int | None = None, duration: float | None = None)[source]#
Bases:
objectMetadata for video content.
This class stores essential video properties such as resolution, frame rate, duration.
- height: int | None = None#
- width: int | None = None#
- fps: float | None = None#
- num_frames: int | None = None#
- duration: float | None = None#
- __init__(height: int | None = None, width: int | None = None, fps: float | None = None, num_frames: int | None = None, duration: float | None = None) None#
- class data_juicer.utils.video_utils.Frames(frames: List[ndarray[tuple[Any, ...], dtype[uint8]]], indices: List[int] | None = None, pts_time: List[float] | None = None)[source]#
Bases:
object- frames: List[ndarray[tuple[Any, ...], dtype[uint8]]]#
- indices: List[int] | None#
- pts_time: List[float] | None#
- __init__(frames: List[ndarray[tuple[Any, ...], dtype[uint8]]], indices: List[int] | None = None, pts_time: List[float] | None = None) None#
Method generated by attrs for class Frames.
- class data_juicer.utils.video_utils.Clip(source_video: str, span: tuple[float, float], id: str | None = None, path: str | None = None, encoded_data: bytes | None = None, frames: List[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None)[source]#
Bases:
objectContainer for video clip data including metadata, frames, and processing results.
This class stores information about a video segment, including its source, span, frames and so on.
- source_video: str#
- span: tuple[float, float]#
- id: str | None#
- path: str | None#
- encoded_data: bytes | None#
- frames: List[ndarray[tuple[Any, ...], dtype[uint8]]] | None#
- __init__(source_video: str, span: tuple[float, float], id: str | None = None, path: str | None = None, encoded_data: bytes | None = None, frames: List[ndarray[tuple[Any, ...], dtype[uint8]]] | None = None) None#
Method generated by attrs for class Clip.
- class data_juicer.utils.video_utils.VideoReader(video_source: str | Path | bytes | IO[bytes])[source]#
Bases:
ABCAbstract class for video processing.
This class provides an interface for video processing tasks such as extracting frames, key frames, and clipping.
- __init__(video_source: str | Path | bytes | IO[bytes])[source]#
Initialize video reader.
- Parameters:
video_source โ Path, URL, bytes, or file-like object.
- property metadata#
- abstractmethod get_metadata() VideoMetadata[source]#
Get video metadata.
- abstractmethod extract_frames(start_time: float = 0, end_time: float | None = None) Iterator[ndarray][source]#
Yield frames between [start_time, end_time) as numpy arrays.
- Parameters:
start_time โ Start time in seconds (inclusive)
end_time โ End time in seconds (exclusive). If None, extract to end of video.
- abstractmethod extract_keyframes(start_time: float = 0, end_time: float | None = None) Frames[source]#
Extract keyframes and return them in a Frames object.
- Parameters:
start_time โ Start time in seconds (inclusive)
end_time โ End time in seconds (exclusive). If None, extract to end of video.
- abstractmethod extract_clip(start_time: float = 0, end_time: float | None = None, output_path: str = None, to_numpy: bool = True) Clip | None[source]#
Extract a subclip.
- Parameters:
start_time โ Start time in seconds
end_time โ End time in seconds. If None, extract to end of video.
output_path โ The path to save the output video clip. If provided, the clip is saved to a file.
to_numpy โ Whether to return frames as a list of numpy arrays.
- Returns:
A Clip object on success, or None on failure.
- class data_juicer.utils.video_utils.AVReader(video_source: str | Path | bytes | IO[bytes], video_stream_index: int = 0, frame_format: str = 'rgb24')[source]#
Bases:
VideoReaderVideo reader using the AV library.
- __init__(video_source: str | Path | bytes | IO[bytes], video_stream_index: int = 0, frame_format: str = 'rgb24')[source]#
Initialize AVReader.
- Parameters:
video_source โ Path, URL, bytes, or file-like object.
video_stream_index โ Video stream index to decode, default set to 0.
frame_format โ Frame format to decode, default set to โrgb24โ.
- get_metadata() VideoMetadata[source]#
Get video metadata.
- extract_frames(start_time: float | None = 0.0, end_time: float | None = None) Iterator[ndarray][source]#
Get the videoโs frames from the container within a specified time range.
- Parameters:
start_time โ Start time in seconds (default: 0.0).
end_time โ End time in seconds (exclusive). If None, decode until end.
- Returns:
Iterator of numpy objects within the specified time range.
- extract_keyframes(start_time: float = 0, end_time: float | None = None)[source]#
Extract key frames.
- Parameters:
start_time โ Start time in seconds (default: 0.0).
end_time โ End time in seconds (exclusive). If None, decode until end.
- Returns:
Iterator of numpy objects within the specified time range.
- extract_clip(start_time, end_time, output_path: str = None, to_numpy: bool = True)[source]#
Extract a clip from the video based on the start and end time.
- Parameters:
start_time โ the start time in second.
end_time โ the end time in second. If itโs None, this function will cut the video from the start_seconds to the end of the video.
output_path โ the path to output video.
- Returns:
Clip object. If output_path is not None, it will save the clip to output_path. If to_numpy is True, it will return clip data as numpy array and save to Clip.frames. If to_numpy is False, it will return clip data as bytes and save to Clip.encoded_data.
- class data_juicer.utils.video_utils.FFmpegReader(video_source: str | Path | bytes | IO[bytes], video_stream_index: int = 0, frame_format: str = 'rgb24')[source]#
Bases:
VideoReaderVideo reader using FFmpeg.
- __init__(video_source: str | Path | bytes | IO[bytes], video_stream_index: int = 0, frame_format: str = 'rgb24')[source]#
Initialize FFmpegReader.
- Parameters:
video_source โ Path, URL, bytes, or file-like object.
video_stream_index โ Video stream index to decode, default set to 0.
frame_format โ Frame format, default set to โrgb24โ.
- get_metadata() VideoMetadata[source]#
Get video metadata.
- extract_frames(start_time: float | None = 0.0, end_time: float | None = None) Iterator[ndarray][source]#
Get the videoโs frames within a specified time range.
- Parameters:
start_time โ Start time in seconds (default: 0.0).
end_time โ End time in seconds (exclusive). If None, decode until end.
duration โ Duration from start_time. Mutually exclusive with end_time.
- Returns:
Iterator of VideoFrame objects within the specified time range.
- extract_keyframes(start_time: float = 0, end_time: float | None = None)[source]#
Extract only true keyframes (I-frames) from video.
- extract_clip(start_time, end_time, output_path: str = None, to_numpy=True, **kwargs)[source]#
Extract a clip from the video based on the start and end time. :param output_path: the path to output video. :param start_time: the start time in second. :param end_time: the end time in second. If itโs None, this function
will cut the video from the start_seconds to the end of the video.
- Parameters:
to_numpy โ whether to return clip data as numpy array and save to Clip.frames.
- Returns:
Clip object. If output_path is not None, it will save the clip to output_path. If to_numpy is True, it will return clip data as numpy array and save to Clip.frames. If to_numpy is False, it will return clip data as bytes and save to Clip.encoded_data.
- class data_juicer.utils.video_utils.DecordReader(video_source: str | Path | bytes | IO[bytes])[source]#
Bases:
VideoReaderVideo reader using Decord
- __init__(video_source: str | Path | bytes | IO[bytes])[source]#
Initialize the video reader.
- Parameters:
video_source โ Path, URL, bytes, or file-like object.
- get_metadata() VideoMetadata[source]#
Get video metadata.
- extract_frames(start_time: float | None = 0.0, end_time: float | None = None) Iterator[ndarray][source]#
Get the videoโs frames within a specified time range using decord.
- Parameters:
start_time โ Start time in seconds (default: 0.0).
end_time โ End time in seconds (exclusive). If None, decode until end.
- Returns:
Numpy array of frames in shape (num_frames, height, width, channels).
- extract_keyframes(start_time: float = 0, end_time: float | None = None)[source]#
Extract keyframes and return them in a Frames object.
- Parameters:
start_time โ Start time in seconds (inclusive)
end_time โ End time in seconds (exclusive). If None, extract to end of video.
- extract_clip(start_time, end_time, output_path: str = None, to_numpy=True)[source]#
Extract a clip from the video based on the start and end time.
- Parameters:
start_time โ the start time in second.
end_time โ the end time in second. If itโs None, this function will cut the video from the start_seconds to the end of the video.
output_path โ the path to output video.
to_numpy โ whether to return clip data as numpy array and save to Clip.frames.
- Returns:
Clip object.
- data_juicer.utils.video_utils.create_video_reader(video_source: str, backend: str = 'auto', **kwargs) VideoReader[source]#