data_juicer.ops.mapper.video_captioning_from_summarizer_mapper module#

class data_juicer.ops.mapper.video_captioning_from_summarizer_mapper.VideoCaptioningFromSummarizerMapper(hf_summarizer: str = None, trust_remote_code: bool = False, consider_video_caption_from_video: bool = True, consider_video_caption_from_audio: bool = True, consider_video_caption_from_frames: bool = True, consider_video_tags_from_audio: bool = True, consider_video_tags_from_frames: bool = True, vid_cap_from_vid_args: Dict | None = None, vid_cap_from_frm_args: Dict | None = None, vid_tag_from_aud_args: Dict | None = None, vid_tag_from_frm_args: Dict | None = None, keep_tag_num: Annotated[int, Gt(gt=0)] = 5, keep_original_sample: bool = True, *args, **kwargs)[source]#

Bases: Mapper

Mapper to generate video captions by summarizing several kinds of generated texts (captions from video/audio/frames, tags from audio/frames, โ€ฆ)

__init__(hf_summarizer: str = None, trust_remote_code: bool = False, consider_video_caption_from_video: bool = True, consider_video_caption_from_audio: bool = True, consider_video_caption_from_frames: bool = True, consider_video_tags_from_audio: bool = True, consider_video_tags_from_frames: bool = True, vid_cap_from_vid_args: Dict | None = None, vid_cap_from_frm_args: Dict | None = None, vid_tag_from_aud_args: Dict | None = None, vid_tag_from_frm_args: Dict | None = None, keep_tag_num: Annotated[int, Gt(gt=0)] = 5, keep_original_sample: bool = True, *args, **kwargs)[source]#

Initialization method.

Parameters:
  • hf_summarizer โ€“ the summarizer model used to summarize texts generated by other methods.

  • trust_remote_code โ€“ whether to trust the remote code of HF models.

  • consider_video_caption_from_video โ€“ whether to consider the video caption generated from video directly in the summarization process. Default: True.

  • consider_video_caption_from_audio โ€“ whether to consider the video caption generated from audio streams in the video in the summarization process. Default: True.

  • consider_video_caption_from_frames โ€“ whether to consider the video caption generated from sampled frames from the video in the summarization process. Default: True.

  • consider_video_tags_from_audio โ€“ whether to consider the video tags generated from audio streams in the video in the summarization process. Default: True.

  • consider_video_tags_from_frames โ€“ whether to consider the video tags generated from sampled frames from the video in the summarization process. Default: True.

  • vid_cap_from_vid_args โ€“ the arg dict for video captioning from video directly with keys are the arg names and values are the arg values. Default: None.

  • vid_cap_from_frm_args โ€“ the arg dict for video captioning from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.

  • vid_tag_from_aud_args โ€“ the arg dict for video tagging from audio streams in the video with keys are the arg names and values are the arg values. Default: None.

  • vid_tag_from_frm_args โ€“ the arg dict for video tagging from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.

  • keep_tag_num โ€“ max number N of tags from sampled frames to keep. Too many tags might bring negative influence to summarized text, so we consider to only keep the N most frequent tags. Default: 5.

  • keep_original_sample โ€“ whether to keep the original sample. If itโ€™s set to False, there will be only summarized captions in the final datasets and the original captions will be removed. Itโ€™s True in default.

  • args โ€“ extra args

  • kwargs โ€“ extra args

process_batched(samples, rank=None)[source]#