data_juicer.ops.mapper.image_diffusion_mapper module#

class data_juicer.ops.mapper.image_diffusion_mapper.ImageDiffusionMapper(hf_diffusion: str = 'CompVis/stable-diffusion-v1-4', trust_remote_code: bool = False, torch_dtype: str = 'fp32', revision: str = 'main', strength: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 0.8, guidance_scale: float = 7.5, aug_num: Annotated[int, Gt(gt=0)] = 1, keep_original_sample: bool = True, caption_key: str | None = None, hf_img2seq: str = 'Salesforce/blip2-opt-2.7b', save_dir: str = None, *args, **kwargs)[源代码]#

基类:Mapper

Generate image by diffusion model

__init__(hf_diffusion: str = 'CompVis/stable-diffusion-v1-4', trust_remote_code: bool = False, torch_dtype: str = 'fp32', revision: str = 'main', strength: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 0.8, guidance_scale: float = 7.5, aug_num: Annotated[int, Gt(gt=0)] = 1, keep_original_sample: bool = True, caption_key: str | None = None, hf_img2seq: str = 'Salesforce/blip2-opt-2.7b', save_dir: str = None, *args, **kwargs)[源代码]#

Initialization method.

参数:
  • hf_diffusion -- diffusion model name on huggingface to generate the image.

  • torch_dtype -- the floating point type used to load the diffusion model. Can be one of ['fp32', 'fp16', 'bf16']

  • revision -- The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.

  • strength -- Indicates extent to transform the reference image. Must be between 0 and 1. image is used as a starting point and more noise is added the higher the strength. The number of denoising steps depends on the amount of noise initially added. When strength is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in num_inference_steps. A value of 1 essentially ignores image.

  • guidance_scale -- A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.

  • aug_num -- The image number to be produced by stable-diffusion model.

  • keep_candidate_mode --

    retain strategy for the generated $caption_num$ candidates.

    'random_any': Retain the random one from generated captions

    'similar_one_simhash': Retain the generated one that is most

    similar to the original caption

    'all': Retain all generated captions by concatenation

备注

This is a batched_OP, whose input and output type are both list. Suppose there are $N$ list of input samples, whose batch size is $b$, and denote caption_num as $M$. The number of total samples after generation is $2Nb$ when keep_original_sample is True and $Nb$ when keep_original_sample is False. For 'random_any' and 'similar_one_simhash' mode, it's $(1+M)Nb$ for 'all' mode when keep_original_sample is True and $MNb$ when keep_original_sample is False.

参数:
  • caption_key -- the key name of fields in samples to store captions for each images. It can be a string if there is only one image in each sample. Otherwise, it should be a list. If it's none, ImageDiffusionMapper will produce captions for each images.

  • hf_img2seq -- model name on huggingface to generate caption if caption_key is None.

  • save_dir -- The directory where generated image files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.

process_batched(samples, rank=None, context=False)[源代码]#

备注

This is a batched_OP, whose the input and output type are both list. Suppose there are $N$ input sample list with batch size as $b$, and denote aug_num as $M$. the number of total samples after generation is $(1+M)Nb$.

参数:

samples

返回: