data_juicer.ops.mapper.image_diffusion_mapper module#
- class data_juicer.ops.mapper.image_diffusion_mapper.ImageDiffusionMapper(hf_diffusion: str = 'CompVis/stable-diffusion-v1-4', trust_remote_code: bool = False, torch_dtype: str = 'fp32', revision: str = 'main', strength: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 0.8, guidance_scale: float = 7.5, aug_num: Annotated[int, Gt(gt=0)] = 1, keep_original_sample: bool = True, caption_key: str | None = None, hf_img2seq: str = 'Salesforce/blip2-opt-2.7b', *args, **kwargs)[source]#
Bases:
MapperGenerate image by diffusion model
- __init__(hf_diffusion: str = 'CompVis/stable-diffusion-v1-4', trust_remote_code: bool = False, torch_dtype: str = 'fp32', revision: str = 'main', strength: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 0.8, guidance_scale: float = 7.5, aug_num: Annotated[int, Gt(gt=0)] = 1, keep_original_sample: bool = True, caption_key: str | None = None, hf_img2seq: str = 'Salesforce/blip2-opt-2.7b', *args, **kwargs)[source]#
Initialization method.
- Parameters:
hf_diffusion – diffusion model name on huggingface to generate the image.
torch_dtype – the floating point type used to load the diffusion model. Can be one of [‘fp32’, ‘fp16’, ‘bf16’]
revision – The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.
strength – Indicates extent to transform the reference image. Must be between 0 and 1. image is used as a starting point and more noise is added the higher the strength. The number of denoising steps depends on the amount of noise initially added. When strength is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in num_inference_steps. A value of 1 essentially ignores image.
guidance_scale – A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.
aug_num – The image number to be produced by stable-diffusion model.
keep_candidate_mode –
retain strategy for the generated $caption_num$ candidates.
’random_any’: Retain the random one from generated captions
- ’similar_one_simhash’: Retain the generated one that is most
similar to the original caption
’all’: Retain all generated captions by concatenation
Note
This is a batched_OP, whose input and output type are both list. Suppose there are $N$ list of input samples, whose batch size is $b$, and denote caption_num as $M$. The number of total samples after generation is $2Nb$ when keep_original_sample is True and $Nb$ when keep_original_sample is False. For ‘random_any’ and ‘similar_one_simhash’ mode, it’s $(1+M)Nb$ for ‘all’ mode when keep_original_sample is True and $MNb$ when keep_original_sample is False.
- Parameters:
caption_key – the key name of fields in samples to store captions for each images. It can be a string if there is only one image in each sample. Otherwise, it should be a list. If it’s none, ImageDiffusionMapper will produce captions for each images.
hf_img2seq – model name on huggingface to generate caption if caption_key is None.
- process_batched(samples, rank=None, context=False)[source]#
Note
This is a batched_OP, whose the input and output type are both list. Suppose there are $N$ input sample list with batch size as $b$, and denote aug_num as $M$. the number of total samples after generation is $(1+M)Nb$.
- Parameters:
samples
- Returns: