data_juicer.ops.filter.image_nsfw_filter module#
- class data_juicer.ops.filter.image_nsfw_filter.ImageNSFWFilter(hf_nsfw_model: str = 'Falconsai/nsfw_image_detection', trust_remote_code: bool = False, max_score: float = 0.5, any_or_all: str = 'any', *args, **kwargs)[source]#
Bases:
FilterFilter to keep samples whose images have low nsfw scores.
- __init__(hf_nsfw_model: str = 'Falconsai/nsfw_image_detection', trust_remote_code: bool = False, max_score: float = 0.5, any_or_all: str = 'any', *args, **kwargs)[source]#
Initialization method.
- Parameters:
hf_nsfw_model โ nsfw detection model name on huggingface.
max_score โ the nsfw score threshold for samples. range from 0 to 1. Samples with nsfw score less than this threshold will be kept.
any_or_all โ keep this sample with โanyโ or โallโ strategy of all images. โanyโ: keep this sample if any images meet the condition. โallโ: keep this sample only if all images meet the condition.
args โ extra args
kwargs โ extra args
- compute_stats_single(sample, rank=None, context=False)[source]#
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample โ input sample.
context โ whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats