data_juicer.ops.filter.image_nsfw_filter module#

class data_juicer.ops.filter.image_nsfw_filter.ImageNSFWFilter(hf_nsfw_model: str = 'Falconsai/nsfw_image_detection', trust_remote_code: bool = False, max_score: float = 0.5, any_or_all: str = 'any', *args, **kwargs)[source]#

Bases: Filter

Filter to keep samples whose images have low nsfw scores.

__init__(hf_nsfw_model: str = 'Falconsai/nsfw_image_detection', trust_remote_code: bool = False, max_score: float = 0.5, any_or_all: str = 'any', *args, **kwargs)[source]#

Initialization method.

Parameters:

hf_nsfw_model – nsfw detection model name on huggingface.
max_score – the nsfw score threshold for samples. range from 0 to 1. Samples with nsfw score less than this threshold will be kept.
any_or_all – keep this sample with ‘any’ or ‘all’ strategy of all images. ‘any’: keep this sample if any images meet the condition. ‘all’: keep this sample only if all images meet the condition.
args – extra args
kwargs – extra args

compute_stats_single(sample, rank=None, context=False)[source]#

Compute stats for the sample which is used as a metric to decide whether to filter this sample.

Parameters:

sample – input sample.
context – whether to store context information of intermediate vars in the sample temporarily.

Returns:

sample with computed stats

process_single(sample, rank=None)[source]#

For sample level, sample –> Boolean.

Parameters:: sample – sample to decide whether to filter
Returns:: true for keeping and false for filtering

data_juicer.ops.filter.image_nsfw_filter module#

This Page