data_juicer.ops.mapper.mllm_mapper module#

class data_juicer.ops.mapper.mllm_mapper.MllmMapper(hf_model: str = 'llava-hf/llava-v1.6-vicuna-7b-hf', max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, *args, **kwargs)[source]#

Bases: Mapper

Mapper to use MLLMs for visual question answering tasks. This operator uses a Hugging Face model to generate answers based on input text and images. It supports models like llava-hf/llava-v1.6-vicuna-7b-hf and Qwen/Qwen2-VL-7B-Instruct. The operator processes each sample, loading and processing images, and generating responses using the specified model. The generated responses are appended to the sampleโ€™s text field. The key parameters include the model ID, maximum new tokens, temperature, top-p sampling, and beam search size, which control the generation process.

__init__(hf_model: str = 'llava-hf/llava-v1.6-vicuna-7b-hf', max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, *args, **kwargs)[source]#

Initialization method. :param hf_model: hugginface model id. :param max_new_tokens: the maximum number of new tokens

generated by the model.

Parameters:
  • temperature โ€“ used to control the randomness of generated text. The higher the temperature, the more random and creative the generated text will be.

  • top_p โ€“ randomly select the next word from the group of words whose cumulative probability reaches p.

  • num_beams โ€“ the larger the beam search size, the higher the quality of the generated text.

  • args โ€“ extra args

  • kwargs โ€“ extra args

process_single(sample=None, rank=None)[source]#

For sample level, sample โ€“> sample

Parameters:

sample โ€“ sample to process

Returns:

processed sample