PaperPulse - AI/ML Summarization Platform

One-line Summary

The paper introduces Entity-centric Multimodal Preference Optimization (EMPO) to reduce hallucinations in Large Vision-Language Models by improving modality alignment and utilizing automatically constructed high-quality preference data.

Plain-language Overview

Large Vision-Language Models (LVLMs) are powerful tools that can perform a variety of tasks involving both images and text. However, they sometimes produce 'hallucinations,' or incorrect outputs, due to misalignment between the visual and textual information. This paper presents a new method called Entity-centric Multimodal Preference Optimization (EMPO) to better align these modalities and reduce hallucinations. By using automatically generated high-quality data, EMPO significantly decreases the occurrences of hallucinations in LVLMs.

Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization

One-line Summary

Plain-language Overview

Technical Details

Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization

One-line Summary

Plain-language Overview

Technical Details

Methodology

Data

Results