PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization

arXivSource

Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao, Min Zhang

cs.AI
|
Jun 4, 2025
1 views

One-line Summary

The paper introduces Entity-centric Multimodal Preference Optimization (EMPO) to reduce hallucinations in Large Vision-Language Models by improving modality alignment and utilizing automatically constructed high-quality preference data.

Plain-language Overview

Large Vision-Language Models (LVLMs) are powerful tools that can perform a variety of tasks involving both images and text. However, they sometimes produce 'hallucinations,' or incorrect outputs, due to misalignment between the visual and textual information. This paper presents a new method called Entity-centric Multimodal Preference Optimization (EMPO) to better align these modalities and reduce hallucinations. By using automatically generated high-quality data, EMPO significantly decreases the occurrences of hallucinations in LVLMs.

Technical Details