PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

ArXivSource

Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang

cs.CV
cs.LG
|
Dec 26, 2025
2 views

One-line Summary

The study shows that targeting high-entropy tokens in vision-language models can efficiently degrade performance and expose safety vulnerabilities with fewer resources.

Plain-language Overview

Vision-language models (VLMs) are great at understanding and generating text based on images, but they can still be tricked by adversarial attacks. This research found that only a small portion of the words (about 20%) in a sentence, which are the most uncertain or unpredictable, are key to changing the model's output. By focusing on these critical words, attackers can disrupt the model's performance just as effectively as more widespread attacks, but with fewer resources. This method can even affect different types of VLMs, showing that these models have a common weak spot.

Technical Details