PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs

ArXivSource

Yuxuan Liu, Yuntian Shi, Kun Wang, Haoting Shen, Kun Yang

cs.AI
|
Feb 3, 2026
18 views

One-line Summary

CSR-Bench is a benchmark for evaluating the cross-modal safety and reliability of multimodal large language models, revealing systematic alignment gaps and trade-offs between over-rejection and safety.

Plain-language Overview

Researchers have developed a new benchmark called CSR-Bench to test how well multimodal large language models (MLLMs) handle both text and images together. These models often fail to understand joint intents across modalities, leading to issues like bias, hallucination, and safety failures. By testing 16 different MLLMs, the study found that these models struggle with aligning text and image inputs, often defaulting to text dominance and showing safety weaknesses. The study also highlights a trade-off where models that avoid rejecting inputs too much might compromise on safe and fair behavior.

Technical Details