Yinan Xia, Yilei Jiang, Yingshui Tan, Xiaoyong Zhu, Xiangyu Yue, Bo Zheng
MSR-Align is a new dataset designed to improve the safety and robustness of vision-language models against harmful prompts by enhancing multimodal reasoning capabilities grounded in safety policies.
Vision-Language Models (VLMs) are becoming more advanced, allowing them to perform complex reasoning tasks across both images and text. However, this also means they can be tricked into behaving unsafely with harmful prompts. Existing safety measures, which are mainly for text-based models, don't fully protect these advanced VLMs. To address this, researchers have created MSR-Align, a new dataset that helps these models understand and follow safety policies more effectively across both visual and textual inputs. By training VLMs with this dataset, the models become better at resisting unsafe prompts while still performing well in general tasks.