Naen Xu, Jinghuai Zhang, Changjiang Li, Hengyu An, Chunyi Zhou, Jun Wang, Boyu Xu, Yuyuan Li, Tianyu Du, Shouling Ji
Large vision-language models struggle to recognize and respect copyrighted content, prompting the need for enhanced copyright compliance tools.
Large vision-language models (LVLMs) are powerful tools that can understand and generate content from both images and text. However, there's a concern about whether these models can recognize and respect copyrighted material, like book excerpts or song lyrics, which is crucial to avoid legal issues. This study evaluated various LVLMs using a large dataset to see how well they handle copyrighted content. The results showed that even the most advanced models often fail to recognize copyright notices, highlighting the need for improved systems to ensure they comply with copyright laws.