Léo Labat, Etienne Ollion, François Yvon
This study examines how multilingual large language models (LLMs) respond to value-laden multiple-choice questions across different languages, revealing variability in consistency and language-specific behaviors.
Researchers investigated whether multilingual AI models, like chatbots, answer value-based questions consistently across different languages. They created a unique dataset with survey questions translated into eight European languages to test this. The study found that larger, well-trained models were generally more consistent, but their answers varied depending on the question. Some questions showed agreement among models, while others did not, and certain questions triggered language-specific responses, suggesting that how models are fine-tuned affects their answers.