Hussein S. Al-Olimat, Ahmad Alshareef
ALPS is a diagnostic challenge set designed to test deep semantic and pragmatic understanding in Arabic, revealing current model limitations in morpho-syntactic dependencies despite high fluency scores.
The ALPS challenge set is a new tool for evaluating how well AI models understand the Arabic language beyond just surface-level fluency. Unlike other benchmarks that use translated or synthetic data, ALPS is created by experts in Arabic linguistics to ensure cultural and linguistic authenticity. It consists of 531 questions that test deep understanding across various linguistic tasks. The study finds that while some top commercial AI models perform well, they still struggle with the intricacies of Arabic grammar and syntax, especially compared to human performance.