Süha Kağan Köse, Mehmet Can Baytekin, Burak Aktaş, Bilge Kaan Görür, Evren Ayberk Munis, Deniz Yılmaz, Muhammed Yusuf Kartal, Çağrı Toraman
The study develops a Turkish-specific Retrieval-Augmented Generation (RAG) dataset and benchmarks various methods, finding that complex methods like HyDE significantly improve accuracy over simpler baselines.
This research focuses on improving how AI systems generate factual information in Turkish, a language with complex word forms. The team created a new dataset from Turkish Wikipedia and CulturaX to test different methods for enhancing AI-generated answers. They found that advanced techniques can greatly increase accuracy, but also discovered that simpler, cost-effective methods can perform nearly as well. The study highlights the importance of adapting AI techniques to specific languages, especially those with rich morphological structures like Turkish.