Tiansheng Hu, Yilun Zhao, Canyu Zhang, Arman Cohan, Chen Zhao
The SAGE benchmark reveals that traditional BM25 outperforms LLM-based retrievers for scientific literature retrieval, with enhancements possible through document augmentation using LLMs.
Researchers are exploring how well large language model (LLM) based systems can help with retrieving scientific papers for answering complex questions. They created a benchmark called SAGE to test different retrieval systems, and found that traditional keyword-based searches (like BM25) were more effective than the newer LLM-based methods. However, by enhancing documents with additional metadata and keywords using LLMs, they were able to improve retrieval performance. This suggests that while LLMs have potential, they currently need further refinement to compete with traditional retrieval methods.