Bogdan Zagribelnyy, Ivan Ilin, Maksim Kuznetsov, Nikita Bondarev, Roman Schutski, Thomas MacDougall, Rim Shayakhmetov, Zulfat Miftakhutdinov, Mikolaj Mizera, Vladimir Aladinskiy, Alex Aliper, Alex Zhavoronkov
This paper introduces a new benchmarking framework for retrosynthesis using large language models (LLMs), emphasizing chemical plausibility over exact matches and presenting a novel dataset, CREED, to improve LLM performance.
In the field of drug discovery, large language models (LLMs) are being used to plan chemical syntheses. However, current methods to evaluate their effectiveness are limited because they focus on matching a single correct answer rather than considering multiple plausible solutions. This research proposes a new way to assess these models by focusing on how plausible their solutions are, rather than just whether they match a single 'correct' answer. The authors also introduce a new dataset called CREED, which contains millions of validated chemical reactions, and show that using this data can help improve the performance of LLMs in planning chemical syntheses.