Koyena Pal, David Bau, Chandan Singh
The study finds that explanations from large reasoning models (LRMs) often generalize across different models, enhancing consistency and aligning with human preferences.
Researchers investigated whether explanations generated by large reasoning models (LRMs) can be used to understand problems in a general way, rather than being specific to one model. They found that these explanations, which are like chains of thought written out in natural language, often help different models behave more consistently. This consistency is also linked to how humans rank the quality of these explanations and can be improved further with certain techniques. The study suggests using these explanations carefully and provides a framework for evaluating their effectiveness.