Yunseok Han, Yejoon Lee, Jaeyoung Do
RFEval benchmarks reasoning faithfulness in large reasoning models, revealing that nearly half of the outputs are unfaithful, especially in math and code tasks, and that accuracy does not reliably indicate faithfulness.
Large reasoning models often generate explanations that seem logical but don't truly reflect the reasoning behind their decisions, which can erode trust. The study introduces a framework to evaluate the faithfulness of these models by checking if their reasoning is consistent and causally linked to their answers. They developed a benchmark called RFEval to test this, revealing that almost half of the model outputs are unfaithful, particularly in complex areas like math and coding. The research highlights that just because a model is accurate doesn't mean it is reasoning faithfully, emphasizing the need for models to not only deliver correct answers but also to demonstrate sound reasoning processes.