Zachary Berger, Daniel Prakah-Asante, John Guttag, Collin M. Stultz
Current ECG benchmarking practices are flawed and need to be expanded to include broader clinical evaluations for reliable progress in the field.
This paper highlights problems with how we currently evaluate machine learning models that interpret ECG data. The standard tests used focus too much on certain heart rhythm problems, ignoring other important clinical information that ECGs can provide. The authors suggest expanding tests to include more aspects of heart health and future patient outcomes. They also found that a simple model can perform as well as more complex ones, suggesting we need to rethink our evaluation methods.