Fernando de Meer Pardo, Branka Hadji Misheva, Martin Braschler, Kurt Stockinger
TransClean improves entity matching accuracy by detecting false positives using transitive consistency, achieving significant F1 score improvements in multi-source datasets.
TransClean is a new method designed to improve the accuracy of entity matching algorithms, which are used to identify when different data records refer to the same entity. It works particularly well in challenging real-world scenarios where data comes from multiple sources, is noisy, and lacks labels. By focusing on the consistency of relationships between data records, TransClean can identify and remove incorrect matches (false positives) without needing extensive manual labeling. This approach leads to better overall matching performance, as demonstrated by significant improvements in accuracy across various test datasets.