Adrián Girón, Pablo Miralles, Javier Huertas-Tato, Sergio D'Antonio, David Camacho
xList-Hate is a new framework for hate speech detection that uses a checklist-based approach for improved interpretability and robustness across different datasets and conditions.
Hate speech detection is often treated as a straightforward task of labeling content as hateful or not, but this overlooks the complexity of what constitutes hate speech. The xList-Hate framework breaks down the process into a series of clear questions that a language model answers, creating a detailed profile of the content. These answers are then used to make a final decision about whether the content is hateful, using a simple decision tree that is easy to understand. This method not only provides more transparent results but also works better across different datasets and situations, making it a promising tool for content moderation.