Adnan Al Ali, Jindřich Helcl, Jindřich Libovický
The study finds no systematic bias against non-native Czech speakers in LLM-based text detectors and shows that modern detectors do not rely on perplexity to identify generated text.
Researchers have been concerned that language model-based tools, used to detect if text is AI-generated, might unfairly flag writings by non-native speakers as AI-generated. This study revisits these concerns in the context of the Czech language. The findings reveal that the complexity of texts written by non-native Czech speakers is not lower than those by native speakers, which was previously thought to lead to false flags. Moreover, the study finds no systematic bias against non-native speakers in current detection tools, and these tools no longer depend on text complexity as a key factor for their analysis.