Peter Balogh
Certain transformer attention heads in language models act as membership testers, identifying repeated tokens with high precision, similar to Bloom filters.
Researchers have discovered that some parts of transformer-based language models, known as attention heads, function similarly to membership testers. These attention heads are able to determine if a word or token has appeared previously in a given text. This ability is akin to Bloom filters, a type of data structure used in computing to test whether an element is a member of a set. The study examined several models and found that these attention heads are quite effective, even outperforming traditional Bloom filters in certain cases. This discovery suggests that these attention heads play a significant role in processing repeated words and contribute to the overall understanding of text by the model.