Alexander Kozachinskiy, Tomasz Steifer, Przemysław Wałȩga
This paper presents a new construction of a transformer that can solve the PARITY problem using a single layer with practical features, and establishes a lower bound proving that a single-layer, single-head transformer cannot solve PARITY.
Transformers are a powerful type of neural network architecture, but we still don't fully understand their capabilities. One specific problem, called the PARITY problem, has been difficult to solve with simple transformer models. Previous solutions required complex or impractical setups. This research introduces a new, practical way to solve the PARITY problem using transformers with a single layer, while also proving that it's impossible to do so with just one head in a single-layer transformer.