Parity, Sensitivity, and Transformers
Alexander Kozachinskiy, Tomasz Steifer et al.
TLDR: This paper presents a new construction of a transformer that can solve the PARITY problem using a single layer with practical features, and establishes a lower bound proving that a single-layer, single-head transformer cannot solve PARITY.