RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference
Siran Liu, Guoxia Wang et al.
TLDR: RRAttention is a novel attention mechanism that reduces computational complexity while maintaining performance in processing long contexts by using a dynamic block sparse attention method with a round-robin sampling strategy.