Jian Chen, Zhuoran Wang, Jiayu Qin, Ming Li, Meng Wang, Changyou Chen, Yin Chen, Qizhen Weng, Yirui Liu
KV-CoRE is a method to evaluate the data-dependent compressibility of kv-caches in large language models, revealing patterns linked to model architecture and training data across multiple languages.
Large language models (LLMs) use kv-caches to speed up text generation, but as the context length increases, managing these caches can become a bottleneck. KV-CoRE is a new method that evaluates how well these caches can be compressed based on the specific data they handle, helping to save memory without losing much performance. By analyzing various models and datasets, the study finds that compressibility varies with model design and the languages used. This research provides a framework for more efficient and data-aware cache management in LLMs.