TurboQuant: Google's 100x KV Cache Breakthrough and What It Means for Long-Context AI
Google's TurboQuant research reduces KV cache memory overhead by ~100x using a two-step algorithm combining PolarQuant vector rotation and Johnson-Lindenstrauss compression. This could make 2M-token context models economically feasible for many more teams.