1. Google's TurboQuant AI Compression Slashes LLM Memory Use 6x, Easing GPU Crunch
Google Research has unveiled TurboQuant, a new compression algorithm that directly targets one of generative AI's most critical bottlenecks: memory. The technique promises to reduce the memory footprint of large language models (LLMs) by up to six times while simultaneously boosting inference speed and maintaining mode...