Critical 'Bleeding Llama' Vulnerability in Ollama Exposes AI Servers to Remote Memory and Data Leaks
Security researchers have disclosed a critical out-of-bounds read vulnerability in Ollama, the widely deployed open-source AI inference engine, enabling remote attackers to extract sensitive data including process memory contents, API keys, conversation prompts, and user information from exposed servers. The flaw, tracked as CVE-2024-XXXX and dubbed "Bleeding Llama," represents a significant escalation in attacks targeting AI infrastructure, according to findings published by cloud security firm Wiz Research.
The vulnerability allows unauthenticated remote actors to trigger the memory leak by sending malformed requests to Ollama's API endpoints, even on default configurations. Attackers can exploit the flaw without requiring authentication or user interaction, researchers noted. The research team demonstrated that exposed Ollama instances—particularly those with open ports accessible over the internet—could be scanned and compromised within minutes using automated tooling. In addition to the memory-leak vector, researchers identified separate vulnerabilities in Ollama's Windows implementation that could enable persistent code execution on compromised endpoints.
The disclosure underscores accelerating security risks in AI deployment pipelines, where inference servers often process sensitive data and store authentication credentials. Ollama has released patches addressing the vulnerabilities, and organizations running self-hosted AI servers are urged to restrict network exposure, implement authentication where available, and apply updates immediately. The researchers emphasized that any Ollama instance accessible from the internet should be considered potentially compromised pending investigation. The findings add to a growing list of critical vulnerabilities in AI tooling, highlighting the gap between rapid AI adoption and corresponding security hardening in production environments.