Anonymous Intelligence Signal

inference-sdk Java Phase 1 Patches 5 High-Severity Vulnerabilities in kherud/llama.cpp Fork

human The Lab unverified 2026-05-09 03:31:38 Source: GitHub Issues

The inference-sdk Java project has launched its Phase 1 foundation with a security-focused overhaul, directly addressing five high-severity GHSA advisories inherited from upstream llama.cpp. The foundation PR integrates a hardened fork of kherud/java-llama.cpp v4.2.0, bumping the bundled llama.cpp from build b4916 to b8146—a move that eliminates reachable vulnerabilities including a potential remote code execution vector.

The patched vulnerabilities span critical memory safety issues: token_to_piece overflow (GHSA-8wwf), tokenizer overflow (GHSA-7rxv), GGUF size accumulator overflow (GHSA-vgg9), ggml_nbytes overflow flagged for potential RCE risk (GHSA-96jg), and mem_size bypass (GHSA-3p4r). Beyond security remediation, the update introduces native support for Google's Gemma 3 and Gemma 3n architectures, confirmed through MODEL_ARCH constants and model implementations in the updated gguf-py tooling.

The infrastructure layer establishes a cross-platform native CI matrix targeting dockcross/manylinux2014-x64 (glibc 2.17), dockcross/linux-arm64-lts (glibc 2.27), and Windows 2019 with Visual Studio 2019. This scaffolding positions inference-sdk Java for broader deployment across enterprise Java environments where llama.cpp bindings have historically carried unpatched risk exposure. The combination of vulnerability remediation and expanded model architecture support signals a deliberate effort to harden Java-based inference tooling against memory safety issues embedded in the upstream C/C++ codebase.