1. Cactus Releases 26M Needle Model: Distilled Gemini Tool Calling for Budget Devices
Cactus has open-sourced Needle, a 26-million-parameter function-calling model derived from Google's Gemini architecture, targeting a significant gap in mobile and wearable AI deployment. The model achieves 6000 tokens per second on prefill and 1200 tokens per second on decode when running on consumer hardware—performan...