AI Betting Blind Spot: Major Models Lose Money on Premier League Predictions, xAI's Grok Worst Performer
A stark new benchmark reveals a critical weakness in today's most advanced AI: they are terrible at making money by predicting real-world events over time. In a simulated betting exercise across an entire Premier League season, AI models from Google, OpenAI, and Anthropic all ended up with negative returns. The study, conducted by London-based AI startup General Reasoning, exposes a significant gap between AI's prowess in narrow tasks like coding and its failure to navigate the complex, probabilistic dynamics of sports outcomes.
The "KellyBench" report tested eight leading AI systems, including xAI's Grok, which performed the worst. Researchers provided each model with extensive historical data and statistics from the 2023–24 Premier League season, instructing them to build predictive models to maximize financial returns while managing risk. Despite this rich information diet, the AIs consistently failed to translate data into profitable betting strategies. This wasn't a simple quiz; it was a prolonged simulation of real-world decision-making under uncertainty, and the models fell short.
The findings signal a fundamental challenge for AI deployment in finance, logistics, and strategic planning—any domain requiring long-term, probabilistic reasoning about messy real-world systems. While AI excels in bounded environments, its inability to reliably analyze and bet on soccer matches suggests current systems lack a deeper, causal understanding of how events unfold. For companies betting on AI for forecasting and risk assessment, this benchmark serves as a concrete warning: the most hyped models may still be poor substitutes for human judgment in complex, dynamic scenarios.