1. N-Day-Bench: Frontier LLMs Face Live Test Against Real GitHub Vulnerabilities
A new benchmark is putting frontier large language models to the ultimate test: can they find real, known security vulnerabilities in live, high-profile codebases before the patch is applied? N-Day-Bench addresses the critical flaw in static AI security tests—data contamination and memorization—by constructing a fresh,...