New AI benchmarks arise for useful work and agentic AI: GDPval, GAIA2, SWE-Bench Pro. AI models are getting faster, cheaper, better, and more agentic, by scaling RL post-training.
Better AI Models, New AI Evals, and Scaling…
New AI benchmarks arise for useful work and agentic AI: GDPval, GAIA2, SWE-Bench Pro. AI models are getting faster, cheaper, better, and more agentic, by scaling RL post-training.