The nonprofit Center for AI Safety and Scale AI have released a challenging new benchmark for frontier AI systems.
ByteDance released Doubao-1.5-pro, an upgrade to its flagship AI model, which it claims outperforms OpenAI's o1 in AIME.
A new set of much more challenging evals has emerged in response, created by companies, nonprofits, and governments. Yet even on the most advanced evals, AI systems are making astonishing progress.