News
So far, the new test, called ARC-AGI-2, has stumped most models. “Reasoning” AI models like OpenAI’s o1-pro and DeepSeek’s R1 score between 1% and 1.3% on ARC-AGI-2, according to the Arc ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results