Anthropic's new flagship model Claude Opus 4.7 beat every benchmark we threw at it, and eats tokens like a hungry teenager.
Progress in testing often comes from trying new ideas, even when they fail. Learn more on this podcast with Bart Knaack.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results