Claude 3.7 Sonnet Unleashed
Anthropic releases a powerful Claude 3.7 Sonnet that’s great for coding, adds extended thinking, and releases Claude Code agent to further support AI coding.

Anthropic Releases Claude 3.7 Sonnet
Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model on the market. - Anthropic
Anthropic released their latest and greatest model, Claude 3.7 Sonnet, a SOTA frontier AI model that boasts significant improvements over its predecessor and competitors. Claude 3.7 Sonnet excels particularly in coding, reasoning, and agentic tool use.
Key features, which we will cover further below:
Claude 3.7 Sonnet is a hybrid reasoning model that can answer queries either as a base model or using Extended Thinking for AI reasoning.
Claude 3.7 Sonnet excels in coding, achieving over 70% in SWE-bench.
Anthropic released a new command-line interface called Claude Code, Anthropic’s first agentic coding tool, in a limited research preview.
Benchmarks and The Base Model
Claude 3.7 Sonnet’s strongest area of performance is coding. Since its release last year, Claude 3.5 Sonnet has been the favorite AI model of many coders, used both for standalone code generation and in Cursor or other AI coding environments. Recent AI model releases such as o3-mini called that into question, but now Claude 3.7 Sonnet reclaimed that “Best Coding AI Model” title.
Claude 3.7 Sonnet shows a significant improvement over previous versions and other models on various benchmarks, including a 20% increase on SWE-bench, obtaining 70.3% score on SWE-bench with custom scaffolding.

The ‘vibe check’ of initial reviews have been very positive, earning Claude 3.7 hype video titles like “CLINE + Claude Sonnet 3.7 Is Completely INSANE” and Claude 3.7 just dropped and it's insane (best code model ever), while All About AI tested Claude 3.7 on various coding, creative writing, and puzzle-solving tasks and found it impressive, expressing excitement about the new model.
The Claude 3.7 Sonnet has already been integrated in AI coding tools such as Cursor, representing a big step-up in capabilities. They also have improved the Claude.AI coding experience:
Our GitHub integration is now available on all Claude plans—enabling developers to connect their code repositories directly to Claude.
Another area where Claude 3.7 Sonnet excels is agentic tool use, outperforming others in real-world tasks involving API interaction and performing highly on TAU-bench, which tracks agentic use cases.
Claude 3.7 Sonnet is a solid AI model all around. Anthropic said they focused on making Claude 3.7 better on real-world coding than benchmarks, with some benchmarks only modestly improving over Claude 3.5, while other results are stellar.
Extended Thinking
Every competitive frontier AI model in 2025 will have some form of AI reasoning and Claude 3.7 is no exception. Claude’s extended thinking provides chain-of-thought reasoning for improved accuracy and performance, using scaling of test-time compute.
The Claude extended thinking process is not hidden from users, but visible in raw form. Anthropic explained the benefits of this: Trust, alignment, and interest.
In the Claude.ai interface, there is a switch to turn on extended thinking. When using the API, users or developers can budget the amount of thinking used in a query by specifying a token limit.
How well does extended thinking perform? As with other evaluations, the more thinking tokens that are used, the more it improves results. In the GPQA benchmark, Claude 3.7 Sonnet obtained 68% without extended thinking, and improved it to a better-than-human 84% with extended thinking.

Claude’s extended thinking and agent training are a powerful combination in virtual tasks, which is shown by performing better on evaluations like OSWorld. They also give it a major boost in any interesting area: Video gameplay. They got Claude 3.7 Sonnet to play Pokémon, and it’s actually quite good.
Claude Code
Claude Code is an agentic coding tool that lives in your terminal and integrates directly with your development environment. The Claude Code environment allows you to efficiently generate and modify code through the command-line. As an interface, Claude Code reminds me of Aider (which is a great AI coding assistant):
Claude Code is an active collaborator that can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools—keeping you in the loop at every step.
Claude Coder was released to a limited beta preview, and there is a waitlist to get on. It works on macOS, Ubuntu/Linux, or Windows WSL operating systems, so pretty much any developer could adopt it.
Claude AI models are currently used in most AI coding environments, such as Replit, Cursor, and Vercel; it’s an interesting decision for them to build their own interface and compete with AI coding assistants directly.
Pricing and Access
Access: Claude 3.7 Sonnet is now available on all Claude plans, including Free tier, but you need the Pro tier or other paid plans to access their Extended Thinking mode.
Pricing: Claude 3.7 via the API is not cheap, with pricing for Claude 3.7 at $3 / million token input and $15 / million token output. This is the same as Claude 3.5, and considerable higher than alternatives such as Gemini 2.0 Pro or even o3-mini.

Limitations: One feature Claude 3.7 currently lacks is access to live information from the web. It cannot compete on use cases requiring web access or knowledge of current events. Sonnet has a knowledge cut-off of October 2024.
Safety: Anthropic published a detailed system card for Claude 3.7 Sonnet, covering their safety evaluation results in several categories, showing progress in alignment, utility and safety. For example:
Claude 3.7 Sonnet makes more nuanced distinctions between harmful and benign requests, reducing unnecessary refusals by 45% compared to its predecessor.
Conclusion - Gen 3 Models
Overall, Claude 3.7 Sonnet is an excellent frontier AI model, state-of-the-art for reasoning, coding, and AI agent applications. Anthropic is likely focused on coding applications because that’s their bread-and-butter; their work on the Anthropic Economic Index shows that 37% of Claude conversations are computer programming-related.
With Grok-3 and Claude 3.7 Sonnet released, what are the next shoes to drop? Rumors abound that GPT-4.5 will be released this week, and Llama 4 will release soon.
Why are these major AI releases happening now? Multiple major AI labs spent 2024 building new AI compute capacity on the latest Nvidia GPUs and training AI models at a scale that was impossible before. Their efforts are now bearing fruit.
Ethan Mollick is calling these newly released AI models “Gen 3 models”, suggesting pre-training scaling beyond 10^26 FLOPs is yielding the next generation of base AI models beyond GPT-4. While we don’t know the dataset size or exact FLOPS used to train Grok-3 or Claude 3.7, they confirm one thing: Scaling still matters.