Claude Capybara vs Opus: What Actually Changed
Claude Capybara is not Opus with better scores. It is a new tier entirely — Anthropic created the Capybara tier specifically because the gap between this model and Opus 4.6 was too large for a version number. Leaked internal documents describe “dramatically higher scores” in coding, reasoning, and cybersecurity, plus capabilities that Opus does not have at all. We cover this further in our GPT-5 comparison article.

This comparison breaks down every dimension where Capybara and Opus differ: benchmarks, capabilities, pricing, availability, and which model you should actually use.
The Fundamental Difference: Tier vs Version
Understanding why Anthropic created a new tier instead of releasing “Opus 5” matters for interpreting every comparison that follows.
Why Not Just Opus 5?
When Anthropic improved from Opus 4 to Opus 4.5, and from 4.5 to 4.6, each version brought incremental improvements within the same capability class. The pricing stayed the same. The model felt smarter but fundamentally similar.
Capybara breaks this pattern. The leaked documents describe the improvement as a “step change” — not incremental but qualitative. The model reportedly creates “deep connective tissue between ideas and knowledge” that represents an architectural innovation, not just scaling. Anthropic decided that calling this Opus 5 would understate the difference.
What “Step Change” Means Practically
A version bump (Opus 4.5 → 4.6) means the same tasks get done slightly better. A tier change (Opus → Capybara) means new tasks become possible. Capybara reportedly handles problems that Opus cannot solve at all — not problems it solves poorly, but problems where Opus fails entirely.
The most concrete example is proactive vulnerability discovery. Opus can scan code for known vulnerability patterns. Capybara reportedly discovers unknown vulnerabilities — zero-days that have never been documented. This is not a better version of what Opus does; it is a different capability.
Benchmark Comparison
No official Capybara benchmarks have been published. But the leaked documents provide enough information to establish the relative positioning.
What We Know About Opus 4.6
| Benchmark | Opus 4.6 Score | Status |
|---|---|---|
| SWE-Bench Verified | 80.8% | Leads most public models |
| GPQA Diamond | 91.31% | Leads all public models |
| Terminal-Bench 2.0 | 65.4% | Strong performance |
| ARC-AGI-2 | Not published | — |
Opus 4.6 is already a frontier model. It leads on academic reasoning (GPQA Diamond) and performs strongly on coding benchmarks. It is not a model with obvious weaknesses — making Capybara’s claimed improvements all the more significant.
What Leaked Documents Say About Capybara
The leaked drafts use specific language for different capability areas. “Dramatically higher scores” in software coding — the strongest improvement claim. “Significantly improved” in academic reasoning. “Far ahead of any other AI model” in cybersecurity — not just ahead of Opus, but ahead of every model from every lab.
If “dramatically higher” in coding means even a 10-point improvement over Opus’s 80.8% SWE-Bench score, Capybara would be in the 90%+ range — territory no public model has reached.
Capability-by-Capability Comparison
Coding
Opus 4.6 handles multi-file editing, architectural reasoning, debugging, and code generation across major languages. It powers Claude Code effectively and handles most development workflows.
Capybara reportedly delivers a “step change” in coding. Specific improvements include better handling of large codebases, fewer incorrect suggestions, more reliable autonomous code generation, and multi-language support extending to systems languages and domain-specific frameworks. The gap is described as qualitative — not just faster or more accurate, but able to handle complexity levels where Opus fails.
Reasoning
Opus 4.6 leads on GPQA Diamond (91.31%), handles complex logical chains, mathematical proofs, and scientific analysis effectively.
Capybara is “significantly improved” in academic reasoning. More importantly, it adds cross-domain reasoning — synthesizing insights across different fields. Where Opus reasons well within a single domain, Capybara reportedly connects ideas from biology to software architecture, from economics to system design.
Cybersecurity
Opus 4.6 can analyze code for known vulnerability patterns, assist with security audits, and help with threat modeling when prompted.
Capybara is “far ahead of any other AI model in cyber capabilities.” This is Anthropic’s strongest single claim. Capybara proactively discovers unknown vulnerabilities, identifies zero-days at scale, and performs attack surface analysis. This capability does not exist in Opus — it is entirely new.
Agent Workflows
Opus 4.6 supports multi-step autonomous tasks through Claude Code and the API. It handles tool use, code execution, and chained operations, but sometimes loses context or makes poor decisions about when to proceed versus ask for input.
Capybara has “greater consistency in autonomous multi-step task execution” with “fewer failures in long chains” and “better judgment about when to pause for human input.” These improvements address specific, documented pain points in Opus agent behavior.
Pricing Comparison
Current Opus Pricing
| Input | Output | |
|---|---|---|
| Standard | $5/MTok | $25/MTok |
| Batch | $2.50/MTok | $12.50/MTok |
| Fast Mode | $30/MTok | $150/MTok |
Expected Capybara Pricing
No official pricing. Community estimates range from 2x to 5x Opus — meaning $10-25/MTok input, $50-125/MTok output. Leaked documents confirm the model is “expensive to run.”
Cost Per Task vs Cost Per Token
The pricing gap narrows when you consider effective cost. If Capybara completes a complex task in one attempt that takes Opus three attempts, the per-task cost may be comparable despite higher per-token rates. For simple tasks where Opus succeeds easily, Capybara is pure overspend.
Availability Comparison
| Opus 4.6 | Capybara | |
|---|---|---|
| API access | Available now | Restricted early access |
| Claude.com | Pro and Team plans | Not available |
| AWS Bedrock | Available | Not available |
| Google Vertex AI | Available | Not available |
| Microsoft Foundry | Available | Not available |
| Batch API | Available | TBD |
| Prompt Caching | Available | TBD |
Opus 4.6 is fully available across all platforms and plans. Capybara is available to nobody outside Anthropic’s selected early access group.
When to Use Each Model
Use Opus When
Most tasks should use Opus. It is already a frontier model that handles the vast majority of development, reasoning, and analysis tasks effectively. Use Opus for standard coding assistance, content generation, data analysis, code review, architectural discussions, and anything where reliability and availability matter.
Opus is also the right choice whenever cost matters. At $5/MTok input versus Capybara’s estimated $10-25/MTok, Opus delivers excellent performance at a fraction of the cost.
Use Capybara When
Reserve Capybara for tasks where Opus fails or where the stakes justify the premium. Security vulnerability scanning where you need proactive zero-day discovery. Ultra-complex refactoring across massive codebases where Opus loses consistency. Multi-step autonomous workflows that require perfect judgment over long chains. Cross-domain analysis that requires synthesizing insights from multiple fields.
Capybara is also the clear choice for any task where cybersecurity analysis is primary. No other model — from any lab — matches its reported capabilities in this domain.
The Routing Strategy
Most development teams will use both models with intelligent routing. Simple queries go to Sonnet. Standard complex tasks go to Opus. The hardest problems — and anything security-critical — go to Capybara. The Claude API makes this routing trivial since only the model parameter changes between calls.
Questions About Claude Capybara vs Opus
Is Capybara better than Opus at everything?
Yes, based on leaked documents. But “better” comes with “more expensive” and “not yet available.” For most tasks, Opus 4.6 is more than sufficient and available right now.
Will Opus be discontinued when Capybara launches?
No. Anthropic maintains all tiers simultaneously. Haiku, Sonnet, and Opus continue to serve their respective use cases. Capybara adds a new tier above Opus; it does not replace it.
How much more expensive is Capybara than Opus?
Community estimates range from 2x to 5x Opus pricing. At minimum, expect $10/MTok input and $50/MTok output. Leaked documents confirm the model is “expensive to run.”
Should I wait for Capybara instead of using Opus?
No. Build with Opus now. The API is identical, so switching to Capybara later requires changing one parameter. Waiting means months of lost productivity for a model upgrade that many tasks will not need.
Can Opus do anything Capybara cannot?
Not in terms of capability. But Opus is available now, costs less, and has well-documented behavior. Capybara’s advantage is in capability ceiling; Opus’s advantage is in accessibility and cost.
