Claude Capybara and ASL-4: Understanding AI’s Highest Safety Level

Claude Capybara is expected to be the first AI model classified at ASL-4 — the highest safety level in Anthropic’s Responsible Scaling Policy (RSP). This classification is not just a label. It triggers specific containment protocols, external audits, and deployment restrictions that no previous model has required.

The reason is straightforward: Anthropic’s own internal assessment describes Capybara as posing “unprecedented cybersecurity risks” and being “far ahead of any other AI model in cyber capabilities.” When a company builds something it considers this dangerous, ASL-4 is the framework for handling it responsibly.

What Are AI Safety Levels

Anthropic created the AI Safety Level (ASL) framework as part of its Responsible Scaling Policy. The system defines escalating safety requirements based on model capabilities — the more powerful the model, the more stringent the controls.

The ASL Framework

ASL-1: Models with no meaningful risk of catastrophic harm. Basic language models, narrow AI systems. Minimal safety requirements beyond standard testing.

ASL-2: Models that could be misused but do not significantly increase risk beyond what is already accessible through existing tools. Current Claude models (Haiku, Sonnet, Opus) operate at ASL-2. Safety requirements include standard red-teaming, content filtering, and usage monitoring.

ASL-3: Models that could meaningfully increase risk in domains like cybersecurity, biology, or persuasion. ASL-3 requires enhanced safety testing, additional deployment restrictions, and more rigorous monitoring. No publicly available model has been officially classified at ASL-3. Our danger assessment explores the specific risks in detail, though some models approach this threshold.

ASL-4: Models that pose catastrophic risk potential in specific domains. ASL-4 requires the most stringent safety measures — external security audits, government notification, capability-specific containment protocols, and restricted deployment. Capybara is expected to be the first model at this level.

Why Levels Matter

The ASL system serves two purposes. For Anthropic, it creates binding commitments — once a model is classified at a given level, the company must implement the corresponding safety measures before deployment. For the public, it provides transparency about how seriously Anthropic takes specific risks.

The key word is binding. Anthropic has committed to not deploying models above their safety infrastructure’s capability. If Capybara triggers ASL-4 but ASL-4 safety measures are not fully implemented, the model cannot be released — even if competitive pressure mounts.

Why Capybara Triggers ASL-4

The specific capabilities described in leaked documents align with ASL-4 criteria across multiple dimensions.

Cybersecurity Capability Threshold

ASL-4 is triggered when a model can “meaningfully assist in creating novel cyberweapons or discovering novel zero-day vulnerabilities at a scale or speed far exceeding human capabilities.” Capybara’s leaked description — proactive vulnerability discovery, zero-day identification, being “far ahead of any other AI model in cyber capabilities” — maps directly to this threshold.

The critical word is “novel.” ASL-2 and ASL-3 models can identify known vulnerability patterns. ASL-4 is triggered when a model discovers vulnerabilities that humans have not yet found — creating new knowledge rather than applying existing knowledge.

The Scale Factor

Individual vulnerability discoveries are concerning. Vulnerability discovery at scale — scanning entire codebases, networks, or software ecosystems simultaneously — is what elevates the risk to ASL-4. Capybara reportedly performs attack surface analysis across complex systems, identifying multiple attack vectors simultaneously rather than examining one potential vulnerability at a time.

Autonomy Concerns

Capybara’s enhanced agent workflow capabilities add another dimension. A model that can discover vulnerabilities is dangerous. A model that can discover vulnerabilities and autonomously chain together exploitation steps is substantially more dangerous. The leaked description of “greater consistency in autonomous multi-step task execution” suggests Capybara can operate with less human oversight — exactly the type of capability that ASL-4 is designed to address.

What ASL-4 Requires

ASL-4 classification triggers specific safety requirements that go far beyond what any previous model has needed.

External Security Audits

ASL-4 requires independent external audits of the model’s capabilities before deployment. These are not Anthropic’s internal evaluations — they are conducted by third-party security researchers and organizations with the authority to assess whether safety measures are adequate.

For Capybara, this likely means cybersecurity firms, government agencies, and academic researchers independently evaluating what the model can do and whether the containment measures are sufficient.

Government Notification

ASL-4 includes provisions for notifying relevant government bodies about the model’s capabilities. For a model with unprecedented cybersecurity capabilities, this likely means briefing national security agencies and cybersecurity authorities in countries where the model will operate.

This is particularly relevant given the geopolitical dimensions of AI-powered cyber capabilities. A model that can find zero-days at nation-state speed is a national security concern, not just a technology product.

Capability-Specific Containment

ASL-4 does not apply blanket restrictions. It requires capability-specific containment — targeted measures for the specific risks a model poses. For Capybara, this means containment protocols specifically designed to prevent misuse of cybersecurity capabilities while allowing legitimate use.

This is where the dual-use problem becomes acute. The same proactive vulnerability scanning that makes Capybara dangerous is exactly what defensive security teams need. Capability-specific containment must thread this needle — restricting offensive use without preventing defensive use.

Deployment Restrictions

ASL-4 models cannot be deployed through the same channels as lower-level models. The restricted early access to cybersecurity defense organizations is likely a direct result of ASL-4 requirements — Anthropic cannot offer Capybara through the public API until ASL-4 safety measures are fully validated.

How ASL-4 Affects the Release Timeline

The safety requirements of ASL-4 are a primary reason Capybara remains restricted.

The Safety Infrastructure Gap

Anthropic committed to not deploying models that exceed their safety infrastructure’s capability. If ASL-4 safety measures are still being developed or validated, Capybara cannot be released — regardless of competitive pressure or business needs.

This creates a paradox: the better the model performs, the harder it is to release. Every capability improvement that makes Capybara more valuable also makes the safety bar higher.

External Audit Timeline

External audits take time. Independent security researchers need to evaluate the model, test containment measures, and document their findings. This process cannot be rushed without compromising its integrity. Weeks to months of evaluation may be required before ASL-4 measures are validated.

Government Engagement

Government notification and potential regulatory engagement add further complexity. Briefing national security agencies across multiple countries, addressing questions, and potentially accommodating regulatory requirements all take time and are not fully within Anthropic’s control.

What ASL-4 Means for the AI Industry

Capybara’s ASL-4 classification has implications beyond Anthropic.

Setting a Precedent

Capybara will be the first model publicly classified at ASL-4. How Anthropic implements and communicates these safety measures will set expectations for the entire industry. Other labs developing models with similar capabilities — see our full capabilities overview — will face pressure to adopt comparable safety frameworks will face pressure to adopt comparable safety frameworks.

The Regulatory Signal

ASL-4 classification sends a signal to regulators that some AI models require safety measures beyond what the industry has previously implemented. This could accelerate regulatory frameworks like the EU AI Act and inform U.S. policy discussions about frontier model governance.

Competitive Dynamics

Safety requirements create an asymmetry in competitive dynamics. Labs that implement ASL-4-level safety measures face higher costs and longer timelines than labs that do not. If one lab releases a Capybara-class model without equivalent safety measures, the responsible lab is disadvantaged — creating pressure to lower safety standards.

Anthropic’s response to this dynamic has been to argue that the industry should adopt shared safety standards rather than racing to the bottom on safety in pursuit of competitive advantage.

Questions About Claude Capybara ASL-4 Safety

What is ASL-4?

ASL-4 (AI Safety Level 4) is the highest level in Anthropic’s Responsible Scaling Policy. It applies to models that pose catastrophic risk potential and requires external security audits, government notification, capability-specific containment, and restricted deployment.

Is Capybara officially classified as ASL-4?

Not officially confirmed by Anthropic. Based on the capabilities described in leaked documents — particularly “unprecedented cybersecurity risks” and proactive zero-day discovery — Capybara aligns with ASL-4 criteria. The restricted release strategy is consistent with ASL-4 requirements.

Has any model been ASL-4 before?

No. All publicly available Claude models operate at ASL-2. No model from any lab has been publicly classified at ASL-4 or its equivalent. Capybara would be the first.

Does ASL-4 mean Capybara will never be released?

No. ASL-4 defines the safety measures required for deployment, not a prohibition on deployment. Once external audits validate that safety measures are adequate, Capybara can be released — initially through restricted channels, then potentially through broader access.

Do other AI companies have safety levels like ASL?

Not at the same specificity. OpenAI has its Preparedness Framework, and Google has safety evaluation processes, but Anthropic’s ASL system is the most detailed public framework with binding deployment commitments tied to specific capability thresholds.