The SLA problem nobody is talking about
Before we get into solutions, it's worth understanding just how exposed most organisations are right now. Anthropic's uptime commitments vary dramatically by plan:
| Plan | SLA Commitment | Max downtime/month | Reality (90-day actual) |
|---|---|---|---|
| Standard (API default) | None | Unlimited | 98.97% = ~7.5hrs |
| Priority Tier | 99.5% | ~3.6 hrs | 98.97% — already breached |
| Enterprise (~$50K/yr) | 99.99% | ~52 mins | Not publicly disclosed |
The uncomfortable reality: Most organisations building on the Claude API are on the Standard tier — which means they have no contractual uptime protection whatsoever. When Claude goes down, they have no recourse and no compensation. The only answer is to build your own resilience.
The good news: building AI redundancy is entirely achievable and does not require a massive engineering effort. The key is understanding the architectural patterns available and choosing the right approach for your organisation.
What is an AI proxy server?
Before diving into the tools, it helps to see the problem visually. Here's the difference between a non-redundant and redundant AI architecture:
An AI proxy server (also called an AI gateway) sits between your application and the AI providers it calls. Instead of your application talking directly to Claude's API, it talks to the proxy — which then routes the request to the appropriate AI provider based on availability, cost, or performance.
Think of it like a load balancer, but for AI models. When your primary provider goes down, the proxy automatically routes requests to your fallback provider — transparently, without your application code needing to change.
Key insight: An AI proxy decouples your application from any specific AI provider. This is the single most important architectural decision you can make to protect your organisation from AI downtime.
A proxy typically handles:
- Routing — directing requests to the right provider based on rules you define
- Failover — automatically switching to a backup provider when the primary fails
- Load balancing — distributing requests across providers to avoid rate limits
- Caching — storing responses for repeated queries to reduce cost and latency
- Logging and monitoring — tracking usage, costs, and performance across all providers
- Authentication — managing API keys centrally rather than scattered across applications
Open source and commercial AI proxy tools
Rather than building a proxy from scratch, the vast majority of organisations are better served by an existing open-source or commercial AI gateway. These tools are production-hardened, actively maintained, and can be deployed in days rather than months. A good proxy will handle the key capabilities you need:
- Automatic failover — switching to a backup provider when the primary fails, without your application noticing
- Health checks — proactively detecting provider issues before your users do
- Request queuing — holding requests during outages and processing them when service resumes
- Response caching — serving cached responses for repeated queries to reduce cost and latency
- Unified logging — tracking usage, cost, and performance across all providers in one place
- Circuit breaker pattern — stopping requests to a failed provider during a cool-down period rather than hammering it repeatedly
- API key management — centralising credentials rather than scattering them across applications
Here are the leading options available today:
LiteLLM
The most widely adopted open-source AI proxy. Supports 100+ LLM providers through a single unified API interface. Drop-in replacement for OpenAI's SDK — switch providers by changing a single parameter. Includes load balancing, fallbacks, spend tracking, and a management UI. Can be self-hosted or used as a managed service.
Portkey
Production-grade AI gateway with automatic failover, semantic caching, and detailed observability. Strong on governance features — per-team rate limits, audit logs, and policy enforcement. Particularly well-suited for regulated industries needing full control over what goes in and out of their AI systems.
AWS Bedrock
Amazon's managed AI service gives you access to Claude, Llama, Mistral, and others through a single AWS API. Native AWS integration means IAM, CloudWatch, and VPC all work out of the box. If you're already on AWS, Bedrock eliminates the need for a separate proxy layer entirely — failover is built in.
Azure AI Foundry
Microsoft's equivalent to Bedrock — access to GPT-4o, Claude, Llama, and others through a single Azure endpoint. Tight integration with Microsoft 365 and Azure Active Directory. Strong compliance posture for regulated industries. Well-suited for organisations already standardised on Microsoft's cloud.
Google Vertex AI
Google's AI platform providing access to Gemini models alongside third-party models including Claude. Native integration with Google Cloud's IAM, logging, and monitoring. Good choice for organisations standardised on GCP or using Google Workspace extensively.
Requesty / Bifrost
Purpose-built AI gateways designed specifically for production reliability. Automatic failover in milliseconds, semantic caching, and detailed cost analytics. Lower operational overhead than self-hosting LiteLLM. Good choice for teams without dedicated platform engineering resource.
Choosing the right redundancy strategy
The right approach depends on your organisation's size, technical capability, and risk tolerance. Here's a framework for choosing:
| Scenario | Recommended approach | Effort |
|---|---|---|
| Already on AWS | AWS Bedrock with Cross-Region Inference enabled | Low |
| Already on Azure | Azure AI Foundry | Low |
| Already on GCP | Google Vertex AI | Low |
| Multi-cloud or cloud-agnostic | LiteLLM self-hosted or Portkey | Medium |
| No platform engineering resource | Requesty or Portkey managed | Low |
| Regulated industry (finance, health) | Self-hosted LiteLLM or Portkey with full audit logging | Medium |
| Maximum control required | LiteLLM + direct API fallback bypassing cloud platform | High |
| Truly mission-critical AI | Cross-cloud: Bedrock (primary) + Azure AI Foundry (secondary) + direct API (tertiary) | Very High |
This isn't just an Anthropic problem — it's industry-wide
To be clear: AI provider outages are not unique to Anthropic. Every major LLM provider has experienced significant downtime in 2026. This is a structural characteristic of the AI industry at its current maturity level — not a failing of any single company.
| Provider | Recent incident | Impact |
|---|---|---|
| Claude (Anthropic) | Multiple incidents May 19, 2026. API uptime 98.97% over 90 days | Priority SLA of 99.5% already breached |
| ChatGPT (OpenAI) | Major outage Feb 3, 2026 — over 15,000 user reports, all services affected. Further outage April 20, 2026 affecting ChatGPT and Codex globally | 52 incidents in 90 days, median duration 1hr 47mins |
| Gemini (Google) | Elevated error rates Feb 18, 2026 — chat history lost for users. Gemini API degraded performance April 17-18, 2026 for over 34 hours combined | Multiple incidents tracked since September 2025 |
The pattern is clear: No AI provider has achieved the reliability of traditional cloud infrastructure. OpenAI had 52 incidents in 90 days. Google's Gemini API was in a degraded state for nearly 34 hours across two consecutive days in April. Claude's API has not met its own Priority tier SLA commitment. The risk is not provider-specific — it is inherent to the current state of the industry. The only rational response is to build redundancy into your architecture.
This is precisely why the choice of which provider to use as your primary is less important than ensuring you have a tested fallback to a different provider on different infrastructure. When outages are this frequent across the entire industry, single-provider dependency is simply not a defensible architectural decision for any business-critical AI workload.
This is a question every architect should ask — and most don't. Routing all your AI traffic through AWS Bedrock, Azure AI Foundry, or Google Vertex AI solves the single-provider problem, but potentially introduces a new one: single-cloud platform dependency.
The good news is that AWS Bedrock is not simply a single-region service. AWS offers a feature called Cross-Region Inference — which automatically routes Bedrock requests to an available region if your primary region is degraded or unavailable. This provides meaningful resilience against the most common failure mode: a single AWS region going down.
Bedrock with Cross-Region Inference is significantly more resilient than a standard single-region setup and is sufficient for the vast majority of enterprise use cases. If you are using Bedrock, enabling Cross-Region Inference should be a baseline requirement, not an optional extra.
However, Cross-Region Inference has limits. It routes across AWS regions — it does not protect against a broader AWS-wide incident. These are rare, but they do happen. In December 2021, a major AWS us-east-1 outage cascaded and affected services across multiple regions simultaneously. In those scenarios, all Bedrock endpoints would be affected regardless of cross-region routing.
The cascading failure scenario: Your application calls AWS Bedrock → Cross-Region Inference tries alternative regions → a broader AWS network incident affects all regions simultaneously → everything fails at once. Your application, your gateway, and your underlying models are all on the same infrastructure.
Matching your redundancy to your risk appetite
The right level of redundancy depends on how critical AI is to your operations:
- Important but not critical — Bedrock with Cross-Region Inference is sufficient. The probability of an AWS-wide incident is very low.
- Critical — Bedrock with Cross-Region Inference as primary, plus a direct API fallback to a provider bypassing Bedrock, giving you a path that doesn't depend on AWS at all.
- Mission-critical — Full cross-cloud diversity across two or more cloud platforms, each with independent AI gateways.
True cross-cloud redundancy
For organisations where AI is genuinely mission-critical, the architecture that provides the highest level of protection looks like this:
| Layer | Primary | Secondary | Tertiary |
|---|---|---|---|
| Cloud Platform | AWS | Azure | GCP |
| AI Gateway | AWS Bedrock | Azure AI Foundry | Vertex AI |
| Model | Claude via Bedrock | GPT-4o via Azure | Gemini via Vertex |
Each layer sits on completely independent infrastructure, so a single cloud outage cannot cascade through the entire stack.
The pragmatic middle ground for most organisations
Full cross-cloud redundancy is complex and expensive to build and maintain — realistic only for large enterprises with significant engineering resource. For most organisations, the pragmatic approach that balances resilience with manageability is:
- Primary: AWS Bedrock with Cross-Region Inference enabled — provides strong resilience against regional failures, which are the most common cause of outages
- Secondary: Direct API calls to one or two providers, bypassing the cloud platform entirely — so a broader AWS incident doesn't take your fallback with it
- Abstraction layer: LiteLLM or similar sits in front of both, so switching between Bedrock and a direct API call is a configuration change, not a code change
The key principle: Your primary and secondary paths must not share a single point of failure. If both paths go through the same cloud platform, the same region, or the same network provider — you don't have redundancy, you have an illusion of it.
Building a multi-provider failover strategy
Regardless of which tools you choose, a sound multi-provider failover strategy follows these principles:
Define your provider hierarchy
Choose a primary provider, one or more secondary providers, and optionally a tertiary. For most organisations: Claude (primary) → GPT-4o (secondary) → Gemini (tertiary). Document this hierarchy and the criteria for switching.
Normalise your prompts
Different models respond differently to the same prompt. Test your prompts against all providers in your hierarchy and adjust so outputs are acceptable from any of them. Avoid provider-specific features in your critical path.
Set failure thresholds
Define what constitutes a failure — e.g. three consecutive 5xx errors, or error rate above 10% over 60 seconds. Don't switch providers on a single failed request, but don't wait too long either.
Implement graceful degradation
If all providers fail, your application should degrade gracefully — queuing requests, showing a helpful message, or falling back to a non-AI alternative — rather than returning an error to the user.
Monitor independently
Don't rely on provider status pages — they often lag behind actual incidents. Set up independent health checks that ping your providers directly every 30 seconds. Tools like Better Stack or Checkly make this straightforward.
Test your failover regularly
An untested failover is not a failover. Run quarterly "failover drills" where you deliberately disable your primary provider and verify that traffic routes correctly to the secondary. Time the switchover and document it.
Cost considerations
Multi-provider redundancy does add cost — but less than you might think, and far less than the cost of downtime.
- Dual API keys — maintaining accounts with two providers costs nothing if you're only paying per token used
- Proxy infrastructure — LiteLLM self-hosted runs comfortably on a small VM costing ~$20-50/month
- Managed gateways — Portkey and similar services typically cost $50-200/month for most usage levels
- Cloud platforms — AWS Bedrock, Azure AI Foundry, and Vertex AI add a small markup (typically 10-20%) on top of base model costs in exchange for the managed reliability
The maths: If your business generates £10,000/day in AI-assisted revenue and experiences 5 hours of downtime per month at 98.97% uptime, that's approximately £2,000/month in lost productivity. A £100/month proxy that eliminates that downtime pays for itself 20 times over.
What about open-source models as a fallback?
For organisations that want the ultimate fallback — one that doesn't depend on any external provider — running an open-source model locally or on your own infrastructure is worth considering.
Models like Llama 3.1, Mistral, and Qwen can run on commodity GPU hardware and provide a genuine zero-dependency fallback. They won't match frontier models on complex tasks, but for many operational workflows they're entirely sufficient.
This approach makes particular sense for:
- Organisations with strict data residency requirements
- Financial services and healthcare where external API calls raise compliance concerns
- High-volume workloads where the economics of per-token pricing become significant
- Organisations that classify AI as genuinely mission-critical and need maximum resilience
The governance layer
Redundancy isn't just a technical problem — it's a governance one. Your proxy or gateway is also the right place to enforce organisational AI policies:
- Data classification — block requests containing PII or confidential data from being sent to external providers
- Content filtering — enforce acceptable use policies at the gateway layer rather than relying on each application to implement them
- Cost controls — set spend limits per team, per application, or per time period
- Audit logging — log every request and response for compliance and incident investigation
- Access control — manage which teams and applications can call which models
A well-configured AI gateway isn't just a reliability tool — it's a central point of control for your entire AI estate. This is what mature AI governance looks like in practice.
Getting started today
If you're currently calling an AI provider's API directly with no proxy or fallback in place, here's the minimum viable action plan:
- This week — sign up for a second AI provider (OpenAI if you're using Claude, or vice versa). Cost: zero until you use it.
- This month — implement a basic failover wrapper around your most critical AI calls. Even a simple try/except that switches providers is infinitely better than nothing.
- This quarter — evaluate LiteLLM or a managed gateway for your use case and migrate your AI calls through it. Add independent monitoring.
- Ongoing — run quarterly failover drills. Review your provider hierarchy as the model landscape evolves.
Need help building your AI resilience strategy?
AI Bods helps organisations design and implement AI architectures that are built to last — with redundancy, governance, and human oversight at their core. Get in touch to discuss your situation.
Talk to AI Bods →Also download our free guide — AI-First Without the Risk — for the complete 8-point framework for AI business continuity, including how to classify your AI dependencies and build your incident response plan.