How to Build Redundancy for AI Models: A Practical Guide

Claude's API recorded 98.97% uptime over the past 90 days — already breaching Anthropic's own Priority tier SLA commitment of 99.5%. Most organisations have zero contractual protection. This guide explains exactly how to build AI redundancy so your business keeps running when your AI provider doesn't.

The SLA problem nobody is talking about

Before we get into solutions, it's worth understanding just how exposed most organisations are right now. Anthropic's uptime commitments vary dramatically by plan:

Plan SLA Commitment Max downtime/month Reality (90-day actual)
Standard (API default) None Unlimited 98.97% = ~7.5hrs
Priority Tier 99.5% ~3.6 hrs 98.97% — already breached
Enterprise (~$50K/yr) 99.99% ~52 mins Not publicly disclosed

The uncomfortable reality: Most organisations building on the Claude API are on the Standard tier — which means they have no contractual uptime protection whatsoever. When Claude goes down, they have no recourse and no compensation. The only answer is to build your own resilience.

The good news: building AI redundancy is entirely achievable and does not require a massive engineering effort. The key is understanding the architectural patterns available and choosing the right approach for your organisation.

What is an AI proxy server?

Before diving into the tools, it helps to see the problem visually. Here's the difference between a non-redundant and redundant AI architecture:

⚠ Diagram 1: Current Non-Redundant Setup — Single point of failure
Your Application Customer workflows, pipelines, tools Direct API call — no fallback Claude API (Anthropic) Only provider — no alternative configured OUTAGE ❌ No customer support Pipeline completely dark ❌ Processing halted All automation stopped ❌ Teams locked out No fallback available
✓ Diagram 2: Redundant Setup with AI Proxy — Automatic failover, zero visible impact
Your Application One call — provider-agnostic AI Proxy / Gateway LiteLLM · Portkey · AWS Bedrock Health checks · Auto-failover · Caching · Logging Primary Failover Last resort Claude (Anthropic) claude-sonnet-4-6 GPT-4o (OpenAI) gpt-4o Gemini (Google) gemini-1.5-pro OUTAGE Auto-reroute ✓ Users see no interruption — proxy switches provider in milliseconds
Diagram 3: Cross-Cloud Architecture — Maximum resilience for mission-critical AI
Your Application LiteLLM abstraction layer AWS Bedrock + Cross-Region Inference Primary Claude (via Bedrock) Llama · Mistral fallback Azure AI Foundry Secondary cloud Failover GPT-4o (via Azure) Independent infrastructure Google Vertex AI Tertiary cloud Last resort Gemini (via Vertex) Fully independent stack Key principle: no two paths share the same infrastructure An AWS-wide outage takes down the primary but leaves Azure and GCP paths fully operational 💡 Pragmatic middle ground: Bedrock (primary) + direct Claude/OpenAI API call (secondary, bypasses AWS entirely)

An AI proxy server (also called an AI gateway) sits between your application and the AI providers it calls. Instead of your application talking directly to Claude's API, it talks to the proxy — which then routes the request to the appropriate AI provider based on availability, cost, or performance.

Think of it like a load balancer, but for AI models. When your primary provider goes down, the proxy automatically routes requests to your fallback provider — transparently, without your application code needing to change.

Key insight: An AI proxy decouples your application from any specific AI provider. This is the single most important architectural decision you can make to protect your organisation from AI downtime.

A proxy typically handles:

Open source and commercial AI proxy tools

Rather than building a proxy from scratch, the vast majority of organisations are better served by an existing open-source or commercial AI gateway. These tools are production-hardened, actively maintained, and can be deployed in days rather than months. A good proxy will handle the key capabilities you need:

Here are the leading options available today:

Open Source

LiteLLM

The most widely adopted open-source AI proxy. Supports 100+ LLM providers through a single unified API interface. Drop-in replacement for OpenAI's SDK — switch providers by changing a single parameter. Includes load balancing, fallbacks, spend tracking, and a management UI. Can be self-hosted or used as a managed service.

Open Source

Portkey

Production-grade AI gateway with automatic failover, semantic caching, and detailed observability. Strong on governance features — per-team rate limits, audit logs, and policy enforcement. Particularly well-suited for regulated industries needing full control over what goes in and out of their AI systems.

Managed Service

AWS Bedrock

Amazon's managed AI service gives you access to Claude, Llama, Mistral, and others through a single AWS API. Native AWS integration means IAM, CloudWatch, and VPC all work out of the box. If you're already on AWS, Bedrock eliminates the need for a separate proxy layer entirely — failover is built in.

Managed Service

Azure AI Foundry

Microsoft's equivalent to Bedrock — access to GPT-4o, Claude, Llama, and others through a single Azure endpoint. Tight integration with Microsoft 365 and Azure Active Directory. Strong compliance posture for regulated industries. Well-suited for organisations already standardised on Microsoft's cloud.

Managed Service

Google Vertex AI

Google's AI platform providing access to Gemini models alongside third-party models including Claude. Native integration with Google Cloud's IAM, logging, and monitoring. Good choice for organisations standardised on GCP or using Google Workspace extensively.

Commercial SaaS

Requesty / Bifrost

Purpose-built AI gateways designed specifically for production reliability. Automatic failover in milliseconds, semantic caching, and detailed cost analytics. Lower operational overhead than self-hosting LiteLLM. Good choice for teams without dedicated platform engineering resource.

Choosing the right redundancy strategy

The right approach depends on your organisation's size, technical capability, and risk tolerance. Here's a framework for choosing:

Scenario Recommended approach Effort
Already on AWS AWS Bedrock with Cross-Region Inference enabled Low
Already on Azure Azure AI Foundry Low
Already on GCP Google Vertex AI Low
Multi-cloud or cloud-agnostic LiteLLM self-hosted or Portkey Medium
No platform engineering resource Requesty or Portkey managed Low
Regulated industry (finance, health) Self-hosted LiteLLM or Portkey with full audit logging Medium
Maximum control required LiteLLM + direct API fallback bypassing cloud platform High
Truly mission-critical AI Cross-cloud: Bedrock (primary) + Azure AI Foundry (secondary) + direct API (tertiary) Very High

This isn't just an Anthropic problem — it's industry-wide

To be clear: AI provider outages are not unique to Anthropic. Every major LLM provider has experienced significant downtime in 2026. This is a structural characteristic of the AI industry at its current maturity level — not a failing of any single company.

Provider Recent incident Impact
Claude (Anthropic) Multiple incidents May 19, 2026. API uptime 98.97% over 90 days Priority SLA of 99.5% already breached
ChatGPT (OpenAI) Major outage Feb 3, 2026 — over 15,000 user reports, all services affected. Further outage April 20, 2026 affecting ChatGPT and Codex globally 52 incidents in 90 days, median duration 1hr 47mins
Gemini (Google) Elevated error rates Feb 18, 2026 — chat history lost for users. Gemini API degraded performance April 17-18, 2026 for over 34 hours combined Multiple incidents tracked since September 2025

The pattern is clear: No AI provider has achieved the reliability of traditional cloud infrastructure. OpenAI had 52 incidents in 90 days. Google's Gemini API was in a degraded state for nearly 34 hours across two consecutive days in April. Claude's API has not met its own Priority tier SLA commitment. The risk is not provider-specific — it is inherent to the current state of the industry. The only rational response is to build redundancy into your architecture.

This is precisely why the choice of which provider to use as your primary is less important than ensuring you have a tested fallback to a different provider on different infrastructure. When outages are this frequent across the entire industry, single-provider dependency is simply not a defensible architectural decision for any business-critical AI workload.

This is a question every architect should ask — and most don't. Routing all your AI traffic through AWS Bedrock, Azure AI Foundry, or Google Vertex AI solves the single-provider problem, but potentially introduces a new one: single-cloud platform dependency.

The good news is that AWS Bedrock is not simply a single-region service. AWS offers a feature called Cross-Region Inference — which automatically routes Bedrock requests to an available region if your primary region is degraded or unavailable. This provides meaningful resilience against the most common failure mode: a single AWS region going down.

Bedrock with Cross-Region Inference is significantly more resilient than a standard single-region setup and is sufficient for the vast majority of enterprise use cases. If you are using Bedrock, enabling Cross-Region Inference should be a baseline requirement, not an optional extra.

However, Cross-Region Inference has limits. It routes across AWS regions — it does not protect against a broader AWS-wide incident. These are rare, but they do happen. In December 2021, a major AWS us-east-1 outage cascaded and affected services across multiple regions simultaneously. In those scenarios, all Bedrock endpoints would be affected regardless of cross-region routing.

The cascading failure scenario: Your application calls AWS Bedrock → Cross-Region Inference tries alternative regions → a broader AWS network incident affects all regions simultaneously → everything fails at once. Your application, your gateway, and your underlying models are all on the same infrastructure.

Matching your redundancy to your risk appetite

The right level of redundancy depends on how critical AI is to your operations:

True cross-cloud redundancy

For organisations where AI is genuinely mission-critical, the architecture that provides the highest level of protection looks like this:

Layer Primary Secondary Tertiary
Cloud Platform AWS Azure GCP
AI Gateway AWS Bedrock Azure AI Foundry Vertex AI
Model Claude via Bedrock GPT-4o via Azure Gemini via Vertex

Each layer sits on completely independent infrastructure, so a single cloud outage cannot cascade through the entire stack.

The pragmatic middle ground for most organisations

Full cross-cloud redundancy is complex and expensive to build and maintain — realistic only for large enterprises with significant engineering resource. For most organisations, the pragmatic approach that balances resilience with manageability is:

The key principle: Your primary and secondary paths must not share a single point of failure. If both paths go through the same cloud platform, the same region, or the same network provider — you don't have redundancy, you have an illusion of it.

Building a multi-provider failover strategy

Regardless of which tools you choose, a sound multi-provider failover strategy follows these principles:

1

Define your provider hierarchy

Choose a primary provider, one or more secondary providers, and optionally a tertiary. For most organisations: Claude (primary) → GPT-4o (secondary) → Gemini (tertiary). Document this hierarchy and the criteria for switching.

2

Normalise your prompts

Different models respond differently to the same prompt. Test your prompts against all providers in your hierarchy and adjust so outputs are acceptable from any of them. Avoid provider-specific features in your critical path.

3

Set failure thresholds

Define what constitutes a failure — e.g. three consecutive 5xx errors, or error rate above 10% over 60 seconds. Don't switch providers on a single failed request, but don't wait too long either.

4

Implement graceful degradation

If all providers fail, your application should degrade gracefully — queuing requests, showing a helpful message, or falling back to a non-AI alternative — rather than returning an error to the user.

5

Monitor independently

Don't rely on provider status pages — they often lag behind actual incidents. Set up independent health checks that ping your providers directly every 30 seconds. Tools like Better Stack or Checkly make this straightforward.

6

Test your failover regularly

An untested failover is not a failover. Run quarterly "failover drills" where you deliberately disable your primary provider and verify that traffic routes correctly to the secondary. Time the switchover and document it.

Cost considerations

Multi-provider redundancy does add cost — but less than you might think, and far less than the cost of downtime.

The maths: If your business generates £10,000/day in AI-assisted revenue and experiences 5 hours of downtime per month at 98.97% uptime, that's approximately £2,000/month in lost productivity. A £100/month proxy that eliminates that downtime pays for itself 20 times over.

What about open-source models as a fallback?

For organisations that want the ultimate fallback — one that doesn't depend on any external provider — running an open-source model locally or on your own infrastructure is worth considering.

Models like Llama 3.1, Mistral, and Qwen can run on commodity GPU hardware and provide a genuine zero-dependency fallback. They won't match frontier models on complex tasks, but for many operational workflows they're entirely sufficient.

This approach makes particular sense for:

The governance layer

Redundancy isn't just a technical problem — it's a governance one. Your proxy or gateway is also the right place to enforce organisational AI policies:

A well-configured AI gateway isn't just a reliability tool — it's a central point of control for your entire AI estate. This is what mature AI governance looks like in practice.

Getting started today

If you're currently calling an AI provider's API directly with no proxy or fallback in place, here's the minimum viable action plan:

  1. This week — sign up for a second AI provider (OpenAI if you're using Claude, or vice versa). Cost: zero until you use it.
  2. This month — implement a basic failover wrapper around your most critical AI calls. Even a simple try/except that switches providers is infinitely better than nothing.
  3. This quarter — evaluate LiteLLM or a managed gateway for your use case and migrate your AI calls through it. Add independent monitoring.
  4. Ongoing — run quarterly failover drills. Review your provider hierarchy as the model landscape evolves.

Need help building your AI resilience strategy?

AI Bods helps organisations design and implement AI architectures that are built to last — with redundancy, governance, and human oversight at their core. Get in touch to discuss your situation.

Talk to AI Bods →

Also download our free guide — AI-First Without the Risk — for the complete 8-point framework for AI business continuity, including how to classify your AI dependencies and build your incident response plan.

Download the free guide →

Sources & disclaimer: All uptime figures and incident data referenced in this article are sourced from publicly available status pages (status.claude.com, status.openai.com), independent third-party monitoring services (IsDown, StatusGator, Downdetector), and published SLA documentation. This article is intended for general informational purposes only and does not constitute professional legal, compliance, or business continuity advice. SLA terms and uptime figures are subject to change — always verify current commitments directly with your AI provider. AI Bods is an independent consultancy and is not affiliated with, endorsed by, or sponsored by Anthropic, OpenAI, Google, or any other AI provider mentioned in this article.

© 2026 AI Bods. All rights reserved.  |  ai-bods.com  |  Human+ AI Insight