tutorial

OpenAI-compatible API gateway with LLM routing: build vs buy

If you're routing requests across multiple LLM providers (OpenAI, Anthropic, Google, open-source), here's the build-vs-buy analysis with concrete numbers, code samples, and tradeoffs.

You started with OpenAI. Then Anthropic Claude was better at code. Then DeepSeek V3 came out at 1/15 the price. Now you have if/else chains in production deciding which provider gets which request. The right abstraction is an LLM gateway — a single OpenAI-compatible endpoint that routes internally.

This post compares the main options (LiteLLM self-hosted, OpenRouter, Tokia) with real numbers, including when each one makes sense.

What "OpenAI-compatible" actually means

It means the gateway exposes /v1/chat/completions, /v1/embeddings, /v1/images/generations etc. with the same request/response shape OpenAI uses. Result: any OpenAI SDK (Python, Node, Go, Ruby) works unmodified — you just change base_url.

# Before
client = OpenAI(api_key="sk-...")

# After (any gateway)
client = OpenAI(api_key="...", base_url="https://gateway/v1")

The non-OpenAI providers (Anthropic, Google, Cohere) are translated by the gateway internally — their native APIs are different but the OpenAI envelope hides that.

Option 1: Build it yourself with LiteLLM

LiteLLM is open-source, self-hosted. You run it as a proxy in your infra.

Pros:

  • Free (just infra cost)
  • Full control over routing rules
  • Your data never leaves your network

Cons:

  • You manage uptime, scaling, auth, logs, billing reconciliation
  • You handle provider API keys (5+ accounts to monitor balance/limits)
  • Updates require redeploys
  • No UI for non-engineers

Cost: ~$50-200/mo infra (depending on traffic) + 3-5h/week eng maintenance.

Verdict: makes sense if you have a dedicated platform team and >50k req/day.

Option 2: OpenRouter

OpenRouter is a hosted gateway. Pay-as-you-go, no infra to manage.

Pros:

  • Setup in 2 minutes
  • 100+ models cataloged
  • One API key, one bill
  • Falls back automatically if a provider is down

Cons:

  • Adds 5-10% markup on top of upstream prices
  • USD only — Brazilian users still hit IOF + spread on credit cards
  • English support
  • No NF-e, no PIX

Verdict: great default for non-BR users at any scale.

Option 3: Tokia (BR-focused)

Tokia is OpenRouter-equivalent for Brazilian companies. Same gateway concept, BR-specific add-ons.

Pros over OpenRouter for BR users:

  • PIX top-up (instant, no IOF)
  • NF-e (Brazilian electronic invoice) auto-issued
  • BRL pricing (no spread)
  • 5% volume discount above R$ 500/mo, escalating
  • PT-BR support via WhatsApp

Cons:

  • 1.5-3x markup on raw provider cost (vs OpenRouter's 5-10%)
  • Currently 27 models (vs OpenRouter's 100+)
  • BR-focused; English support is OK but not primary

Verdict: makes sense if you're a Brazilian PME doing < R$ 10k/mo and value PT-BR support + NF-e more than absolute lowest cost.

The routing logic itself

Regardless of which option, the smart routing pattern is the same. Here's what we use at Tokia internally and recommend for your own gateway:

# Routing rules (pseudo-code)
def route(messages, latency_budget_ms, quality_tier):
    if quality_tier == "premium":
        # Hard problems: legal, financial, complex code
        return "claude-sonnet-4-6"
    elif quality_tier == "balanced":
        # Most production traffic
        return "gpt-4o-mini" if latency_budget_ms < 2000 else "deepseek-v3"
    elif quality_tier == "fast":
        # Real-time chat, autocomplete
        return "gemini-2-flash"
    else:
        # Free/test
        return "llama-3-3-70b"

Plus circuit breaker per provider — if OpenAI is returning 5xx, fallback to Anthropic automatically.

Concrete cost comparison

A SaaS with 1M chat completions/mo, avg 500 tokens in / 200 tokens out:

| Setup | Monthly cost | |---|---| | OpenAI direct gpt-4o | $1,800 USD (~R$ 9,360 + IOF + spread = R$10,560) | | OpenAI direct gpt-4o-mini | $100 USD (~R$ 585 with surcharges) | | Self-hosted LiteLLM routing 50/50 mini/sonnet | $90 + $50 infra | | OpenRouter routing same | $110 | | Tokia gpt-4o-mini | R$ 1,560 (BRL pricing locked) | | Tokia deepseek-v3 (alternative) | R$ 120 BRL |

The biggest win isn't gateway markup vs no markup — it's picking the right model. A gateway just makes that picking trivial because you can A/B test without changing infra.

Recommended path

  1. Start with OpenRouter if non-BR (or Tokia if BR)
  2. Hit ~$5k+/mo, evaluate LiteLLM self-hosted for cost
  3. Hit $50k+/mo, negotiate direct enterprise contracts with top 2-3 providers + keep gateway as failover

Don't build a custom gateway from scratch. The routing/fallback/auth/billing problems are already solved by these tools — your engineering time has better ROI elsewhere.

#gateway#llm-routing#litellm#openrouter#tokia

Quer testar Tokia com R$ 10 via PIX?

Criar conta grátis →