v3.2 — 218 models live

One API.
Every LLM.

Route requests across Claude, GPT-4, Gemini, Llama, and Mistral through a single endpoint. Smart routing picks the best model per request based on cost, latency, and capability.

218
Models
12ms
Routing overhead
99.97%
Uptime (90d)
$ npm install @nexus/sdk copy
request.sh
# One endpoint, any model
curl https://api.nexus.dev/v1/chat/completions \
-H "Authorization: Bearer $NEXUS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"messages": [{"role": "user", "content": "Explain TCP"}],
"stream": true
}'
# Response streams back — same format, any model
HTTP/2 200
x-model: claude-opus-4-6
x-latency: 340ms TTFT
x-cost: $0.0042
x-tokens: 85 tok/s
Model Registry

Compare 218 models in one place

Real-time pricing, latency, and throughput. Smart routing uses these metrics to pick the optimal model for each request.

Model Input Output TTFT Speed Context
claude-opus-4-6 Anthropic $15.00/M $75.00/M 340ms 85 tok/s 200K SMARTEST
gpt-4-turbo OpenAI $10.00/M $30.00/M 280ms 92 tok/s 128K FAST
gemini-2.0-pro Google $7.00/M $21.00/M 195ms 110 tok/s 1M FAST
llama-3.1-405b Meta $3.00/M $3.00/M 420ms 48 tok/s 128K VALUE
mistral-large Mistral $4.00/M $12.00/M 220ms 78 tok/s 128K
command-r-plus Cohere $3.00/M $15.00/M 310ms 65 tok/s 128K VALUE
Live Dashboard

Every request, traced

Model, latency, cost, tokens. Every API call is logged with full observability. Debug in seconds, not hours.

Request Log
1,247 requests today
09:42:31 claude-opus-4-6 200 OK 342ms $0.0048
09:42:28 gpt-4-turbo 200 OK 218ms $0.0031
09:42:25 gemini-2.0-pro 200 OK 186ms $0.0019
09:42:22 llama-3.1-405b 429 RATE $0.00
09:42:19 claude-opus-4-6 200 OK 395ms $0.0063
09:42:16 mistral-large 200 OK 215ms $0.0022
09:42:14 gpt-4-turbo 200 OK 253ms $0.0037
09:42:11 gemini-2.0-pro 200 OK 172ms $0.0015
Today's spend
$47.82
+12% vs yesterday
Avg latency (p50)
186ms
-8ms vs last week
Requests (24h)
14,892
+340 vs yesterday
Integration

Five lines to switch providers

Drop-in replacement for OpenAI SDK. Change one string to switch models. No code rewrite.

Python MOST POPULAR
from nexus import Nexus

client = Nexus(api_key="nx-...")

response = client.chat.completions.create(
    model="claude-opus-4-6",  # or "gpt-4-turbo"
    messages=[{"role": "user", "content": "..."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content)
TypeScript
import { Nexus } from '@nexus/sdk'

const nexus = new Nexus({ apiKey: 'nx-...' })

const stream = await nexus.chat.completions.create({
  model: 'claude-opus-4-6',
  messages: [{ role: 'user', content: '...' }],
  stream: true,
})

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
}
Go
import "github.com/nexus-ai/go-sdk"

client := nexus.NewClient("nx-...")

stream, _ := client.Chat.Create(ctx, &nexus.ChatRequest{
    Model:    "claude-opus-4-6",
    Messages: []nexus.Message{{
        Role: "user", Content: "...",
    }},
    Stream: true,
})
cURL
# Works with any OpenAI-compatible client
curl https://api.nexus.dev/v1/chat/completions \
  -H "Authorization: Bearer nx-..." \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "..."}],
    "route": { "strategy": "cost-optimized" }
  }'
Transparent Pricing

Pay for tokens, not seats

Pass-through pricing at cost plus a thin margin. No per-seat fees, no minimums, no surprises.

Developer
Free to start
$5 credit included. No credit card needed.
  • + 218 models, full access
  • + 100 RPM rate limit
  • + 7-day log retention
  • + Community support
Enterprise
Custom
SOC 2, SSO, dedicated endpoints, SLA.
  • + Dedicated capacity
  • + Custom model hosting
  • + Unlimited retention
  • + 99.99% SLA
  • + Dedicated support engineer
Infrastructure

Global edge, measured in milliseconds

Requests route through the nearest edge node. Provider failover happens in under 50ms.

Edge Latency by Region
US East (Virginia)
8ms
US West (Oregon)
12ms
EU West (Frankfurt)
18ms
APAC (Tokyo)
32ms
APAC (Sydney)
45ms
South America (Sao Paulo)
68ms
Uptime — Last 90 Days
Jan 2 Apr 2
99.97% uptime 2 incidents (both <15min)
Provider Status
Anthropic Operational
OpenAI Operational
Google Degraded
Meta (via Together) Operational
Mistral Operational