v3.2 — 218 models live

One API.
Every LLM.

Route requests across Claude, GPT-4, Gemini, Llama, and Mistral through a single endpoint. Smart routing picks the best model per request based on cost, latency, and capability.

218

Models

12ms

Routing overhead

99.97%

Uptime (90d)

$ npm install @nexus/sdk copy

request.sh

# One endpoint, any model
curl https://api.nexus.dev/v1/chat/completions \
  -H "Authorization: Bearer $NEXUS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "messages": [{"role": "user", "content": "Explain TCP"}],
    "stream": true
  }'
# Response streams back — same format, any model
HTTP/2 200
x-model: claude-opus-4-6
x-latency: 340ms TTFT
x-cost: $0.0042
x-tokens: 85 tok/s

Model Registry

Compare 218 models in one place

Real-time pricing, latency, and throughput. Smart routing uses these metrics to pick the optimal model for each request.

Model	Input	Output	TTFT	Speed	Context
claude-opus-4-6 Anthropic	$15.00/M	$75.00/M	340ms	85 tok/s	200K	SMARTEST
gpt-4-turbo OpenAI	$10.00/M	$30.00/M	280ms	92 tok/s	128K	FAST
gemini-2.0-pro Google	$7.00/M	$21.00/M	195ms	110 tok/s	1M	FAST
llama-3.1-405b Meta	$3.00/M	$3.00/M	420ms	48 tok/s	128K	VALUE
mistral-large Mistral	$4.00/M	$12.00/M	220ms	78 tok/s	128K
command-r-plus Cohere	$3.00/M	$15.00/M	310ms	65 tok/s	128K	VALUE

Live Dashboard

Every request, traced

Model, latency, cost, tokens. Every API call is logged with full observability. Debug in seconds, not hours.

Request Log

1,247 requests today

09:42:31 claude-opus-4-6 200 OK 342ms $0.0048

09:42:28 gpt-4-turbo 200 OK 218ms $0.0031

09:42:25 gemini-2.0-pro 200 OK 186ms $0.0019

09:42:22 llama-3.1-405b 429 RATE — $0.00

09:42:19 claude-opus-4-6 200 OK 395ms $0.0063

09:42:16 mistral-large 200 OK 215ms $0.0022

09:42:14 gpt-4-turbo 200 OK 253ms $0.0037

09:42:11 gemini-2.0-pro 200 OK 172ms $0.0015

Today's spend

$47.82

+12% vs yesterday

Avg latency (p50)

186ms

-8ms vs last week

Requests (24h)

14,892

+340 vs yesterday

Integration

Five lines to switch providers

Drop-in replacement for OpenAI SDK. Change one string to switch models. No code rewrite.

Python MOST POPULAR

from nexus import Nexus

client = Nexus(api_key="nx-...")

response = client.chat.completions.create(
    model="claude-opus-4-6",  # or "gpt-4-turbo"
    messages=[{"role": "user", "content": "..."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content)

TypeScript

import { Nexus } from '@nexus/sdk'

const nexus = new Nexus({ apiKey: 'nx-...' })

const stream = await nexus.chat.completions.create({
  model: 'claude-opus-4-6',
  messages: [{ role: 'user', content: '...' }],
  stream: true,
})

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
}

import "github.com/nexus-ai/go-sdk"

client := nexus.NewClient("nx-...")

stream, _ := client.Chat.Create(ctx, &nexus.ChatRequest{
    Model:    "claude-opus-4-6",
    Messages: []nexus.Message{{
        Role: "user", Content: "...",
    }},
    Stream: true,
})

cURL

# Works with any OpenAI-compatible client
curl https://api.nexus.dev/v1/chat/completions \
  -H "Authorization: Bearer nx-..." \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "..."}],
    "route": { "strategy": "cost-optimized" }
  }'

Transparent Pricing

Pay for tokens, not seats

Pass-through pricing at cost plus a thin margin. No per-seat fees, no minimums, no surprises.

Developer

Free to start

$5 credit included. No credit card needed.

+ 218 models, full access
+ 100 RPM rate limit
+ 7-day log retention
+ Community support

Team

Cost + 5% margin

Pass-through pricing. Volume discounts at $1k+/mo.

+ Unlimited RPM
+ Smart routing engine
+ 90-day log retention
+ Fallback chains
+ Slack + email support

Enterprise

Custom

SOC 2, SSO, dedicated endpoints, SLA.

+ Dedicated capacity
+ Custom model hosting
+ Unlimited retention
+ 99.99% SLA
+ Dedicated support engineer

Infrastructure

Global edge, measured in milliseconds

Requests route through the nearest edge node. Provider failover happens in under 50ms.

Edge Latency by Region

US East (Virginia)

8ms

US West (Oregon)

12ms

EU West (Frankfurt)

18ms

APAC (Tokyo)

32ms

APAC (Sydney)

45ms

South America (Sao Paulo)

68ms

Uptime — Last 90 Days

Jan 2 Apr 2

99.97% uptime 2 incidents (both <15min)

Provider Status

Anthropic Operational

OpenAI Operational

Google Degraded

Meta (via Together) Operational