Route requests across Claude, GPT-4, Gemini, Llama, and Mistral through a single endpoint. Smart routing picks the best model per request based on cost, latency, and capability.
Real-time pricing, latency, and throughput. Smart routing uses these metrics to pick the optimal model for each request.
| Model | Input | Output | TTFT | Speed | Context | |
|---|---|---|---|---|---|---|
| claude-opus-4-6 Anthropic | $15.00/M | $75.00/M | 340ms | 85 tok/s | 200K | SMARTEST |
| gpt-4-turbo OpenAI | $10.00/M | $30.00/M | 280ms | 92 tok/s | 128K | FAST |
| gemini-2.0-pro Google | $7.00/M | $21.00/M | 195ms | 110 tok/s | 1M | FAST |
| llama-3.1-405b Meta | $3.00/M | $3.00/M | 420ms | 48 tok/s | 128K | VALUE |
| mistral-large Mistral | $4.00/M | $12.00/M | 220ms | 78 tok/s | 128K | |
| command-r-plus Cohere | $3.00/M | $15.00/M | 310ms | 65 tok/s | 128K | VALUE |
Model, latency, cost, tokens. Every API call is logged with full observability. Debug in seconds, not hours.
Drop-in replacement for OpenAI SDK. Change one string to switch models. No code rewrite.
from nexus import Nexus client = Nexus(api_key="nx-...") response = client.chat.completions.create( model="claude-opus-4-6", # or "gpt-4-turbo" messages=[{"role": "user", "content": "..."}], stream=True ) for chunk in response: print(chunk.choices[0].delta.content)
import { Nexus } from '@nexus/sdk' const nexus = new Nexus({ apiKey: 'nx-...' }) const stream = await nexus.chat.completions.create({ model: 'claude-opus-4-6', messages: [{ role: 'user', content: '...' }], stream: true, }) for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? '') }
import "github.com/nexus-ai/go-sdk" client := nexus.NewClient("nx-...") stream, _ := client.Chat.Create(ctx, &nexus.ChatRequest{ Model: "claude-opus-4-6", Messages: []nexus.Message{{ Role: "user", Content: "...", }}, Stream: true, })
# Works with any OpenAI-compatible client curl https://api.nexus.dev/v1/chat/completions \ -H "Authorization: Bearer nx-..." \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "..."}], "route": { "strategy": "cost-optimized" } }'
Pass-through pricing at cost plus a thin margin. No per-seat fees, no minimums, no surprises.
Requests route through the nearest edge node. Provider failover happens in under 50ms.