TL;DR A tiny, working proof of a quota‑aware request router:
- Caps spend, gates risky requests, escalates to a human when either trips.
- Clean interface, simple telemetry, and a minimal Python harness you can run.
Interface (v0) route(request) -> result Inputs
- prompt: string
- budget_cfg: { max_cost: float, cost_per_call: float }
- policy: { verifier_threshold: float } Outputs
- status: "ok" | "escalated"
- reason: "verifier" | "budget" | null
- cost: float
- budget_remaining: float
- trace_len: int
Behavior
- Pre‑call budget check. If insufficient funds for next call → escalate(reason=budget).
- If enough budget, invoke provider and record latency/cost.
- Verifier gates risky requests (risk_score > threshold) → escalate(reason=verifier).
- If verifier passes, accept result, deduct cost, update budget.
- Log a compact trace (costs, latency, decision, budget_remaining).
Telemetry (per request)
- request_id (implicit), t_start, latency_ms, cost, risk_score, decision, budget_remaining.
ASCII Flow [request] ↓ [budget check] —no→ [escalate: budget] yes ↓ [model call] → [verifier] —gate→ [escalate: verifier] pass ↓ [ok]
Minimal Harness (Python)
import json, time
class Router:
def __init__(self, budget_dollars=0.15, cost_per_call=0.10, verifier_threshold=0.3):
self.budget_dollars = budget_dollars
self.cost_per_call = cost_per_call
self.verifier_threshold = verifier_threshold
self.budget_remaining = budget_dollars
self.trace = []
self.accepted_calls = 0
self.total_cost = 0.0
self.any_verifier_triggered = False
self.any_budget_block = False
self.escalations = []
def risk_score(self, prompt: str) -> float:
return 0.9 if 'risky' in prompt.lower() else 0.1
def verifier(self, output: str, prompt: str) -> bool:
return self.risk_score(prompt) > self.verifier_threshold
def model_call(self, prompt: str):
time.sleep(0.005)
return {'output': f'out:{prompt[:20]}', 'latency_ms': 5, 'cost': self.cost_per_call}
def attempt(self, prompt: str):
ts = time.time(); event = {'t': ts, 'prompt': prompt}
if self.budget_remaining < self.cost_per_call:
self.any_budget_block = True
event['action'] = 'escalate_budget'; event['reason'] = 'insufficient_budget'
self.escalations.append({'type': 'budget', 'prompt': prompt}); self.trace.append(event)
return {'status': 'escalated', 'why': 'budget'}
resp = self.model_call(prompt); event['action'] = 'call'; event['latency_ms'] = resp['latency_ms']; event['cost'] = resp['cost']
if self.verifier(resp['output'], prompt):
self.any_verifier_triggered = True
event['verifier'] = 'triggered'; event['action'] = 'escalate_verifier'
self.escalations.append({'type': 'verifier', 'prompt': prompt}); self.trace.append(event)
return {'status': 'escalated', 'why': 'verifier'}
self.accepted_calls += 1; self.total_cost += resp['cost']; self.budget_remaining -= resp['cost']
event['verifier'] = 'clean'; event['accepted'] = True; event['budget_remaining'] = round(self.budget_remaining, 4)
self.trace.append(event); return {'status': 'ok'}
router = Router(budget_dollars=0.15, cost_per_call=0.10, verifier_threshold=0.3)
# Sequence: risky → safe → safe
router.attempt('risky_prompt with sensitive intent') # verifier
router.attempt('safe_prompt') # spend 0.10
router.attempt('another safe_prompt') # budget cap
result = {
'budget': router.budget_dollars,
'cost_per_call': router.cost_per_call,
'accepted_calls': router.accepted_calls,
'total_cost': round(router.total_cost, 2),
'budget_remaining': round(router.budget_remaining, 2),
'cost_capped': router.any_budget_block,
'verifier_triggered': router.any_verifier_triggered,
'escalated': 'human' if (router.any_budget_block or router.any_verifier_triggered) else 'none',
'trace_len': len(router.trace),
}
print(json.dumps(result, indent=2))
Sample Output
{
"budget": 0.15,
"cost_per_call": 0.1,
"accepted_calls": 1,
"total_cost": 0.1,
"budget_remaining": 0.05,
"cost_capped": true,
"verifier_triggered": true,
"escalated": "human",
"trace_len": 3
}
Notes
- v0 focuses on the outer loop behaviors (budget, gate, escalate).
- Extend next: multiple providers/modes, per‑provider budgets, richer verifiers, and telemetry sink.