Production eval engine for Ultravox

Turn live
voice calls into
a sharper agent.

Trelx watches real calls, catches exact failures, generates safer prompt fixes, and synthesizes stronger blueprints from what broke in production.

Calls analyzed
500+
in production
Errors detected
60+
transcript-backed
Agents monitored
4
Ultravox agents
Fix latency
<2min
end-to-end
Live control room

Real-time agent watchlist

View live →
Agent
Calls
Analyzed
Errors
Rate
Cold Outreach AI
76
68
12
17.6%
Edifice Properties
20
18
3
16.7%
Ramco Gas
14
12
1
8.3%
Debt Collector
8
7
0
0%
Why Trelx

Built for production.

Transcript-backed

Exact failure evidence

Every miss ties back to the agent turn, the stage, and the quote that caused the problem. No vague metrics.

Simulation-first

Prove fixes before touching prod

Generate the patch, replay it against past calls, and show before/after impact before a human decides.

Blueprint mode

Synthesize what actually works

Trelx turns repeated failure patterns into hardened system prompts built from real production drift.

Workflow

One loop.
Real evidence.
Cleaner agents.

No vanity analytics. No fake QA. Just a dense, production-first loop from live transcript to measurable improvement.

01

Ingest

Pull ended Ultravox calls, transcripts, summaries, and tool traces into Supabase.

02

Evaluate

GPT-4o flags wrong turns, severity, stage, and quote-backed reasoning.

03

Fix

Generate patch suggestions, simulate against history, and surface the safest path.

04

Synthesize

Roll repeated failures into a stronger blueprint for the next agent version.

Get started

Your agents are failing right now.

Trelx is already watching your Ultravox calls. Open the dashboard to see what's broken.