
See what each agent costs and how it performs
Monitor cost per day, inspect where spend came from, compare model mix, and use evals plus operator review to keep each specialist improving over time.
Break down daily spend across tokens, voice, and web search so teams can see what the role actually costs to run.
Track which models handled the work and where a different route can improve cost, speed, or quality.
Measure pass rates, catch regressions, and keep people attached to the outputs that still need judgment.
Think operating dashboard, not black box.
The feedback loop behind every agent
Monitoring & Improvement makes cost, quality, model usage, and approvals visible in one place so operators can tune the role with evidence instead of guesswork.
Cost and usage
Track spend by day, role, and workflow, then inspect which parts came from tokens, voice, or web search.
Models and routing
See which models handled the work, how traffic is split, and where a lower-cost or higher-quality route makes sense.
Evals and quality
Measure pass rates, catch regressions, and compare quality signals across the work the agent is actually doing.
Review and approvals
Keep operators in the loop on sensitive outputs, failed checks, and the decisions that should never auto-run.
Compare cost, quality, and pass rates in one operating view
Use evals and model comparison together so teams can see which routes are cheaper, which are sharper, and where a change actually improves the work.
| Metric | GPT-5 mini | GPT-5 nano | GPT-OSS 120B Fireworks | GPT-OSS 20B Fireworks |
|---|---|---|---|---|
| Total Cost | $0.659436 | $0.225682 | $0.542208 | $0.390457 |
| Avg Request Duration | 14.020s | 11.986s | 10.805s | 10.908s |
| Total Tokens | 1,374,832 | 1,654,896 | 3,304,304 | 5,248,592 |
| Total Error | 0 | 0 | 0 | 0 |
| Total Calls | 376 | 392 | 536 | 616 |
| Percentage Passed | 81.12% | 73.72% | 71.27% | 55.84% |
Explore the current monitoring surfaces
These pages connect monitoring and improvement back to measurable outcomes and rollout planning.