Agent Snapshot
Samantha customer support specialist portrait
Samantha
Customer Support Specialist
Today
$33.08
92 resolved tickets
Pass Rate
94%
Across live eval checks
Review Queue
4 items
2 policy, 2 knowledge edits
Cost Per Day
7 Day Spend
MonTueWedThuFriSatSun
Mon
$31.84
Tokens$10.23
Voice$21.17
Web search$0.44
Tue
$28.62
Tokens$9.08
Voice$19.15
Web search$0.39
Wed
$36.41
Tokens$11.62
Voice$24.27
Web search$0.52
Thu
$42.18
Tokens$13.20
Voice$28.50
Web search$0.48
Fri
$38.95
Tokens$12.81
Voice$25.59
Web search$0.55
Sat
$26.47
Tokens$8.12
Voice$17.99
Web search$0.36
Sun
$33.08
Tokens$9.26
Voice$23.24
Web search$0.58
Models Used
OpenAI 5.3 mini
Primary ticket resolution and email replies
52%
Anthropic Sonnet 2.6
Escalation handling and approval drafts
31%
Kimi 2.5
Knowledge lookups and research-heavy flows
17%
Live Evals
Answer relevancy94%
Resolution accuracy91%
Escalation correctness97%
Policy adherence89%
Voice
70% of today's spend
Tokens
28% of today's spend
Web Search
$0.58 today
Monitoring & Improvement

See what each agent costs and how it performs

Monitor cost per day, inspect where spend came from, compare model mix, and use evals plus operator review to keep each specialist improving over time.

What Monitoring & Improvement Covers
Cost visibility

Break down daily spend across tokens, voice, and web search so teams can see what the role actually costs to run.

Model mix and routing

Track which models handled the work and where a different route can improve cost, speed, or quality.

Evals and operator review

Measure pass rates, catch regressions, and keep people attached to the outputs that still need judgment.

Think operating dashboard, not black box.

What Gets Monitored

The feedback loop behind every agent

Monitoring & Improvement makes cost, quality, model usage, and approvals visible in one place so operators can tune the role with evidence instead of guesswork.

Cost and usage

Track spend by day, role, and workflow, then inspect which parts came from tokens, voice, or web search.

Models and routing

See which models handled the work, how traffic is split, and where a lower-cost or higher-quality route makes sense.

Evals and quality

Measure pass rates, catch regressions, and compare quality signals across the work the agent is actually doing.

Review and approvals

Keep operators in the loop on sensitive outputs, failed checks, and the decisions that should never auto-run.

Models And Evals

Compare cost, quality, and pass rates in one operating view

Use evals and model comparison together so teams can see which routes are cheaper, which are sharper, and where a change actually improves the work.

MetricGPT-5 miniGPT-5 nanoGPT-OSS 120B FireworksGPT-OSS 20B Fireworks
Total Cost$0.659436$0.225682$0.542208$0.390457
Avg Request Duration14.020s11.986s10.805s10.908s
Total Tokens1,374,8321,654,8963,304,3045,248,592
Total Error0000
Total Calls376392536616
Percentage Passed81.12%73.72%71.27%55.84%
answer relevancy
GPT-5 mini
GPT-5 nano
GPT-OSS 120B Fireworks
GPT-OSS 20B Fireworks
Pass
Fail
Citation Check
GPT-5 mini
GPT-5 nano
GPT-OSS 120B Fireworks
GPT-OSS 20B Fireworks
Monitoring & Improvement - BotDojo