Agent Snapshot

Samantha

Customer Support Specialist

Today

$33.08

92 resolved tickets

Pass Rate

94%

Across live eval checks

Review Queue

4 items

2 policy, 2 knowledge edits

Cost Per Day

7 Day Spend

MonTueWedThuFriSatSun

Mon

$31.84

Tokens$10.23

Voice$21.17

Web search$0.44

Tue

$28.62

Tokens$9.08

Voice$19.15

Web search$0.39

Wed

$36.41

Tokens$11.62

Voice$24.27

Web search$0.52

Thu

$42.18

Tokens$13.20

Voice$28.50

Web search$0.48

Fri

$38.95

Tokens$12.81

Voice$25.59

Web search$0.55

Sat

$26.47

Tokens$8.12

Voice$17.99

Web search$0.36

Sun

$33.08

Tokens$9.26

Voice$23.24

Web search$0.58

Models Used

OpenAI 5.3 mini

Primary ticket resolution and email replies

52%

Anthropic Sonnet 2.6

Escalation handling and approval drafts

31%

Kimi 2.5

Knowledge lookups and research-heavy flows

17%

Live Evals

Answer relevancy94%

Resolution accuracy91%

Escalation correctness97%

Policy adherence89%

Voice

70% of today's spend

Tokens

28% of today's spend

Web Search

$0.58 today

Monitoring & Improvement

See what each agent costs and how it performs

Monitor cost per day, inspect where spend came from, compare model mix, and use evals plus operator review to keep each specialist improving over time.

What Monitoring & Improvement Covers

Cost visibility

Break down daily spend across tokens, voice, and web search so teams can see what the role actually costs to run.

Model mix and routing

Track which models handled the work and where a different route can improve cost, speed, or quality.

Evals and operator review

Measure pass rates, catch regressions, and keep people attached to the outputs that still need judgment.

Think operating dashboard, not black box.

Book a demo See customer stories

What Gets Monitored

The feedback loop behind every agent

Monitoring & Improvement makes cost, quality, model usage, and approvals visible in one place so operators can tune the role with evidence instead of guesswork.

Cost and usage

Track spend by day, role, and workflow, then inspect which parts came from tokens, voice, or web search.

Models and routing

See which models handled the work, how traffic is split, and where a lower-cost or higher-quality route makes sense.

Evals and quality

Measure pass rates, catch regressions, and compare quality signals across the work the agent is actually doing.

Review and approvals

Keep operators in the loop on sensitive outputs, failed checks, and the decisions that should never auto-run.

Models And Evals

Compare cost, quality, and pass rates in one operating view

Use evals and model comparison together so teams can see which routes are cheaper, which are sharper, and where a change actually improves the work.

Metric	GPT-5 mini	GPT-5 nano	GPT-OSS 120B Fireworks	GPT-OSS 20B Fireworks
Total Cost	$0.659436	$0.225682	$0.542208	$0.390457
Avg Request Duration	14.020s	11.986s	10.805s	10.908s
Total Tokens	1,374,832	1,654,896	3,304,304	5,248,592
Total Error	0	0	0	0
Total Calls	376	392	536	616
Percentage Passed	81.12%	73.72%	71.27%	55.84%

answer relevancy

GPT-5 mini

GPT-5 nano

GPT-OSS 120B Fireworks

GPT-OSS 20B Fireworks

Pass

Fail

Citation Check

GPT-5 mini

GPT-5 nano

GPT-OSS 120B Fireworks

GPT-OSS 20B Fireworks

Explore the current monitoring surfaces

These pages connect monitoring and improvement back to measurable outcomes and rollout planning.

Customer Stories

See how teams turn cost visibility, QA coverage, and measurable ROI into production proof.

Pricing

See how platform access, included usage, and rollout support are packaged for ongoing improvement.