Skip to main content
Use run analytics on an experiment run detail page to compare how candidates performed inside that specific run. Run analytics are calculated from stored experiment result cells. They do not re-run prompts or agents, and they do not read the current source prompt or agent configuration.

Scope

Run analytics are scoped to one experiment run. Use them when you want to compare candidates on the same dataset version and run conditions.

Metrics

MetricMeaning
Success rateCompleted result cells divided by all result cells in the selected scope
Total costStored cost for all selected result cells
Average costTotal cost divided by selected result cell count
Total tokensStored token usage for all selected result cells
Input tokensStored input token usage
Output tokensStored output token usage
Average tokensTotal tokens divided by selected result cell count
Average durationAverage duration for completed result cells
p95 duration95th percentile duration for completed result cells
Failed result cells still count toward status totals. If a failed cell recorded cost or tokens before failing, those values remain included in cost and token totals. Duration averages and percentiles only use completed result cells.

Candidate charts

Candidate charts group results by experiment candidate. Use these charts to compare:
  • the cheapest candidate
  • the fastest candidate
  • the slowest candidate
  • high-token candidates
  • candidates with more failures

Metadata filters

Analytics can be filtered by dataset row metadata. For example, if your CSV includes metadata.topic, you can filter analytics to a specific topic and compare candidates only for those rows. Metadata filters use dataset row metadata from the experiment run, not request metadata.

TTFT

Time to first token is not shown in experiment analytics yet. Fetch Hive currently shows reliable stored duration, cost, token, and status metrics for experiment results. TTFT will be added after first-token timing is captured consistently across dashboard prompts, deployed prompt invokes, and agents. See also: Review results, Run an experiment, and Datasets