Skip to main content
Use run results to compare candidate behavior across dataset rows. The default view is a result-cell table. Each row represents one dataset row run against one candidate.

Result columns

Result tables can include:
ColumnMeaning
Dataset rowThe row position from the dataset version
CandidateThe prompt or agent candidate
StatusPending, running, completed, failed, cancelled, or another run state
Output previewShort preview of the generated output
DurationTime spent on the result cell
TokensToken usage when available
CostCost recorded for the result cell
Request IDThe linked request created by the normal prompt or agent execution flow

How do I filter results?

Use the filters at the top of the result view. You can filter by candidate, status, dataset row, or text search. Use filters when you want to focus on failures, compare one candidate at a time, or inspect a specific case.

How do I export results?

Click Export CSV from the run results table. The export includes every result that matches the current filters, not only the visible page. CSV exports include dataset inputs, expected output, metadata, candidate details, output or error text, request IDs, timing, token usage, cost, and timestamps.

How do I inspect a result?

Click a result row. Fetch Hive opens the same request detail sheet used by the normal logs area. This keeps experiment review aligned with live prompt, workflow, and agent request inspection instead of using a separate experiment-only detail surface. Use request details when you need provider settings, inputs, metadata, trace availability, cost, timing information, completions, workflow runs, or agent run context. If a result failed before a request was created, the row remains visible with the failure status and any stored output or error summary.

How expected output appears

If the dataset row includes expected_output, use it as the reference answer while reviewing results. Evaluator execution is not enabled yet. This means Fetch Hive does not currently mark a result correct or incorrect automatically. See also: Datasets, Run an experiment, and Log history