Skip to main content
Use Experiments when you want to compare prompts or agents against the same set of inputs. An experiment lets you:
  • upload or select a dataset
  • add prompt and agent candidates
  • run every dataset row against every candidate
  • review outputs, usage, cost, and failures in one place
Experiments are useful when you want to test quality before publishing a change, compare models or prompt approaches, or run the same cases against an agent and a prompt.

What you’ll find here

  • Datasets — Upload CSV files, map columns, and understand dataset format
  • Add candidates — Add dashboard prompts, deployed prompts, and agents
  • Build an experiment — Create an experiment and prepare it for a run
  • Run an experiment — Start, track, and cancel experiment runs
  • Review results — Compare outputs, open request details, and inspect failures
  • Run analytics — Compare run cost, tokens, latency, and success rate
  • Evaluators — Understand current evaluator status and planned evaluator types

How experiments work

An experiment combines a dataset with one or more candidates. A dataset is a set of rows. Each row contains input values, optional expected output, and optional metadata. A candidate is the prompt or agent you want to test. Fetch Hive captures a snapshot when you add the candidate so later edits to the source prompt or agent do not change that candidate inside the experiment. A run executes the dataset against the candidates. If you have 100 dataset rows and three candidates, the run has 300 result cells. Your current plan limits how many result cells a new run can create. Existing experiments and past runs remain available if your plan changes, but new runs must fit your current plan. Each result cell stores the candidate output, status, duration, usage, cost, and links to request details when available.

Current scope

Experiments currently target prompts and agents. Dataset upload supports CSV files in the dashboard. Server-side imports, evaluator execution, workflow candidates, experiment-local model overrides, and custom evaluator code are planned future additions. See also: Prompts, Agents, and Log history