Skip to main content
Auto Summarize keeps long agent conversations working by compressing prior history when it approaches the model’s context limit.

What it does

Every agent has this tool enabled by default. Before each turn, Fetch Hive checks whether the accumulated conversation history is approaching the model’s context window. If it is, the prior turns are automatically summarised into a single compact message, and the agent continues with that summary as its starting context instead of the raw history. The agent retains full awareness of what was discussed — it just works from a condensed version of the earlier turns rather than every token verbatim.

How it fires

Auto Summarize is not a tool the agent calls. It runs as a server-side check before the model ever sees the conversation. The agent and the LLM are unaware of it — from the model’s perspective, it simply receives a well-formed conversation history that fits its context window. When summarization fires during a streaming run, a summary event arrives at the start of the stream before any response tokens:
{
  "type": "summary",
  "summary_text": "The conversation covered AI infrastructure trends. The user asked about evals...",
  "original_token_count": 15234,
  "context_limit": 200000,
  "model": "gpt-4.1",
  "provider": "openai"
}
In the Chat panel inside the agent editor, a Chat summarized accordion appears in the conversation at the point where summarization occurred. Click it to expand and read the full summary text and token counts. See Run with API for how to handle this event in your own integration.

Enabling and disabling

The tool node appears on every agent canvas with a System badge. To disable it for a specific agent:
  1. Select the Auto Summarize node in the editor.
  2. In the settings sheet, switch the toggle to Disabled.
Disabling it means the agent will send the full raw history on every turn. If the conversation grows beyond the model’s context limit, the oldest messages will be truncated by the model provider.

Configuration

There are no per-agent configuration options for this tool. The summarization threshold and the model used to write summaries are set at the platform level by your workspace operator.

Use cases

  • Long support or research conversations that span many turns without losing earlier context.
  • Agents running in thread_id mode where conversations persist across multiple sessions.
  • Any use case where you want the agent to stay coherent over a long interaction without manual history management.

Notes

  • Auto Summarize only fires on persistent threads (calls that include a thread_id). Single-shot calls and stateless history passed via the messages field are not affected.
  • The summarization call is made by Fetch Hive — it does not count against your token usage for that turn.
  • If the summarization service is unavailable for any reason, the agent run proceeds normally with the full history. The feature is fail-open.
  • To test it, use Chat in the agent editor — the Chat summarized accordion appears when the threshold is crossed.