Optimize the Context Window
Use this when you need to control how the runtime manages conversation history to stay within model token limits.
Prerequisites
awakencrate added toCargo.toml- An agent configured with
AgentSpec
ContextWindowPolicy
Every agent has a ContextWindowPolicy that controls how conversation history is managed. Set it on your AgentSpec:
use awaken::ContextWindowPolicy;
let policy = ContextWindowPolicy {
max_context_tokens: 200_000,
max_output_tokens: 16_384,
min_recent_messages: 10,
enable_prompt_cache: true,
autocompact_threshold: Some(100_000),
compaction_mode: ContextCompactionMode::KeepRecentRawSuffix,
compaction_raw_suffix_messages: 2,
};
Fields
| Field | Type | Default | Description |
|---|---|---|---|
max_context_tokens | usize | 200_000 | Model’s total context window size in tokens |
max_output_tokens | usize | 16_384 | Tokens reserved for model output |
min_recent_messages | usize | 10 | Minimum number of recent messages to always preserve, even if over budget |
enable_prompt_cache | bool | true | Whether to enable prompt caching |
autocompact_threshold | Option<usize> | None | Token count that triggers auto-compaction. None disables auto-compaction |
compaction_mode | ContextCompactionMode | KeepRecentRawSuffix | Strategy used when auto-compaction fires |
compaction_raw_suffix_messages | usize | 2 | Number of recent raw messages to preserve in suffix compaction mode |
Truncation
When the conversation exceeds the available token budget, the runtime automatically drops the oldest messages to fit. The budget is calculated as:
available = max_context_tokens - max_output_tokens - tool_schema_tokens
What truncation preserves
- System messages are never truncated. All system messages at the start of the history survive regardless of budget.
- Recent messages – at least
min_recent_messageshistory messages are kept, even if they exceed the budget. - Tool call/result pairs – the split point is adjusted so that an assistant message with tool calls is never separated from its corresponding tool result messages.
- Dangling tool calls – after truncation, any orphaned tool calls (whose results were dropped) are patched to prevent invalid message sequences.
Artifact compaction
Before truncation runs, oversized tool results are compacted automatically. A tool result whose text exceeds ARTIFACT_COMPACT_THRESHOLD_TOKENS (2048 tokens, estimated at ~8192 characters) is truncated to a preview of at most 1600 characters or 24 lines, whichever is shorter. The preview includes a compaction indicator showing the original size.
Non-tool messages (system, user, assistant) are never subject to artifact compaction.
Compaction
Compaction summarizes older conversation history into a condensed summary message, reducing token usage while preserving context. Unlike truncation (which drops messages), compaction replaces them with a summary.
Enabling auto-compaction
Set autocompact_threshold to trigger compaction when total message tokens exceed that value:
let policy = ContextWindowPolicy {
autocompact_threshold: Some(100_000),
compaction_mode: ContextCompactionMode::KeepRecentRawSuffix,
compaction_raw_suffix_messages: 4,
..Default::default()
};
ContextCompactionMode
Two strategies are available:
KeepRecentRawSuffix(default) – keeps the most recentcompaction_raw_suffix_messagesmessages as raw history. Everything before the compaction boundary is summarized.CompactToSafeFrontier– compacts all messages up to the safe frontier (the latest point where all tool call/result pairs are complete).
The compaction boundary is chosen so that no tool call is separated from its result. The boundary finder walks the message history and only places boundaries where all open tool calls have been resolved.
CompactionConfig
The compaction subsystem is configured through CompactionConfig, stored in the agent spec’s sections["compaction"] and read via CompactionConfigKey:
use awaken::CompactionConfig;
let config = CompactionConfig {
summarizer_system_prompt: "You are a conversation summarizer. \
Preserve all key facts, decisions, tool results, and action items. \
Be concise but complete.".into(),
summarizer_user_prompt: "Summarize the following conversation:\n\n{messages}".into(),
summary_max_tokens: Some(1024),
summary_model: Some("claude-3-haiku".into()),
min_savings_ratio: 0.3,
};
| Field | Type | Default | Description |
|---|---|---|---|
summarizer_system_prompt | String | Conversation summarizer prompt | System prompt for the summarizer LLM call |
summarizer_user_prompt | String | "Summarize...\n\n{messages}" | User prompt template; {messages} is replaced with the conversation transcript |
summary_max_tokens | Option<u32> | None | Maximum tokens for the summary response |
summary_model | Option<String> | None | Model for summarization (defaults to the agent’s model) |
min_savings_ratio | f64 | 0.3 | Minimum token savings ratio (0.0-1.0) to accept a compaction |
The compaction pass only runs when the expected savings ratio exceeds min_savings_ratio. A minimum gain of 1024 tokens (MIN_COMPACTION_GAIN_TOKENS) is also required to justify the summarization LLM call.
DefaultSummarizer
The built-in DefaultSummarizer reads prompts from CompactionConfig and supports cumulative summarization. When a previous summary exists, it asks the LLM to update the existing summary with new conversation rather than re-summarizing everything from scratch.
The transcript renderer filters out Visibility::Internal messages before sending to the summarizer, since system-injected context is re-injected each turn and should not be included in summaries.
Summary storage
Compaction summaries are stored as <conversation-summary> tagged internal system messages. On load, trim_to_compaction_boundary drops all messages before the latest summary message, so already-summarized history is never re-loaded into the context window.
Compaction boundaries are tracked durably via CompactionState, recording the summary text, pre/post token counts, and timestamp for each compaction event.
Truncation recovery
When the LLM stops due to MaxTokens with incomplete tool calls (argument JSON was truncated mid-generation), the runtime can automatically retry by injecting a continuation prompt asking the model to break its work into smaller pieces and continue. The retry count is tracked by TruncationState and bounded by a configurable maximum.
Key Files
crates/awaken-contract/src/contract/inference.rs–ContextWindowPolicy,ContextCompactionModecrates/awaken-runtime/src/context/transform/mod.rs–ContextTransform(truncation)crates/awaken-runtime/src/context/transform/compaction.rs– artifact compactioncrates/awaken-runtime/src/context/compaction.rs– boundary finding, load-time trimmingcrates/awaken-runtime/src/context/summarizer.rs–ContextSummarizer,DefaultSummarizercrates/awaken-runtime/src/context/plugin.rs–CompactionPlugin,CompactionConfig,CompactionStatecrates/awaken-runtime/src/context/truncation.rs–TruncationState, continuation prompts