Context Management
Overview
Long conversations can exceed the model's context window. fm.cr provides tools to monitor context usage, export and restore conversation state, and automatically compact sessions when needed.
Transcripts
Export the current conversation as JSON for persistence or analysis:
# Save conversation state
json = session.transcript_json
# Restore later
restored = Fm::Session.from_transcript(model, json)
response = restored.respond("Continue our conversation.")
# Restore with instructions and tools
restored = Fm::Session.from_transcript(model, json,
instructions: "You are a helpful assistant.",
tools: [my_tool]
)
Convert a transcript to readable text:
text = Fm.transcript_to_text(session.transcript_json)
puts text
Context Usage Estimation
Monitor how much of the context window is consumed:
limit = Fm::ContextLimit.default_on_device # 4096 tokens
usage = Fm.context_usage_from_transcript(session.transcript_json, limit)
puts "Estimated tokens: #{usage.estimated_tokens}"
puts "Available tokens: #{usage.available_tokens}"
puts "Utilization: #{(usage.utilization * 100).round(1)}%"
puts "Over limit: #{usage.over_limit?}"
ContextLimit
Configure the context window parameters:
# Default on-device limit (4096 tokens)
limit = Fm::ContextLimit.default_on_device
# Custom limit
limit = Fm::ContextLimit.new(
max_tokens: 4096,
reserved_response_tokens: 512,
chars_per_token: 4
)
ContextUsage
The ContextUsage struct provides these fields:
| Field | Type | Description |
|---|---|---|
estimated_tokens |
Int32 |
Estimated tokens consumed by transcript |
max_tokens |
Int32 |
Maximum tokens configured |
reserved_response_tokens |
Int32 |
Tokens reserved for next response |
available_tokens |
Int32 |
Estimated tokens remaining |
utilization |
Float32 |
Usage ratio (0.0 - 1.0+) |
over_limit? |
Bool |
Whether estimate exceeds budget |
Automatic Compaction
When a conversation gets too long, compact it by summarizing earlier messages and starting a fresh session:
limit = Fm::ContextLimit.default_on_device
if result = Fm.compact_session_if_needed(model, session, limit, base_instructions: "Be helpful.")
session = result.session
puts "Compacted. Summary: #{result.summary}"
end
The method returns nil if compaction is not needed (context is within limits).
CompactionConfig
Customize the compaction behavior:
config = Fm::CompactionConfig.new(
chunk_tokens: 800,
max_summary_tokens: 400,
instructions: "Summarize the conversation concisely.",
summary_options: Fm::GenerationOptions.new(
temperature: 0.2,
max_response_tokens: 256_u32
)
)
result = Fm.compact_session_if_needed(model, session, limit, config: config)
Manual Compaction
Compact a transcript directly without checking limits:
summary = Fm.compact_transcript(model, session.transcript_json)
puts summary
Create a new session from a summary:
new_session = Fm.session_from_summary(model, "Be helpful.", summary)
Token Estimation
Estimate tokens for arbitrary text:
tokens = Fm.estimate_tokens("Hello, world!", chars_per_token: 4)
puts tokens # => 4