GitHub
ESC

Context Management

Overview

Long conversations can exceed the model's context window. fm.cr provides tools to monitor context usage, export and restore conversation state, and automatically compact sessions when needed.

Transcripts

Export the current conversation as JSON for persistence or analysis:

# Save conversation state
json = session.transcript_json

# Restore later
restored = Fm::Session.from_transcript(model, json)
response = restored.respond("Continue our conversation.")

# Restore with instructions and tools
restored = Fm::Session.from_transcript(model, json,
  instructions: "You are a helpful assistant.",
  tools: [my_tool]
)

Convert a transcript to readable text:

text = Fm.transcript_to_text(session.transcript_json)
puts text

Context Usage Estimation

Monitor how much of the context window is consumed:

limit = Fm::ContextLimit.default_on_device  # 4096 tokens
usage = Fm.context_usage_from_transcript(session.transcript_json, limit)

puts "Estimated tokens: #{usage.estimated_tokens}"
puts "Available tokens: #{usage.available_tokens}"
puts "Utilization: #{(usage.utilization * 100).round(1)}%"
puts "Over limit: #{usage.over_limit?}"

ContextLimit

Configure the context window parameters:

# Default on-device limit (4096 tokens)
limit = Fm::ContextLimit.default_on_device

# Custom limit
limit = Fm::ContextLimit.new(
  max_tokens: 4096,
  reserved_response_tokens: 512,
  chars_per_token: 4
)

ContextUsage

The ContextUsage struct provides these fields:

Field Type Description
estimated_tokens Int32 Estimated tokens consumed by transcript
max_tokens Int32 Maximum tokens configured
reserved_response_tokens Int32 Tokens reserved for next response
available_tokens Int32 Estimated tokens remaining
utilization Float32 Usage ratio (0.0 - 1.0+)
over_limit? Bool Whether estimate exceeds budget

Automatic Compaction

When a conversation gets too long, compact it by summarizing earlier messages and starting a fresh session:

limit = Fm::ContextLimit.default_on_device

if result = Fm.compact_session_if_needed(model, session, limit, base_instructions: "Be helpful.")
  session = result.session
  puts "Compacted. Summary: #{result.summary}"
end

The method returns nil if compaction is not needed (context is within limits).

CompactionConfig

Customize the compaction behavior:

config = Fm::CompactionConfig.new(
  chunk_tokens: 800,
  max_summary_tokens: 400,
  instructions: "Summarize the conversation concisely.",
  summary_options: Fm::GenerationOptions.new(
    temperature: 0.2,
    max_response_tokens: 256_u32
  )
)

result = Fm.compact_session_if_needed(model, session, limit, config: config)

Manual Compaction

Compact a transcript directly without checking limits:

summary = Fm.compact_transcript(model, session.transcript_json)
puts summary

Create a new session from a summary:

new_session = Fm.session_from_summary(model, "Be helpful.", summary)

Token Estimation

Estimate tokens for arbitrary text:

tokens = Fm.estimate_tokens("Hello, world!", chars_per_token: 4)
puts tokens  # => 4