Fractal Glyph Tape

Agent Memory OS: Dense, fractal, cross-lingual phrase memory.

Intelligent memory retrieval for AI agents. Fractal Glyph Tape (FGT) clusters phrases, assigns them glyph codes, and uses foveated memory to deliver the right context at the right time—achieving +46.7pp accuracy gain at a 256-token budget on synthetic multi-turn dialogs.

# Encode text to glyph representation
$ echo "Can you send me that file?" | fgt encode
谷阜
# Decode glyph back to phrase family
$ echo "谷阜" | fgt decode
Phrase family #1247: File-sharing request
• "Can you send me that file?" (en)
• "Mind emailing the document?" (en)
• "你能发给我那个文件吗?" (zh)
• "¿Puedes enviarme ese archivo?" (es)

What's in this repo?

A complete research prototype for phrase-level semantic compression and cross-lingual LLMs

Agent Memory Service

Production-ready REST API for intelligent memory retrieval. Foveated allocation delivers +46.7pp accuracy gain at a 256-token budget on synthetic multi-turn dialogs.

Semantic Compression

Smaller corpora and logs with reconstructable meaning. 50-70% compression on our test corpora while preserving semantic content.

Effective Context Extension

More usable signal per token under fixed context windows. Fit 2.5-4x more semantic content in the same token budget on our internal benchmarks.

Cross-Lingual Bridging

Shared glyph IDs for phrase families spanning multiple languages. 90-95% cross-lingual precision on EN↔ES↔ZH retrieval experiments.

All metrics are from internal experiments; see README and docs/PHASE-5-RESULTS.md for setup and limitations.

Implementation Includes:

Memory Service API – REST endpoints for agent memory read/write
Foveated retrieval – 3-zone allocation (early/relevant/recent)
Memory Console UI – interactive chat with context visualization
Multilingual embeddings & clustering – phrase families with metadata
Glyph encoding system – integer glyph IDs → Mandarin glyph strings
Fractal tape builder – 2D projection + recursive triangular addressing
Hybrid tokenizer – wraps base tokenizer with glyph-aware spans
Benchmark suite – Phase 5 validation with +46.7pp accuracy gains

Why It Matters

Three core capabilities that transform how LLMs handle language

Intelligent Memory Retrieval

  • Foveated allocation strategy: 30% early context, 30% relevant, 40% recent
  • Delivers the right memories at the right time for agent decision-making
  • +46.7pp accuracy improvement over naive truncation under tight budgets

Semantic Compression

  • Replace repeated patterns with short glyph codes
  • Store one shared phrase-family table instead of millions of near-duplicates
  • 50-70% compression while preserving semantic content

Cross-Lingual by Design

  • English, Spanish, Chinese, and other languages sharing the same intent cluster together
  • Glyph IDs act as language-agnostic anchors for retrieval and analysis
  • 90-95% precision across language pairs
+46.7pp
Accuracy Gain (256 tokens)
50-70%
Compression Ratio
90-95%
Cross-Lingual Precision

How It Works

Three steps to a navigable phrase memory

Step 1

Cluster

We embed and cluster phrases into phrase families, keeping examples, statistics, and language labels.

Step 2

Glyph & Fractal

Each family gets a glyph code and a coordinate on a fractal tape—a recursive triangular map of phrase space.

Step 3

Integrate

A hybrid tokenizer and LLM adapter let existing models consume glyph-coded text and learn to expand glyphs into natural language.

Full Pipeline

Corpus
Embeddings
Clusters
Glyph IDs
Fractal Tape
Tokenizer
LLM

Quickstart: Agent Memory API

# Start the memory service
python -m src.memory.service

# Write to agent memory
curl -X POST http://localhost:8000/api/memory/write \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "my-agent", "turn": {...}}'

# Read with foveated retrieval
curl -X POST http://localhost:8000/api/memory/read \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "my-agent", "token_budget": 256}'

# Try the Memory Console
# Visit http://localhost:3000/memory-console

Build Your Own Tape

# 1) Create environment
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2) Build a demo tape
python scripts/run_full_build.py --config configs/demo.yaml

# 3) Try the CLI
echo "Can you send me that file?" | fgt encode
echo "谷阜" | fgt decode

# 4) Launch the visualizer
uvicorn fgt.viz.app:app --reload

For Researchers and Builders

If you care about:

  • Tokenization and representation learning
  • Semantic compression and storage
  • Cross-lingual alignment
  • Long-context LLMs

…then FGT is designed to be picked apart, extended, and argued with.

FGT is research software

We invite feedback, experiments, and extensions. If you're working on tokenization, compression, or cross-lingual LLMs, this is for you.