Skip to main content

Building LEANN: Giving Your Agent a 3,628-Document Memory

· 5 min read
Frederico Santana
Founder & Technical Writer, DPO2U

An AI agent without long-term memory is a stateless function call — powerful in isolation, useless for continuity. When I started building DPO2U's six-agent ecosystem, every agent had the same problem: it couldn't remember what happened yesterday. LEANN changed that, but getting there meant debugging chunk sizes, fixing duplicate indexes, and learning why 256-token chunks turn a 2-hour build into a 16-hour one.

The problem

DPO2U's knowledge base spans 3,628 documents: 2,055 Zettelkasten permanent notes, 136 Maps of Content, 42 interdisciplinary bridges, 50+ concept maps, 30+ milestones, and dozens of strategic documents and agent configurations. When the compliance-expert agent needs to answer "what's our position on LGPD Article 41?", it needs to find the relevant permanent notes, cross-reference the bridge between compliance and blockchain domains, and synthesize a grounded answer — not hallucinate one from training data.

Vector search solves this. LEANN (Lightweight Embedding-Augmented Neural Network) is the vector database that gives every agent access to the full knowledge base through semantic search.

The architecture

The pipeline:

  1. Every .md file in /root/DPO2U/ (excluding obsidian-vault/ and leann-env/) is split into chunks
  2. Each chunk is embedded using all-MiniLM-L6-v2 — a sentence-transformer model that runs locally, no API calls
  3. Embeddings are stored in an HNSW (Hierarchical Navigable Small World) index with compact + recompute mode
  4. The index is exposed as an MCP Server tool (leann_search) that any agent can call

Total resource footprint: ~80MB RAM, ~38.4MB on disk, and a build time of roughly 2 hours.

The bugs that cost me a day

Bug #1: Chunk size matters more than you think

The initial configuration used 256-token chunks with 50-token overlap. Mathematically reasonable. Practically disastrous. The small chunk size produced 5,989 batches, and the embedding pass projected to take ~16 hours on the VPS CPU.

Increasing the chunk size to 512 tokens (keeping 50-token overlap) reduced batches to 702 — a 2-hour build. The quality tradeoff is minimal: most Zettelkasten notes are atomic by design (one idea per note), so a 512-token chunk typically captures the entire note. Larger chunks lose precision on long documents, but our longest notes top out at ~2000 tokens.

Chunk sizeChunksBatchesBuild time
256 tokens~35,0005,989~16 hours
512 tokens22,458702~2 hours

Bug #2: Duplicate indexing from subdirectories

The initial build script passed 73 subdirectories individually to the indexer. The result: 10,324 documents indexed — nearly 3x the actual count. Files in nested paths were being indexed multiple times (once from the parent, once from the subdirectory, once from a symlink).

The fix was embarrassingly simple: pass only the root directory /root/DPO2U/ and let the indexer walk the tree. Combined with explicit exclusions for obsidian-vault/ and leann-env/, this produced the correct 3,628 document count.

Bug #3: LLM provider instability

The consolidation step (episodic memory → semantic notes) initially used Z.ai's glm-4-flash model. Two problems: the model didn't actually exist in Z.ai's API, and the account had zero balance. The migration path: Z.ai → Gemini 3.0 Flash via a local Antigravity proxy → eventually Gemini 2.0 Flash via the OpenAI-compatible endpoint.

The lesson: always have a fallback LLM provider configured. LEANN's indexing is provider-independent (local embeddings), but the consolidation step depends on an LLM for summarization.

The mandatory search protocol

LEANN isn't optional for agents — it's mandatory. Every agent follows a "search-before-responding" protocol: before answering any question about the project, infrastructure, past decisions, or vault knowledge, the agent must execute leann_search and use the results as context.

This protocol exists because of a specific failure mode: without it, agents confidently generate plausible-sounding answers that have no basis in the actual knowledge base. A compliance-expert might cite a regulation correctly but miss that DPO2U already has a permanent note with a nuanced interpretation of that regulation. Search-before-respond eliminates this class of errors.

Reindexing and maintenance

LEANN reindexes automatically every 15 minutes via cron. The trigger is a flag file (06-Memory/.leann-reindex-needed) that gets created whenever an agent edits MEMORY.md — a PostToolUse hook handles this. The flag ensures that reindexing only runs when the knowledge base has actually changed.

Weekly consolidation (Sundays at 02:00) uses the LLM to transform raw episodic memory logs into structured semantic notes. These notes then enter the LEANN index at the next reindex cycle, closing the loop: agents create knowledge → knowledge is indexed → agents search knowledge → agents create better knowledge.

What I'd do differently

If I were building LEANN from scratch:

  • Start with 512-token chunks — don't experiment with smaller sizes on a CPU-only machine
  • Index from root only — never pass subdirectories individually
  • Set up the fallback LLM provider on day one — consolidation failing silently is worse than failing loudly

An agent without memory is a suggestion engine. An agent with 3,628 searchable documents is a colleague who actually read the docs.

For the full framework documentation, see LEANN Framework. For how agents are structured, see One Person Unicorn.