Skip to content

0DevDutt0/MemoryMesh

Repository files navigation

Typing SVG


Python FastAPI MCP SQLite Tests License


What is MemoryMesh?

Most AI agents are amnesiac by default — every conversation starts from zero. MemoryMesh solves this by providing a production-grade memory layer that any agent can plug into via MCP or REST API.

It stores memories across four cognitive types, retrieves them with a hybrid semantic + recency + importance ranking, compresses old memories using LLMs (Groq for single-memory summarisation, Mistral for cluster synthesis), and models forgetting using the Ebbinghaus retention curve so memories fade realistically over time.

Built from scratch in Python 3.13 · 35 files · 51 tests · 0 TODOs


Architecture

graph TB
    subgraph Clients["🖥️ Clients"]
        CC["Claude Code / Cursor"]
        AG["Custom Agents"]
        DB["Streamlit Dashboard"]
    end

    subgraph Transport["🔌 Transport Layer"]
        MCP["MCP stdio server<br/>(7 tools)"]
        REST["FastAPI REST API<br/>(12 endpoints)"]
    end

    subgraph Core["⚙️ MemoryMesh Core"]
        STORE["MemoryStore<br/>CRUD + Embeddings"]
        RET["Retriever<br/>Hybrid Scoring"]
        COMP["Compressor<br/>Tier-1 Groq · Tier-2 Mistral"]
        DECAY["DecayEngine<br/>Ebbinghaus Curve"]
        EMB["Embedder<br/>bge-large-en-v1.5 · 1024-dim"]
    end

    subgraph Storage["💾 Persistence"]
        SQL["SQLite WAL<br/>memories · embeddings<br/>compression_log · access_log"]
        FAISS["FAISS IndexFlatIP<br/>(numpy fallback)"]
    end

    CC -->|JSON-RPC stdio| MCP
    AG -->|HTTP| REST
    DB -->|HTTP| REST
    MCP --> STORE
    MCP --> RET
    REST --> STORE
    REST --> RET
    REST --> COMP
    STORE --> SQL
    STORE --> EMB
    RET --> FAISS
    RET --> EMB
    COMP -->|Groq llama-3.1-8b-instant| STORE
    COMP -->|Mistral mistral-small| STORE
    DECAY -->|asyncio background task| STORE
Loading

The Four Memory Types

Type Icon Description Real-world Analogy Decay Speed
episodic 🕐 Time-stamped events "I talked to Alice about X yesterday" Fast (×1.0)
semantic 📚 Permanent facts "Alice is a Python engineer at Google" Slow (×2.0)
procedural ⚙️ Skills & how-tos "To deploy FastAPI: uvicorn main:app..." Slowest (×3.0)
preference ❤️ User patterns "Alice prefers concise bullet-point answers" Medium-slow (×2.5)

Each type has a tuned stability multiplier in the Ebbinghaus forgetting curve, so skills outlast events, and facts outlast episodes — just like human memory.


Hybrid Retrieval Pipeline

flowchart LR
    Q["🔍 Query"] --> EMB["Embed with\nbge-large-en-v1.5"]
    EMB --> FAISS["FAISS / numpy\ncosine similarity"]
    FAISS --> SEM["Semantic\nScore (×0.5)"]

    Q --> TIME["Time since\nlast access"]
    TIME --> REC["Recency\nScore (×0.3)"]

    Q --> IMP["importance field\n+ access_count boost"]
    IMP --> IMPS["Importance\nScore (×0.2)"]

    SEM --> BLEND["Weighted\nBlend"]
    REC --> BLEND
    IMPS --> BLEND
    BLEND --> RANK["Re-rank & Return\nTop-K Results"]
    RANK --> LOG["Increment\naccess_count"]
Loading

Final score formula:

score = 0.5 × cosine_sim  +  0.3 × exp(-λ·days)  +  0.2 × (importance + min(0.02·accesses, 0.3))

Weights are configurable per-query via semantic_weight, recency_weight, importance_weight.


Hierarchical Compression

Memories are automatically compressed on a nightly schedule to keep the store lean and token-efficient:

Week 1-7    [Fresh memories — full content stored]
              │
              ▼  Tier 1 (age > 7 days, not recently accessed)
Week 1+     [llama-3.1-8b-instant via Groq]
              "Compress to 2-3 sentences preserving key facts"
              → original preserved in compression_log
              → is_compressed = True
              │
              ▼  Tier 2 (cluster merge, ≥ 3 episodic memories)
              [mistral-small-latest via Mistral AI]
              "Synthesize N episodic memories → 1 semantic memory"
              → source memories marked is_compressed
              → new semantic memory created with importance = 0.8

The compression_log table records every compression event with original content, timestamp, and model used — making compression fully auditable and reversible.


The Ebbinghaus Forgetting Curve

Each memory has a decay_score updated every 6 hours by the background DecayEngine:

$$R = e^{-t/S}$$

Where:

  • R = retention score ∈ [0, 1]
  • t = days since last access
  • S = stability = (1/λ) × type_multiplier × (1 + importance) × (1 + log(1 + accesses) × 0.5)
Retention
   1.0 ┤
   0.9 ┤·····  ← procedural (skills, multiplier=3.0)
   0.8 ┤    ·····  ← semantic (facts, multiplier=2.0)
   0.7 ┤        ·····  ← preference (patterns, multiplier=2.5)
   0.5 ┤              ·····
   0.3 ┤                   ·····  ← episodic (events, multiplier=1.0)
   0.1 ┤                         ·····
   0.0 ┤──────────────────────────────── days
       0    7    14   30   60   90

High-importance + frequently-accessed memories gain extra stability — your "important things" stick around.


Quick Start

# 1. Clone & install
git clone https://github.com/0DevDutt0/MemoryMesh.git
cd MemoryMesh
pip install -e ".[dev]"

# 2. Configure
cp .env.example .env
# Add your GROQ_API_KEY and MISTRAL_API_KEY

# 3. Start the REST API
uvicorn memorymesh.api.main:app --reload
# → http://localhost:8000/docs (interactive Swagger UI)

# 4. Start the MCP server (for Claude Code / Cursor)
python -m memorymesh.mcp.server

# 5. Launch the dashboard
streamlit run dashboard/app.py
# → http://localhost:8501

Store & Retrieve in 10 Lines

import httpx, asyncio

API = "http://localhost:8000/v1"

async def main():
    async with httpx.AsyncClient() as c:
        # Store a fact about the user
        await c.post(f"{API}/memories/", json={
            "content": "User prefers FastAPI over Flask for async workloads.",
            "agent_id": "my-agent",
            "memory_type": "preference",
            "importance": 0.9,
        })

        # Retrieve it semantically
        results = (await c.post(f"{API}/memories/search", json={
            "query": "what web framework does the user prefer?",
            "agent_id": "my-agent",
            "k": 3,
        })).json()

        for r in results:
            print(f"[{r['rank']}] {r['score']:.3f}{r['memory']['content']}")

asyncio.run(main())
[1] 0.874 — User prefers FastAPI over Flask for async workloads.

Demo — See It In Action

Demo Sample Inputs Sample Outputs

Start the API server, then run the end-to-end demo in one command:

# Terminal 1 — start the REST API
uvicorn memorymesh.api.main:app --reload

# Terminal 2 — full lifecycle demo (store · search · graph · update · stats)
python Demo/quickstart.py

Quickstart Terminal Output

┌────────────────────┐
│  1. Health Check   │
└────────────────────┘
{"status": "ok", "ts": "2026-06-15T14:00:00.123456"}

┌──────────────────────────┐
│  2. Storing 7 Memories   │
└──────────────────────────┘
  ✓ [semantic     ] id=a1b2c3d4…  importance=0.95
  ✓ [episodic     ] id=3f8a1c2d…  importance=0.90
  ✓ [procedural   ] id=b2c3d4e5…  importance=0.85
  ✓ [preference   ] id=c3d4e5f6…  importance=0.80
  ✓ [semantic     ] id=d4e5f6a7…  importance=0.75
  ✓ [episodic     ] id=e5f6a7b8…  importance=0.70
  ✓ [procedural   ] id=f6a7b8c9…  importance=0.80

┌──────────────────────────────────────────────────────┐
│  4. Semantic Search — 'user communication style'     │
└──────────────────────────────────────────────────────┘
  #1 score=0.8214  [preference]
     The user prefers concise answers, dark mode UI, Python over JavaScript…
  #2 score=0.7123  [semantic]
     Devdutt S is a software engineer from Kochi, Kerala. GitHub: 0DevDutt0…

┌──────────────────────────────────────────────┐
│  6. Memory Graph (threshold=0.7)             │
└──────────────────────────────────────────────┘
  Nodes: 7  |  Edges: 3
  Edge  weight=0.8123  a1b2c3d4… ↔ 3f8a1c2d…
  Edge  weight=0.7891  b2c3d4e5… ↔ f6a7b8c9…
  Edge  weight=0.7234  d4e5f6a7… ↔ a1b2c3d4…

┌──────────────────────┐
│  9. Final Stats      │
└──────────────────────┘
  Total memories : 7
  Compressed     : 0
  Avg decay score: 0.9981
  By type        : {semantic: 2, episodic: 2, procedural: 2, preference: 1}

✅  Demo complete — all operations succeeded.

Store → Search JSON Round-Trip

The core pattern: store a memory, retrieve it semantically — no keyword overlap required.

1. Store a preference:

curl -X POST http://localhost:8000/v1/memories/ \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
    "agent_id": "demo-agent",
    "memory_type": "preference",
    "importance": 0.8
  }'
{
  "id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
  "agent_id": "demo-agent",
  "memory_type": "preference",
  "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
  "importance": 0.8,
  "access_count": 0,
  "decay_score": 1.0,
  "is_compressed": false,
  "created_at": "2026-06-15T14:04:00.000000"
}

2. Search with a semantically different query (zero keyword overlap):

curl -X POST http://localhost:8000/v1/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what coding language does the user enjoy and how do they like responses formatted?",
    "agent_id": "demo-agent",
    "k": 3
  }'
[
  {
    "memory": {
      "id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
      "memory_type": "preference",
      "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
      "importance": 0.8,
      "access_count": 1,
      "decay_score": 0.9981
    },
    "score": 0.821406,
    "rank": 1
  }
]

The query "what coding language does the user enjoy…" matched with score 0.821 despite sharing zero keywords with the stored memory — pure cosine similarity in 1024-dimensional bge-large embedding space.


Ebbinghaus Decay In Action

The same memory stored under different types and access patterns decays at very different rates:

Days since access Memory Type Importance Accesses Retention
0 any any any 1.000
7 episodic 0.5 0 0.703
7 semantic 0.5 0 0.837
30 episodic 0.5 0 0.135
30 procedural 0.9 10 0.912
90 semantic 0.9 20 0.831
90 episodic 0.3 1 0.003

A frequently-accessed skill (procedural, importance=0.9, 10 accesses) retains 91% after 30 days.
A one-off low-importance event (episodic, 1 access) fades to 0.3% after 90 days — modelling human forgetting mathematically.


Demo Files

File What's inside
Demo/quickstart.py End-to-end Python script — stores 7 memories across all four types, semantic search, type-filtered search, memory graph, update, list, and stats
Demo/SAMPLE_INPUTS.md 15 annotated curl examples: health check, all 4 memory types, batch store, semantic search, type-filtered search, get/update/delete, graph, stats, tier-1 & tier-2 compression, MCP tool calls
Demo/sample_outputs.json Canonical JSON responses for every operation — useful as an API contract reference or test fixture

API Reference

Method Endpoint Description
POST /v1/memories/ Store a single memory
POST /v1/memories/search Hybrid semantic search
POST /v1/memories/batch Batch store (up to 100)
GET /v1/memories/{id} Get memory by ID
PATCH /v1/memories/{id} Update content / importance / metadata
DELETE /v1/memories/{id} Delete permanently
GET /v1/memories/agent/{id} List all memories for an agent
POST /v1/memories/agent/{id}/graph Semantic similarity graph (for viz)
GET /v1/stats Global memory statistics
POST /v1/compress/trigger Run auto-compression now
POST /v1/compress/memory/{id}/tier1 Compress single memory (Groq)
POST /v1/compress/agent/{id}/tier2 Cluster merge (Mistral)
GET /v1/compress/log Compression audit history
GET /health Liveness probe

Full interactive docs at http://localhost:8000/docs (Swagger UI) and /redoc (ReDoc).


MCP Integration

Add to your Claude Code / Cursor MCP config:

{
  "mcpServers": {
    "memorymesh": {
      "command": "python",
      "args": ["-m", "memorymesh.mcp.server"],
      "cwd": "/path/to/MemoryMesh"
    }
  }
}

7 tools exposed to the LLM:

Tool What it does
store_memory Save a new memory (type + importance + metadata)
retrieve_memories Semantic search with optional type filter
delete_memory Hard delete by ID
update_memory Edit content / importance / metadata in-place
list_memories Browse all memories for an agent
get_memory_stats Token-efficient stats snapshot
compress_agent_memories Trigger cluster merge for an agent

Streamlit Dashboard

Four interactive pages accessible at http://localhost:8501:

Page What you see
🔍 Search Explorer Live hybrid retrieval with weight sliders · Store new memories
🕸️ Memory Graph pyvis semantic network · colour-coded by type · edge weight = cosine similarity
🗜️ Compression Monitor Timeline of compressions · Token savings · Manual trigger
📉 Decay Visualizer Interactive Plotly retention curves · Adjust λ, importance, access count

Tech Stack

Layer Technology Why
Language Python 3.13 Async-native, type hints, StrEnum
API FastAPI + uvicorn Auto-docs, async, Pydantic v2 validation
Database aiosqlite (SQLite WAL) Zero-dependency, async, ACID, BLOB storage
Embeddings BAAI/bge-large-en-v1.5 Best open-source embedding (1024-dim, MTEB top-5)
Vector Search FAISS IndexFlatIP Exact cosine search, optional numpy fallback
LLM Tier-1 Groq llama-3.1-8b-instant Sub-second summarisation, free tier
LLM Tier-2 Mistral mistral-small High-quality cluster synthesis
MCP mcp SDK 1.27 stdio JSON-RPC, works with Claude Code / Cursor
Dashboard Streamlit + pyvis + Plotly Interactive memory exploration
Testing pytest-asyncio, MagicMock 51 tests, in-memory SQLite, no GPU in CI
CI GitHub Actions lint (ruff) + test matrix on ubuntu

Testing

pytest tests/ -v
✓ test_compressor.py   8 tests  — LLM compression (mocked Groq + Mistral clients)
✓ test_decay.py       13 tests  — Ebbinghaus formula + DecayEngine lifecycle
✓ test_retriever.py   14 tests  — Search, filters, agent isolation, graph
✓ test_store.py       16 tests  — CRUD, access tracking, embedding round-trip
─────────────────────────────────
51 passed in 0.63s

Key design decisions:

  • No GPU required in CIEmbedder is mocked with a deterministic hash-seeded numpy vector
  • No real LLM calls in testsAsyncGroq and Mistral clients are patched via unittest.mock
  • Isolated databases — every test fixture uses aiosqlite :memory:, with automatic teardown
  • asyncio_mode = "auto" — all async def tests run automatically without @pytest.mark.asyncio

Configuration

Copy .env.example.env and fill in your keys:

# Required for compression
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...

# Tunable parameters
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5   # swap for lighter all-MiniLM-L6-v2
DECAY_RATE=0.1                            # λ in the forgetting curve
COMPRESS_AGE_DAYS=7.0                     # memories older than this get tier-1 compressed
COMPRESS_CLUSTER_SIZE=20                  # episodic memories per tier-2 merge
DECAY_RUN_INTERVAL_HOURS=6.0             # background decay update frequency

Project Layout

memorymesh/
├── core/           config · logging · aiosqlite database
├── memory/         types · embedder · store · retriever · decay · compressor
├── schemas/        Pydantic request / response models
├── api/            FastAPI app · 3 routers (memories, compress, health)
└── mcp/            MCP stdio server (7 tools)
dashboard/          Streamlit 4-page UI
tests/              51 async tests, mock embedder, in-memory DB
Demo/               Runnable quickstart · curl examples · sample JSON
.github/workflows/  CI: lint (ruff) + pytest matrix

Why MemoryMesh?

Feature MemoryMesh mem0 Zep ChromaDB alone
Four semantic memory types
MCP-native (Claude / Cursor)
Ebbinghaus forgetting curve
Hierarchical LLM compression Partial Partial
REST API + dashboard
Zero infrastructure (SQLite)
Open source, no usage fees Partial Partial

Built with curiosity and production instincts by Devdutt S · Kochi, India

GitHub LinkedIn

About

Production-grade persistent memory server for AI agents. MCP-compatible (Claude Code, Cursor). Four memory types (episodic/semantic/procedural/preference), hybrid vector retrieval (FAISS + bge-large), hierarchical LLM compression (Groq + Mistral), Ebbinghaus forgetting curve, FastAPI + Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages