GitHub - 0DevDutt0/MemoryMesh: Production-grade persistent memory server for AI agents. MCP-compatible (Claude Code, Cursor). Four memory types (episodic/semantic/procedural/preference), hybrid vector retrieval (FAISS + bge-large), hierarchical LLM compression (Groq + Mistral), Ebbinghaus forgetting curve, FastAPI + Streamlit.

What is MemoryMesh?

Most AI agents are amnesiac by default — every conversation starts from zero. MemoryMesh solves this by providing a production-grade memory layer that any agent can plug into via MCP or REST API.

It stores memories across four cognitive types, retrieves them with a hybrid semantic + recency + importance ranking, compresses old memories using LLMs (Groq for single-memory summarisation, Mistral for cluster synthesis), and models forgetting using the Ebbinghaus retention curve so memories fade realistically over time.

Built from scratch in Python 3.13 · 35 files · 51 tests · 0 TODOs

Architecture

graph TB
    subgraph Clients["🖥️ Clients"]
        CC["Claude Code / Cursor"]
        AG["Custom Agents"]
        DB["Streamlit Dashboard"]
    end

    subgraph Transport["🔌 Transport Layer"]
        MCP["MCP stdio server<br/>(7 tools)"]
        REST["FastAPI REST API<br/>(12 endpoints)"]
    end

    subgraph Core["⚙️ MemoryMesh Core"]
        STORE["MemoryStore<br/>CRUD + Embeddings"]
        RET["Retriever<br/>Hybrid Scoring"]
        COMP["Compressor<br/>Tier-1 Groq · Tier-2 Mistral"]
        DECAY["DecayEngine<br/>Ebbinghaus Curve"]
        EMB["Embedder<br/>bge-large-en-v1.5 · 1024-dim"]
    end

    subgraph Storage["💾 Persistence"]
        SQL["SQLite WAL<br/>memories · embeddings<br/>compression_log · access_log"]
        FAISS["FAISS IndexFlatIP<br/>(numpy fallback)"]
    end

    CC -->|JSON-RPC stdio| MCP
    AG -->|HTTP| REST
    DB -->|HTTP| REST
    MCP --> STORE
    MCP --> RET
    REST --> STORE
    REST --> RET
    REST --> COMP
    STORE --> SQL
    STORE --> EMB
    RET --> FAISS
    RET --> EMB
    COMP -->|Groq llama-3.1-8b-instant| STORE
    COMP -->|Mistral mistral-small| STORE
    DECAY -->|asyncio background task| STORE

The Four Memory Types

Type	Icon	Description	Real-world Analogy	Decay Speed
`episodic`	🕐	Time-stamped events	"I talked to Alice about X yesterday"	Fast (×1.0)
`semantic`	📚	Permanent facts	"Alice is a Python engineer at Google"	Slow (×2.0)
`procedural`	⚙️	Skills & how-tos	"To deploy FastAPI: uvicorn main:app..."	Slowest (×3.0)
`preference`	❤️	User patterns	"Alice prefers concise bullet-point answers"	Medium-slow (×2.5)

Each type has a tuned stability multiplier in the Ebbinghaus forgetting curve, so skills outlast events, and facts outlast episodes — just like human memory.

Hybrid Retrieval Pipeline

flowchart LR
    Q["🔍 Query"] --> EMB["Embed with\nbge-large-en-v1.5"]
    EMB --> FAISS["FAISS / numpy\ncosine similarity"]
    FAISS --> SEM["Semantic\nScore (×0.5)"]

    Q --> TIME["Time since\nlast access"]
    TIME --> REC["Recency\nScore (×0.3)"]

    Q --> IMP["importance field\n+ access_count boost"]
    IMP --> IMPS["Importance\nScore (×0.2)"]

    SEM --> BLEND["Weighted\nBlend"]
    REC --> BLEND
    IMPS --> BLEND
    BLEND --> RANK["Re-rank & Return\nTop-K Results"]
    RANK --> LOG["Increment\naccess_count"]

Final score formula:

score = 0.5 × cosine_sim  +  0.3 × exp(-λ·days)  +  0.2 × (importance + min(0.02·accesses, 0.3))

Weights are configurable per-query via semantic_weight, recency_weight, importance_weight.

Hierarchical Compression

Memories are automatically compressed on a nightly schedule to keep the store lean and token-efficient:

Week 1-7    [Fresh memories — full content stored]
              │
              ▼  Tier 1 (age > 7 days, not recently accessed)
Week 1+     [llama-3.1-8b-instant via Groq]
              "Compress to 2-3 sentences preserving key facts"
              → original preserved in compression_log
              → is_compressed = True
              │
              ▼  Tier 2 (cluster merge, ≥ 3 episodic memories)
              [mistral-small-latest via Mistral AI]
              "Synthesize N episodic memories → 1 semantic memory"
              → source memories marked is_compressed
              → new semantic memory created with importance = 0.8

The compression_log table records every compression event with original content, timestamp, and model used — making compression fully auditable and reversible.

The Ebbinghaus Forgetting Curve

Each memory has a decay_score updated every 6 hours by the background DecayEngine:

$$R = e^{-t/S}$$

Where:

R = retention score ∈ [0, 1]
t = days since last access
S = stability = (1/λ) × type_multiplier × (1 + importance) × (1 + log(1 + accesses) × 0.5)

Retention
   1.0 ┤
   0.9 ┤·····  ← procedural (skills, multiplier=3.0)
   0.8 ┤    ·····  ← semantic (facts, multiplier=2.0)
   0.7 ┤        ·····  ← preference (patterns, multiplier=2.5)
   0.5 ┤              ·····
   0.3 ┤                   ·····  ← episodic (events, multiplier=1.0)
   0.1 ┤                         ·····
   0.0 ┤──────────────────────────────── days
       0    7    14   30   60   90

High-importance + frequently-accessed memories gain extra stability — your "important things" stick around.

Quick Start

# 1. Clone & install
git clone https://github.com/0DevDutt0/MemoryMesh.git
cd MemoryMesh
pip install -e ".[dev]"

# 2. Configure
cp .env.example .env
# Add your GROQ_API_KEY and MISTRAL_API_KEY

# 3. Start the REST API
uvicorn memorymesh.api.main:app --reload
# → http://localhost:8000/docs (interactive Swagger UI)

# 4. Start the MCP server (for Claude Code / Cursor)
python -m memorymesh.mcp.server

# 5. Launch the dashboard
streamlit run dashboard/app.py
# → http://localhost:8501

Store & Retrieve in 10 Lines

import httpx, asyncio

API = "http://localhost:8000/v1"

async def main():
    async with httpx.AsyncClient() as c:
        # Store a fact about the user
        await c.post(f"{API}/memories/", json={
            "content": "User prefers FastAPI over Flask for async workloads.",
            "agent_id": "my-agent",
            "memory_type": "preference",
            "importance": 0.9,
        })

        # Retrieve it semantically
        results = (await c.post(f"{API}/memories/search", json={
            "query": "what web framework does the user prefer?",
            "agent_id": "my-agent",
            "k": 3,
        })).json()

        for r in results:
            print(f"[{r['rank']}] {r['score']:.3f} — {r['memory']['content']}")

asyncio.run(main())

[1] 0.874 — User prefers FastAPI over Flask for async workloads.

Demo — See It In Action

Start the API server, then run the end-to-end demo in one command:

# Terminal 1 — start the REST API
uvicorn memorymesh.api.main:app --reload

# Terminal 2 — full lifecycle demo (store · search · graph · update · stats)
python Demo/quickstart.py

Quickstart Terminal Output

┌────────────────────┐
│  1. Health Check   │
└────────────────────┘
{"status": "ok", "ts": "2026-06-15T14:00:00.123456"}

┌──────────────────────────┐
│  2. Storing 7 Memories   │
└──────────────────────────┘
  ✓ [semantic     ] id=a1b2c3d4…  importance=0.95
  ✓ [episodic     ] id=3f8a1c2d…  importance=0.90
  ✓ [procedural   ] id=b2c3d4e5…  importance=0.85
  ✓ [preference   ] id=c3d4e5f6…  importance=0.80
  ✓ [semantic     ] id=d4e5f6a7…  importance=0.75
  ✓ [episodic     ] id=e5f6a7b8…  importance=0.70
  ✓ [procedural   ] id=f6a7b8c9…  importance=0.80

┌──────────────────────────────────────────────────────┐
│  4. Semantic Search — 'user communication style'     │
└──────────────────────────────────────────────────────┘
  #1 score=0.8214  [preference]
     The user prefers concise answers, dark mode UI, Python over JavaScript…
  #2 score=0.7123  [semantic]
     Devdutt S is a software engineer from Kochi, Kerala. GitHub: 0DevDutt0…

┌──────────────────────────────────────────────┐
│  6. Memory Graph (threshold=0.7)             │
└──────────────────────────────────────────────┘
  Nodes: 7  |  Edges: 3
  Edge  weight=0.8123  a1b2c3d4… ↔ 3f8a1c2d…
  Edge  weight=0.7891  b2c3d4e5… ↔ f6a7b8c9…
  Edge  weight=0.7234  d4e5f6a7… ↔ a1b2c3d4…

┌──────────────────────┐
│  9. Final Stats      │
└──────────────────────┘
  Total memories : 7
  Compressed     : 0
  Avg decay score: 0.9981
  By type        : {semantic: 2, episodic: 2, procedural: 2, preference: 1}

✅  Demo complete — all operations succeeded.

Store → Search JSON Round-Trip

The core pattern: store a memory, retrieve it semantically — no keyword overlap required.

1. Store a preference:

curl -X POST http://localhost:8000/v1/memories/ \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
    "agent_id": "demo-agent",
    "memory_type": "preference",
    "importance": 0.8
  }'

{
  "id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
  "agent_id": "demo-agent",
  "memory_type": "preference",
  "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
  "importance": 0.8,
  "access_count": 0,
  "decay_score": 1.0,
  "is_compressed": false,
  "created_at": "2026-06-15T14:04:00.000000"
}

2. Search with a semantically different query (zero keyword overlap):

curl -X POST http://localhost:8000/v1/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what coding language does the user enjoy and how do they like responses formatted?",
    "agent_id": "demo-agent",
    "k": 3
  }'

[
  {
    "memory": {
      "id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
      "memory_type": "preference",
      "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
      "importance": 0.8,
      "access_count": 1,
      "decay_score": 0.9981
    },
    "score": 0.821406,
    "rank": 1
  }
]

The query "what coding language does the user enjoy…" matched with score 0.821 despite sharing zero keywords with the stored memory — pure cosine similarity in 1024-dimensional bge-large embedding space.

Ebbinghaus Decay In Action

The same memory stored under different types and access patterns decays at very different rates:

Days since access	Memory Type	Importance	Accesses	Retention
0	any	any	any	1.000
7	episodic	0.5	0	0.703
7	semantic	0.5	0	0.837
30	episodic	0.5	0	0.135
30	procedural	0.9	10	0.912
90	semantic	0.9	20	0.831
90	episodic	0.3	1	0.003

A frequently-accessed skill (procedural, importance=0.9, 10 accesses) retains 91% after 30 days.
A one-off low-importance event (episodic, 1 access) fades to 0.3% after 90 days — modelling human forgetting mathematically.

Demo Files

File	What's inside
`Demo/quickstart.py`	End-to-end Python script — stores 7 memories across all four types, semantic search, type-filtered search, memory graph, update, list, and stats
`Demo/SAMPLE_INPUTS.md`	15 annotated curl examples: health check, all 4 memory types, batch store, semantic search, type-filtered search, get/update/delete, graph, stats, tier-1 & tier-2 compression, MCP tool calls
`Demo/sample_outputs.json`	Canonical JSON responses for every operation — useful as an API contract reference or test fixture

API Reference

Method	Endpoint	Description
`POST`	`/v1/memories/`	Store a single memory
`POST`	`/v1/memories/search`	Hybrid semantic search
`POST`	`/v1/memories/batch`	Batch store (up to 100)
`GET`	`/v1/memories/{id}`	Get memory by ID
`PATCH`	`/v1/memories/{id}`	Update content / importance / metadata
`DELETE`	`/v1/memories/{id}`	Delete permanently
`GET`	`/v1/memories/agent/{id}`	List all memories for an agent
`POST`	`/v1/memories/agent/{id}/graph`	Semantic similarity graph (for viz)
`GET`	`/v1/stats`	Global memory statistics
`POST`	`/v1/compress/trigger`	Run auto-compression now
`POST`	`/v1/compress/memory/{id}/tier1`	Compress single memory (Groq)
`POST`	`/v1/compress/agent/{id}/tier2`	Cluster merge (Mistral)
`GET`	`/v1/compress/log`	Compression audit history
`GET`	`/health`	Liveness probe

Full interactive docs at http://localhost:8000/docs (Swagger UI) and /redoc (ReDoc).

MCP Integration

Add to your Claude Code / Cursor MCP config:

{
  "mcpServers": {
    "memorymesh": {
      "command": "python",
      "args": ["-m", "memorymesh.mcp.server"],
      "cwd": "/path/to/MemoryMesh"
    }
  }
}

7 tools exposed to the LLM:

Tool	What it does
`store_memory`	Save a new memory (type + importance + metadata)
`retrieve_memories`	Semantic search with optional type filter
`delete_memory`	Hard delete by ID
`update_memory`	Edit content / importance / metadata in-place
`list_memories`	Browse all memories for an agent
`get_memory_stats`	Token-efficient stats snapshot
`compress_agent_memories`	Trigger cluster merge for an agent

Streamlit Dashboard

Four interactive pages accessible at http://localhost:8501:

Page	What you see
🔍 Search Explorer	Live hybrid retrieval with weight sliders · Store new memories
🕸️ Memory Graph	pyvis semantic network · colour-coded by type · edge weight = cosine similarity
🗜️ Compression Monitor	Timeline of compressions · Token savings · Manual trigger
📉 Decay Visualizer	Interactive Plotly retention curves · Adjust λ, importance, access count

Tech Stack

Layer	Technology	Why
Language	Python 3.13	Async-native, type hints, StrEnum
API	FastAPI + uvicorn	Auto-docs, async, Pydantic v2 validation
Database	aiosqlite (SQLite WAL)	Zero-dependency, async, ACID, BLOB storage
Embeddings	BAAI/bge-large-en-v1.5	Best open-source embedding (1024-dim, MTEB top-5)
Vector Search	FAISS IndexFlatIP	Exact cosine search, optional numpy fallback
LLM Tier-1	Groq llama-3.1-8b-instant	Sub-second summarisation, free tier
LLM Tier-2	Mistral mistral-small	High-quality cluster synthesis
MCP	mcp SDK 1.27	stdio JSON-RPC, works with Claude Code / Cursor
Dashboard	Streamlit + pyvis + Plotly	Interactive memory exploration
Testing	pytest-asyncio, MagicMock	51 tests, in-memory SQLite, no GPU in CI
CI	GitHub Actions	lint (ruff) + test matrix on ubuntu

Testing

pytest tests/ -v

✓ test_compressor.py   8 tests  — LLM compression (mocked Groq + Mistral clients)
✓ test_decay.py       13 tests  — Ebbinghaus formula + DecayEngine lifecycle
✓ test_retriever.py   14 tests  — Search, filters, agent isolation, graph
✓ test_store.py       16 tests  — CRUD, access tracking, embedding round-trip
─────────────────────────────────
51 passed in 0.63s

Key design decisions:

No GPU required in CI — Embedder is mocked with a deterministic hash-seeded numpy vector
No real LLM calls in tests — AsyncGroq and Mistral clients are patched via unittest.mock
Isolated databases — every test fixture uses aiosqlite :memory:, with automatic teardown
asyncio_mode = "auto" — all async def tests run automatically without @pytest.mark.asyncio

Configuration

Copy .env.example → .env and fill in your keys:

# Required for compression
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...

# Tunable parameters
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5   # swap for lighter all-MiniLM-L6-v2
DECAY_RATE=0.1                            # λ in the forgetting curve
COMPRESS_AGE_DAYS=7.0                     # memories older than this get tier-1 compressed
COMPRESS_CLUSTER_SIZE=20                  # episodic memories per tier-2 merge
DECAY_RUN_INTERVAL_HOURS=6.0             # background decay update frequency

Project Layout

memorymesh/
├── core/           config · logging · aiosqlite database
├── memory/         types · embedder · store · retriever · decay · compressor
├── schemas/        Pydantic request / response models
├── api/            FastAPI app · 3 routers (memories, compress, health)
└── mcp/            MCP stdio server (7 tools)
dashboard/          Streamlit 4-page UI
tests/              51 async tests, mock embedder, in-memory DB
Demo/               Runnable quickstart · curl examples · sample JSON
.github/workflows/  CI: lint (ruff) + pytest matrix

Why MemoryMesh?

Feature	MemoryMesh	mem0	Zep	ChromaDB alone
Four semantic memory types	✅	❌	❌	❌
MCP-native (Claude / Cursor)	✅	❌	❌	❌
Ebbinghaus forgetting curve	✅	❌	❌	❌
Hierarchical LLM compression	✅	Partial	Partial	❌
REST API + dashboard	✅	✅	✅	❌
Zero infrastructure (SQLite)	✅	❌	❌	✅
Open source, no usage fees	✅	Partial	Partial	✅

Built with curiosity and production instincts by Devdutt S · Kochi, India

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Demo		Demo
dashboard		dashboard
memorymesh.egg-info		memorymesh.egg-info
memorymesh		memorymesh
tests		tests
.gitignore		.gitignore
README.md		README.md
memorymesh.db		memorymesh.db
memorymesh.db-shm		memorymesh.db-shm
memorymesh.db-wal		memorymesh.db-wal
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is MemoryMesh?

Architecture

The Four Memory Types

Hybrid Retrieval Pipeline

Hierarchical Compression

The Ebbinghaus Forgetting Curve

Quick Start

Store & Retrieve in 10 Lines

Demo — See It In Action

Quickstart Terminal Output

Store → Search JSON Round-Trip

Ebbinghaus Decay In Action

Demo Files

API Reference

MCP Integration

Streamlit Dashboard

Tech Stack

Testing

Configuration

Project Layout

Why MemoryMesh?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is MemoryMesh?

Architecture

The Four Memory Types

Hybrid Retrieval Pipeline

Hierarchical Compression

The Ebbinghaus Forgetting Curve

Quick Start

Store & Retrieve in 10 Lines

Demo — See It In Action

Quickstart Terminal Output

Store → Search JSON Round-Trip

Ebbinghaus Decay In Action

Demo Files

API Reference

MCP Integration

Streamlit Dashboard

Tech Stack

Testing

Configuration

Project Layout

Why MemoryMesh?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages