Most AI agents are amnesiac by default — every conversation starts from zero. MemoryMesh solves this by providing a production-grade memory layer that any agent can plug into via MCP or REST API.
It stores memories across four cognitive types, retrieves them with a hybrid semantic + recency + importance ranking, compresses old memories using LLMs (Groq for single-memory summarisation, Mistral for cluster synthesis), and models forgetting using the Ebbinghaus retention curve so memories fade realistically over time.
Built from scratch in Python 3.13 · 35 files · 51 tests · 0 TODOs
graph TB
subgraph Clients["🖥️ Clients"]
CC["Claude Code / Cursor"]
AG["Custom Agents"]
DB["Streamlit Dashboard"]
end
subgraph Transport["🔌 Transport Layer"]
MCP["MCP stdio server<br/>(7 tools)"]
REST["FastAPI REST API<br/>(12 endpoints)"]
end
subgraph Core["⚙️ MemoryMesh Core"]
STORE["MemoryStore<br/>CRUD + Embeddings"]
RET["Retriever<br/>Hybrid Scoring"]
COMP["Compressor<br/>Tier-1 Groq · Tier-2 Mistral"]
DECAY["DecayEngine<br/>Ebbinghaus Curve"]
EMB["Embedder<br/>bge-large-en-v1.5 · 1024-dim"]
end
subgraph Storage["💾 Persistence"]
SQL["SQLite WAL<br/>memories · embeddings<br/>compression_log · access_log"]
FAISS["FAISS IndexFlatIP<br/>(numpy fallback)"]
end
CC -->|JSON-RPC stdio| MCP
AG -->|HTTP| REST
DB -->|HTTP| REST
MCP --> STORE
MCP --> RET
REST --> STORE
REST --> RET
REST --> COMP
STORE --> SQL
STORE --> EMB
RET --> FAISS
RET --> EMB
COMP -->|Groq llama-3.1-8b-instant| STORE
COMP -->|Mistral mistral-small| STORE
DECAY -->|asyncio background task| STORE
| Type | Icon | Description | Real-world Analogy | Decay Speed |
|---|---|---|---|---|
episodic |
🕐 | Time-stamped events | "I talked to Alice about X yesterday" | Fast (×1.0) |
semantic |
📚 | Permanent facts | "Alice is a Python engineer at Google" | Slow (×2.0) |
procedural |
⚙️ | Skills & how-tos | "To deploy FastAPI: uvicorn main:app..." | Slowest (×3.0) |
preference |
❤️ | User patterns | "Alice prefers concise bullet-point answers" | Medium-slow (×2.5) |
Each type has a tuned stability multiplier in the Ebbinghaus forgetting curve, so skills outlast events, and facts outlast episodes — just like human memory.
flowchart LR
Q["🔍 Query"] --> EMB["Embed with\nbge-large-en-v1.5"]
EMB --> FAISS["FAISS / numpy\ncosine similarity"]
FAISS --> SEM["Semantic\nScore (×0.5)"]
Q --> TIME["Time since\nlast access"]
TIME --> REC["Recency\nScore (×0.3)"]
Q --> IMP["importance field\n+ access_count boost"]
IMP --> IMPS["Importance\nScore (×0.2)"]
SEM --> BLEND["Weighted\nBlend"]
REC --> BLEND
IMPS --> BLEND
BLEND --> RANK["Re-rank & Return\nTop-K Results"]
RANK --> LOG["Increment\naccess_count"]
Final score formula:
score = 0.5 × cosine_sim + 0.3 × exp(-λ·days) + 0.2 × (importance + min(0.02·accesses, 0.3))
Weights are configurable per-query via semantic_weight, recency_weight, importance_weight.
Memories are automatically compressed on a nightly schedule to keep the store lean and token-efficient:
Week 1-7 [Fresh memories — full content stored]
│
▼ Tier 1 (age > 7 days, not recently accessed)
Week 1+ [llama-3.1-8b-instant via Groq]
"Compress to 2-3 sentences preserving key facts"
→ original preserved in compression_log
→ is_compressed = True
│
▼ Tier 2 (cluster merge, ≥ 3 episodic memories)
[mistral-small-latest via Mistral AI]
"Synthesize N episodic memories → 1 semantic memory"
→ source memories marked is_compressed
→ new semantic memory created with importance = 0.8
The compression_log table records every compression event with original content, timestamp, and model used — making compression fully auditable and reversible.
Each memory has a decay_score updated every 6 hours by the background DecayEngine:
Where:
- R = retention score ∈ [0, 1]
- t = days since last access
- S = stability =
(1/λ) × type_multiplier × (1 + importance) × (1 + log(1 + accesses) × 0.5)
Retention
1.0 ┤
0.9 ┤····· ← procedural (skills, multiplier=3.0)
0.8 ┤ ····· ← semantic (facts, multiplier=2.0)
0.7 ┤ ····· ← preference (patterns, multiplier=2.5)
0.5 ┤ ·····
0.3 ┤ ····· ← episodic (events, multiplier=1.0)
0.1 ┤ ·····
0.0 ┤──────────────────────────────── days
0 7 14 30 60 90
High-importance + frequently-accessed memories gain extra stability — your "important things" stick around.
# 1. Clone & install
git clone https://github.com/0DevDutt0/MemoryMesh.git
cd MemoryMesh
pip install -e ".[dev]"
# 2. Configure
cp .env.example .env
# Add your GROQ_API_KEY and MISTRAL_API_KEY
# 3. Start the REST API
uvicorn memorymesh.api.main:app --reload
# → http://localhost:8000/docs (interactive Swagger UI)
# 4. Start the MCP server (for Claude Code / Cursor)
python -m memorymesh.mcp.server
# 5. Launch the dashboard
streamlit run dashboard/app.py
# → http://localhost:8501import httpx, asyncio
API = "http://localhost:8000/v1"
async def main():
async with httpx.AsyncClient() as c:
# Store a fact about the user
await c.post(f"{API}/memories/", json={
"content": "User prefers FastAPI over Flask for async workloads.",
"agent_id": "my-agent",
"memory_type": "preference",
"importance": 0.9,
})
# Retrieve it semantically
results = (await c.post(f"{API}/memories/search", json={
"query": "what web framework does the user prefer?",
"agent_id": "my-agent",
"k": 3,
})).json()
for r in results:
print(f"[{r['rank']}] {r['score']:.3f} — {r['memory']['content']}")
asyncio.run(main())[1] 0.874 — User prefers FastAPI over Flask for async workloads.
Start the API server, then run the end-to-end demo in one command:
# Terminal 1 — start the REST API
uvicorn memorymesh.api.main:app --reload
# Terminal 2 — full lifecycle demo (store · search · graph · update · stats)
python Demo/quickstart.py┌────────────────────┐
│ 1. Health Check │
└────────────────────┘
{"status": "ok", "ts": "2026-06-15T14:00:00.123456"}
┌──────────────────────────┐
│ 2. Storing 7 Memories │
└──────────────────────────┘
✓ [semantic ] id=a1b2c3d4… importance=0.95
✓ [episodic ] id=3f8a1c2d… importance=0.90
✓ [procedural ] id=b2c3d4e5… importance=0.85
✓ [preference ] id=c3d4e5f6… importance=0.80
✓ [semantic ] id=d4e5f6a7… importance=0.75
✓ [episodic ] id=e5f6a7b8… importance=0.70
✓ [procedural ] id=f6a7b8c9… importance=0.80
┌──────────────────────────────────────────────────────┐
│ 4. Semantic Search — 'user communication style' │
└──────────────────────────────────────────────────────┘
#1 score=0.8214 [preference]
The user prefers concise answers, dark mode UI, Python over JavaScript…
#2 score=0.7123 [semantic]
Devdutt S is a software engineer from Kochi, Kerala. GitHub: 0DevDutt0…
┌──────────────────────────────────────────────┐
│ 6. Memory Graph (threshold=0.7) │
└──────────────────────────────────────────────┘
Nodes: 7 | Edges: 3
Edge weight=0.8123 a1b2c3d4… ↔ 3f8a1c2d…
Edge weight=0.7891 b2c3d4e5… ↔ f6a7b8c9…
Edge weight=0.7234 d4e5f6a7… ↔ a1b2c3d4…
┌──────────────────────┐
│ 9. Final Stats │
└──────────────────────┘
Total memories : 7
Compressed : 0
Avg decay score: 0.9981
By type : {semantic: 2, episodic: 2, procedural: 2, preference: 1}
✅ Demo complete — all operations succeeded.
The core pattern: store a memory, retrieve it semantically — no keyword overlap required.
1. Store a preference:
curl -X POST http://localhost:8000/v1/memories/ \
-H "Content-Type: application/json" \
-d '{
"content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
"agent_id": "demo-agent",
"memory_type": "preference",
"importance": 0.8
}'{
"id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
"agent_id": "demo-agent",
"memory_type": "preference",
"content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
"importance": 0.8,
"access_count": 0,
"decay_score": 1.0,
"is_compressed": false,
"created_at": "2026-06-15T14:04:00.000000"
}2. Search with a semantically different query (zero keyword overlap):
curl -X POST http://localhost:8000/v1/memories/search \
-H "Content-Type: application/json" \
-d '{
"query": "what coding language does the user enjoy and how do they like responses formatted?",
"agent_id": "demo-agent",
"k": 3
}'[
{
"memory": {
"id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
"memory_type": "preference",
"content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
"importance": 0.8,
"access_count": 1,
"decay_score": 0.9981
},
"score": 0.821406,
"rank": 1
}
]The query
"what coding language does the user enjoy…"matched with score 0.821 despite sharing zero keywords with the stored memory — pure cosine similarity in 1024-dimensional bge-large embedding space.
The same memory stored under different types and access patterns decays at very different rates:
| Days since access | Memory Type | Importance | Accesses | Retention |
|---|---|---|---|---|
| 0 | any | any | any | 1.000 |
| 7 | episodic | 0.5 | 0 | 0.703 |
| 7 | semantic | 0.5 | 0 | 0.837 |
| 30 | episodic | 0.5 | 0 | 0.135 |
| 30 | procedural | 0.9 | 10 | 0.912 |
| 90 | semantic | 0.9 | 20 | 0.831 |
| 90 | episodic | 0.3 | 1 | 0.003 |
A frequently-accessed skill (procedural, importance=0.9, 10 accesses) retains 91% after 30 days.
A one-off low-importance event (episodic, 1 access) fades to 0.3% after 90 days — modelling human forgetting mathematically.
| File | What's inside |
|---|---|
Demo/quickstart.py |
End-to-end Python script — stores 7 memories across all four types, semantic search, type-filtered search, memory graph, update, list, and stats |
Demo/SAMPLE_INPUTS.md |
15 annotated curl examples: health check, all 4 memory types, batch store, semantic search, type-filtered search, get/update/delete, graph, stats, tier-1 & tier-2 compression, MCP tool calls |
Demo/sample_outputs.json |
Canonical JSON responses for every operation — useful as an API contract reference or test fixture |
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/memories/ |
Store a single memory |
POST |
/v1/memories/search |
Hybrid semantic search |
POST |
/v1/memories/batch |
Batch store (up to 100) |
GET |
/v1/memories/{id} |
Get memory by ID |
PATCH |
/v1/memories/{id} |
Update content / importance / metadata |
DELETE |
/v1/memories/{id} |
Delete permanently |
GET |
/v1/memories/agent/{id} |
List all memories for an agent |
POST |
/v1/memories/agent/{id}/graph |
Semantic similarity graph (for viz) |
GET |
/v1/stats |
Global memory statistics |
POST |
/v1/compress/trigger |
Run auto-compression now |
POST |
/v1/compress/memory/{id}/tier1 |
Compress single memory (Groq) |
POST |
/v1/compress/agent/{id}/tier2 |
Cluster merge (Mistral) |
GET |
/v1/compress/log |
Compression audit history |
GET |
/health |
Liveness probe |
Full interactive docs at http://localhost:8000/docs (Swagger UI) and /redoc (ReDoc).
Add to your Claude Code / Cursor MCP config:
{
"mcpServers": {
"memorymesh": {
"command": "python",
"args": ["-m", "memorymesh.mcp.server"],
"cwd": "/path/to/MemoryMesh"
}
}
}7 tools exposed to the LLM:
| Tool | What it does |
|---|---|
store_memory |
Save a new memory (type + importance + metadata) |
retrieve_memories |
Semantic search with optional type filter |
delete_memory |
Hard delete by ID |
update_memory |
Edit content / importance / metadata in-place |
list_memories |
Browse all memories for an agent |
get_memory_stats |
Token-efficient stats snapshot |
compress_agent_memories |
Trigger cluster merge for an agent |
Four interactive pages accessible at http://localhost:8501:
| Page | What you see |
|---|---|
| 🔍 Search Explorer | Live hybrid retrieval with weight sliders · Store new memories |
| 🕸️ Memory Graph | pyvis semantic network · colour-coded by type · edge weight = cosine similarity |
| 🗜️ Compression Monitor | Timeline of compressions · Token savings · Manual trigger |
| 📉 Decay Visualizer | Interactive Plotly retention curves · Adjust λ, importance, access count |
| Layer | Technology | Why |
|---|---|---|
| Language | Python 3.13 | Async-native, type hints, StrEnum |
| API | FastAPI + uvicorn | Auto-docs, async, Pydantic v2 validation |
| Database | aiosqlite (SQLite WAL) | Zero-dependency, async, ACID, BLOB storage |
| Embeddings | BAAI/bge-large-en-v1.5 | Best open-source embedding (1024-dim, MTEB top-5) |
| Vector Search | FAISS IndexFlatIP | Exact cosine search, optional numpy fallback |
| LLM Tier-1 | Groq llama-3.1-8b-instant | Sub-second summarisation, free tier |
| LLM Tier-2 | Mistral mistral-small | High-quality cluster synthesis |
| MCP | mcp SDK 1.27 | stdio JSON-RPC, works with Claude Code / Cursor |
| Dashboard | Streamlit + pyvis + Plotly | Interactive memory exploration |
| Testing | pytest-asyncio, MagicMock | 51 tests, in-memory SQLite, no GPU in CI |
| CI | GitHub Actions | lint (ruff) + test matrix on ubuntu |
pytest tests/ -v✓ test_compressor.py 8 tests — LLM compression (mocked Groq + Mistral clients)
✓ test_decay.py 13 tests — Ebbinghaus formula + DecayEngine lifecycle
✓ test_retriever.py 14 tests — Search, filters, agent isolation, graph
✓ test_store.py 16 tests — CRUD, access tracking, embedding round-trip
─────────────────────────────────
51 passed in 0.63s
Key design decisions:
- No GPU required in CI —
Embedderis mocked with a deterministic hash-seeded numpy vector - No real LLM calls in tests —
AsyncGroqandMistralclients are patched viaunittest.mock - Isolated databases — every test fixture uses
aiosqlite :memory:, with automatic teardown asyncio_mode = "auto"— allasync deftests run automatically without@pytest.mark.asyncio
Copy .env.example → .env and fill in your keys:
# Required for compression
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...
# Tunable parameters
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5 # swap for lighter all-MiniLM-L6-v2
DECAY_RATE=0.1 # λ in the forgetting curve
COMPRESS_AGE_DAYS=7.0 # memories older than this get tier-1 compressed
COMPRESS_CLUSTER_SIZE=20 # episodic memories per tier-2 merge
DECAY_RUN_INTERVAL_HOURS=6.0 # background decay update frequencymemorymesh/
├── core/ config · logging · aiosqlite database
├── memory/ types · embedder · store · retriever · decay · compressor
├── schemas/ Pydantic request / response models
├── api/ FastAPI app · 3 routers (memories, compress, health)
└── mcp/ MCP stdio server (7 tools)
dashboard/ Streamlit 4-page UI
tests/ 51 async tests, mock embedder, in-memory DB
Demo/ Runnable quickstart · curl examples · sample JSON
.github/workflows/ CI: lint (ruff) + pytest matrix
| Feature | MemoryMesh | mem0 | Zep | ChromaDB alone |
|---|---|---|---|---|
| Four semantic memory types | ✅ | ❌ | ❌ | ❌ |
| MCP-native (Claude / Cursor) | ✅ | ❌ | ❌ | ❌ |
| Ebbinghaus forgetting curve | ✅ | ❌ | ❌ | ❌ |
| Hierarchical LLM compression | ✅ | Partial | Partial | ❌ |
| REST API + dashboard | ✅ | ✅ | ✅ | ❌ |
| Zero infrastructure (SQLite) | ✅ | ❌ | ❌ | ✅ |
| Open source, no usage fees | ✅ | Partial | Partial | ✅ |