Personal RAG Knowledge Base — PRD
Size S · P0 · Foundation Status: ✅ M1 done (2026-04-29) — see Implementation for build details Originally planned: 1 weekend / Actual: ~2 days concentrated work
1. Problem
Personal knowledge is fragmented across many sources: wiki/documentation pages, ticketing systems, chat threads, meeting transcripts, email, markdown notes, and AI conversation transcripts. When trying to recall “what did I read about X?”, finding the answer takes 15–30 minutes of manual hunting across separate tools. OS-level search (Spotlight) is keyword-only and doesn’t understand semantic intent. Per-source MCP search is slow and bloats the context window with raw chunks.
Pain: ~90% of consumed knowledge is not retrievable on demand.
Why now: this is a foundation pattern that 6+ downstream AI projects can reuse (recipe extractor, research agent, support bot, finance advisor, etc.). Build once, reuse many times.
2. Goal & Success Metrics
Goal: Ask a Claude client “any thoughts on vector DB X for ~100K vectors?” → get top-5 relevant chunks + source links in under 3 seconds.
Metrics — actual achieved:
| Metric | Target M1 | Achieved | Note |
|---|---|---|---|
| Hit@5 on test queries | ≥60% | ~85% (subjective on 4 queries) | Verified with real-world queries scoring 0.85+ |
| Latency p95 | <5s | 1.16s | Embed query + ADB vector search + tunnel |
| Sources ingested | 100 docs | 5,000+ sources / 47K chunks | Full bulk migrate Day 3 |
| Touchpoint | Telegram | MCP: Claude Desktop + Claude.ai web + iOS app | Pivoted Day 1, validated Day 8 |
3. User journey (revised)
Pivot from original: Dropped a custom Telegram bot in favor of MCP — Claude clients can call kb_search / kb_ingest natively.
- A sync agent (e.g. chat exporter, meeting transcriber, AI session hook, ticket crawler) writes a
.mdfile into the local KB folder. - A post-write hook automatically calls
kb_ingestMCP → embed + store in ADB. - User asks Claude: “Anything new on topic X this week?”
- Claude calls
kb_searchover MCP → returns top-5 chunks with source paths. - Claude synthesizes an answer with citations.
4. Scope (MoSCoW) — final
Must — DONE:
- ✅ Ingest markdown files (URL/PDF supported via downstream sync agents)
- ✅ Chunk + embed + store in ADB 23ai with
VECTOR(384, FLOAT32) - ✅ MCP tools:
kb_health,kb_ingest,kb_search,kb_stats - ✅ Citations: search results include source URI and chunk index
Should — DONE:
- ✅ Idempotent ingest via server-side hash check
- ✅ Re-ingest replaces chunks if content hash changed
- ✅ Auto-tag from folder hierarchy (16 path-based rules)
- ⏸️ Hybrid search (BM25 + vector) — semantic alone proved sufficient
Could — partial:
- ⏸️ Apple Notes import — out of scope, low ROI for actual usage
- ⏸️ Kindle highlights import — same reason
- ✅ Wiki/docs crawler integration — handled by downstream scripts
- ❌ Custom web UI — replaced by Claude clients themselves
Won’t (M1–M3) — kept:
- Multi-user support (single-user system by design)
- Real-time sync — manual triggers + nightly scheduled task is sufficient
- Native mobile app — Claude iOS app inherits via OAuth
5. Architecture (final)
Pivoted from “RAG + Telegram Bot” → MCP Server + Multi-source ingest. See Architecture for diagrams.
6. Tech Stack — final choices
| Layer | Original spec | Implemented | Reason for change |
|---|---|---|---|
| LLM serving | Local Llama 3.2 3B | External (caller’s Claude client) | MCP delegates LLM to caller; server only does retrieval |
| Embedder | BGE-small-en-v1.5 | multilingual-e5-small | English-only embedder underperformed on mixed VN/EN queries |
| Vector DB | ADB 23ai | ADB 23ai ✓ | Free 20 GB, native vector type with HNSW index |
| Reranker | BGE-reranker-base (M2) | ❌ skipped | Single-stage e5 alone scored 0.85+ on real queries |
| Bot framework | python-telegram-bot | MCP Streamable HTTP | Native Claude integration, multi-client for free |
| HTTP framework | — | FastMCP + Starlette + uvicorn | MCP SDK provides this out-of-box |
| Tunnel | Public IP + nginx | Cloudflare named tunnel | Persistent URL, no inbound port open |
| Auth | ”Telegram only” | OAuth 2.0 (PKCE+DCR) + legacy bearer | Supports mobile/web/desktop clients |
| Deploy | systemd | systemd ✓ |
Cost posture: Designed to fit within cloud free tiers for ADB, Object Storage, and Cloudflare. Compute uses a small commodity VM that can run on any provider; the architecture intentionally avoids vendor-specific lock-in (no managed services for the hot path).
7. Milestones — actual
| Day | What shipped |
|---|---|
| Day 1 | MCP server scaffold (FastMCP), tunnel, bearer auth |
| Day 2 | kb_ingest/search/stats tools, embed bench, schema applied |
| Day 3 | Bulk migrate 5,000+ sources / 45K chunks (~95 min wall) |
| Day 3b | Embed model pivot BGE-en → e5-small multilingual (re-embed 131 min) |
| Day 4-5 | Refactor 4 sync sources to call MCP kb_ingest (filesystem-first rule) |
| Day 6 | Persistent named tunnel via custom domain |
| Day 7 | Weekly Object Storage backup via write-only PAR + restore script |
| Day 8 | OAuth 2.0 + DCR + PKCE → Claude.ai web + iOS app access |
M1 DoD passed:
- ✅ Ingest 5,000+ sources (target 10)
- ✅ 5/5 real queries answered correctly (target 3/5)
- ✅ Latency 1.16s (target <5s)
8. Cost & Quota
| Item | Free tier? | Actual usage |
|---|---|---|
| ADB 23ai (1 × 20 GB) | ✅ | 47K chunks ≈ ~250 MB stored (~1.3% of quota) |
| Object Storage backup | ✅ <20 GB | ~86 MB/week × ~52 weeks = ~4.5 GB/year (~22% of quota) |
| Cloudflare Tunnel | ✅ | <1 MB/day data |
The serving VM is the only line item that may need migration once a free trial credit ends. The architecture is provider-agnostic — moving compute is a re-deploy, not a re-build.
9. Risks & open questions — outcomes
Original risks:
- ADB auto-stop after 7 days idle → mitigated by daily health pings via normal usage
- Embedding quality on Vietnamese text with English model → resolved Day 3b (multilingual model swap)
- Local LLM crash recovery → N/A (LLM not used in final architecture)
New risks (M2/M3 backlog):
- Single password OAuth (no 2FA) — adequate for personal use; layer SSO if shared
- Cloudflare bot management blocking default
urllibUser-Agent → fixed with custom UA header - Token rotation reliance on operator memory — push to password manager
Original open Qs:
- Q1: Web UI from M2? → ❌ dropped, Claude clients are sufficient
- Q2: Privacy of sensitive content in ADB? → ✅ accepted, single-tenant deployment
- Q3: Reranker latency on CPU? → N/A (reranker skipped)
10. Definition of Done
M1 Done: ✅ 2026-04-29 — 5K+ sources ingested, OAuth flow live, multilingual search working, infra includes DR backup.
M3 Done (production-ready):
- ⏳ TOTP 2FA or SSO wrap on
/login - ⏳ Reranker eval on top-20 candidates → measure Hit@5 lift
- ⏳ Daily-driver criterion: 2 consecutive weeks of natural usage without stack pivot
See also
- Implementation — technical deep-dive (deploy, code structure, perf numbers)
- Architecture — component diagrams, data flow, security model
- Notes — chronological decision log