← Back to project
● M1 done P0 Size S Foundation

personal-rag-kb — PRD

Product spec, scope, milestones, and success metrics for personal-rag-kb.

Personal RAG Knowledge Base — PRD

Size S · P0 · Foundation Status: ✅ M1 done (2026-04-29) — see Implementation for build details Originally planned: 1 weekend / Actual: ~2 days concentrated work

1. Problem

Personal knowledge is fragmented across many sources: wiki/documentation pages, ticketing systems, chat threads, meeting transcripts, email, markdown notes, and AI conversation transcripts. When trying to recall “what did I read about X?”, finding the answer takes 15–30 minutes of manual hunting across separate tools. OS-level search (Spotlight) is keyword-only and doesn’t understand semantic intent. Per-source MCP search is slow and bloats the context window with raw chunks.

Pain: ~90% of consumed knowledge is not retrievable on demand.

Why now: this is a foundation pattern that 6+ downstream AI projects can reuse (recipe extractor, research agent, support bot, finance advisor, etc.). Build once, reuse many times.

2. Goal & Success Metrics

Goal: Ask a Claude client “any thoughts on vector DB X for ~100K vectors?” → get top-5 relevant chunks + source links in under 3 seconds.

Metrics — actual achieved:

MetricTarget M1AchievedNote
Hit@5 on test queries≥60%~85% (subjective on 4 queries)Verified with real-world queries scoring 0.85+
Latency p95<5s1.16sEmbed query + ADB vector search + tunnel
Sources ingested100 docs5,000+ sources / 47K chunksFull bulk migrate Day 3
TouchpointTelegramMCP: Claude Desktop + Claude.ai web + iOS appPivoted Day 1, validated Day 8

3. User journey (revised)

Pivot from original: Dropped a custom Telegram bot in favor of MCP — Claude clients can call kb_search / kb_ingest natively.

  1. A sync agent (e.g. chat exporter, meeting transcriber, AI session hook, ticket crawler) writes a .md file into the local KB folder.
  2. A post-write hook automatically calls kb_ingest MCP → embed + store in ADB.
  3. User asks Claude: “Anything new on topic X this week?”
  4. Claude calls kb_search over MCP → returns top-5 chunks with source paths.
  5. Claude synthesizes an answer with citations.

4. Scope (MoSCoW) — final

Must — DONE:

  • ✅ Ingest markdown files (URL/PDF supported via downstream sync agents)
  • ✅ Chunk + embed + store in ADB 23ai with VECTOR(384, FLOAT32)
  • ✅ MCP tools: kb_health, kb_ingest, kb_search, kb_stats
  • ✅ Citations: search results include source URI and chunk index

Should — DONE:

  • ✅ Idempotent ingest via server-side hash check
  • ✅ Re-ingest replaces chunks if content hash changed
  • ✅ Auto-tag from folder hierarchy (16 path-based rules)
  • ⏸️ Hybrid search (BM25 + vector) — semantic alone proved sufficient

Could — partial:

  • ⏸️ Apple Notes import — out of scope, low ROI for actual usage
  • ⏸️ Kindle highlights import — same reason
  • ✅ Wiki/docs crawler integration — handled by downstream scripts
  • ❌ Custom web UI — replaced by Claude clients themselves

Won’t (M1–M3) — kept:

  • Multi-user support (single-user system by design)
  • Real-time sync — manual triggers + nightly scheduled task is sufficient
  • Native mobile app — Claude iOS app inherits via OAuth

5. Architecture (final)

Pivoted from “RAG + Telegram Bot” → MCP Server + Multi-source ingest. See Architecture for diagrams.

6. Tech Stack — final choices

LayerOriginal specImplementedReason for change
LLM servingLocal Llama 3.2 3BExternal (caller’s Claude client)MCP delegates LLM to caller; server only does retrieval
EmbedderBGE-small-en-v1.5multilingual-e5-smallEnglish-only embedder underperformed on mixed VN/EN queries
Vector DBADB 23aiADB 23ai ✓Free 20 GB, native vector type with HNSW index
RerankerBGE-reranker-base (M2)❌ skippedSingle-stage e5 alone scored 0.85+ on real queries
Bot frameworkpython-telegram-botMCP Streamable HTTPNative Claude integration, multi-client for free
HTTP frameworkFastMCP + Starlette + uvicornMCP SDK provides this out-of-box
TunnelPublic IP + nginxCloudflare named tunnelPersistent URL, no inbound port open
Auth”Telegram only”OAuth 2.0 (PKCE+DCR) + legacy bearerSupports mobile/web/desktop clients
Deploysystemdsystemd

Cost posture: Designed to fit within cloud free tiers for ADB, Object Storage, and Cloudflare. Compute uses a small commodity VM that can run on any provider; the architecture intentionally avoids vendor-specific lock-in (no managed services for the hot path).

7. Milestones — actual

DayWhat shipped
Day 1MCP server scaffold (FastMCP), tunnel, bearer auth
Day 2kb_ingest/search/stats tools, embed bench, schema applied
Day 3Bulk migrate 5,000+ sources / 45K chunks (~95 min wall)
Day 3bEmbed model pivot BGE-en → e5-small multilingual (re-embed 131 min)
Day 4-5Refactor 4 sync sources to call MCP kb_ingest (filesystem-first rule)
Day 6Persistent named tunnel via custom domain
Day 7Weekly Object Storage backup via write-only PAR + restore script
Day 8OAuth 2.0 + DCR + PKCE → Claude.ai web + iOS app access

M1 DoD passed:

  • ✅ Ingest 5,000+ sources (target 10)
  • ✅ 5/5 real queries answered correctly (target 3/5)
  • ✅ Latency 1.16s (target <5s)

8. Cost & Quota

ItemFree tier?Actual usage
ADB 23ai (1 × 20 GB)47K chunks ≈ ~250 MB stored (~1.3% of quota)
Object Storage backup✅ <20 GB~86 MB/week × ~52 weeks = ~4.5 GB/year (~22% of quota)
Cloudflare Tunnel<1 MB/day data

The serving VM is the only line item that may need migration once a free trial credit ends. The architecture is provider-agnostic — moving compute is a re-deploy, not a re-build.

9. Risks & open questions — outcomes

Original risks:

  • ADB auto-stop after 7 days idle → mitigated by daily health pings via normal usage
  • Embedding quality on Vietnamese text with English model → resolved Day 3b (multilingual model swap)
  • Local LLM crash recovery → N/A (LLM not used in final architecture)

New risks (M2/M3 backlog):

  • Single password OAuth (no 2FA) — adequate for personal use; layer SSO if shared
  • Cloudflare bot management blocking default urllib User-Agent → fixed with custom UA header
  • Token rotation reliance on operator memory — push to password manager

Original open Qs:

  • Q1: Web UI from M2? → ❌ dropped, Claude clients are sufficient
  • Q2: Privacy of sensitive content in ADB? → ✅ accepted, single-tenant deployment
  • Q3: Reranker latency on CPU? → N/A (reranker skipped)

10. Definition of Done

M1 Done: ✅ 2026-04-29 — 5K+ sources ingested, OAuth flow live, multilingual search working, infra includes DR backup.

M3 Done (production-ready):

  • ⏳ TOTP 2FA or SSO wrap on /login
  • ⏳ Reranker eval on top-20 candidates → measure Hit@5 lift
  • ⏳ Daily-driver criterion: 2 consecutive weeks of natural usage without stack pivot

See also

  • Implementation — technical deep-dive (deploy, code structure, perf numbers)
  • Architecture — component diagrams, data flow, security model
  • Notes — chronological decision log