Tại sao AI Memory của bạn nói dối — và tôi build audit-knowledge 3-layer để bắt nó

TL;DR — AI hôm qua nói tôi đang dùng Oracle Cloud A1 VM (free tier ARM) host infrastructure cá nhân. Confident, có architecture diagram, citation từ workspace CLAUDE.md. Vấn đề: tôi chưa register VM A1. Nó không tồn tại. Tôi build audit-knowledge — 3-layer audit chạy weekly cron, catch hơn 25 stale facts trong corpus 79 files của tôi sau 1 ngày deploy. Bài này tôi giải thích tại sao mọi setup persistent memory đều cần audit layer.

AI memory drift là vấn đề ngầm 2026

Claude Code, ChatGPT custom GPTs, Cursor, Continue — tất cả đang ship “persistent memory”:

Files trong ~/.claude/projects/.../memory/
Workspace CLAUDE.md instructions
.cursorrules per-project
Vector DB ngầm cho session history

Lời hứa: AI biết context, preferences, projects của bạn.

Thực tế: AI confidently khẳng định fact đã đúng 3 tháng trước, nhưng giờ thì không.

Pattern điển hình:

Day 1: Bạn ghi “I’m using Postgres 14 cho side project” vào memory file.
Day 60: Bạn upgrade lên Postgres 16. Quên update memory file.
Day 120: AI confidently suggest 1 feature Postgres-14-only mà broken trên 16.

Nhân lên 50+ memory files, 12 side projects, 5+ workspace CLAUDE.md, infrastructure đổi hằng tuần. Drift là inevitable nếu không có active validation.

Tệ hơn — LLM không biết là không biết. Nó treat memory như ground truth.

Cú hallucination cụ thể của tôi

Tôi đã viết “OCI Foundation (VM A1 + ADB 23ai + Object Storage + Block Volume + Email Delivery + Functions). Setup 1 lần, 12 project xài chung. Prerequisite cho personal-rag-kb (P0)” trong workspace CLAUDE.md — như kế hoạch từ tháng 2.

6 tháng sau, AI assistant đọc dòng đó và đối xử như infrastructure đã deploy. Memory file mail_watcher_project.md mention “anh chưa register được A1” nhưng AI không cross-reference.

Khi tôi hỏi “deploy cron job mới”, AI reply: “Sẽ deploy lên Oracle A1 VM cron-host, dùng ADB 23ai cho persistence”. Tôi tin. Bắt đầu code dựa vào fake infrastructure.

Khi audit tool tôi mới build phát hiện và cross-flag: conflict confidence 95. Workspace nói A1 live, memory nói chưa register. AI đã hallucinate compounding 6 tháng.

Software engineering đã giải quyết vấn đề này từ lâu

Cho code, chúng ta có:

Linter catch broken reference lúc build
Type checker flag inconsistent types
CI/CD chạy test mỗi commit
Observability alert khi prod drift khỏi spec

Cho AI memory? Không có gì.

Bạn viết notes → AI đọc → AI tell you facts → bạn act. Không có verification step. Không có “fact này 90 ngày tuổi và reference server không respond nữa”. Không có “đợi đã, fact này conflict với file khác”.

Tôi build audit-knowledge để fill gap đó.

3 layer, escalating cost

Insight cốt lõi: stale-fact patterns khác nhau cần detection methods khác nhau.

Layer 1 — Static check (free)
   ↓
Layer 2 — LLM cross-source ($0.20/run)
   ↓
Layer 3 — Live probe (free)

Layer 1: Static checks

Free, fast, catch obvious:

Cited path không tồn tại (os.path.exists)
IP format invalid (octets > 255)
Port out of range
URL malformed

Catch: “Tôi cite /Users/old/path/foo.py ở 5 chỗ, nhưng đã move sang /Users/new/path/.”

Limit: chỉ catch syntax wrong, không phải semantic.

Layer 2: LLM cross-source contradiction

Cái đắt nhưng powerful nhất. Bundle TOÀN BỘ files audit vào 1 prompt Sonnet 4.6:

## CATEGORY: memory
### file_a.md ...
### file_b.md ...

## CATEGORY: workspace_claude_md
### CLAUDE.md ...

## CATEGORY: project_notes
### NOTES.md ...

Find facts that contradict each other or look stale.
Output JSON với confidence + evidence chain.

LLM đọc tất cả 1 phát, spot:

File A nói “X true”, file B nói “X false” → cross_source_mismatch
File C nói “current” nhưng date 6 tháng cũ → stale_fact
File D claim existence thứ file E mô tả “planned” → broken_assumption

Đây là layer catch Oracle A1 hallucination.

Cost: ~$0.20 per weekly run. Đáng từng xu.

Layer 3: Live probe

Free, execute shell commands để verify claim:

{
  "claim_type": "port",
  "claim_value": "8080",
  "auto_probe": "lsof -ti :8080 -sTCP:LISTEN | head -1"
}

Memory nói “rag-kb daemon listen port 8080” mà lsof không thấy → finding.

Probe templates per claim type:

path → test -e
port → lsof -ti
url → curl -sI -m 5 -o /dev/null -w '%{http_code}'
version → product-specific (python3 --version, gcloud --version…)
hostname → host

Plus manual registry (probes.json) 23 entries cho project-specific facts (GCP machine type, vault decrypt, daemon health, …).

Bẫy brittleness tôi gần rơi vào

Version đầu tôi có ý tưởng clever: hash-based suppression. Khi LLM flag thứ thực ra ổn, lưu finding fingerprint vào suppressions.json với expiry. Future runs skip.

Tôi add 14 suppressions cho legitimate-but-flagged-anyway findings.

Run tiếp theo: 7 finding MỚI, all logically equivalent với suppressed ones, nhưng fingerprint khác. Lý do? LLM rephrase findings mỗi run — slightly different wording, different example quotes, different fingerprint hash.

Tôi đang chơi whack-a-mole.

Fix: add explanatory context trong source file.

Thay vì suppress “marcng path looks wrong” by hash, tôi add comment:

<!-- NOTE: `-Users-marcng-Documents-Personal-Assistant` là Claude Code project
folder encoding cho legacy session khi user có username `marcng`. Hệ thống hiện
tại dùng `marcmax` nhưng Claude Code preserve folder name cũ. Folder THỰC SỰ
TỒN TẠI tại /Users/marcmax/.claude/... — intentional legacy preservation. -->

Bây giờ LLM đọc file, thấy explanation, ngưng flag. Durable, không phụ thuộc hash.

Đây là bài học sâu hơn: bad suppression giấu vấn đề; good context dạy auditor.

Catch-up khi Mac off

Tôi muốn daily run lúc 3 AM. macOS cron không catch up missed runs nếu Mac off lúc đó. macOS launchd thì có — với RunAtLoad: true + state file để dedupe.

<key>StartCalendarInterval</key>
<dict>
    <key>Hour</key><integer>3</integer>
    <key>Minute</key><integer>0</integer>
</dict>
<key>RunAtLoad</key><true/>

Plus state-file logic trong audit.py:

# Skip-if-recent guard
if last_run < 23 hours ago:
    print(f"SKIP: tier={tier} ran {hours_since:.1f}h ago")
    sys.exit(0)

Mac off cả ngày → boot 9 AM → RunAtLoad fire audit → state file shows last run 2 ngày trước → audit chạy. Mac wake từ sleep buổi chiều sau 3 AM run thành công → RunAtLoad fire → state file shows 6h ago → skip.

3 catch-up layer, defense-in-depth.

PII redaction là non-negotiable

Layer 2 gửi file content qua Anthropic API. Kể cả có zero-retention, defense-in-depth nói: redact secrets trước.

REDACTIONS = [
    (re.compile(r"sk-ant-[a-zA-Z0-9_\-]{20,}"), "[REDACTED:anthropic_key]"),
    (re.compile(r"\bsk-(?!ant-)[A-Za-z0-9]{20,}"), "[REDACTED:openai_key]"),
    (re.compile(r"ghp_[A-Za-z0-9]{36,}"), "[REDACTED:github_pat]"),
    (re.compile(r"AKIA[0-9A-Z]{16}"), "[REDACTED:aws_access_key]"),
    (re.compile(r"-----BEGIN[A-Z ]+PRIVATE KEY-----[\s\S]+?-----END[A-Z ]+PRIVATE KEY-----"),
     "[REDACTED:private_key]"),
    # ... 10+ patterns
]

Cho credentials_vault.md và file high-risk khác, thêm partial redact email (alice@example.com → [user]@example.com — giữ domain cho cross-source diff).

Test với 6 unit case. 100% pass.

Production results — 7 audit runs trong 1 ngày

Run	RED	YELLOW	Cái gì đổi
1 (initial)	10	15	baseline scan
2 (sau 13 patches)	6	17	apply LLM-suggested fixes
3 (worktree fix)	2	21	bulk-fix 56 worktree CLAUDE.md
4 (probe fix)	0	26	fix `lsof -ti` template bug
5 (suppressions)	2	7	new findings surface
6 (memory updates)	2	7	duplicate fingerprints
7 (root-cause fixes)	0	10	clean ✅

Tổng fixes ship trong 1 day:

16 memory file edits (close open questions: stay LL through probation, hybrid Claude cho Jarvis, defer AI cofounder Q3, M2 cold-start benchmark verified)
56 workspace CLAUDE.md updates
14 fingerprint suppressions
5 root-cause prompt-context fixes
1 probe bug fix
23 manual probes registered

Tại sao điều này quan trọng NGAY BÂY GIỜ

3 thứ đang xảy ra cùng lúc:

Persistent AI memory đang trở thành default. Claude Projects, ChatGPT Custom GPTs, Cursor .cursorrules, Continue .continue/, mọi IDE.
Personal LLM stack đang lớn lên. Tôi có 12 side project, 50+ memory file, 5+ workspace instruction. Nhân lên hàng triệu dev.
Không ai audit. Người ta phát hiện drift sau khi act trên stale info (“đợi đã, VM đó không tồn tại nữa?”), không phải trước.

Cost vô hình cho đến khi hit production. Lúc đó trust đã erode rồi.

Build it yourself

Architecture summary (toàn bộ ~30 Python file, source mở):

3 verification layer (static / LLM / live probe)
4-tier cadence (daily probes / weekly full / monthly exhaustive / event-driven hooks)
launchd agents cho catch-up sau Mac off/sleep
PII redaction trước khi LLM send
Fingerprint-based suppression với expiry
Source registry cover memory + CLAUDE.md (global + workspace) + project NOTES + settings + KB sample
23 manual probes + auto-probe templates per claim type
Cost: ~$3/month cho daily/weekly/monthly + event-driven runs

Total build time: 8 hours, 1 session. 14 hardening features, full test corpus.

Phần khó nhất không phải code. Là đủ trust audit results để fix cái nó tìm ra — nhiều thứ tôi “luôn biết” là minor inconsistency nhưng kept ignoring. Audit force confrontation: hoặc justify (add explanatory context) hoặc fix (update file).

Sắp tới

Tôi đang build hướng:

Reverse audit — scan codebase cho fact nên trong memory mà chưa (proactive bootstrap).
Drift detection — trend RED count theo thời gian, alert khi > moving avg + 2 stddev.
Auto-fix Tier A — high-confidence + reversible findings → apply patch automatic.
Adversarial test corpus — synthetic stale facts với known answer, đo precision/recall trên benchmark.

Bigger picture: Tôi nghĩ mọi personal AI setup cần audit layer. Một số sẽ dùng của tôi. Đa số sẽ build riêng. Vài team sẽ productize. Tất cả sẽ glad có nó lần đầu AI confidently assert thứ stale.

Until AI không thể nói dối bạn được nữa, bạn đang fly blind.

Built và shipped 2026-05-08 trong 8 giờ. Source: ~/.claude/hooks/audit-knowledge/. Cảm hứng từ realization tôi đã quietly trust AI memory 6 tháng mà không validate lần nào.