5 Bias bẩm sinh của Cloud LLM & 5 Lever để vượt qua

Scope của bài này: Cloud-hosted LLM (Claude, GPT, Gemini, Mistral hosted, etc.) — tức là bạn chỉ access qua API và không thể fine-tune weights. Nếu bạn dùng Local LLM (Llama base, Qwen base self-host) thì có thêm option fine-tune mà bài này không cover. Xem mục cuối “Local LLM khác gì?”.

Bối cảnh: 12 lỗi trong 1 session

Tuần này tôi để Claude Code draft một workflow doc cho client (Google Sheet 6 cột: Module · Flow · Current workflow · Workflow V2 · Key rules). Source: meeting transcript Krisp + email + Confluence + workshop notes.

110 turns sau, tôi bắt được 12 lỗi trong cell content. Không phải lỗi typo. Là lỗi structural:

Lấy lời internal team demo trong meeting → ghi như client mô tả workflow hiện tại
Convert client requirement (“avoid X”) thành mô tả Current state (“X is happening”)
Auto-fill state machine, approval levels, pre-conditions dù source không nói gì
Bịa “Pending Review status” cho feature mà meeting note không mention
Sai một entity ↔ vendor mapping → propagate qua nhiều cells khác nhau

Tôi hỏi Claude: “Tại sao sai nhiều thế?” Câu trả lời rất honest, đến mức tôi viết blog này để chia sẻ.

5 Bias bẩm sinh của Cloud LLM

Không phải accident. Đây là default behavior được bake vào weights khi vendor train (đặc biệt qua RLHF / Constitutional AI). Cloud LLM chính là Claude / GPT / Gemini bạn dùng qua API — bias profile của các model này tương đương vì cùng training paradigm.

1. Smooth-prose bias

LLM được train trên data với polished prose. Output “đẹp” = reward. Vậy nó tự động convert raw quote "Yeah, we are using it." thành "The feature is actively used today across all sites". Smooth → drop nuance → claim hơn source (raw quote chỉ confirm “đang dùng”, không nói “across all sites” hay “actively”).

Trade-off bị bỏ: aesthetics > accuracy.

2. Fill-the-gap bias

LLM thấy gap trong info → tự fill bằng “plausible” content. Ví dụ:

Source nói: “1 batch mỗi tháng, value date là ngày X”
LLM fill thêm: “send to bank vài ngày trước ngày X” — KHÔNG có trong source

LLM assume detail đó plausible vì pattern industry-standard thường thế. Plausible ≠ true. Đây là hallucination tendency — fill gap bằng common knowledge thay vì admit “source không nói”.

3. Pattern-complete bias

LLM quen với template (PRD có Pre-conditions / Approval flow / State machine / Error codes). Khi viết workflow doc, nó auto-fill template fields kể cả khi source im lặng. Meeting không nói “Pending Review status” → LLM vẫn add vì PRD template thường có.

4. Confident-assertion bias

LLM được train answer confidently, không “I don’t know”. Nó prefer:

❌ “Cell empty — không có info” (admit gap)
✅ “Per workflow…” (assert dù không có source)

Empty cell feels như chưa làm việc. LLM bị psychologically push toward filling.

5. Please-the-user bias (sycophancy)

LLM muốn deliver complete output cho user xài ngay. Empty cells = “tôi chưa làm xong” → over-deliver → over-claim.

Tổng hợp: pipeline 3-step gây compound error

Default behavior của LLM:

extract → smooth → assert

Mỗi step add error:

Extract từ nhiều source: mix các speaker / template / interpretation (source blending)
Smooth raw quote thành prose: drop nuance, paraphrase sai
Assert với confidence: claim hơn source, fill gap, complete pattern

→ Compound error qua 3 steps. 12 lỗi/session là predictable consequence, không phải accident.

Sự thật lạnh: bạn không thể “fix” Cloud LLM

Với Cloud LLM (Claude, GPT, Gemini), bạn chỉ access qua API. Weights nằm ở data center của vendor — bạn không thể:

Re-train với data riêng
Adjust RLHF reward (bias đã bake từ training)
Strip “smooth-prose” preference

Một số vendor có “fine-tune API” (OpenAI cho GPT-3.5/4, Anthropic Claude fine-tune trong rollout) nhưng:

Đắt (vài trăm → vài nghìn USD per run)
Limited scope (instruction tuning, không phải full retrain)
Bias core (smoothing, sycophancy) khó remove vì đã bake từ pre-training

→ Cho 99% Cloud LLM users, bias là inherent + immutable. Bạn chỉ có thể: box / dilute / bypass bias.

5 Lever để vượt qua (theo độ mạnh)

Lever 1: Architectural constraint — strongest, mechanical

Thay vì asking LLM làm đúng, force LLM vào pipeline tool-driven:

Pipeline cứng (Python script):
1. grep transcript → extract quotes by speaker
2. Filter [CLIENT] only
3. Pass to LLM CHỈ task: "categorize quote vào module X / Y / Z"
4. Output: structured JSON

LLM chỉ làm narrow task (categorize). Extract / filter là code, không LLM. Bias không chạm được vì code deterministic.

Lever 2: Multi-agent verification — strong, expensive

2 LLM separate: writer + critic.

Writer draft cell. Critic receive draft + source → check “is every claim verbatim from source?” → flag mismatch → return to Writer. Loop until critic passes.

Bias ở Writer ≠ bias ở Critic (different prompts, different roles). Critic catch lỗi Writer miss. Tốn 2x tokens nhưng catch thêm 30-50% lỗi.

Lever 3: Mode lock — medium, prompt-engineering

Single explicit mode prompt KHÓA LLM vào 1 behavior:

SYSTEM: You are in EXTRACTION MODE.
- You may ONLY paste verbatim strings from source files.
- You may NOT generate any new prose.
- Penalty for any new prose: stop processing, return error.

Bias bị suppress trong scope của mode. Nhưng bias vẫn có thể leak khi prompt ambiguous → cần monitor.

Lever 4: Extended thinking + self-critique — medium

Trước khi output, LLM phải:

Generate draft
Self-critique: “Is each claim from source? Where exactly?”
Revise based on self-critique
Output

LLM tự critique LLM → catch ~30-50% lỗi. Không 100% (bias áp lên cả critique step).

Lever 5: Slow-down + reduce-surface — weak nhưng cumulative

Bias activate mạnh khi:

Ambiguous prompt (LLM fill gap với assumption)
Long context (LLM lose track of source vs interpretation)
High creative freedom (LLM smooth prose)
Time pressure (LLM skip verification)

Counter: prompt narrow + specific. Source file passed inline (not retrieved by RAG). Output format constraints (template, JSON schema).

→ Bias đỡ active nhưng không vanish.

Insight quan trọng: bias là feature trong 80% cases

Smoothing prose hữu ích khi viết blog, marketing, email phong cách. Hại khi viết workflow doc cho client cần precision.

Đừng cố “fix” LLM globally. Detect bug-context và switch mode:

Default: smooth + helpful (bias OK)
Workflow doc / PRD draft: extract-only mode (bias suppressed)
Final published doc: human polish, không LLM

Implication cho người dùng AI assistant

Constraint > training/feedback. Bạn feedback LLM “đừng diễn giải” → next session nó vẫn diễn giải. Mechanical constraint (Quote-or-Empty rule) hiệu quả hơn pep-talk.
Architectural guardrails > behavioral instructions. Một Python script extract verbatim quote loại 1 risk surface. Một CLAUDE.md rule “be careful” thì không.
Multi-agent > single-agent với high-stakes docs. Pay 2x tokens, save N hours rework.
Mode switching is a real capability. Treat AI assistant như multi-modal tool, không phải general-purpose intern. Chọn mode đúng cho task.

Local LLM khác gì?

Nếu bạn self-host Local LLM (Llama 3 base, Qwen base, Mistral base trên Ollama / vLLM / llama.cpp), bài viết này áp dụng một phần, nhưng có thêm Lever 0 không có ở Cloud:

Lever 0: Fine-tune trực tiếp (Local LLM only)

Bạn có thể:

LoRA fine-tune với dataset extract-only (chỉ verbatim quotes từ source) → model học habit “không synthesize” thay vì cần constraint bên ngoài
DPO / ORPO với preference data: prefer “I don’t know” hơn “fill plausible” → giảm hallucination tendency
Constitutional AI tự host: train model theo principles riêng (vd “always cite source”)

Effort: vừa phải nếu có GPU (RTX 4090 trở lên) hoặc cloud GPU rental. Tốn 4-12 giờ training cho LoRA dataset 5-10K examples.

Bias profile Local LLM khác Cloud

Base model (chưa instruction-tune): smoothing yếu hơn, fill-gap tương đương, sycophancy yếu — vì chưa qua RLHF heavy
Instruction-tuned (Llama-3-Instruct, Qwen-Chat): bias profile tiệm cận Cloud — vì cũng dùng RLHF/DPO
Reasoning models (DeepSeek-R1, QwQ): có “thinking” trace explicit, dễ catch self-contradiction → giảm 1 phần fill-gap

Trade-off Local vs Cloud cho task chính xác

	Cloud LLM	Local LLM
Bias mặc định	Strong (RLHF heavy)	Weaker base / similar instruct
Fine-tune	Đắt + limited	Free + full control
Capability ceiling	Cao (Claude Opus, GPT-4)	Thấp hơn (Llama 70B ≈ GPT-3.5+)
Cho task workflow doc	Cần guardrail mạnh	Có thể fine-tune để giảm bias

→ Nếu bạn làm task accuracy-critical thường xuyên + có hardware → Local LLM fine-tuned đáng đầu tư. Nếu task variety cao + capability quan trọng → Cloud LLM + guardrails là pragmatic choice.

Kết

Cloud LLM bias là inherent. Bạn không sửa được weights. Nhưng bạn có thể design system xung quanh nó — pipeline, multi-agent, mode lock — để bias không hại.

12 lỗi/session là dấu hiệu bạn đang để LLM chạy unconstrained. Add constraint mechanical. Test 2 tuần. Đo error rate. Iterate.

Đó là cách làm việc với AI assistant trưởng thành — không phải “AI is amazing, will do everything”, cũng không phải “AI hallucinates, useless”. Nó là tool có inherent bias — biết bias, design xung quanh, ship work.

Bài viết được rút ra từ một session làm việc thực tế với Claude Code (Opus 4.7, Cloud) khi draft workflow doc cho client. Pattern và insight áp dụng cho mọi Cloud LLM assistant (Claude, GPT, Gemini). Local LLM users xem mục “Local LLM khác gì?” để biết thêm Lever 0 (fine-tune).