- type
- concept
- created
- Mon Apr 06 2026 02:00:00 GMT+0200 (Central European Summer Time)
- updated
- Mon Apr 06 2026 02:00:00 GMT+0200 (Central European Summer Time)
- sources
- wiki/concepts/wiki-system
- tags
- infrastructure wiki search ai openrouter
- aliases
- Wiki Search, AI Search, Smart Search
Wiki AI Search
How It Works
1. Query Expansion (LLM pre-pass)
When the user searches, the system first calls Gemini 2.0 Flash to expand the query into 8-10 related search terms. This turns "units" into ["apartment units", "unit management", "occupancy", "lease", "tenant", "vacancy", "floor plan", "unit detail"].
- Model:
google/gemini-2.0-flash-001via wiki/entities/openrouter - Timeout: 5 seconds (falls back to basic keyword extraction)
- Max tokens: 200
- Temperature: 0.3
- Cost: ~$0.0001 per query
2. Multi-Term Merged Search
Each expanded term is searched independently via qmd search (BM25 keyword matching). Results are merged by file path — highest score wins per unique page. Up to 15 results returned.
3. Result Validation & Snippets
Every result is validated against the actual filesystem (case-sensitive Linux paths). Invalid results are dropped. Each valid result gets:
- Title (from frontmatter or filename)
- Type badge (concept/entity/summary/analysis)
- 160-char snippet (markdown stripped: headings, callouts, bold, links, wikilinks, tables)
4. AI Answer (streaming SSE)
Top 6 pages are read in full (truncated to 3,000 chars each). Sent as context to Gemini 2.0 Flash with the user's question. Answer streams via Server-Sent Events (SSE).
5. Follow-Up Questions
Conversation history is maintained client-side (up to 3 exchanges). The same search field handles follow-ups — type again to ask more about the same topic.
Architecture
User types query
↓
[LLM] Expand query → 8-10 related terms (~1s)
↓
[qmd] Search each term → merge results by file path
↓
[fs] Validate each result exists, extract snippet
↓
[SSE] Send related_pages event → frontend renders immediately
↓
[fs] Read top 6 pages in full for context
↓
[LLM] Stream answer with citations → SSE tokens
↓
Frontend: AI answer on top, related pages below
Ranking & Demotion
- Title boost: Pages whose filename matches search terms get +0.05 per match
- Overview demotion: Catch-all pages (unitcycle, projectbrief, activecontext, etc.) get -0.15 unless directly searched — prevents platform overview pages from polluting every query
Infrastructure
- Server:
wiki/server/index.js— Express.js, port 3080, PM2 processuc-wiki - Endpoint:
POST /api/ask(SSE streaming) - Search engine:
qmd(BM25 keyword search, local) - LLM: OpenRouter → Gemini 2.0 Flash
- Nginx:
proxy_buffering offrequired for SSE streaming (configured in/etc/nginx/sites-available/uc-wiki.xdvu.com) - URL: https://uc-wiki.xdvu.com/search
SSE Event Types
| Event | When | Data |
|---|---|---|
status |
During search | { message: "Expanding search..." } |
related_pages |
After search completes | { pages: [{ title, path, type, snippet }] } |
sources |
Before LLM answer | { sources: [{ title, path, type }] } |
token |
During LLM streaming | { content: "..." } |
error |
On failure | { message: "..." } |
Key Design Decisions
- Single request, not two — Frontend sends one POST to
/api/askwhich returns both related pages AND the AI answer. No separate/api/searchcall. - LLM expansion before search, not after — Expanding the query first produces much better BM25 results than searching with raw user input.
- File validation — Every result is checked against the filesystem before being shown. Prevents 404s from case-sensitivity mismatches or stale qmd index.
- Snippet extraction — Strips all markdown syntax (headings, callouts, bold, links, wikilinks, tables) for clean plain-text previews.
Related
- wiki/concepts/wiki-system — Overall wiki architecture and hooks
- wiki/concepts/tech-stack — Project infrastructure
- wiki/entities/openrouter — LLM API gateway used for search