type: concept
created: Mon Apr 06 2026 02:00:00 GMT+0200 (Central European Summer Time)
updated: Mon Apr 06 2026 02:00:00 GMT+0200 (Central European Summer Time)
sources: wiki/concepts/wiki-system
tags: infrastructure wiki search ai openrouter
aliases: Wiki Search, AI Search, Smart Search

Wiki AI Search

abstract

LLM-powered search for the UnitCycle Wiki. Single input field — AI answer streams on top (like Google + Gemini), related pages with snippets below. Uses Gemini 2.0 Flash via OpenRouter for query expansion and answer generation.

How It Works

1. Query Expansion (LLM pre-pass)

When the user searches, the system first calls Gemini 2.0 Flash to expand the query into 8-10 related search terms. This turns "units" into ["apartment units", "unit management", "occupancy", "lease", "tenant", "vacancy", "floor plan", "unit detail"].

Model: google/gemini-2.0-flash-001 via wiki/entities/openrouter
Timeout: 5 seconds (falls back to basic keyword extraction)
Max tokens: 200
Temperature: 0.3
Cost: ~$0.0001 per query

2. Multi-Term Merged Search

Each expanded term is searched independently via qmd search (BM25 keyword matching). Results are merged by file path — highest score wins per unique page. Up to 15 results returned.

3. Result Validation & Snippets

Every result is validated against the actual filesystem (case-sensitive Linux paths). Invalid results are dropped. Each valid result gets:

Title (from frontmatter or filename)
Type badge (concept/entity/summary/analysis)
160-char snippet (markdown stripped: headings, callouts, bold, links, wikilinks, tables)

4. AI Answer (streaming SSE)

Top 6 pages are read in full (truncated to 3,000 chars each). Sent as context to Gemini 2.0 Flash with the user's question. Answer streams via Server-Sent Events (SSE).

5. Follow-Up Questions

Conversation history is maintained client-side (up to 3 exchanges). The same search field handles follow-ups — type again to ask more about the same topic.

Architecture

User types query
    ↓
[LLM] Expand query → 8-10 related terms (~1s)
    ↓
[qmd] Search each term → merge results by file path
    ↓
[fs] Validate each result exists, extract snippet
    ↓
[SSE] Send related_pages event → frontend renders immediately
    ↓
[fs] Read top 6 pages in full for context
    ↓
[LLM] Stream answer with citations → SSE tokens
    ↓
Frontend: AI answer on top, related pages below

Ranking & Demotion

Title boost: Pages whose filename matches search terms get +0.05 per match
Overview demotion: Catch-all pages (unitcycle, projectbrief, activecontext, etc.) get -0.15 unless directly searched — prevents platform overview pages from polluting every query

Infrastructure

Server: wiki/server/index.js — Express.js, port 3080, PM2 process uc-wiki
Endpoint: POST /api/ask (SSE streaming)
Search engine: qmd (BM25 keyword search, local)
LLM: OpenRouter → Gemini 2.0 Flash
Nginx: proxy_buffering off required for SSE streaming (configured in /etc/nginx/sites-available/uc-wiki.xdvu.com)
URL: https://uc-wiki.xdvu.com/search

SSE Event Types

Event	When	Data
`status`	During search	`{ message: "Expanding search..." }`
`related_pages`	After search completes	`{ pages: [{ title, path, type, snippet }] }`
`sources`	Before LLM answer	`{ sources: [{ title, path, type }] }`
`token`	During LLM streaming	`{ content: "..." }`
`error`	On failure	`{ message: "..." }`

Key Design Decisions

Single request, not two — Frontend sends one POST to /api/ask which returns both related pages AND the AI answer. No separate /api/search call.
LLM expansion before search, not after — Expanding the query first produces much better BM25 results than searching with raw user input.
File validation — Every result is checked against the filesystem before being shown. Prevents 404s from case-sensitivity mismatches or stale qmd index.
Snippet extraction — Strips all markdown syntax (headings, callouts, bold, links, wikilinks, tables) for clean plain-text previews.

wiki/concepts/wiki-system — Overall wiki architecture and hooks
wiki/concepts/tech-stack — Project infrastructure
wiki/entities/openrouter — LLM API gateway used for search

Wiki AI Search

How It Works

1. Query Expansion (LLM pre-pass)

2. Multi-Term Merged Search

3. Result Validation & Snippets

4. AI Answer (streaming SSE)

5. Follow-Up Questions

Architecture

Ranking & Demotion

Infrastructure

SSE Event Types

Key Design Decisions

Related