type
concept
created
Mon Apr 06 2026 02:00:00 GMT+0200 (Central European Summer Time)
updated
Mon Apr 06 2026 02:00:00 GMT+0200 (Central European Summer Time)
sources
wiki/concepts/wiki-system
tags
infrastructure wiki search ai openrouter
aliases
Wiki Search, AI Search, Smart Search

Wiki AI Search

abstract
LLM-powered search for the UnitCycle Wiki. Single input field — AI answer streams on top (like Google + Gemini), related pages with snippets below. Uses Gemini 2.0 Flash via OpenRouter for query expansion and answer generation.

How It Works

1. Query Expansion (LLM pre-pass)

When the user searches, the system first calls Gemini 2.0 Flash to expand the query into 8-10 related search terms. This turns "units" into ["apartment units", "unit management", "occupancy", "lease", "tenant", "vacancy", "floor plan", "unit detail"].

2. Multi-Term Merged Search

Each expanded term is searched independently via qmd search (BM25 keyword matching). Results are merged by file path — highest score wins per unique page. Up to 15 results returned.

3. Result Validation & Snippets

Every result is validated against the actual filesystem (case-sensitive Linux paths). Invalid results are dropped. Each valid result gets:

4. AI Answer (streaming SSE)

Top 6 pages are read in full (truncated to 3,000 chars each). Sent as context to Gemini 2.0 Flash with the user's question. Answer streams via Server-Sent Events (SSE).

5. Follow-Up Questions

Conversation history is maintained client-side (up to 3 exchanges). The same search field handles follow-ups — type again to ask more about the same topic.

Architecture

User types query
    ↓
[LLM] Expand query → 8-10 related terms (~1s)
    ↓
[qmd] Search each term → merge results by file path
    ↓
[fs] Validate each result exists, extract snippet
    ↓
[SSE] Send related_pages event → frontend renders immediately
    ↓
[fs] Read top 6 pages in full for context
    ↓
[LLM] Stream answer with citations → SSE tokens
    ↓
Frontend: AI answer on top, related pages below

Ranking & Demotion

Infrastructure

SSE Event Types

Event When Data
status During search { message: "Expanding search..." }
related_pages After search completes { pages: [{ title, path, type, snippet }] }
sources Before LLM answer { sources: [{ title, path, type }] }
token During LLM streaming { content: "..." }
error On failure { message: "..." }

Key Design Decisions

  1. Single request, not two — Frontend sends one POST to /api/ask which returns both related pages AND the AI answer. No separate /api/search call.
  2. LLM expansion before search, not after — Expanding the query first produces much better BM25 results than searching with raw user input.
  3. File validation — Every result is checked against the filesystem before being shown. Prevents 404s from case-sensitivity mismatches or stale qmd index.
  4. Snippet extraction — Strips all markdown syntax (headings, callouts, bold, links, wikilinks, tables) for clean plain-text previews.

Related