F3.2 Invoice Processing — Smart Lean AI Pipeline

CRITICAL BUG FIX: Upload 400 Error

Root cause: server.js line 1342 uses let body = ''; to accumulate request body as a string. This corrupts binary multipart/form-data (PDF uploads). Django receives mangled data and says "No PDF file provided".

Fix: Change the proxy to use Buffer for binary-safe forwarding:

// Line 1342-1358 in server.js — replace string body with Buffer array
const chunks = [];
req.on('data', chunk => chunks.push(chunk));
req.on('end', () => {
  const body = Buffer.concat(chunks);
  const proxyReq = http.request({
    hostname: '127.0.0.1', port: 3001,
    path: proxyUrl, method: req.method,
    headers: { ...req.headers, host: '127.0.0.1:3001' }
  }, (proxyRes) => {
    res.writeHead(proxyRes.statusCode, proxyRes.headers);
    proxyRes.pipe(res);
  });
  proxyReq.on('error', (err) => {
    console.error('Django proxy error:', err.message);
    res.writeHead(502, { 'Content-Type': 'application/json' });
    res.end(JSON.stringify({ error: 'Backend unavailable' }));
  });
  if (body.length > 0) proxyReq.write(body);
  proxyReq.end();
});

File: /home/claude/projects/unitcycle-demo/server.js lines 1340-1361 Verify: curl -X POST -F "pdf_file=@backend/media/invoices/1abb4cad698a_test_invoice.pdf" https://demo.unitcycle.com/api/invoices/upload/

Context

UnitCycle's invoice feature (F3.2) has the frontend and basic backend built, but the pipeline dead-ends at approve/reject. Invoices go nowhere after approval — no GL coding, no payment tracking, no AI matching. Competitors like Yardi Breeze and Entrata offer 94% AI accuracy with full AP workflows. This plan transforms the existing skeleton into a Smart Lean pipeline where AI handles the tedious work (vendor/property/GL matching, anomaly detection, duplicate flagging) and the PM just reviews and approves.

Design approved by Rafael: Smart Lean workflow (C), 2-panel layout, inline AI badges per field, light mode.

Architecture: Smart Lean Pipeline

Upload PDF → LlamaParse OCR (Cost Effective, 3 credits) → AI Extraction (Gemini Flash via OpenRouter)
    → Auto-match vendor/property/GL → Flag anomalies/dupes/WO matches
    → PM Reviews (inline AI badges, accept/override) → Approved → Scheduled → Paid

6-Stage State Machine

UPLOADED → AI_PROCESSING → PENDING_REVIEW → APPROVED → SCHEDULED → PAID
                                ↓
                          REJECTED / ON_HOLD

API Keys & Configuration

Environment Setup

Create /home/claude/projects/unitcycle-demo/backend/.env:

LLAMAPARSE_API_KEY=llx-TNRhGRWbPYPukOOnn2xo0nevpsGuLek4yC8Nmbt5f57TWadS
OPENROUTER_API_KEY=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

LlamaParse Config

Mode: Cost Effective (gpt4o_mode=true) — 3 credits/page, 3,333 free/month
Current code fix: backend/invoices/llamaparse.py line 63: change "premium_mode": "true" → "gpt4o_mode": "true"
API: https://api.cloud.llamaindex.ai/api/v1/parsing (keep v1 for now, simpler)

OpenRouter Config

Primary model: google/gemini-2.0-flash-001 — $0.10/$0.40 per M tokens
Fallback: qwen/qwen3-32b — $0.08/$0.24 per M tokens
API: https://openrouter.ai/api/v1/chat/completions (OpenAI-compatible)
Use for: Invoice field extraction enhancement, vendor/property/GL matching, anomaly detection, AI summaries, confidence scoring
Cost: ~$0.0004 per invoice, ~$40 per 100K invoices/year

Django Settings Update

Install python-dotenv, load .env in backend/config/settings.py
Add: OPENROUTER_API_KEY, OPENROUTER_MODEL (default: google/gemini-2.0-flash-001)
Update: LLAMAPARSE_API_KEY to read from env instead of hardcoded

Design Spec

Invoice Detail Page (Redesign)

Layout: 2-panel (PDF left, details right) — keep existing structure

New components in right panel (top to bottom):

Workflow Stepper — horizontal 6-stage bar at page top
- Uploaded → AI Extracted → Review → Approved → Scheduled → Paid
- Gold filled dot with pulse animation on active step
- Gold checkmark on completed steps, muted outline on future
Glass Summary Card — hero card with gold top border
- Total amount (24px Manrope bold), AI confidence bar + percentage
- Due date, payment terms
- No glassmorphism blur (light mode), just subtle shadow + gold accent bar
AI Alert Cards — only shown when relevant
- Price anomaly (red): "34% above avg for this vendor"
- Work order match (purple): links to matching WO
- Duplicate warning (amber): fingerprint match detected
- Each alert has icon, title, detail text, and action link
Vendor & Property Card — inline AI badges per field
- Each field: label | value | AI badge (check 95%, eye 78%, warning 45%) | "change" link on hover
- Confidence tiers: High (>=85% gold), Medium (60-84% amber), Low (<60% red)
- Low-confidence fields show alternatives ("Also considered: Oakmere Trace 61%")
- "change" link opens dropdown with search for manual override
Line Items Table — with GL code chips
- Columns: #, Description, GL Code, Qty, Amount
- GL code as clickable chip with inline confidence badge
- Uncertain GL codes have amber border
- Total row with Manrope bold, tabular-nums
Action Bar — pinned at bottom of line items card
- Approve (navy bg, white text, primary), Hold (neutral), Reject (red outline)
- Approve triggers: accept all AI suggestions → move to APPROVED status

Invoice List Page (Minor updates)

Add "Scheduled" and "Paid" status filter tabs
Add "Hold" status tab
Stats cards: add total scheduled, total paid amounts

Color System (Light Mode, oklch())

Surfaces: white cards on oklch(0.97 0.005 260) background
Gold accent: oklch(0.68 0.16 75) — all positive indicators
Navy: oklch(0.25 0.06 260) — approve button, headings
Confidence high: oklch(0.58 0.16 75) on oklch(0.96 0.04 80) bg
Confidence medium: oklch(0.62 0.15 55) on oklch(0.95 0.04 60) bg
Danger: oklch(0.55 0.22 25) — anomalies, reject only
Purple: oklch(0.52 0.18 300) — WO match alerts

Implementation Plan

Phase 1: Foundation (Backend)

Step 1: Environment & config setup

Create backend/.env with API keys
Install python-dotenv
Update backend/config/settings.py to load .env, add OPENROUTER settings
Update LLAMAPARSE_API_KEY to read from env
Files: backend/.env, backend/config/settings.py, backend/requirements.txt

Step 2: Fix LlamaParse mode (1-line change)

backend/invoices/llamaparse.py line 63: "premium_mode": "true" → "gpt4o_mode": "true"
Update the new API key
Test: upload a real PDF, verify extraction still works

Step 3: Create OpenRouter service

New file: backend/invoices/openrouter_service.py
OpenAI-compatible client pointing at openrouter.ai/api/v1
extract_invoice_fields(markdown_text) — structured JSON output with schema
match_vendor(extracted_name, vendor_list) — fuzzy match + confidence
match_property(extracted_text, property_list) — context-based matching
suggest_gl_codes(line_items, vendor_category) — per-line GL suggestion
detect_anomalies(invoice_data, vendor_history) — price anomaly detection
generate_summary(invoice_data) — human-readable AI summary
All methods return confidence scores

Step 4: Update Invoice model status choices

Add to status choices: 'on_hold', 'scheduled'
Add DB columns (raw SQL since managed=False):
- approved_by VARCHAR(100)
- approved_at TIMESTAMP
- scheduled_at TIMESTAMP
- scheduled_by VARCHAR(100)
- paid_at TIMESTAMP
- hold_reason TEXT
- gl_code_confirmed BOOLEAN DEFAULT FALSE
- vendor_confirmed BOOLEAN DEFAULT FALSE
- property_confirmed BOOLEAN DEFAULT FALSE
Add gl_code column to invoice_line_items table
Files: backend/invoices/models.py, raw SQL migration script

Step 5: AI matching pipeline

New file: backend/invoices/ai_pipeline.py
process_invoice(invoice_id) — orchestrates the full pipeline:
1. Get LlamaParse markdown → extract JSON via OpenRouter
2. Match vendor against vendors table (fuzzy + AI)
3. Match property against properties table (context clues)
4. Suggest GL codes per line item (based on vendor category + description)
5. Check for duplicates (multi-field fingerprint: vendor + amount + date + invoice#)
6. Detect price anomalies (compare to vendor's avg invoice amount)
7. Find matching work orders (description similarity)
8. Calculate per-field confidence scores
9. Save all results to Invoice + InvoiceLineItem records
Files: backend/invoices/ai_pipeline.py

Step 6: New API endpoints

POST /api/invoices/<id>/hold/ — set status to on_hold with reason
POST /api/invoices/<id>/schedule/ — move approved → scheduled
POST /api/invoices/<id>/mark-paid/ — move scheduled → paid
POST /api/invoices/<id>/confirm-field/ — confirm AI suggestion (vendor/property/GL)
POST /api/invoices/<id>/override-field/ — override AI suggestion
GET /api/invoices/<id>/vendor-history/ — vendor's past invoices for anomaly context
GET /api/invoices/aging/ — AP aging report data
Update existing InvoiceDetailSerializer to include all new fields
Files: backend/invoices/views.py, backend/invoices/urls.py, backend/invoices/serializers.py

Phase 2: Frontend Redesign

Step 7: Update types & service

Add new fields to Invoice interface (hold_reason, scheduled_at, paid_at, etc.)
Add InvoiceFieldConfidence type (per-field confidence + alternatives)
Add new service methods: hold, schedule, markPaid, confirmField, overrideField, getVendorHistory, getAging
Files: src/app/features/invoices/invoice.types.ts, src/app/features/invoices/invoice.service.ts

Step 8: Workflow stepper component

New: src/app/features/invoices/components/workflow-stepper.component.ts
Input: current status → maps to active step
6 dots with connectors, gold pulse on active, animations

Step 9: Redesign invoice-detail.component.ts

Replace existing right panel with new sections (per design spec):
- Glass summary card with gold top border
- AI alert cards (anomaly, WO match, duplicate) — conditional
- Vendor & Property card with inline AI badges + "change" dropdowns
- Line items table with GL code chips
- Action bar (Approve/Hold/Reject)
Keep left panel (PDF preview) as-is
Add workflow stepper above panels
Files: src/app/features/invoices/invoice-detail.component.ts

Step 10: Update invoice list

Add Scheduled, Paid, On Hold filter tabs
Update stats cards with new status counts
Files: src/app/features/invoices/invoices.component.ts

Phase 3: AI Integration & Polish

Step 11: Wire up upload → AI pipeline

On PDF upload: LlamaParse OCR → OpenRouter extraction → AI matching pipeline
Polling: frontend checks status while AI_PROCESSING
When done: reload detail view with all AI suggestions populated
Files: backend/invoices/views.py (update invoice_upload)

Step 12: Field confirmation UX

Click "Accept" on AI badge → POST confirm-field → badge turns solid gold check
Click "change" → dropdown with search → POST override-field → update badge
GL code chip click → dropdown of GL codes with AI-ranked suggestions
All confirmations tracked for audit trail

Step 13: Duplicate detection

On upload: compute fingerprint (vendor_name + amount + invoice_date + invoice_number hash)
Query existing invoices for fingerprint match
If match found: show amber alert card with link to existing invoice
Field: is_duplicate already exists in model

Phase 4: Testing & Verification

Step 14: Playwright tests

Invoice list: filters, search, pagination, stats
Invoice detail: view, PDF preview, field display
Upload flow: drag-drop, progress, scanning animation, result
AI review: badge display, accept, override, alerts
Approve/reject/hold workflows
Status transitions through full pipeline
Files: tests/invoices/

Step 15: Manual verification

Upload a real vendor invoice PDF
Verify LlamaParse extracts correctly with Cost Effective mode
Verify OpenRouter/Gemini Flash returns accurate structured JSON
Verify vendor/property/GL matching produces reasonable suggestions
Verify anomaly detection flags correctly
Verify full workflow: Upload → AI Processing → Review → Approve → Schedule → Paid
Check responsive layout on mobile

Key Files to Modify

File	Changes
`backend/.env`	CREATE — API keys
`backend/config/settings.py`	Load .env, add OPENROUTER settings
`backend/invoices/llamaparse.py`	Fix mode to gpt4o, use new API key
`backend/invoices/openrouter_service.py`	CREATE — OpenRouter client
`backend/invoices/ai_pipeline.py`	CREATE — AI matching orchestrator
`backend/invoices/models.py`	Add new status choices, new fields
`backend/invoices/views.py`	New endpoints, update upload flow
`backend/invoices/serializers.py`	Expand detail serializer
`backend/invoices/urls.py`	New routes
`backend/requirements.txt`	Add python-dotenv, openai
`src/app/features/invoices/invoice.types.ts`	New fields, confidence types
`src/app/features/invoices/invoice.service.ts`	New methods
`src/app/features/invoices/invoice-detail.component.ts`	Full redesign per spec
`src/app/features/invoices/invoices.component.ts`	New filter tabs
`src/app/features/invoices/components/workflow-stepper.component.ts`	CREATE
Raw SQL script	Add columns to invoices + invoice_line_items

Existing Code to Reuse

backend/invoices/llamaparse.py — keep upload/poll/extract flow, just change mode
backend/invoices/models.py — Invoice model already has 107 columns including vendor_match_confidence, property_match_confidence, is_duplicate, etc.
src/app/features/invoices/invoice-detail.component.ts — keep left PDF panel entirely, redesign right panel only
src/app/features/invoices/invoice.service.ts — extend, don't replace
Vendor matching fields already exist in Invoice model (probable_vendor_name, vendor_matched, vendor_match_confidence, vendor_match_strategy)
Property matching fields already exist (probable_property_name, property_matched, property_match_confidence)
All CSS custom properties in src/styles.css — reuse for consistency

Verification

python manage.py check — Django validation
ng build — Angular compilation
Upload test PDF → verify LlamaParse extraction (Cost Effective mode)
Verify OpenRouter API call returns valid JSON
Verify AI matching populates vendor/property/GL suggestions
Walk through full UI flow: list → upload → scanning → review → approve
Playwright test suite passes
pm2 restart unitcycle && pm2 save — deploy
Verify at https://demo.unitcycle.com/invoices