OpenRouter LLM Research: Cheapest Effective Models for Invoice Extraction & General AI Tasks

Executive Summary

After pulling live pricing from the OpenRouter API (312+ models) and cross-referencing web benchmarks, here are the ranked recommendations for UnitCycle's two use cases.

Use Case 1: Invoice Field Extraction (Structured JSON Output)

Requirements: After OCR, extract vendor name, amounts, dates, line items, GL codes as structured JSON. Must reliably produce valid JSON every time.

Top Picks — Ranked by Value (Quality-per-Dollar)

Rank	Model	Input/M	Output/M	Context	Structured Output	Notes
1	Google Gemini 2.0 Flash (`google/gemini-2.0-flash-001`)	$0.10	$0.40	1M	Yes (native `json_schema`)	Best value. 97.1% quality in 38-task benchmark at lowest cost. Native JSON schema enforcement. Google's workhorse.
2	Google Gemini 2.5 Flash Lite (`google/gemini-2.5-flash-lite`)	$0.10	$0.40	1M	Yes (native `json_schema`)	Newer Lite variant. Same price as 2.0 Flash. Designed for high-volume extraction/classification.
3	DeepSeek V3.1 (`deepseek/deepseek-chat-v3.1`)	$0.15	$0.75	32K	Yes (native `json_schema`)	671B MoE (37B active). Excellent at structured extraction — "high compliance with structured outputs." Very literal/predictable, ideal for pipelines.
4	Qwen3-32B (`qwen/qwen3-32b`)	$0.08	$0.24	40K	Yes (native `json_schema`)	Cheapest input of any quality model. Strong at structured tasks. May need explicit formatting instructions for long contexts.
5	GPT-4.1 Nano (`openai/gpt-4.1-nano`)	$0.10	$0.40	1M	Yes (native `json_schema`)	OpenAI's cheapest. Best-in-class structured output compliance (OpenAI pioneered this). Multimodal.
6	GPT-5 Nano (`openai/gpt-5-nano`)	$0.05	$0.40	400K	Yes (native `json_schema`)	Even cheaper input than 4.1 Nano. New model — verify extraction quality before committing.
7	Mistral Small 3.1 24B (`mistralai/mistral-small-3.1-24b-instruct`)	$0.03	$0.11	131K	Yes (native `json_schema`)	Ultra-cheap. Works out-of-box with invoice extraction libraries. May be less reliable on complex multi-line-item invoices.
8	GPT-4o Mini (`openai/gpt-4o-mini`)	$0.15	$0.60	128K	Yes (native `json_schema`)	Battle-tested. Known excellent JSON compliance. Slightly more expensive but very reliable.
9	Llama 4 Scout (`meta-llama/llama-4-scout`)	$0.08	$0.30	327K	Yes (native `json_schema`)	Open-source, multimodal, huge context. Good structured output support via OpenRouter.

Cost Estimate for Invoice Processing

Assuming a typical invoice = ~2,000 tokens input (OCR markdown + system prompt + schema), ~500 tokens output (extracted JSON):

Model	Cost per Invoice	Cost per 1,000 Invoices	Cost per 10,000 Invoices
Gemini 2.0 Flash	$0.0004	$0.40	$4.00
Gemini 2.5 Flash Lite	$0.0004	$0.40	$4.00
Qwen3-32B	$0.0003	$0.28	$2.80
GPT-5 Nano	$0.0003	$0.30	$3.00
DeepSeek V3.1	$0.0007	$0.68	$6.75
Mistral Small 3.1	$0.0001	$0.12	$1.16
GPT-4.1 Nano	$0.0004	$0.40	$4.00
GPT-4o Mini	$0.0006	$0.60	$6.00

At these prices, even 100,000 invoices/year costs under $40 with Gemini Flash.

Use Case 2: General AI Tasks (Matching, Scoring, Anomaly Detection, Summaries)

Requirements: Property/vendor matching, confidence scoring, anomaly detection in rent rolls, AI summaries. Light reasoning, not as format-critical.

Top Picks — Ranked by Value

Rank	Model	Input/M	Output/M	Why
1	Qwen3-235B-A22B (2507) (`qwen/qwen3-235b-a22b-2507`)	$0.071	$0.10	Monster MoE model at absurdly low price. 235B params, 22B active. Supports reasoning mode. Best reasoning-per-dollar available.
2	Gemini 2.0 Flash (`google/gemini-2.0-flash-001`)	$0.10	$0.40	Same model as Use Case 1. Good enough for both tasks — simplifies your stack to one model.
3	DeepSeek V3.1 (`deepseek/deepseek-chat-v3.1`)	$0.15	$0.75	Strong reasoning. Hybrid thinking/non-thinking modes. Excels at data analysis.
4	Llama 4 Scout (`meta-llama/llama-4-scout`)	$0.08	$0.30	327K context, multimodal, good general intelligence for the price.
5	GPT-4o Mini (`openai/gpt-4o-mini`)	$0.15	$0.60	Reliable all-rounder. If you want one model for everything and trust OpenAI.

RECOMMENDATION: Best Strategy for UnitCycle

Primary Model: Google Gemini 2.0 Flash (`google/gemini-2.0-flash-001`)

$0.10 input / $0.40 output per million tokens
Use for BOTH invoice extraction AND general AI tasks
Native structured output (json_schema mode) via OpenRouter
1M token context window (handles any document)
97.1% quality on diverse benchmarks at lowest cost tier
Multimodal (can process images too if needed later)
Battle-tested, production-stable from Google

Fallback / Cost-Optimized: Qwen3-32B (`qwen/qwen3-32b`)

$0.08 input / $0.24 output — even cheaper
Use as fallback if Gemini has rate limits or outages
Native structured output support
Open-source model, multiple providers on OpenRouter

Heavy Reasoning Tasks: Qwen3-235B-A22B-2507 (`qwen/qwen3-235b-a22b-2507`)

$0.071 input / $0.10 output — cheapest "frontier-class" reasoning
Use for anomaly detection, complex pattern matching, AI summaries requiring deeper analysis
Has reasoning/thinking mode for chain-of-thought

NOT Recommended:

Claude 3.5 Haiku ($0.80/$4.00) — 8x more expensive than Gemini Flash for marginal quality gain on extraction tasks
Claude Haiku 4.5 ($1.00/$5.00) — Even more expensive. Save Claude for tasks that truly need it.
DeepSeek R1 ($0.70/$2.50) — Overkill reasoning model for these tasks. Use V3.1 instead.
Free models — Rate-limited (20 req/min, 200 req/day). Not viable for production.

Implementation Notes

OpenRouter API Call (Invoice Extraction Example)

import requests

response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_OPENROUTER_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/gemini-2.0-flash-001",
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "invoice_extraction",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "vendor_name": {"type": "string"},
                        "invoice_number": {"type": "string"},
                        "invoice_date": {"type": "string"},
                        "due_date": {"type": "string"},
                        "total_amount": {"type": "number"},
                        "line_items": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "description": {"type": "string"},
                                    "quantity": {"type": "number"},
                                    "unit_price": {"type": "number"},
                                    "amount": {"type": "number"},
                                    "gl_code": {"type": "string"}
                                }
                            }
                        }
                    },
                    "required": ["vendor_name", "total_amount", "line_items"]
                }
            }
        },
        "messages": [
            {"role": "system", "content": "Extract invoice data from the OCR text. Return structured JSON."},
            {"role": "user", "content": "<OCR markdown text here>"}
        ]
    }
)

Key OpenRouter Features to Use:

response_format: json_schema — Forces valid JSON matching your exact Pydantic schema
Provider routing — OpenRouter auto-routes to cheapest/fastest provider for each model
Response healing plugin — Auto-fixes malformed JSON responses (safety net)
Streaming — Supported for structured outputs too

Tiered Strategy for Production:

Tier 1 (default): Gemini 2.0 Flash — handles 95% of invoices
Tier 2 (complex): If extraction confidence < 80%, retry with DeepSeek V3.1 or Qwen3-235B
Tier 3 (manual): If both fail, flag for human review