OpenRouter LLM Research: Cheapest Effective Models for Invoice Extraction & General AI Tasks

Executive Summary

After pulling live pricing from the OpenRouter API (312+ models) and cross-referencing web benchmarks, here are the ranked recommendations for UnitCycle's two use cases.


Use Case 1: Invoice Field Extraction (Structured JSON Output)

Requirements: After OCR, extract vendor name, amounts, dates, line items, GL codes as structured JSON. Must reliably produce valid JSON every time.

Top Picks — Ranked by Value (Quality-per-Dollar)

Rank Model Input/M Output/M Context Structured Output Notes
1 Google Gemini 2.0 Flash (google/gemini-2.0-flash-001) $0.10 $0.40 1M Yes (native json_schema) Best value. 97.1% quality in 38-task benchmark at lowest cost. Native JSON schema enforcement. Google's workhorse.
2 Google Gemini 2.5 Flash Lite (google/gemini-2.5-flash-lite) $0.10 $0.40 1M Yes (native json_schema) Newer Lite variant. Same price as 2.0 Flash. Designed for high-volume extraction/classification.
3 DeepSeek V3.1 (deepseek/deepseek-chat-v3.1) $0.15 $0.75 32K Yes (native json_schema) 671B MoE (37B active). Excellent at structured extraction — "high compliance with structured outputs." Very literal/predictable, ideal for pipelines.
4 Qwen3-32B (qwen/qwen3-32b) $0.08 $0.24 40K Yes (native json_schema) Cheapest input of any quality model. Strong at structured tasks. May need explicit formatting instructions for long contexts.
5 GPT-4.1 Nano (openai/gpt-4.1-nano) $0.10 $0.40 1M Yes (native json_schema) OpenAI's cheapest. Best-in-class structured output compliance (OpenAI pioneered this). Multimodal.
6 GPT-5 Nano (openai/gpt-5-nano) $0.05 $0.40 400K Yes (native json_schema) Even cheaper input than 4.1 Nano. New model — verify extraction quality before committing.
7 Mistral Small 3.1 24B (mistralai/mistral-small-3.1-24b-instruct) $0.03 $0.11 131K Yes (native json_schema) Ultra-cheap. Works out-of-box with invoice extraction libraries. May be less reliable on complex multi-line-item invoices.
8 GPT-4o Mini (openai/gpt-4o-mini) $0.15 $0.60 128K Yes (native json_schema) Battle-tested. Known excellent JSON compliance. Slightly more expensive but very reliable.
9 Llama 4 Scout (meta-llama/llama-4-scout) $0.08 $0.30 327K Yes (native json_schema) Open-source, multimodal, huge context. Good structured output support via OpenRouter.

Cost Estimate for Invoice Processing

Assuming a typical invoice = ~2,000 tokens input (OCR markdown + system prompt + schema), ~500 tokens output (extracted JSON):

Model Cost per Invoice Cost per 1,000 Invoices Cost per 10,000 Invoices
Gemini 2.0 Flash $0.0004 $0.40 $4.00
Gemini 2.5 Flash Lite $0.0004 $0.40 $4.00
Qwen3-32B $0.0003 $0.28 $2.80
GPT-5 Nano $0.0003 $0.30 $3.00
DeepSeek V3.1 $0.0007 $0.68 $6.75
Mistral Small 3.1 $0.0001 $0.12 $1.16
GPT-4.1 Nano $0.0004 $0.40 $4.00
GPT-4o Mini $0.0006 $0.60 $6.00

At these prices, even 100,000 invoices/year costs under $40 with Gemini Flash.


Use Case 2: General AI Tasks (Matching, Scoring, Anomaly Detection, Summaries)

Requirements: Property/vendor matching, confidence scoring, anomaly detection in rent rolls, AI summaries. Light reasoning, not as format-critical.

Top Picks — Ranked by Value

Rank Model Input/M Output/M Why
1 Qwen3-235B-A22B (2507) (qwen/qwen3-235b-a22b-2507) $0.071 $0.10 Monster MoE model at absurdly low price. 235B params, 22B active. Supports reasoning mode. Best reasoning-per-dollar available.
2 Gemini 2.0 Flash (google/gemini-2.0-flash-001) $0.10 $0.40 Same model as Use Case 1. Good enough for both tasks — simplifies your stack to one model.
3 DeepSeek V3.1 (deepseek/deepseek-chat-v3.1) $0.15 $0.75 Strong reasoning. Hybrid thinking/non-thinking modes. Excels at data analysis.
4 Llama 4 Scout (meta-llama/llama-4-scout) $0.08 $0.30 327K context, multimodal, good general intelligence for the price.
5 GPT-4o Mini (openai/gpt-4o-mini) $0.15 $0.60 Reliable all-rounder. If you want one model for everything and trust OpenAI.

RECOMMENDATION: Best Strategy for UnitCycle

Primary Model: Google Gemini 2.0 Flash (google/gemini-2.0-flash-001)

Fallback / Cost-Optimized: Qwen3-32B (qwen/qwen3-32b)

Heavy Reasoning Tasks: Qwen3-235B-A22B-2507 (qwen/qwen3-235b-a22b-2507)

NOT Recommended:


Implementation Notes

OpenRouter API Call (Invoice Extraction Example)

import requests

response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_OPENROUTER_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/gemini-2.0-flash-001",
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "invoice_extraction",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "vendor_name": {"type": "string"},
                        "invoice_number": {"type": "string"},
                        "invoice_date": {"type": "string"},
                        "due_date": {"type": "string"},
                        "total_amount": {"type": "number"},
                        "line_items": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "description": {"type": "string"},
                                    "quantity": {"type": "number"},
                                    "unit_price": {"type": "number"},
                                    "amount": {"type": "number"},
                                    "gl_code": {"type": "string"}
                                }
                            }
                        }
                    },
                    "required": ["vendor_name", "total_amount", "line_items"]
                }
            }
        },
        "messages": [
            {"role": "system", "content": "Extract invoice data from the OCR text. Return structured JSON."},
            {"role": "user", "content": "<OCR markdown text here>"}
        ]
    }
)

Key OpenRouter Features to Use:

  1. response_format: json_schema — Forces valid JSON matching your exact Pydantic schema
  2. Provider routing — OpenRouter auto-routes to cheapest/fastest provider for each model
  3. Response healing plugin — Auto-fixes malformed JSON responses (safety net)
  4. Streaming — Supported for structured outputs too

Tiered Strategy for Production:

  1. Tier 1 (default): Gemini 2.0 Flash — handles 95% of invoices
  2. Tier 2 (complex): If extraction confidence < 80%, retry with DeepSeek V3.1 or Qwen3-235B
  3. Tier 3 (manual): If both fail, flag for human review

Sources