Case Study

One of the biggest token consumers globally improved quality by removing context bloat

Processing 193,000,000,000 tokens per month, Pax Historia set out to cut costs with bear-1.1 compression. Instead, they discovered it also improved user preference — and lifted purchase amounts by 5%.

February 2026268,327 votes193B tokens/monthbear-1.1

The story

Pax Historia is a conversational AI platform where users interact with historical figures. As one of the biggest token consumers on OpenRouter — processing 193 billion tokens per month — they came to us with a straightforward goal: reduce their LLM costs. Long conversation histories were driving up token counts, and they wanted to compress the context before sending it to their models.

Pax Historia

193B tokens/month on OpenRouter

We integrated bear-1.1 compression into their pipeline. The cost savings were immediate and expected. What we didn't expect was what happened next.

Pax Historia runs a model arena — a blind comparison where users vote on which response they prefer across different models. After accumulating over 268,000 votes, the results showed something remarkable: models running with bear-1.1 compression were consistently preferred over their uncompressed counterparts.

Arena results

The arena tested 27 models head-to-head with 268,327 total votes. Two models were tested with bear-1.1 compression: Claude Sonnet 4.5 at 0.2 compression and Gemini 3 Flash at 0.05 compression. Both ranked higher than their uncompressed versions.

With bear-1.1 compressionNo compression

Rank	Model	Score	Wins	Votes
#1	Claude Sonnet 4.6	+0.45	9,178	15,819
#2	Claude Sonnet 4.5 @ 0.2 bear-1.1	+0.43	12,848	22,179
#3	Claude Sonnet 4.5	+0.41	13,013	22,744
#4	Claude Opus 4.5	+0.36	11,561	20,416
#5	GLM 5	+0.36	8,415	14,872
#6	Claude Opus 4.6	+0.35	11,463	20,263
#7	Gemini 3 Flash @ 0.05 bear-1.1	+0.23	12,250	22,792
#8	Gemini 3 Flash	+0.20	12,122	22,819
#9	Qwen3.5 Plus	+0.18	8,050	15,424
#10	Gemini 3.1 Pro Preview	+0.15	7,634	14,611
#11	Gemini 3 Pro Preview	+0.15	11,847	22,699
#12	Gemini 2.5 Flash	+0.09	11,703	22,909

Compressed vs uncompressed

The most direct comparison: the same base model, with and without bear-1.1 compression. In both cases, the compressed version scored higher and ranked higher.

Claude Sonnet 4.5 went from #3 uncompressed (+0.41) to #2 with 0.2 compression (+0.43). That puts it above Claude Opus 4.5 and just 0.02 points behind the top model.

Gemini 3 Flash moved from #8 uncompressed (+0.20) to #7 with 0.05 compression (+0.23).

UncompressedWith bear-1.1

Compression improved user preference, not just efficiency

In blind comparisons across 268,327 votes, users consistently preferred responses from compressed models. Removing context bloat appears to help models generate more focused, higher-quality responses.

Win rates

Looking at raw win rates across all arena matchups reinforces the finding. Compressed versions won a slightly higher proportion of their matches than their uncompressed counterparts.

UncompressedWith bear-1.1

Business impact

The arena results motivated Pax Historia to run A/B tests on their production traffic, comparing compressed and uncompressed pipelines on actual user behavior — not just preference votes.

The result: purchase amounts went up 5% when using compressed context. Users who interacted with compressed-context responses spent more. The quality improvement measured in the arena translated directly to a business metric.

+5%

Purchase amount lift

A/B tested on production traffic

4.7–22%

Saved per request

Original goal: achieved alongside quality gains

Why does compression improve quality?

This isn't the first time we've seen compression improve output quality. Our FinanceBench evaluation showed a 2.7 percentage point accuracy improvement on financial QA when using bear-1.2 compression.

The mechanism is the same: LLM attention is a finite resource. When the context window is filled with boilerplate, formatting artifacts, and low-information tokens, the model spreads its attention across irrelevant content. Compression removes this noise, letting the model focus on what matters.

In Pax Historia's case, conversation histories accumulate system prompts, formatting, and repetitive phrasing. bear-1.1 compression strips these while preserving the semantic content that drives good responses. The result is a model that appears more coherent and focused to users — because its context actually is more coherent and focused.

Key findings

Compression improves user preference

In a 268K-vote blind arena, models with bear-1.1 compression ranked higher than their uncompressed counterparts on both Claude Sonnet 4.5 and Gemini 3 Flash.

Quality gains translate to business metrics

A/B testing on production traffic showed a 5% increase in purchase amounts when using compressed context. Better responses drive better outcomes.

Cost reduction and quality improvement are not a tradeoff

Pax Historia achieved both goals simultaneously: lower costs per request from fewer tokens, and higher quality from cleaner context. Compression is not a compromise — it's an optimization on both axes.

Ready to try it?

Create an account to get your API key and start compressing.

Get started Read the docs