Backed byCombinator

Supercharge LLM performance
by removing context bloat

The bear-1.1 compression model removes context bloat from your prompts before they hit your LLM. Drop-in API middleware that integrates in minutes. (we measured it 🙂)

66%Fewer tokens
3xCost reduction
+1.1%Accuracy gain
<0.1sPer 10K tokens

Intelligent semantic processing

The bear-1 and bear-1.1 models process tokens based on context and semantic intent. bear-1.1 is the latest version with improved accuracy.

In its most fundamental sense, compression is the process of encoding
information using fewer bits or resources than the original representation
by identifying and eliminating statistical redundancies or irrelevant data
within a dataset. Whether applied to digital media, text, or the high-
dimensional vector spaces of Large Language Models, compression relies on
the principle that most raw information contains noise or repeating patterns
that do not contribute new meaning. By applying an algorithm—or in your
case, an ML-based model—to map the input data into a more compact form,
you essentially distil the signal from the noise. In the context of ML
inputs, this means transforming long-form text into a dense, mathematically
efficient representation that preserves the original semantic intent and
logical relationships while significantly reducing the physical token count,
thereby allowing a system to process more information within the same fixed
computational window or budget.
In its most fundamental sense, compression is the process of encoding
information using fewer bits or resources than the original representation
by identifying and eliminating statistical redundancies or irrelevant data
within a dataset. Whether applied to digital media, text, or the high-
dimensional vector spaces of Large Language Models, compression relies on
the principle that most raw information contains noise or repeating patterns
that do not contribute new meaning. By applying an algorithm—or in your
case, an ML-based model—to map the input data into a more compact form,
you essentially distil the signal from the noise. In the context of ML
inputs, this means transforming long-form text into a dense, mathematically
efficient representation that preserves the original semantic intent and
logical relationships while significantly reducing the physical token count,
thereby allowing a system to process more information within the same fixed
computational window or budget.
← Move to compare →

One API call

Send text in, get compressed text back. Drop it in before your LLM call. That's the entire integration.

POSTapi.thetokencompany.com/v1/compress
{
"model": "bear-1.1",
"input": "Your long text to compress..."
}
response
{
"output": "Compressed text...",
"original_input_tokens": 1284,
"output_tokens": 436
}

$0.05 per 1M compressed tokens · Available models: bear-1, bear-1.1

Use cases

Chat applications

Expand conversation history by 3x within the same context window. Process input to increase context quality.

Document processing

Process web scrapes, PDFs, and large documents without bloated inputs.

Backed by

the founders and operators of

Silo
Wolt
Y Combinator
Supercell
Hugging Face
SVA

Ready to compress?

Access the compression API.