Backed byCombinator
Supercharge LLM performance
by removing context bloat
The bear-1.1 compression model removes context bloat from your prompts before they hit your LLM. Drop-in API middleware that integrates in minutes. (we measured it 🙂)
66%Fewer tokens
3xCost reduction
+1.1%Accuracy gain
<0.1sPer 10K tokens
Intelligent semantic processing
The bear-1 and bear-1.1 models process tokens based on context and semantic intent. bear-1.1 is the latest version with improved accuracy.
In its most fundamental sense, compression is the process of encoding
information using fewer bits or resources than the original representation
by identifying and eliminating statistical redundancies or irrelevant data
within a dataset. Whether applied to digital media, text, or the high-
dimensional vector spaces of Large Language Models, compression relies on
the principle that most raw information contains noise or repeating patterns
that do not contribute new meaning. By applying an algorithm—or in your
case, an ML-based model—to map the input data into a more compact form,
you essentially distil the signal from the noise. In the context of ML
inputs, this means transforming long-form text into a dense, mathematically
efficient representation that preserves the original semantic intent and
logical relationships while significantly reducing the physical token count,
thereby allowing a system to process more information within the same fixed
computational window or budget.
In its most fundamental sense, compression is the process of encoding
information using fewer bits or resources than the original representation
by identifying and eliminating statistical redundancies or irrelevant data
within a dataset. Whether applied to digital media, text, or the high-
dimensional vector spaces of Large Language Models, compression relies on
the principle that most raw information contains noise or repeating patterns
that do not contribute new meaning. By applying an algorithm—or in your
case, an ML-based model—to map the input data into a more compact form,
you essentially distil the signal from the noise. In the context of ML
inputs, this means transforming long-form text into a dense, mathematically
efficient representation that preserves the original semantic intent and
logical relationships while significantly reducing the physical token count,
thereby allowing a system to process more information within the same fixed
computational window or budget.
← Move to compare →
One API call
Send text in, get compressed text back. Drop it in before your LLM call. That's the entire integration.
POSTapi.thetokencompany.com/v1/compress
{
"model": "bear-1.1",
"input": "Your long text to compress..."
}
"model": "bear-1.1",
"input": "Your long text to compress..."
}
response
{
"output": "Compressed text...",
"original_input_tokens": 1284,
"output_tokens": 436
}
"output": "Compressed text...",
"original_input_tokens": 1284,
"output_tokens": 436
}
$0.05 per 1M compressed tokens · Available models: bear-1, bear-1.1
Use cases
Chat applications
Expand conversation history by 3x within the same context window. Process input to increase context quality.
Document processing
Process web scrapes, PDFs, and large documents without bloated inputs.
Backed by
the founders and operators of






Ready to compress?
Access the compression API.