Skip to main content

View on GitHub

Source code and contributions welcome.

Installation

Install the SDK using pip:
pip install tokenc

Quick Start

Get started with just a few lines of code:
from tokenc import TokenClient

client = TokenClient(api_key="your-api-key")

response = client.compress_input(
    input="Your text that needs compression for optimal token usage.",
    aggressiveness=0.1
)

print(f"Compressed text: {response.output}")
print(f"Original tokens: {response.original_input_tokens}")
print(f"Compressed tokens: {response.output_tokens}")
print(f"Tokens saved: {response.tokens_saved}")
print(f"Compression ratio: {response.compression_ratio:.2f}x")

API Reference

TokenClient

Constructor:
TokenClient(api_key: str, base_url: str = ..., timeout: int = 30)
api_key
str
required
Your API key for authentication.
base_url
str
Base URL for the API.
timeout
int
default:"30"
Request timeout in seconds.

compress_input()

Compress text input for optimized LLM inference.
input
str
required
The text to compress.
model
str
default:"bear-1.2"
Model to use. bear-1.2 (recommended), bear-1.1, or bear-1.
aggressiveness
float
default:"0.5"
Compression intensity 0.0-1.0.
max_output_tokens
int | None
Maximum token count for output.
min_output_tokens
int | None
Minimum token count for output.
protect_json
bool
default:"false"
Prevents compressing JSON objects.
compression_settings
CompressionSettings | None
Custom settings object (alternative to individual params).
Returns: CompressResponse

CompressionSettings

Dataclass for compression configuration.
aggressiveness
float
Compression intensity 0.0-1.0.
max_output_tokens
int | None
Optional maximum output tokens.
min_output_tokens
int | None
Optional minimum output tokens.
protect_json
bool
default:"false"
Prevents compressing JSON objects.

CompressResponse

Dataclass for compression results with built-in metrics.
# CompressResponse attributes:
response.output                  # str: The compressed text
response.output_tokens           # int: Token count of compressed output
response.original_input_tokens   # int: Token count of original input
response.compression_time        # float: Time taken to compress (seconds)

# Computed properties:
response.tokens_saved            # int: Number of tokens saved
response.compression_ratio       # float: Ratio of original to compressed tokens
response.compression_percentage  # float: Percentage reduction in tokens

Examples

With OpenAI

Compress prompts before sending to OpenAI to reduce costs:
from tokenc import TokenClient
from openai import OpenAI

# Initialize clients
tc = TokenClient(api_key="your-ttc-api-key")
openai = OpenAI(api_key="your-openai-api-key")

# Your prompt
prompt = """
Please explain the process of photosynthesis in detail,
including the light-dependent and light-independent reactions,
the role of chlorophyll, and how plants convert CO2 and water
into glucose and oxygen. Thank you very much for your help!
"""

# Compress the prompt
compressed = tc.compress_input(
    input=prompt,
    aggressiveness=0.6
)

print(f"Compressed from {compressed.original_input_tokens} to {compressed.output_tokens} tokens")
print(f"Compression: {compressed.compression_percentage:.1f}%")

# Use compressed prompt with OpenAI
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": compressed.output}]
)

print(response.choices[0].message.content)

Using CompressionSettings

Use a settings object for more control:
from tokenc import TokenClient, CompressionSettings

client = TokenClient(api_key="your-api-key")

# Create custom compression settings
settings = CompressionSettings(
    aggressiveness=0.7,
    max_output_tokens=100,
    min_output_tokens=50,
    protect_json=False
)

response = client.compress_input(
    input="Your text here...",
    compression_settings=settings
)

print(f"Compression percentage: {response.compression_percentage:.1f}%")

Context Manager

Use as a context manager for automatic cleanup:
from tokenc import TokenClient

with TokenClient(api_key="your-api-key") as client:
    response = client.compress_input(
        input="Your text here...",
        aggressiveness=0.6
    )
    print(response.output)
# Session automatically closed

Compression Levels

Compare different compression levels:
from tokenc import TokenClient

client = TokenClient(api_key="your-api-key")

text = "Your long text here..."

# Light compression - preserve most content
light = client.compress_input(input=text, aggressiveness=0.2)

# Moderate compression - balanced approach
moderate = client.compress_input(input=text, aggressiveness=0.5)

# Aggressive compression - maximum savings
aggressive = client.compress_input(input=text, aggressiveness=0.8)

Error Handling

The SDK provides specific exception types for different error conditions:
from tokenc import (
    TokenClient,
    AuthenticationError,
    InvalidRequestError,
    RateLimitError,
    APIError
)

client = TokenClient(api_key="your-api-key")

try:
    response = client.compress_input(input="Your text...")
except AuthenticationError:
    print("Invalid API key")
except InvalidRequestError as e:
    print(f"Invalid request: {e}")
except RateLimitError:
    print("Rate limit exceeded, please wait")
except APIError as e:
    print(f"API error: {e}")
ExceptionDescription
AuthenticationErrorInvalid API key
InvalidRequestErrorInvalid request parameters
RateLimitErrorRate limit exceeded
APIErrorOther API errors

Aggressiveness Guide

Recommended: Start with 0.1 for most use cases. Increase gradually if you need more savings.
RangeLevelDescription
0.1–0.3LightRemoves only obvious filler, safe for all use cases
0.4–0.6ModerateGood balance of compression and quality
0.7–0.9AggressiveSignificant savings, best for cost-sensitive workloads