Cache Module

Cache management utilities for LLM applications.

This module provides comprehensive caching for LLM workflows:

Core Components:: CacheEntry - Single cache entry with metadata CacheStats - Cache statistics and metrics BaseCache - Base class for all cache implementations cached - Decorator to cache function results
Cache Backends:: MemoryCache - Fast in-memory LRU cache with TTL support DiskCache - Persistent disk-based cache with serialization TieredCache - Two-tier cache (memory + disk) for best performance LLMCache - High-level cache wrapper for LLM applications
Key Strategies:: generate_cache_key - Generate cache key from arguments generate_prompt_key - Generate key for LLM prompts generate_embedding_key - Generate key for embeddings
Utilities:: create_memory_cache - Create in-memory cache create_disk_cache - Create disk-based cache create_tiered_cache - Create two-tier cache create_llm_cache - Create LLM-specific cache invalidate_expired_entries - Clean up expired cache entries export_cache_stats - Export cache statistics estimate_cache_size - Estimate cache size in various units

The cache system provides: - LLM response caching by prompt hash - Embedding cache management - Automatic cache invalidation (TTL) - LRU eviction for memory management - Cost and time savings tracking - Multiple backend options (memory, disk, tiered) - Persistent storage for cross-session caching

Example

>>> from kerb.cache import cached, MemoryCache
>>> from kerb.cache.backends import LLMCache
>>> from kerb.cache.strategies import generate_prompt_key
>>>
>>> # Use decorator
>>> @cached(ttl=3600)
>>> def expensive_function(x):
...     return x * 2
>>>
>>> # Use cache directly
>>> cache = MemoryCache(max_size=100)
>>> cache.set("key", "value")
>>> cache.get("key")

class kerb.cache.CacheEntry(key, value, created_at, last_accessed, access_count=0, ttl=None, metadata=<factory>)[source]

Bases: object

Represents a single cache entry with metadata.

key: str

value: Any

created_at: float

last_accessed: float

access_count: int = 0

ttl: float | None = None

metadata: Dict[str, Any]

is_expired()[source]

Check if cache entry has expired.

Return type:: bool

to_dict()[source]

Convert cache entry to dictionary (excluding value for metadata).

Return type:: Dict[str, Any]

__init__(key, value, created_at, last_accessed, access_count=0, ttl=None, metadata=<factory>)

class kerb.cache.CacheStats(hits=0, misses=0, evictions=0, size=0, total_requests=0, estimated_cost_saved=0.0, estimated_time_saved=0.0)[source]

Bases: object

Cache statistics and metrics.

hits: int = 0

misses: int = 0

evictions: int = 0

size: int = 0

total_requests: int = 0

estimated_cost_saved: float = 0.0

estimated_time_saved: float = 0.0

property hit_rate: float: Calculate cache hit rate.

property miss_rate: float: Calculate cache miss rate.

to_dict()[source]

Convert stats to dictionary.

Return type:: Dict[str, Any]

__init__(hits=0, misses=0, evictions=0, size=0, total_requests=0, estimated_cost_saved=0.0, estimated_time_saved=0.0)

class kerb.cache.BaseCache(max_size=None, default_ttl=None)[source]

Bases: object

Base cache interface with common operations.

__init__(max_size=None, default_ttl=None)[source]

Initialize cache.

Parameters:

max_size (Optional[int]) – Maximum number of entries (None for unlimited)
default_ttl (Optional[float]) – Default time-to-live in seconds (None for no expiration)

get(key)[source]

Get value from cache.

Return type:: Optional[Any]

set(key, value, ttl=None, metadata=None)[source]

Set value in cache.

Return type:: None

delete(key)[source]

Delete key from cache. Returns True if key existed.

Return type:: bool

clear()[source]

Clear all cache entries.

Return type:: None

exists(key)[source]

Check if key exists in cache.

Return type:: bool

size()[source]

Get current cache size.

Return type:: int

keys()[source]

Get all cache keys.

Return type:: List[str]

class kerb.cache.MemoryCache(max_size=1000, default_ttl=None)[source]

Bases: BaseCache

In-memory cache with LRU eviction and TTL support.

__init__(max_size=1000, default_ttl=None)[source]

Initialize memory cache.

Parameters:

max_size (Optional[int]) – Maximum number of entries
default_ttl (Optional[float]) – Default TTL in seconds

Example

>>> cache = MemoryCache(max_size=100, default_ttl=3600)
>>> cache.set("key", "value")
>>> cache.get("key")
'value'

get(key)[source]

Get value from cache.

Return type:: Optional[Any]

set(key, value, ttl=None, metadata=None)[source]

Set value in cache.

Return type:: None

delete(key)[source]

Delete key from cache.

Return type:: bool

clear()[source]

Clear all cache entries.

Return type:: None

exists(key)[source]

Check if key exists and is not expired.

Return type:: bool

size()[source]

Get current cache size.

Return type:: int

keys()[source]

Get all cache keys.

Return type:: List[str]

get_entry(key)[source]

Get full cache entry with metadata.

Return type:: Optional[CacheEntry]

get_stats()[source]

Get cache statistics.

Return type:: CacheStats

reset_stats()[source]

Reset cache statistics.

Return type:: None

class kerb.cache.DiskCache(cache_dir='.cache', max_size=None, default_ttl=None, serializer='pickle')[source]

Bases: BaseCache

Persistent disk-based cache.

__init__(cache_dir='.cache', max_size=None, default_ttl=None, serializer='pickle')[source]

Initialize disk cache.

Parameters:

cache_dir (str) – Directory to store cache files
max_size (Optional[int]) – Maximum number of entries
default_ttl (Optional[float]) – Default TTL in seconds
serializer (str) – Serialization format (‘pickle’ or ‘json’)

Example

>>> cache = DiskCache(cache_dir=".cache/llm")
>>> cache.set("key", {"data": "value"})
>>> cache.get("key")
{'data': 'value'}

get(key)[source]

Get value from cache.

Return type:: Optional[Any]

set(key, value, ttl=None, metadata=None)[source]

Set value in cache.

Return type:: None

delete(key)[source]

Delete key from cache.

Return type:: bool

clear()[source]

Clear all cache entries.

Return type:: None

exists(key)[source]

Check if key exists and is not expired.

Return type:: bool

size()[source]

Get current cache size.

Return type:: int

keys()[source]

Get all cache keys.

Return type:: List[str]

get_stats()[source]

Get cache statistics.

Return type:: CacheStats

reset_stats()[source]

Reset cache statistics.

Return type:: None

class kerb.cache.TieredCache(memory_max_size=100, disk_cache_dir='.cache', disk_max_size=None, default_ttl=None)[source]

Bases: BaseCache

Two-tier cache: fast memory cache backed by persistent disk cache.

__init__(memory_max_size=100, disk_cache_dir='.cache', disk_max_size=None, default_ttl=None)[source]

Initialize tiered cache.

Parameters:

memory_max_size (int) – Maximum entries in memory cache
disk_cache_dir (str) – Directory for disk cache
disk_max_size (Optional[int]) – Maximum entries in disk cache
default_ttl (Optional[float]) – Default TTL in seconds

Example

>>> cache = TieredCache(memory_max_size=50, disk_cache_dir=".cache")
>>> cache.set("key", "value")
>>> cache.get("key")  # Fast memory access
'value'

get(key)[source]

Get value from cache (memory first, then disk).

Return type:: Optional[Any]

set(key, value, ttl=None, metadata=None)[source]

Set value in both caches.

Return type:: None

delete(key)[source]

Delete key from both caches.

Return type:: bool

clear()[source]

Clear both caches.

Return type:: None

exists(key)[source]

Check if key exists in either cache.

Return type:: bool

size()[source]

Get total unique keys across both caches.

Return type:: int

keys()[source]

Get all unique cache keys.

Return type:: List[str]

get_stats()[source]

Get statistics for both caches.

Return type:: Dict[str, CacheStats]

reset_stats()[source]

Reset statistics for both caches.

Return type:: None

class kerb.cache.LLMCache(backend=None, cost_per_token=1e-05, avg_tokens_per_request=1000, avg_response_time=2.0)[source]

Bases: object

High-level cache wrapper for LLM applications.

__init__(backend=None, cost_per_token=1e-05, avg_tokens_per_request=1000, avg_response_time=2.0)[source]

Initialize LLM cache.

Parameters:

backend (Optional[BaseCache]) – Cache backend to use (defaults to MemoryCache)
cost_per_token (float) – Cost per token for cost tracking
avg_tokens_per_request (int) – Average tokens per request
avg_response_time (float) – Average response time in seconds

Example

>>> cache = LLMCache()
>>> response = cache.get_or_compute(
...     key="prompt:123",
...     compute_fn=lambda: call_llm("What is AI?"),
...     cost=0.001
... )

cache_prompt(prompt, response, model=None, temperature=None, max_tokens=None, ttl=None, cost=None, **kwargs)[source]

Cache an LLM prompt and response.

Parameters:

prompt (str) – The prompt text
response (str) – The LLM response
model (Optional[str]) – Model name
temperature (Optional[float]) – Temperature setting
max_tokens (Optional[int]) – Max tokens setting
ttl (Optional[float]) – Time to live in seconds
cost (Optional[float]) – Actual cost of the request
**kwargs – Additional parameters

Returns:

The cache key

Return type:

str

Example

>>> key = cache.cache_prompt(
...     prompt="What is AI?",
...     response="AI is...",
...     model="gpt-4o",
...     cost=0.001
... )

get_cached_prompt(prompt, model=None, temperature=None, max_tokens=None, **kwargs)[source]

Get cached LLM response for a prompt.

Parameters:

prompt (str) – The prompt text
model (Optional[str]) – Model name
temperature (Optional[float]) – Temperature setting
max_tokens (Optional[int]) – Max tokens setting
**kwargs – Additional parameters

Returns:

Cached response or None

Return type:

Optional[str]

Example

>>> response = cache.get_cached_prompt(
...     prompt="What is AI?",
...     model="gpt-4o"
... )

cache_embedding(text, embedding, model=None, ttl=None, cost=None, **kwargs)[source]

Cache an embedding.

Parameters:

text (str) – The text that was embedded
embedding (List[float]) – The embedding vector
model (Optional[str]) – Model name
ttl (Optional[float]) – Time to live in seconds
cost (Optional[float]) – Actual cost of the request
**kwargs – Additional parameters

Returns:

The cache key

Return type:

str

Example

>>> key = cache.cache_embedding(
...     text="Hello world",
...     embedding=[0.1, 0.2, ...],
...     model="text-embedding-3-small",
...     cost=0.00001
... )

get_cached_embedding(text, model=None, **kwargs)[source]

Get cached embedding for text.

Parameters:

text (str) – The text to get embedding for
model (Optional[str]) – Model name
**kwargs – Additional parameters

Returns:

Cached embedding or None

Return type:

Optional[List[float]]

Example

>>> embedding = cache.get_cached_embedding(
...     text="Hello world",
...     model="text-embedding-3-small"
... )

get_or_compute(key, compute_fn, ttl=None, cost=None, metadata=None)[source]

Get from cache or compute if not found.

Parameters:

key (str) – Cache key
compute_fn (Callable[[], Any]) – Function to compute value if not cached
ttl (Optional[float]) – Time to live in seconds
cost (Optional[float]) – Cost of computing the value
metadata (Optional[Dict[str, Any]]) – Additional metadata

Returns:

Cached or computed value

Return type:

Any

Example

>>> result = cache.get_or_compute(
...     key="expensive:computation",
...     compute_fn=lambda: expensive_api_call(),
...     ttl=3600,
...     cost=0.01
... )

get_stats()[source]

Get cache statistics.

Return type:: CacheStats

clear()[source]

Clear all cache entries.

Return type:: None

invalidate_by_prefix(prefix)[source]

Invalidate all keys with a given prefix.

Parameters:: prefix (str) – Key prefix to invalidate
Returns:: Number of keys invalidated
Return type:: int

Example

>>> cache.invalidate_by_prefix("prompt:")
42

invalidate_by_pattern(pattern)[source]

Invalidate keys matching a pattern function.

Parameters:: pattern (Callable[[str], bool]) – Function that returns True for keys to invalidate
Returns:: Number of keys invalidated
Return type:: int

Example

>>> cache.invalidate_by_pattern(lambda k: "gpt-4o" in k)
15

kerb.cache.generate_cache_key(*args, prefix='', hash_algorithm='sha256', **kwargs)[source]

Generate a cache key from arguments.

Parameters:

*args – Positional arguments to include in key
prefix (str) – Optional prefix for the key
hash_algorithm (str) – Hash algorithm to use (sha256, md5, sha1)
**kwargs – Keyword arguments to include in key

Returns:

Generated cache key

Return type:

str

Example

>>> key = generate_cache_key("prompt text", model="gpt-4o", temp=0.7)
>>> key = generate_cache_key(prompt, prefix="llm", model=model)

kerb.cache.generate_prompt_key(prompt, model=None, temperature=None, max_tokens=None, **kwargs)[source]

Generate a cache key specifically for LLM prompts.

Parameters:

prompt (str) – The prompt text
model (Optional[str]) – Model name
temperature (Optional[float]) – Temperature setting
max_tokens (Optional[int]) – Max tokens setting
**kwargs – Additional parameters

Returns:

Cache key for the prompt

Return type:

str

Example

>>> key = generate_prompt_key("What is AI?", model="gpt-4o", temperature=0.7)

kerb.cache.generate_embedding_key(text, model=None, **kwargs)[source]

Generate a cache key specifically for embeddings.

Parameters:

text (str) – The text to embed
model (Optional[str]) – Model name
**kwargs – Additional parameters

Returns:

Cache key for the embedding

Return type:

str

Example

>>> key = generate_embedding_key("Hello world", model="text-embedding-3-small")

kerb.cache.cached(cache=None, ttl=None, key_fn=None, cost=None)[source]

Decorator to cache function results.

Parameters:

cache (Optional[BaseCache]) – Cache instance to use (creates LLMCache if None)
ttl (Optional[float]) – Time to live in seconds
key_fn (Optional[Callable[..., str]]) – Function to generate cache key from args/kwargs
cost (Optional[float]) – Cost of computing the function

Example

>>> @cached(ttl=3600)
... def expensive_computation(x, y):
...     return x + y

>>> from kerb.cache.strategies import generate_prompt_key
>>> @cached(key_fn=lambda prompt, **kw: generate_prompt_key(prompt, **kw))
... def call_llm(prompt, model="gpt-4o", **kwargs):
...     return make_api_call(prompt, model, **kwargs)

kerb.cache.create_memory_cache(max_size=1000, default_ttl=None)[source]

Create a new memory cache.

Parameters:

max_size (int) – Maximum number of entries
default_ttl (Optional[float]) – Default TTL in seconds

Returns:

New memory cache instance

Return type:

MemoryCache

Example

>>> cache = create_memory_cache(max_size=100, default_ttl=3600)

kerb.cache.create_disk_cache(cache_dir='.cache', max_size=None, default_ttl=None, serializer='pickle')[source]

Create a new disk cache.

Parameters:

cache_dir (str) – Directory to store cache files
max_size (Optional[int]) – Maximum number of entries
default_ttl (Optional[float]) – Default TTL in seconds
serializer (str) – Serialization format (‘pickle’ or ‘json’)

Returns:

New disk cache instance

Return type:

DiskCache

Example

>>> cache = create_disk_cache(cache_dir=".cache/llm", serializer="json")

kerb.cache.create_tiered_cache(memory_max_size=100, disk_cache_dir='.cache', disk_max_size=None, default_ttl=None)[source]

Create a new tiered cache (memory + disk).

Parameters:

memory_max_size (int) – Maximum entries in memory cache
disk_cache_dir (str) – Directory for disk cache
disk_max_size (Optional[int]) – Maximum entries in disk cache
default_ttl (Optional[float]) – Default TTL in seconds

Returns:

New tiered cache instance

Return type:

TieredCache

Example

>>> cache = create_tiered_cache(
...     memory_max_size=50,
...     disk_cache_dir=".cache/llm"
... )

kerb.cache.create_llm_cache(backend=None, cost_per_token=1e-05, avg_tokens_per_request=1000, avg_response_time=2.0)[source]

Create a new LLM-specific cache.

Parameters:

backend (Optional[BaseCache]) – Cache backend to use
cost_per_token (float) – Cost per token for tracking
avg_tokens_per_request (int) – Average tokens per request
avg_response_time (float) – Average response time in seconds

Returns:

New LLM cache instance

Return type:

LLMCache

Example

>>> cache = create_llm_cache(
...     backend=create_tiered_cache(),
...     cost_per_token=0.00002
... )

kerb.cache.invalidate_expired_entries(cache)[source]

Manually invalidate all expired entries in a cache.

Parameters:: cache (BaseCache) – Cache to clean
Returns:: Number of entries invalidated
Return type:: int

Example

>>> count = invalidate_expired_entries(cache)
>>> print(f"Removed {count} expired entries")

kerb.cache.export_cache_stats(cache, format='dict')[source]

Export cache statistics in various formats.

Parameters:

cache (Union[BaseCache, LLMCache]) – Cache to export stats from
format (Union[ExportFormat, str]) – Output format (ExportFormat enum or string: ‘dict’, ‘json’, ‘csv’, ‘table’)

Returns:

Statistics in requested format

Return type:

Union[Dict, str]

Examples

>>> from kerb.core.enums import ExportFormat
>>> stats = export_cache_stats(cache, format=ExportFormat.JSON)
>>> print(stats)

kerb.cache.estimate_cache_size(cache, unit='entries')[source]

Estimate cache size in various units.

Parameters:

cache (BaseCache) – Cache to measure
unit (Union[SizeUnit, str]) – Unit to measure in (SizeUnit enum or string: ‘entries’, ‘bytes’, ‘kb’, ‘mb’, ‘gb’)

Returns:

Size in requested unit

Return type:

Union[int, str]

Examples

>>> from kerb.core.enums import SizeUnit
>>> size = estimate_cache_size(cache, unit=SizeUnit.MB)
>>> print(f"Cache size: {size}")

Response and embedding caching to reduce costs and latency.