Cache Module
Cache management utilities for LLM applications.
This module provides comprehensive caching for LLM workflows:
- Core Components:
CacheEntry - Single cache entry with metadata CacheStats - Cache statistics and metrics BaseCache - Base class for all cache implementations cached - Decorator to cache function results
- Cache Backends:
MemoryCache - Fast in-memory LRU cache with TTL support DiskCache - Persistent disk-based cache with serialization TieredCache - Two-tier cache (memory + disk) for best performance LLMCache - High-level cache wrapper for LLM applications
- Key Strategies:
generate_cache_key - Generate cache key from arguments generate_prompt_key - Generate key for LLM prompts generate_embedding_key - Generate key for embeddings
- Utilities:
create_memory_cache - Create in-memory cache create_disk_cache - Create disk-based cache create_tiered_cache - Create two-tier cache create_llm_cache - Create LLM-specific cache invalidate_expired_entries - Clean up expired cache entries export_cache_stats - Export cache statistics estimate_cache_size - Estimate cache size in various units
The cache system provides: - LLM response caching by prompt hash - Embedding cache management - Automatic cache invalidation (TTL) - LRU eviction for memory management - Cost and time savings tracking - Multiple backend options (memory, disk, tiered) - Persistent storage for cross-session caching
Example
>>> from kerb.cache import cached, MemoryCache
>>> from kerb.cache.backends import LLMCache
>>> from kerb.cache.strategies import generate_prompt_key
>>>
>>> # Use decorator
>>> @cached(ttl=3600)
>>> def expensive_function(x):
... return x * 2
>>>
>>> # Use cache directly
>>> cache = MemoryCache(max_size=100)
>>> cache.set("key", "value")
>>> cache.get("key")
- class kerb.cache.CacheEntry(key, value, created_at, last_accessed, access_count=0, ttl=None, metadata=<factory>)[source]
Bases:
objectRepresents a single cache entry with metadata.
- __init__(key, value, created_at, last_accessed, access_count=0, ttl=None, metadata=<factory>)
- class kerb.cache.CacheStats(hits=0, misses=0, evictions=0, size=0, total_requests=0, estimated_cost_saved=0.0, estimated_time_saved=0.0)[source]
Bases:
objectCache statistics and metrics.
- __init__(hits=0, misses=0, evictions=0, size=0, total_requests=0, estimated_cost_saved=0.0, estimated_time_saved=0.0)
- class kerb.cache.BaseCache(max_size=None, default_ttl=None)[source]
Bases:
objectBase cache interface with common operations.
- class kerb.cache.MemoryCache(max_size=1000, default_ttl=None)[source]
Bases:
BaseCacheIn-memory cache with LRU eviction and TTL support.
- __init__(max_size=1000, default_ttl=None)[source]
Initialize memory cache.
- Parameters:
Example
>>> cache = MemoryCache(max_size=100, default_ttl=3600) >>> cache.set("key", "value") >>> cache.get("key") 'value'
- class kerb.cache.DiskCache(cache_dir='.cache', max_size=None, default_ttl=None, serializer='pickle')[source]
Bases:
BaseCachePersistent disk-based cache.
- __init__(cache_dir='.cache', max_size=None, default_ttl=None, serializer='pickle')[source]
Initialize disk cache.
- Parameters:
Example
>>> cache = DiskCache(cache_dir=".cache/llm") >>> cache.set("key", {"data": "value"}) >>> cache.get("key") {'data': 'value'}
- class kerb.cache.TieredCache(memory_max_size=100, disk_cache_dir='.cache', disk_max_size=None, default_ttl=None)[source]
Bases:
BaseCacheTwo-tier cache: fast memory cache backed by persistent disk cache.
- __init__(memory_max_size=100, disk_cache_dir='.cache', disk_max_size=None, default_ttl=None)[source]
Initialize tiered cache.
- Parameters:
Example
>>> cache = TieredCache(memory_max_size=50, disk_cache_dir=".cache") >>> cache.set("key", "value") >>> cache.get("key") # Fast memory access 'value'
- class kerb.cache.LLMCache(backend=None, cost_per_token=1e-05, avg_tokens_per_request=1000, avg_response_time=2.0)[source]
Bases:
objectHigh-level cache wrapper for LLM applications.
- __init__(backend=None, cost_per_token=1e-05, avg_tokens_per_request=1000, avg_response_time=2.0)[source]
Initialize LLM cache.
- Parameters:
Example
>>> cache = LLMCache() >>> response = cache.get_or_compute( ... key="prompt:123", ... compute_fn=lambda: call_llm("What is AI?"), ... cost=0.001 ... )
- cache_prompt(prompt, response, model=None, temperature=None, max_tokens=None, ttl=None, cost=None, **kwargs)[source]
Cache an LLM prompt and response.
- Parameters:
- Returns:
The cache key
- Return type:
Example
>>> key = cache.cache_prompt( ... prompt="What is AI?", ... response="AI is...", ... model="gpt-4o", ... cost=0.001 ... )
- get_cached_prompt(prompt, model=None, temperature=None, max_tokens=None, **kwargs)[source]
Get cached LLM response for a prompt.
- Parameters:
- Returns:
Cached response or None
- Return type:
Example
>>> response = cache.get_cached_prompt( ... prompt="What is AI?", ... model="gpt-4o" ... )
- cache_embedding(text, embedding, model=None, ttl=None, cost=None, **kwargs)[source]
Cache an embedding.
- Parameters:
- Returns:
The cache key
- Return type:
Example
>>> key = cache.cache_embedding( ... text="Hello world", ... embedding=[0.1, 0.2, ...], ... model="text-embedding-3-small", ... cost=0.00001 ... )
- get_cached_embedding(text, model=None, **kwargs)[source]
Get cached embedding for text.
- Parameters:
- Returns:
Cached embedding or None
- Return type:
Example
>>> embedding = cache.get_cached_embedding( ... text="Hello world", ... model="text-embedding-3-small" ... )
- get_or_compute(key, compute_fn, ttl=None, cost=None, metadata=None)[source]
Get from cache or compute if not found.
- Parameters:
- Returns:
Cached or computed value
- Return type:
Example
>>> result = cache.get_or_compute( ... key="expensive:computation", ... compute_fn=lambda: expensive_api_call(), ... ttl=3600, ... cost=0.01 ... )
- invalidate_by_prefix(prefix)[source]
Invalidate all keys with a given prefix.
- Parameters:
prefix (
str) – Key prefix to invalidate- Returns:
Number of keys invalidated
- Return type:
Example
>>> cache.invalidate_by_prefix("prompt:") 42
- kerb.cache.generate_cache_key(*args, prefix='', hash_algorithm='sha256', **kwargs)[source]
Generate a cache key from arguments.
- Parameters:
- Returns:
Generated cache key
- Return type:
Example
>>> key = generate_cache_key("prompt text", model="gpt-4o", temp=0.7) >>> key = generate_cache_key(prompt, prefix="llm", model=model)
- kerb.cache.generate_prompt_key(prompt, model=None, temperature=None, max_tokens=None, **kwargs)[source]
Generate a cache key specifically for LLM prompts.
- Parameters:
- Returns:
Cache key for the prompt
- Return type:
Example
>>> key = generate_prompt_key("What is AI?", model="gpt-4o", temperature=0.7)
- kerb.cache.generate_embedding_key(text, model=None, **kwargs)[source]
Generate a cache key specifically for embeddings.
- Parameters:
- Returns:
Cache key for the embedding
- Return type:
Example
>>> key = generate_embedding_key("Hello world", model="text-embedding-3-small")
- kerb.cache.cached(cache=None, ttl=None, key_fn=None, cost=None)[source]
Decorator to cache function results.
- Parameters:
Example
>>> @cached(ttl=3600) ... def expensive_computation(x, y): ... return x + y
>>> from kerb.cache.strategies import generate_prompt_key >>> @cached(key_fn=lambda prompt, **kw: generate_prompt_key(prompt, **kw)) ... def call_llm(prompt, model="gpt-4o", **kwargs): ... return make_api_call(prompt, model, **kwargs)
- kerb.cache.create_memory_cache(max_size=1000, default_ttl=None)[source]
Create a new memory cache.
- Parameters:
- Returns:
New memory cache instance
- Return type:
Example
>>> cache = create_memory_cache(max_size=100, default_ttl=3600)
- kerb.cache.create_disk_cache(cache_dir='.cache', max_size=None, default_ttl=None, serializer='pickle')[source]
Create a new disk cache.
- Parameters:
- Returns:
New disk cache instance
- Return type:
Example
>>> cache = create_disk_cache(cache_dir=".cache/llm", serializer="json")
- kerb.cache.create_tiered_cache(memory_max_size=100, disk_cache_dir='.cache', disk_max_size=None, default_ttl=None)[source]
Create a new tiered cache (memory + disk).
- Parameters:
- Returns:
New tiered cache instance
- Return type:
Example
>>> cache = create_tiered_cache( ... memory_max_size=50, ... disk_cache_dir=".cache/llm" ... )
- kerb.cache.create_llm_cache(backend=None, cost_per_token=1e-05, avg_tokens_per_request=1000, avg_response_time=2.0)[source]
Create a new LLM-specific cache.
- Parameters:
- Returns:
New LLM cache instance
- Return type:
Example
>>> cache = create_llm_cache( ... backend=create_tiered_cache(), ... cost_per_token=0.00002 ... )
- kerb.cache.invalidate_expired_entries(cache)[source]
Manually invalidate all expired entries in a cache.
- Parameters:
cache (
BaseCache) – Cache to clean- Returns:
Number of entries invalidated
- Return type:
Example
>>> count = invalidate_expired_entries(cache) >>> print(f"Removed {count} expired entries")
- kerb.cache.export_cache_stats(cache, format='dict')[source]
Export cache statistics in various formats.
- Parameters:
- Returns:
Statistics in requested format
- Return type:
Examples
>>> from kerb.core.enums import ExportFormat >>> stats = export_cache_stats(cache, format=ExportFormat.JSON) >>> print(stats)
- kerb.cache.estimate_cache_size(cache, unit='entries')[source]
Estimate cache size in various units.
- Parameters:
- Returns:
Size in requested unit
- Return type:
Examples
>>> from kerb.core.enums import SizeUnit >>> size = estimate_cache_size(cache, unit=SizeUnit.MB) >>> print(f"Cache size: {size}")
Response and embedding caching to reduce costs and latency.