Generation Module
LLM generation utilities for unified API access and response management.
This module provides comprehensive tools for LLM generation across multiple providers.
- Usage:
# Common from kerb.generation import generate, generate_stream, Generator
# Providers from kerb.generation.providers import (
OpenAIGenerator, AnthropicGenerator, GoogleGenerator,
)
# Utilities from kerb.generation.utils import retry_with_exponential_backoff, batch_generate
- kerb.generation.generate(messages, model=None, config=None, api_key=None, provider=None, use_cache=True, cost_tracker=None, track_cost=False, rate_limiter=None, max_retries=3, **kwargs)[source]
Universal generator function - generate responses from any LLM provider.
This is the main generation function that routes to the appropriate provider based on the model and provider parameters.
- Parameters:
messages (
Union[List[Message],List[Dict[str,str]],str]) – Input messages (can be string, list of dicts, or list of Message objects)model (
Union[str,ModelName,None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.config (
Optional[GenerationConfig]) – Generation configurationapi_key (
Optional[str]) – API key (if not provided, uses environment variable)provider (
Optional[LLMProvider]) – LLMProvider enum specifying which API to useuse_cache (
bool) – Whether to use response cachingcost_tracker (
Optional[CostTracker]) – Optional cost tracker instancetrack_cost (
bool) – Whether to track costs in global trackerrate_limiter (
Optional[RateLimiter]) – Optional rate limiter instancemax_retries (
int) – Maximum retry attempts for failed requests**kwargs – Additional config parameters
- Returns:
The generated response
- Return type:
Examples
>>> # Using ModelName enum >>> response = generate("Hello", model=ModelName.GPT_4O_MINI, provider=LLMProvider.OPENAI)
>>> # Using custom model name >>> response = generate("Hello", model="my-custom-gpt", provider=LLMProvider.OPENAI)
>>> # Different providers >>> response = generate("Hello", model=ModelName.CLAUDE_35_HAIKU, provider=LLMProvider.ANTHROPIC)
- kerb.generation.generate_stream(messages, model=None, config=None, api_key=None, provider=None, callback=None, **kwargs)[source]
Generate streaming response from any LLM provider.
- Parameters:
messages (
Union[List[Message],List[Dict[str,str]],str]) – Input messages (can be string, list of dicts, or list of Message objects)model (
Union[str,ModelName,None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.config (
Optional[GenerationConfig]) – Generation configurationapi_key (
Optional[str]) – API key (if not provided, uses environment variable)provider (
Optional[LLMProvider]) – LLMProvider enum specifying which API to usecallback (
Optional[Callable[[StreamChunk],None]]) – Optional callback function for each chunk**kwargs – Additional config parameters
- Yields:
StreamChunk – Chunks of the generated response
- kerb.generation.generate_batch(prompts, model=None, config=None, api_key=None, provider=None, max_concurrent=5, show_progress=False, **kwargs)[source]
Generate batch responses.
- Parameters:
prompts (
List[Union[str,List[Message]]]) – List of prompts to processmodel (
Union[str,ModelName,None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.config (
Optional[GenerationConfig]) – Generation configurationapi_key (
Optional[str]) – API key (if not provided, uses environment variable)provider (
Optional[LLMProvider]) – LLMProvider enum specifying which API to usemax_concurrent (
int) – Maximum concurrent requestsshow_progress (
bool) – Whether to show progress**kwargs – Additional config parameters
- Returns:
List of generated responses
- Return type:
- async kerb.generation.generate_async(messages, model=None, config=None, api_key=None, provider=None, use_cache=True, cost_tracker=None, track_cost=False, max_retries=3, **kwargs)[source]
Async generation.
- Parameters:
messages (
Union[List[Message],List[Dict[str,str]],str]) – Input messages (can be string, list of dicts, or list of Message objects)model (
Union[str,ModelName,None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.config (
Optional[GenerationConfig]) – Generation configurationapi_key (
Optional[str]) – API key (if not provided, uses environment variable)provider (
Optional[LLMProvider]) – LLMProvider enum specifying which API to useuse_cache (
bool) – Whether to use response cachingcost_tracker (
Optional[CostTracker]) – Optional cost tracker instancetrack_cost (
bool) – Whether to track costs in global trackermax_retries (
int) – Maximum retry attempts for failed requests**kwargs – Additional config parameters
- Returns:
The generated response
- Return type:
- class kerb.generation.Generator(model, api_key=None, provider=None, cost_tracker=None, **default_config)[source]
Bases:
objectUniversal LLM generator - easily switch between models and providers.
This class provides a convenient stateful interface for LLM generation with support for both enum-based and string-based model specification. It makes it easy to switch between different models and providers without changing your code structure.
Examples
>>> # Using ModelName enum >>> gen = Generator(model=ModelName.GPT_4O_MINI, provider=LLMProvider.OPENAI) >>> response = gen.generate("Hello!")
>>> # Using custom model name >>> gen = Generator(model="my-custom-model", provider=LLMProvider.OPENAI) >>> response = gen.generate("Hello!")
>>> # Easy model switching >>> gen_gpt = Generator(model=ModelName.GPT_4O_MINI, provider=LLMProvider.OPENAI, temperature=0.7) >>> gen_claude = Generator(model=ModelName.CLAUDE_35_HAIKU, provider=LLMProvider.ANTHROPIC, temperature=0.7)
- __init__(model, api_key=None, provider=None, cost_tracker=None, **default_config)[source]
Initialize the universal Generator.
- Parameters:
model (
Union[str,ModelName]) – Model to use (ModelName enum or string for custom models)api_key (
Optional[str]) – API key (if not provided, uses environment variable)provider (
Optional[LLMProvider]) – LLMProvider enum specifying which API to usecost_tracker (
Optional[CostTracker]) – Optional cost tracker instance**default_config – Default configuration parameters (temperature, max_tokens, etc.)
- class kerb.generation.GenerationConfig(model, temperature=0.7, max_tokens=None, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, stop_sequences=None, stream=False, n=1, logprobs=None, seed=None, response_format=None, tools=None, tool_choice=None, reasoning_level=None, reasoning_budget=None, enable_grounding=None, grounding_config=None)[source]
Bases:
objectConfiguration for LLM generation.
- __init__(model, temperature=0.7, max_tokens=None, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, stop_sequences=None, stream=False, n=1, logprobs=None, seed=None, response_format=None, tools=None, tool_choice=None, reasoning_level=None, reasoning_budget=None, enable_grounding=None, grounding_config=None)
- class kerb.generation.GenerationResponse(content, model, provider, usage, finish_reason=None, latency=0.0, cost=0.0, cached=False, metadata=<factory>, raw_response=None)[source]
Bases:
objectResponse from LLM generation.
Note: LLMProvider is imported lazily to avoid circular imports.
- __init__(content, model, provider, usage, finish_reason=None, latency=0.0, cost=0.0, cached=False, metadata=<factory>, raw_response=None)
- class kerb.generation.StreamChunk(content, finish_reason=None, model=None, metadata=<factory>)[source]
Bases:
objectRepresents a chunk from streaming generation.
- __init__(content, finish_reason=None, model=None, metadata=<factory>)
- class kerb.generation.Usage(prompt_tokens=0, completion_tokens=0, total_tokens=0)[source]
Bases:
objectToken usage information.
- __init__(prompt_tokens=0, completion_tokens=0, total_tokens=0)
- class kerb.generation.LLMProvider(*values)[source]
Bases:
EnumSupported LLM providers.
- OPENAI = 'openai'
- ANTHROPIC = 'anthropic'
- GOOGLE = 'google'
- LOCAL = 'local'
- class kerb.generation.ModelName(*values)[source]
Bases:
EnumSupported model names with their actual string identifiers.
- GPT_5_4 = 'gpt-5.4'
- GPT_5_4_PRO = 'gpt-5.4-pro'
- GPT_5_3 = 'gpt-5.3'
- GPT_5_2 = 'gpt-5.2'
- GPT_5 = 'gpt-5'
- GPT_5_MINI = 'gpt-5-mini'
- GPT_5_NANO = 'gpt-5-nano'
- GPT_4_1 = 'gpt-4.1'
- GPT_4_1_MINI = 'gpt-4.1-mini'
- GPT_4_1_NANO = 'gpt-4.1-nano'
- GPT_4O = 'gpt-4o'
- GPT_4O_MINI = 'gpt-4o-mini'
- O3_PRO = 'o3-pro'
- O3 = 'o3'
- O3_MINI = 'o3-mini'
- O1 = 'o1'
- O1_PRO = 'o1-pro'
- O1_MINI = 'o1-mini'
- O4_MINI = 'o4-mini'
- CLAUDE_OPUS_4_6 = 'claude-opus-4-6'
- CLAUDE_SONNET_4_6 = 'claude-sonnet-4-6'
- CLAUDE_HAIKU_4_5 = 'claude-haiku-4-5'
- CLAUDE_45_OPUS = 'claude-4.5-opus-20260212'
- CLAUDE_45_SONNET = 'claude-4.5-sonnet-20260212'
- CLAUDE_45_HAIKU = 'claude-4.5-haiku-20260212'
- CLAUDE_OPUS_4 = 'claude-opus-4'
- CLAUDE_SONNET_4 = 'claude-sonnet-4'
- CLAUDE_35_SONNET = 'claude-3-5-sonnet-20241022'
- CLAUDE_35_HAIKU = 'claude-3-5-haiku-20241022'
- GEMINI_3_1_PRO = 'gemini-3.1-pro-preview'
- GEMINI_3_1_FLASH_LITE = 'gemini-3.1-flash-lite-preview'
- GEMINI_3_PRO = 'gemini-3-pro'
- GEMINI_3_FLASH = 'gemini-3-flash'
- GEMINI_2_5_PRO = 'gemini-2.5-pro'
- GEMINI_2_5_FLASH = 'gemini-2.5-flash'
- GEMINI_2_5_FLASH_LITE = 'gemini-2.5-flash-lite'
- class kerb.generation.MessageRole(*values)[source]
Bases:
EnumMessage roles in conversations.
- SYSTEM = 'system'
- USER = 'user'
- ASSISTANT = 'assistant'
- FUNCTION = 'function'
- TOOL = 'tool'
- class kerb.generation.Message(role, content, timestamp=None, metadata=<factory>, name=None, function_call=None, tool_calls=None)[source]
Bases:
objectUniversal message representation for conversations.
Consolidates the Message classes from generation/ and memory/ packages to provide a single, consistent message representation.
- role
The role of the message sender (system, user, assistant, etc.)
- content
The message content
- timestamp
Optional ISO format timestamp (auto-generated if not provided)
- metadata
Additional metadata about the message
- name
Optional name for the message sender (used in function calling)
- function_call
Optional function call information (legacy)
- tool_calls
Optional list of tool calls
Examples
>>> # Simple user message >>> msg = Message(role="user", content="Hello!")
>>> # System message with enum role >>> msg = Message( ... role=MessageRole.SYSTEM, ... content="You are a helpful assistant" ... )
>>> # Message with metadata >>> msg = Message( ... role="assistant", ... content="Here's the answer", ... metadata={"model": "gpt-4o", "tokens": 150} ... )
- role: MessageRole | str
- __init__(role, content, timestamp=None, metadata=<factory>, name=None, function_call=None, tool_calls=None)
- kerb.generation.retry_with_exponential_backoff(func, max_retries=3, initial_delay=1.0, exponential_base=2.0, jitter=True, retryable_exceptions=(<class 'Exception'>, ))[source]
Retry a function with exponential backoff.
- Parameters:
- Return type:
- Returns:
Result from function
- Raises:
Last exception if all retries fail –
- async kerb.generation.async_retry_with_exponential_backoff(func, max_retries=3, initial_delay=1.0, exponential_base=2.0, jitter=True, retryable_exceptions=(<class 'Exception'>, ))[source]
Async version of retry with exponential backoff.
- Parameters:
- Return type:
- Returns:
Result from function
- Raises:
Last exception if all retries fail –
- kerb.generation.parse_json_response(response)[source]
Parse JSON from LLM response.
Handles markdown code blocks and other formatting.
- Parameters:
response (
Union[GenerationResponse,str]) – GenerationResponse or content string- Returns:
Parsed JSON
- Return type:
- Raises:
ValueError – If JSON cannot be parsed
Example
>>> response = generate("Return JSON", model="gpt-4o-mini", ... provider=LLMProvider.OPENAI, ... response_format={"type": "json_object"}) >>> data = parse_json_response(response)
- kerb.generation.validate_response(response, min_length=None, max_length=None, must_contain=None, must_not_contain=None, pattern=None)[source]
Validate LLM response against criteria.
- Parameters:
- Returns:
True if valid, False otherwise
- Return type:
Example
>>> response = generate("List 3 programming languages", ... model="gpt-4o-mini", provider=LLMProvider.OPENAI) >>> is_valid = validate_response(response, min_length=20, must_contain=["Python"])
- kerb.generation.format_messages(system=None, user=None, assistant=None, history=None)[source]
Format messages for generation.
- Parameters:
- Returns:
Formatted messages
- Return type:
Example
>>> messages = format_messages(system="You are helpful", user="What is Python?") >>> response = generate(messages, model="gpt-4o-mini", provider=LLMProvider.OPENAI)
- class kerb.generation.RateLimiter(requests_per_minute=60, tokens_per_minute=None)[source]
Bases:
objectSimple rate limiter for API requests.
- class kerb.generation.ResponseCache(max_size=1000, ttl=3600)[source]
Bases:
objectSimple in-memory cache for LLM responses.
- kerb.generation.get_cost_summary(cost_tracker=None)[source]
Get cost tracking summary.
- Parameters:
cost_tracker (
Optional[CostTracker]) – CostTracker instance. If None, uses global tracker.- Returns:
Cost summary with totals and per-model breakdown
- Return type:
Example
>>> generate("Hello", model="gpt-4o-mini", provider=LLMProvider.OPENAI, track_cost=True) >>> summary = get_cost_summary() >>> print(f"Total cost: ${summary['total_cost']}")
- kerb.generation.reset_cost_tracking(cost_tracker=None)[source]
Reset cost tracking.
- Parameters:
cost_tracker (
Optional[CostTracker]) – CostTracker instance. If None, resets global tracker.- Return type:
Example
>>> reset_cost_tracking()
- class kerb.generation.BaseProvider(api_key=None, **kwargs)[source]
Bases:
ABCBase class for LLM providers.
Custom providers should inherit from this class and implement the required methods.
- abstractmethod generate(messages, config)[source]
Generate a response.
- Parameters:
config (
GenerationConfig) – Generation configuration
- Return type:
- Returns:
GenerationResponse
- abstractmethod generate_stream(messages, config)[source]
Generate a streaming response.
- Parameters:
config (
GenerationConfig) – Generation configuration
- Yields:
StreamChunk
- abstractmethod async generate_async(messages, config)[source]
Generate a response asynchronously.
- Parameters:
config (
GenerationConfig) – Generation configuration
- Return type:
- Returns:
GenerationResponse
- validate_config(config)[source]
Validate configuration for this provider.
- Parameters:
config (
GenerationConfig) – Generation configuration- Returns:
True if valid
- Return type:
- Raises:
ValueError – If configuration is invalid
- kerb.generation.register_provider(name, provider)[source]
Register a custom provider.
- Parameters:
name (
str) – Provider name (used in model strings like “custom::model-name”)provider (
BaseProvider) – Provider instance
- Return type:
Examples
>>> from kerb.generation.base import register_provider >>> provider = MyCustomProvider(api_key="...") >>> register_provider("mycustom", provider) >>> # Now can use: generate(messages, model="mycustom::my-model")
- kerb.generation.get_provider(name)[source]
Get a registered provider by name.
- Parameters:
name (
str) – Provider name- Return type:
- Returns:
Provider instance or None if not found
- class kerb.generation.OpenAIGenerator(api_key=None, **kwargs)[source]
Bases:
objectOpenAI generator with simplified interface.
This is a convenience class for OpenAI-specific generation.
- generate(messages, model='gpt-4o-mini', **kwargs)[source]
Generate using OpenAI API.
- Parameters:
- Return type:
- Returns:
GenerationResponse
- class kerb.generation.AnthropicGenerator(api_key=None, **kwargs)[source]
Bases:
objectAnthropic generator with simplified interface.
This is a convenience class for Anthropic-specific generation.
- generate(messages, model='claude-3-5-haiku-20241022', **kwargs)[source]
Generate using Anthropic API.
- Parameters:
- Return type:
- Returns:
GenerationResponse
- class kerb.generation.GoogleGenerator(api_key=None, **kwargs)[source]
Bases:
objectGoogle Gemini generator with simplified interface.
This is a convenience class for Google Gemini-specific generation.
- generate(messages, model='gemini-2.5-flash', **kwargs)[source]
Generate using Google Gemini API.
- Parameters:
- Return type:
- Returns:
GenerationResponse
Unified LLM generation with multi-provider support.