Generation Module

LLM generation utilities for unified API access and response management.

This module provides comprehensive tools for LLM generation across multiple providers.

Usage:

# Common from kerb.generation import generate, generate_stream, Generator

# Providers from kerb.generation.providers import (

OpenAIGenerator, AnthropicGenerator, GoogleGenerator,

)

# Utilities from kerb.generation.utils import retry_with_exponential_backoff, batch_generate

kerb.generation.generate(messages, model=None, config=None, api_key=None, provider=None, use_cache=True, cost_tracker=None, track_cost=False, rate_limiter=None, max_retries=3, **kwargs)[source]

Universal generator function - generate responses from any LLM provider.

This is the main generation function that routes to the appropriate provider based on the model and provider parameters.

Parameters:
  • messages (Union[List[Message], List[Dict[str, str]], str]) – Input messages (can be string, list of dicts, or list of Message objects)

  • model (Union[str, ModelName, None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.

  • config (Optional[GenerationConfig]) – Generation configuration

  • api_key (Optional[str]) – API key (if not provided, uses environment variable)

  • provider (Optional[LLMProvider]) – LLMProvider enum specifying which API to use

  • use_cache (bool) – Whether to use response caching

  • cost_tracker (Optional[CostTracker]) – Optional cost tracker instance

  • track_cost (bool) – Whether to track costs in global tracker

  • rate_limiter (Optional[RateLimiter]) – Optional rate limiter instance

  • max_retries (int) – Maximum retry attempts for failed requests

  • **kwargs – Additional config parameters

Returns:

The generated response

Return type:

GenerationResponse

Examples

>>> # Using ModelName enum
>>> response = generate("Hello", model=ModelName.GPT_4O_MINI, provider=LLMProvider.OPENAI)
>>> # Using custom model name
>>> response = generate("Hello", model="my-custom-gpt", provider=LLMProvider.OPENAI)
>>> # Different providers
>>> response = generate("Hello", model=ModelName.CLAUDE_35_HAIKU, provider=LLMProvider.ANTHROPIC)
kerb.generation.generate_stream(messages, model=None, config=None, api_key=None, provider=None, callback=None, **kwargs)[source]

Generate streaming response from any LLM provider.

Parameters:
  • messages (Union[List[Message], List[Dict[str, str]], str]) – Input messages (can be string, list of dicts, or list of Message objects)

  • model (Union[str, ModelName, None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.

  • config (Optional[GenerationConfig]) – Generation configuration

  • api_key (Optional[str]) – API key (if not provided, uses environment variable)

  • provider (Optional[LLMProvider]) – LLMProvider enum specifying which API to use

  • callback (Optional[Callable[[StreamChunk], None]]) – Optional callback function for each chunk

  • **kwargs – Additional config parameters

Yields:

StreamChunk – Chunks of the generated response

kerb.generation.generate_batch(prompts, model=None, config=None, api_key=None, provider=None, max_concurrent=5, show_progress=False, **kwargs)[source]

Generate batch responses.

Parameters:
  • prompts (List[Union[str, List[Message]]]) – List of prompts to process

  • model (Union[str, ModelName, None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.

  • config (Optional[GenerationConfig]) – Generation configuration

  • api_key (Optional[str]) – API key (if not provided, uses environment variable)

  • provider (Optional[LLMProvider]) – LLMProvider enum specifying which API to use

  • max_concurrent (int) – Maximum concurrent requests

  • show_progress (bool) – Whether to show progress

  • **kwargs – Additional config parameters

Returns:

List of generated responses

Return type:

List[GenerationResponse]

async kerb.generation.generate_async(messages, model=None, config=None, api_key=None, provider=None, use_cache=True, cost_tracker=None, track_cost=False, max_retries=3, **kwargs)[source]

Async generation.

Parameters:
  • messages (Union[List[Message], List[Dict[str, str]], str]) – Input messages (can be string, list of dicts, or list of Message objects)

  • model (Union[str, ModelName, None]) – Model to use (ModelName enum or string for custom models). If not provided, must be specified in config.

  • config (Optional[GenerationConfig]) – Generation configuration

  • api_key (Optional[str]) – API key (if not provided, uses environment variable)

  • provider (Optional[LLMProvider]) – LLMProvider enum specifying which API to use

  • use_cache (bool) – Whether to use response caching

  • cost_tracker (Optional[CostTracker]) – Optional cost tracker instance

  • track_cost (bool) – Whether to track costs in global tracker

  • max_retries (int) – Maximum retry attempts for failed requests

  • **kwargs – Additional config parameters

Returns:

The generated response

Return type:

GenerationResponse

class kerb.generation.Generator(model, api_key=None, provider=None, cost_tracker=None, **default_config)[source]

Bases: object

Universal LLM generator - easily switch between models and providers.

This class provides a convenient stateful interface for LLM generation with support for both enum-based and string-based model specification. It makes it easy to switch between different models and providers without changing your code structure.

Examples

>>> # Using ModelName enum
>>> gen = Generator(model=ModelName.GPT_4O_MINI, provider=LLMProvider.OPENAI)
>>> response = gen.generate("Hello!")
>>> # Using custom model name
>>> gen = Generator(model="my-custom-model", provider=LLMProvider.OPENAI)
>>> response = gen.generate("Hello!")
>>> # Easy model switching
>>> gen_gpt = Generator(model=ModelName.GPT_4O_MINI, provider=LLMProvider.OPENAI, temperature=0.7)
>>> gen_claude = Generator(model=ModelName.CLAUDE_35_HAIKU, provider=LLMProvider.ANTHROPIC, temperature=0.7)
__init__(model, api_key=None, provider=None, cost_tracker=None, **default_config)[source]

Initialize the universal Generator.

Parameters:
  • model (Union[str, ModelName]) – Model to use (ModelName enum or string for custom models)

  • api_key (Optional[str]) – API key (if not provided, uses environment variable)

  • provider (Optional[LLMProvider]) – LLMProvider enum specifying which API to use

  • cost_tracker (Optional[CostTracker]) – Optional cost tracker instance

  • **default_config – Default configuration parameters (temperature, max_tokens, etc.)

generate(messages, **kwargs)[source]

Generate a response.

Parameters:
Returns:

The generated response

Return type:

GenerationResponse

stream(messages, **kwargs)[source]

Generate a streaming response.

Parameters:
Yields:

StreamChunk – Chunks of the generated response

batch(prompts, **kwargs)[source]

Generate batch responses.

Parameters:
  • prompts (List[Union[str, List[Message]]]) – List of prompts to process

  • **kwargs – Override default config parameters

Returns:

List of generated responses

Return type:

List[GenerationResponse]

class kerb.generation.GenerationConfig(model, temperature=0.7, max_tokens=None, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, stop_sequences=None, stream=False, n=1, logprobs=None, seed=None, response_format=None, tools=None, tool_choice=None, reasoning_level=None, reasoning_budget=None, enable_grounding=None, grounding_config=None)[source]

Bases: object

Configuration for LLM generation.

model: str
temperature: float = 0.7
max_tokens: int | None = None
top_p: float = 1.0
frequency_penalty: float = 0.0
presence_penalty: float = 0.0
stop_sequences: List[str] | None = None
stream: bool = False
n: int = 1
logprobs: int | None = None
seed: int | None = None
response_format: Dict[str, Any] | None = None
tools: List[Dict[str, Any]] | None = None
tool_choice: str | Dict[str, Any] | None = None
reasoning_level: str | ReasoningLevel | None = None
reasoning_budget: int | None = None
enable_grounding: bool | None = None
grounding_config: Dict[str, Any] | None = None
__init__(model, temperature=0.7, max_tokens=None, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, stop_sequences=None, stream=False, n=1, logprobs=None, seed=None, response_format=None, tools=None, tool_choice=None, reasoning_level=None, reasoning_budget=None, enable_grounding=None, grounding_config=None)
class kerb.generation.GenerationResponse(content, model, provider, usage, finish_reason=None, latency=0.0, cost=0.0, cached=False, metadata=<factory>, raw_response=None)[source]

Bases: object

Response from LLM generation.

Note: LLMProvider is imported lazily to avoid circular imports.

content: str
model: str
provider: Any
usage: Usage
finish_reason: str | None = None
latency: float = 0.0
cost: float = 0.0
cached: bool = False
metadata: Dict[str, Any]
raw_response: Any | None = None
to_dict()[source]

Convert to dictionary format.

Return type:

Dict[str, Any]

__init__(content, model, provider, usage, finish_reason=None, latency=0.0, cost=0.0, cached=False, metadata=<factory>, raw_response=None)
class kerb.generation.StreamChunk(content, finish_reason=None, model=None, metadata=<factory>)[source]

Bases: object

Represents a chunk from streaming generation.

content: str
finish_reason: str | None = None
model: str | None = None
metadata: Dict[str, Any]
__init__(content, finish_reason=None, model=None, metadata=<factory>)
class kerb.generation.Usage(prompt_tokens=0, completion_tokens=0, total_tokens=0)[source]

Bases: object

Token usage information.

prompt_tokens: int = 0
completion_tokens: int = 0
total_tokens: int = 0
property cost: float

Calculate cost based on token usage (requires model pricing).

__init__(prompt_tokens=0, completion_tokens=0, total_tokens=0)
class kerb.generation.LLMProvider(*values)[source]

Bases: Enum

Supported LLM providers.

OPENAI = 'openai'
ANTHROPIC = 'anthropic'
GOOGLE = 'google'
LOCAL = 'local'
class kerb.generation.ModelName(*values)[source]

Bases: Enum

Supported model names with their actual string identifiers.

GPT_5_4 = 'gpt-5.4'
GPT_5_4_PRO = 'gpt-5.4-pro'
GPT_5_3 = 'gpt-5.3'
GPT_5_2 = 'gpt-5.2'
GPT_5 = 'gpt-5'
GPT_5_MINI = 'gpt-5-mini'
GPT_5_NANO = 'gpt-5-nano'
GPT_4_1 = 'gpt-4.1'
GPT_4_1_MINI = 'gpt-4.1-mini'
GPT_4_1_NANO = 'gpt-4.1-nano'
GPT_4O = 'gpt-4o'
GPT_4O_MINI = 'gpt-4o-mini'
O3_PRO = 'o3-pro'
O3 = 'o3'
O3_MINI = 'o3-mini'
O1 = 'o1'
O1_PRO = 'o1-pro'
O1_MINI = 'o1-mini'
O4_MINI = 'o4-mini'
CLAUDE_OPUS_4_6 = 'claude-opus-4-6'
CLAUDE_SONNET_4_6 = 'claude-sonnet-4-6'
CLAUDE_HAIKU_4_5 = 'claude-haiku-4-5'
CLAUDE_45_OPUS = 'claude-4.5-opus-20260212'
CLAUDE_45_SONNET = 'claude-4.5-sonnet-20260212'
CLAUDE_45_HAIKU = 'claude-4.5-haiku-20260212'
CLAUDE_OPUS_4 = 'claude-opus-4'
CLAUDE_SONNET_4 = 'claude-sonnet-4'
CLAUDE_35_SONNET = 'claude-3-5-sonnet-20241022'
CLAUDE_35_HAIKU = 'claude-3-5-haiku-20241022'
GEMINI_3_1_PRO = 'gemini-3.1-pro-preview'
GEMINI_3_1_FLASH_LITE = 'gemini-3.1-flash-lite-preview'
GEMINI_3_PRO = 'gemini-3-pro'
GEMINI_3_FLASH = 'gemini-3-flash'
GEMINI_2_5_PRO = 'gemini-2.5-pro'
GEMINI_2_5_FLASH = 'gemini-2.5-flash'
GEMINI_2_5_FLASH_LITE = 'gemini-2.5-flash-lite'
class kerb.generation.MessageRole(*values)[source]

Bases: Enum

Message roles in conversations.

SYSTEM = 'system'
USER = 'user'
ASSISTANT = 'assistant'
FUNCTION = 'function'
TOOL = 'tool'
class kerb.generation.Message(role, content, timestamp=None, metadata=<factory>, name=None, function_call=None, tool_calls=None)[source]

Bases: object

Universal message representation for conversations.

Consolidates the Message classes from generation/ and memory/ packages to provide a single, consistent message representation.

role

The role of the message sender (system, user, assistant, etc.)

content

The message content

timestamp

Optional ISO format timestamp (auto-generated if not provided)

metadata

Additional metadata about the message

name

Optional name for the message sender (used in function calling)

function_call

Optional function call information (legacy)

tool_calls

Optional list of tool calls

Examples

>>> # Simple user message
>>> msg = Message(role="user", content="Hello!")
>>> # System message with enum role
>>> msg = Message(
...     role=MessageRole.SYSTEM,
...     content="You are a helpful assistant"
... )
>>> # Message with metadata
>>> msg = Message(
...     role="assistant",
...     content="Here's the answer",
...     metadata={"model": "gpt-4o", "tokens": 150}
... )
role: MessageRole | str
content: str
timestamp: str | None = None
metadata: Dict[str, Any]
name: str | None = None
function_call: Dict[str, Any] | None = None
tool_calls: List[Dict[str, Any]] | None = None
__post_init__()[source]

Auto-generate timestamp if not provided.

to_dict()[source]

Convert message to dictionary format.

Return type:

Dict[str, Any]

Returns:

Dictionary representation suitable for API calls

classmethod from_dict(data)[source]

Create message from dictionary.

Parameters:

data (Dict[str, Any]) – Dictionary with message data

Return type:

Message

Returns:

New Message instance

__repr__()[source]

String representation of the message.

Return type:

str

__init__(role, content, timestamp=None, metadata=<factory>, name=None, function_call=None, tool_calls=None)
kerb.generation.retry_with_exponential_backoff(func, max_retries=3, initial_delay=1.0, exponential_base=2.0, jitter=True, retryable_exceptions=(<class 'Exception'>, ))[source]

Retry a function with exponential backoff.

Parameters:
  • func (Callable) – Function to retry

  • max_retries (int) – Maximum number of retries

  • initial_delay (float) – Initial delay in seconds

  • exponential_base (float) – Base for exponential backoff

  • jitter (bool) – Whether to add random jitter

  • retryable_exceptions (tuple) – Exceptions that trigger retry

Return type:

Any

Returns:

Result from function

Raises:

Last exception if all retries fail

async kerb.generation.async_retry_with_exponential_backoff(func, max_retries=3, initial_delay=1.0, exponential_base=2.0, jitter=True, retryable_exceptions=(<class 'Exception'>, ))[source]

Async version of retry with exponential backoff.

Parameters:
  • func (Callable) – Async function to retry

  • max_retries (int) – Maximum number of retries

  • initial_delay (float) – Initial delay in seconds

  • exponential_base (float) – Base for exponential backoff

  • jitter (bool) – Whether to add random jitter

  • retryable_exceptions (tuple) – Exceptions that trigger retry

Return type:

Any

Returns:

Result from function

Raises:

Last exception if all retries fail

kerb.generation.parse_json_response(response)[source]

Parse JSON from LLM response.

Handles markdown code blocks and other formatting.

Parameters:

response (Union[GenerationResponse, str]) – GenerationResponse or content string

Returns:

Parsed JSON

Return type:

Dict[str, Any]

Raises:

ValueError – If JSON cannot be parsed

Example

>>> response = generate("Return JSON", model="gpt-4o-mini",
...                     provider=LLMProvider.OPENAI,
...                     response_format={"type": "json_object"})
>>> data = parse_json_response(response)
kerb.generation.validate_response(response, min_length=None, max_length=None, must_contain=None, must_not_contain=None, pattern=None)[source]

Validate LLM response against criteria.

Parameters:
Returns:

True if valid, False otherwise

Return type:

bool

Example

>>> response = generate("List 3 programming languages",
...                     model="gpt-4o-mini", provider=LLMProvider.OPENAI)
>>> is_valid = validate_response(response, min_length=20, must_contain=["Python"])
kerb.generation.format_messages(system=None, user=None, assistant=None, history=None)[source]

Format messages for generation.

Parameters:
  • system (Optional[str]) – System message

  • user (Optional[str]) – User message

  • assistant (Optional[str]) – Assistant message (for few-shot examples)

  • history (Optional[List[Dict[str, str]]]) – Conversation history as list of {“role”: “…”, “content”: “…”}

Returns:

Formatted messages

Return type:

List[Message]

Example

>>> messages = format_messages(system="You are helpful", user="What is Python?")
>>> response = generate(messages, model="gpt-4o-mini", provider=LLMProvider.OPENAI)
kerb.generation.calculate_cost(model, usage)[source]

Calculate cost for a request.

Parameters:
  • model (Union[str, ModelName]) – Model name (as string or ModelName enum)

  • usage (Usage) – Token usage

Returns:

Cost in USD

Return type:

float

class kerb.generation.RateLimiter(requests_per_minute=60, tokens_per_minute=None)[source]

Bases: object

Simple rate limiter for API requests.

__init__(requests_per_minute=60, tokens_per_minute=None)[source]

Initialize rate limiter.

Parameters:
  • requests_per_minute (int) – Maximum requests per minute

  • tokens_per_minute (Optional[int]) – Maximum tokens per minute (optional)

wait_if_needed(estimated_tokens=0)[source]

Wait if rate limit would be exceeded.

Parameters:

estimated_tokens (int) – Estimated token count for this request

Return type:

None

class kerb.generation.ResponseCache(max_size=1000, ttl=3600)[source]

Bases: object

Simple in-memory cache for LLM responses.

__init__(max_size=1000, ttl=3600)[source]

Initialize response cache.

Parameters:
  • max_size (int) – Maximum number of cached responses

  • ttl (int) – Time to live in seconds

get(messages, config)[source]

Get cached response if available and not expired.

Return type:

Optional[GenerationResponse]

set(messages, config, response)[source]

Cache a response.

Return type:

None

class kerb.generation.CostTracker[source]

Bases: object

Track costs across LLM API calls.

__init__()[source]

Initialize cost tracker.

add_request(model, usage, cost)[source]

Record a request.

Return type:

None

get_summary()[source]

Get cost tracking summary.

Return type:

Dict[str, Any]

reset()[source]

Reset all tracking.

Return type:

None

kerb.generation.get_cost_summary(cost_tracker=None)[source]

Get cost tracking summary.

Parameters:

cost_tracker (Optional[CostTracker]) – CostTracker instance. If None, uses global tracker.

Returns:

Cost summary with totals and per-model breakdown

Return type:

Dict[str, Any]

Example

>>> generate("Hello", model="gpt-4o-mini", provider=LLMProvider.OPENAI, track_cost=True)
>>> summary = get_cost_summary()
>>> print(f"Total cost: ${summary['total_cost']}")
kerb.generation.reset_cost_tracking(cost_tracker=None)[source]

Reset cost tracking.

Parameters:

cost_tracker (Optional[CostTracker]) – CostTracker instance. If None, resets global tracker.

Return type:

None

Example

>>> reset_cost_tracking()
class kerb.generation.BaseProvider(api_key=None, **kwargs)[source]

Bases: ABC

Base class for LLM providers.

Custom providers should inherit from this class and implement the required methods.

__init__(api_key=None, **kwargs)[source]

Initialize provider.

Parameters:
  • api_key (Optional[str]) – API key (if None, will try to get from environment)

  • **kwargs – Provider-specific configuration

abstractmethod generate(messages, config)[source]

Generate a response.

Parameters:
Return type:

GenerationResponse

Returns:

GenerationResponse

abstractmethod generate_stream(messages, config)[source]

Generate a streaming response.

Parameters:
Yields:

StreamChunk

abstractmethod async generate_async(messages, config)[source]

Generate a response asynchronously.

Parameters:
Return type:

GenerationResponse

Returns:

GenerationResponse

validate_config(config)[source]

Validate configuration for this provider.

Parameters:

config (GenerationConfig) – Generation configuration

Returns:

True if valid

Return type:

bool

Raises:

ValueError – If configuration is invalid

kerb.generation.register_provider(name, provider)[source]

Register a custom provider.

Parameters:
  • name (str) – Provider name (used in model strings like “custom::model-name”)

  • provider (BaseProvider) – Provider instance

Return type:

None

Examples

>>> from kerb.generation.base import register_provider
>>> provider = MyCustomProvider(api_key="...")
>>> register_provider("mycustom", provider)
>>> # Now can use: generate(messages, model="mycustom::my-model")
kerb.generation.get_provider(name)[source]

Get a registered provider by name.

Parameters:

name (str) – Provider name

Return type:

Optional[BaseProvider]

Returns:

Provider instance or None if not found

kerb.generation.list_providers()[source]

List all registered provider names.

Return type:

List[str]

class kerb.generation.OpenAIGenerator(api_key=None, **kwargs)[source]

Bases: object

OpenAI generator with simplified interface.

This is a convenience class for OpenAI-specific generation.

__init__(api_key=None, **kwargs)[source]

Initialize OpenAI generator.

Parameters:
  • api_key (Optional[str]) – OpenAI API key (if None, uses OPENAI_API_KEY env var)

  • **kwargs – Additional configuration

generate(messages, model='gpt-4o-mini', **kwargs)[source]

Generate using OpenAI API.

Parameters:
  • messages (List[Message]) – Conversation messages

  • model (str) – Model name

  • **kwargs – Additional generation parameters

Return type:

GenerationResponse

Returns:

GenerationResponse

stream(messages, model='gpt-4o-mini', callback=None, **kwargs)[source]

Stream from OpenAI API.

Parameters:
Return type:

Iterator[StreamChunk]

Returns:

Iterator of StreamChunks

class kerb.generation.AnthropicGenerator(api_key=None, **kwargs)[source]

Bases: object

Anthropic generator with simplified interface.

This is a convenience class for Anthropic-specific generation.

__init__(api_key=None, **kwargs)[source]

Initialize Anthropic generator.

Parameters:
  • api_key (Optional[str]) – Anthropic API key (if None, uses ANTHROPIC_API_KEY env var)

  • **kwargs – Additional configuration

generate(messages, model='claude-3-5-haiku-20241022', **kwargs)[source]

Generate using Anthropic API.

Parameters:
  • messages (List[Message]) – Conversation messages

  • model (str) – Model name

  • **kwargs – Additional generation parameters

Return type:

GenerationResponse

Returns:

GenerationResponse

stream(messages, model='claude-3-5-haiku-20241022', callback=None, **kwargs)[source]

Stream from Anthropic API.

Parameters:
Return type:

Iterator[StreamChunk]

Returns:

Iterator of StreamChunks

class kerb.generation.GoogleGenerator(api_key=None, **kwargs)[source]

Bases: object

Google Gemini generator with simplified interface.

This is a convenience class for Google Gemini-specific generation.

__init__(api_key=None, **kwargs)[source]

Initialize Google Gemini generator.

Parameters:
  • api_key (Optional[str]) – Google API key (if None, uses GOOGLE_API_KEY env var)

  • **kwargs – Additional configuration

generate(messages, model='gemini-2.5-flash', **kwargs)[source]

Generate using Google Gemini API.

Parameters:
  • messages (List[Message]) – Conversation messages

  • model (str) – Model name

  • **kwargs – Additional generation parameters

Return type:

GenerationResponse

Returns:

GenerationResponse

stream(messages, model='gemini-2.5-flash', callback=None, **kwargs)[source]

Stream from Google Gemini API.

Parameters:
Return type:

Iterator[StreamChunk]

Returns:

Iterator of StreamChunks

Unified LLM generation with multi-provider support.