TokenBatcher

Token-aware batcher for LLM inference workloads. A thin convenience wrapper around DynamicBatcher that accepts token-specific parameter names (inference_fn, token_estimator, target_batch_tokens, etc.) and maps them to the base class. Also checks the TokenEstimator protocol (estimate_tokens()) in addition to CostEstimator (estimate_cost()).

Constructor

Signature

def TokenBatcher(
    inference_fn: ProcessFn[RecordT, ResultT]| None = None,
    process_fn: ProcessFn[RecordT, ResultT]| None = None,
    token_estimator: CostEstimatorFn[RecordT]| None = None,
    cost_estimator: CostEstimatorFn[RecordT]| None = None,
    target_batch_tokens: int | None = None,
    target_batch_cost: int = 32000,
    default_token_estimate: int | None = None,
    default_cost: int = 1,
    max_batch_size: int = 256,
    min_batch_size: int = 1,
    batch_timeout_s: float = 0.05,
    max_queue_size: int = 5000,
    prefetch_batches: int = 2
) - > null

Parameters

Name	Type	Description
inference_fn	`ProcessFn[RecordT, ResultT]	None` = None
process_fn	`ProcessFn[RecordT, ResultT]	None` = None
token_estimator	`CostEstimatorFn[RecordT]	None` = None
cost_estimator	`CostEstimatorFn[RecordT]	None` = None
target_batch_tokens	`int	None` = None
target_batch_cost	`int` = 32000	The target cost for a batch (alternative to target_batch_tokens).
default_token_estimate	`int	None` = None
default_cost	`int` = 1	The default cost if no estimator is provided (alternative to default_token_estimate).
max_batch_size	`int` = 256	The maximum number of records in a batch.
min_batch_size	`int` = 1	The minimum number of records in a batch.
batch_timeout_s	`float` = 0.05	The maximum time to wait for a batch to fill, in seconds.
max_queue_size	`int` = 5000	The maximum number of records that can be queued for processing.
prefetch_batches	`int` = 2	The number of batches to prefetch.

Methods

`submit()`

@classmethod
def submit(
    record: RecordT,
    estimated_tokens: int | None = None,
    estimated_cost: int | None = None
) - > asyncio.Future[ResultT]

Submit a single record for batched inference. Accepts either estimated_tokens or estimated_cost.

Parameters

Name	Type	Description
record	`RecordT`	The input record.
estimated_tokens	`int	None` = None
estimated_cost	`int	None` = None

Returns

Type	Description
`asyncio.Future[ResultT]`	A future whose result is the corresponding entry from the list returned by the inference function.

Constructor​

Signature​

Parameters​

Methods​

submit()​

Parameters​

Returns​

Constructor

Signature

Parameters

Methods

`submit()`

Parameters

Returns