TokenBatcher
Token-aware batcher for LLM inference workloads. A thin convenience wrapper around DynamicBatcher that accepts token-specific parameter names (inference_fn, token_estimator, target_batch_tokens, etc.) and maps them to the base class. Also checks the TokenEstimator protocol (estimate_tokens()) in addition to CostEstimator (estimate_cost()).
Constructor
Signature
def TokenBatcher(
inference_fn: ProcessFn[RecordT, ResultT]| None = None,
process_fn: ProcessFn[RecordT, ResultT]| None = None,
token_estimator: CostEstimatorFn[RecordT]| None = None,
cost_estimator: CostEstimatorFn[RecordT]| None = None,
target_batch_tokens: int | None = None,
target_batch_cost: int = 32000,
default_token_estimate: int | None = None,
default_cost: int = 1,
max_batch_size: int = 256,
min_batch_size: int = 1,
batch_timeout_s: float = 0.05,
max_queue_size: int = 5000,
prefetch_batches: int = 2
) - > null
Parameters
| Name | Type | Description |
|---|---|---|
| inference_fn | `ProcessFn[RecordT, ResultT] | None` = None |
| process_fn | `ProcessFn[RecordT, ResultT] | None` = None |
| token_estimator | `CostEstimatorFn[RecordT] | None` = None |
| cost_estimator | `CostEstimatorFn[RecordT] | None` = None |
| target_batch_tokens | `int | None` = None |
| target_batch_cost | int = 32000 | The target cost for a batch (alternative to target_batch_tokens). |
| default_token_estimate | `int | None` = None |
| default_cost | int = 1 | The default cost if no estimator is provided (alternative to default_token_estimate). |
| max_batch_size | int = 256 | The maximum number of records in a batch. |
| min_batch_size | int = 1 | The minimum number of records in a batch. |
| batch_timeout_s | float = 0.05 | The maximum time to wait for a batch to fill, in seconds. |
| max_queue_size | int = 5000 | The maximum number of records that can be queued for processing. |
| prefetch_batches | int = 2 | The number of batches to prefetch. |
Methods
submit()
@classmethod
def submit(
record: RecordT,
estimated_tokens: int | None = None,
estimated_cost: int | None = None
) - > asyncio.Future[ResultT]
Submit a single record for batched inference. Accepts either estimated_tokens or estimated_cost.
Parameters
| Name | Type | Description |
|---|---|---|
| record | RecordT | The input record. |
| estimated_tokens | `int | None` = None |
| estimated_cost | `int | None` = None |
Returns
| Type | Description |
|---|---|
asyncio.Future[ResultT] | A future whose result is the corresponding entry from the list returned by the inference function. |