VLLMShardArgs
Arguments for sharding a model using vLLM.
Attributes
| Attribute | Type | Description |
|---|---|---|
| tensor_parallel_size | int = 1 | Number of tensor parallel workers. |
| dtype | str = "auto" | Data type for model weights. |
| trust_remote_code | bool = true | Whether to trust remote code from HuggingFace. |
| max_model_len | `int | None` = null |
| file_pattern | `str | None` = DEFAULT_SHARD_PATTERN |
| max_file_size | int = 5 * 1024**3 | Maximum size for each sharded file. |
Methods
get_vllm_args()
@classmethod
def get_vllm_args(
model_path: str
) - > dict[str, Any]
Get arguments dict for vLLM LLM constructor.
Parameters
| Name | Type | Description |
|---|---|---|
| model_path | str | The path to the model to be loaded by vLLM. |
Returns
| Type | Description |
|---|---|
dict[str, Any] | The dictionary of arguments suitable for the vLLM LLM constructor. |