Advanced Python Types: Tuples, TypedDicts, and Enums
Flyte-sdk leverages modern Python typing to handle complex data structures like fixed-size sequences, structured dictionaries, and categorical values. By using standard Python types, flyte-sdk ensures type safety, IDE autocompletion, and automatic serialization via Pydantic.
Typed Tuples
When you need to pass or return a fixed-size sequence of values with specific types, use Python's tuple[T1, T2, ...] syntax. flyte-sdk requires explicit typing for tuples; bare tuple (untyped) is not supported.
import flyte
env = flyte.TaskEnvironment(name="tuple_example")
@env.task
async def create_tuple() -> tuple[int, str, float]:
"""Create and return a typed tuple."""
return (42, "hello", 3.14)
@env.task
async def process_tuple(data: tuple[int, str, float]) -> str:
"""Process a typed tuple input."""
num, text, decimal = data
return f"Number: {num}, Text: {text}, Decimal: {decimal}"
As seen in examples/basics/types/tuple_types.py, you can also nest tuples or include dataclasses within them:
@env.task
async def nested_tuple() -> tuple[tuple[int, int], str]:
return ((1, 2), "nested")
NamedTuples for Readable Outputs
For tasks that return multiple values, NamedTuple is often preferred over plain tuples because it provides named access to fields and better IDE support.
from typing import NamedTuple
import flyte
env = flyte.TaskEnvironment(name="namedtuple_example")
class ModelMetrics(NamedTuple):
accuracy: float
precision: float
recall: float
f1_score: float = 0.0 # Supports default values
@env.task
async def calculate_metrics() -> ModelMetrics:
return ModelMetrics(accuracy=0.95, precision=0.92, recall=0.93, f1_score=0.94)
@env.task
async def report_metrics(metrics: ModelMetrics) -> str:
# Access fields by name
return f"Accuracy: {metrics.accuracy:.2%}, F1: {metrics.f1_score:.2%}"
Structured Data with TypedDict
TypedDict allows you to define the structure of dictionaries used in your workflows. This is particularly useful for JSON-like data or configurations.
Optional Fields with NotRequired
In flyte-sdk, fields marked as NotRequired are truly absent from the dictionary if not provided, rather than being set to None. This behavior is critical for logic that uses the in operator to check for field existence.
from typing import List, TypedDict
from typing_extensions import NotRequired
import flyte
class ToolCall(TypedDict):
name: str
args: dict
class AIResponse(TypedDict):
content: str
role: str
tool_calls: NotRequired[List[ToolCall]]
@env.task
async def process_ai_response(response: AIResponse) -> bool:
# This check works correctly because NotRequired fields are absent if not provided
has_tool_calls = "tool_calls" in response
return has_tool_calls
Recursive and Complex TypedDicts
flyte-sdk supports self-referential (recursive) TypedDict structures, which are useful for representing trees or nested hierarchies.
class TreeNode(TypedDict):
value: str
children: NotRequired[List[TreeNode]]
@env.task
async def create_tree() -> TreeNode:
return TreeNode(
value="root",
children=[
TreeNode(value="child1"),
TreeNode(value="child2", children=[TreeNode(value="grandchild")]),
],
)
You can also nest complex Flyte types like File, Dir, or DataFrame inside a TypedDict, as demonstrated in examples/basics/types/typeddict_types_complex.py:
from flyte.io import DataFrame, File
class DatasetPackage(TypedDict):
metadata_file: File
data: DataFrame
version: str
Categorical Inputs with Enums
Use Python's enum.Enum to define a fixed set of valid values for a task input. flyte-sdk handles the serialization of these values, typically using their string representation.
import enum
import flyte
class Status(enum.Enum):
PENDING = "pending"
COMPLETED = "completed"
FAILED = "failed"
@env.task
def update_status(status: Status) -> str:
# Access both .name and .value
return f"Processing status: {status.value} ({status.name})"
Troubleshooting and Gotchas
Bare Tuples
If you use a bare tuple without type parameters (e.g., def task(data: tuple):), flyte-sdk will fail to serialize the data. Always provide explicit types like tuple[int, ...] or use a List[int] if the size is dynamic.
Cache Persistence for Recursive Types
When using self-referential TypedDicts (like the TreeNode example), flyte-sdk implements internal fixes to ensure that these recursive structures are correctly handled during cache persistence. If you encounter issues with deep nesting in cached tasks, ensure you are using the latest version of flyte-sdk.
NotRequired vs Optional
In a TypedDict, Optional[T] means the key must exist but its value can be None. NotRequired[T] means the key may be missing entirely. flyte-sdk respects this distinction, which is important for downstream tasks that validate dictionary keys.