Skip to main content

Flyte SDK Internal Component Architecture

The Flyte SDK architecture is built around a modular structure that separates the user-facing CLI and remote client from the internal execution engine and type system.

Key Components

  • flyte.cli: The primary entry point for users, providing commands for running, deploying, and managing Flyte entities. It leverages the remote client and configuration system.
  • flyte.remote: A high-level client-side representation of Flyte entities (Apps, Runs, Tasks). It manages communication with the Flyte backend via a specialized ClientSet and handles async/sync bridging using syncify.
  • flyte.models: Defines the core data models and serialization contexts used throughout the SDK. It acts as the "lingua franca" for all other components.
  • flyte.types: Contains the TypeEngine and TypeTransformer system, which is responsible for marshalling between native Python types and Flyte's internal literal representation.
  • flyte._internal.runtime: The execution engine that handles task lifecycle, including input loading, execution, and output uploading. It is used both locally and within the cluster.
  • Distributed Compute Plugins: A collection of extensions that provide specialized task types (e.g., Spark, Dask, Ray) and type transformers. They integrate with the SDK via the flyte.extend layer.
  • ImageBuilder: Provides logic for building and caching Docker images, supporting both local and remote build backends.

Relationships and Flow

  1. User Interaction: Users interact with the CLI, which uses the Remote Client to trigger actions on the Flyte backend.
  2. Data Marshalling: Both the Remote Client and the Execution Engine rely on the Type System to convert data.
  3. Plugin Integration: Plugins extend the SDK's capabilities by implementing templates provided by the Extension Layer, which in turn interacts with the Execution Engine.
  4. Infrastructure: The Image Builder is used during deployment to package code and dependencies into container images, often coordinated by the Remote Client.
  5. Storage: The Storage abstraction provides a uniform interface for the Execution Engine to read and write data to various blob stores (S3, GCS, etc.).

Key Architectural Findings:

  • The SDK uses a 'syncify' layer to bridge async gRPC/ConnectRPC calls with synchronous user code.
  • 'flyte.models' is a central dependency used for data representation across all layers.
  • The 'TypeEngine' in 'flyte.types' is the core of Flyte's data marshalling, used by both the runtime and plugins.
  • 'flyte.remote' acts as a high-level facade over the lower-level 'ClientSet' which manages multiple service-specific gRPC clients.
  • Plugins are decoupled from the core SDK but integrate through a formal 'flyte.extend' registry.
  • Image building is an internal utility ('flyte._internal.imagebuild') that supports both local Docker builds and remote builds via the Flyte backend.
Loading diagram...