Skip to main content

Configuring Remote Storage Providers

flyte-sdk provides a unified storage interface for interacting with S3, Google Cloud Storage (GCS), and Azure Blob Storage (ABFS). It uses obstore for high-performance parallel IO and fsspec for broad compatibility.

Configuring S3 Storage

You can configure S3 storage by providing static credentials, using AWS profiles, or relying on IAM roles. The S3 class in flyte.storage manages these settings.

Using Environment Variables

The simplest way to initialize S3 is using the .auto() method, which automatically reads from environment variables like FLYTE_AWS_ACCESS_KEY_ID and FLYTE_AWS_SECRET_ACCESS_KEY.

import flyte
from flyte.storage import S3

# Automatically loads from environment variables
s3_storage = S3.auto(region="us-east-2")

flyte.init(
endpoint="dns:///localhost:8080",
storage=s3_storage
)

Local Development (Minio)

For local development using the Flyte sandbox (which uses Minio), use the .for_sandbox() helper. It defaults to http://localhost:4566 with standard sandbox credentials.

import flyte
from flyte.storage import S3

flyte.init(
endpoint="dns:///localhost:8090",
insecure=True,
storage=S3.for_sandbox(),
)

Authentication Resolution

The S3 provider resolves credentials in the following order:

  1. Static Credentials: Provided via access_key_id and secret_access_key parameters.
  2. AWS Profile: If AWS_PROFILE and AWS_CONFIG_FILE are set, it uses boto3 to resolve the profile.
  3. IAM/Workload Identity: Falls back to the default AWS credential chain (e.g., IAM roles attached to the service account).

Configuring GCS Storage

GCS configuration typically relies on standard Google Application Credentials.

import flyte
from flyte.storage import GCS

# Uses GOOGLE_APPLICATION_CREDENTIALS environment variable
flyte.init(
endpoint="dns:///flyte.example.com",
storage=GCS.auto()
)

Configuring Azure Blob Storage (ABFS)

The ABFS class supports both account key and service principal authentication.

import flyte
from flyte.storage import ABFS

# Using Account Name and Key
abfs_storage = ABFS(
account_name="myaccount",
account_key="my-secret-key"
)

# Or using Service Principal
abfs_sp = ABFS(
account_name="myaccount",
tenant_id="...",
client_id="...",
client_secret="..."
)

flyte.init(storage=abfs_storage)

Global Storage Settings

The base Storage class provides parameters that apply to all providers, such as retry logic and debug logging.

from flyte.storage import S3
import datetime

storage = S3(
retries=5,
backoff=datetime.timedelta(seconds=10),
enable_debug=True
)
ParameterEnvironment VariableDefault
retriesFLYTE_STORAGE_RETRIES3
backoffFLYTE_STORAGE_BACKOFF_SECONDS5
enable_debugFLYTE_STORAGE_DEBUGFalse

Interacting with Remote Storage

Once initialized, use the high-level async functions in flyte.storage to interact with data. These functions automatically use the configured provider and benefit from obstore performance optimizations.

Reading and Writing Files

Use get and put for transferring files between local and remote storage.

import flyte.storage as storage

# Download a file
local_path = await storage.get("s3://my-bucket/data.txt", to_path="/tmp/data.txt")

# Upload a directory recursively
remote_path = await storage.put("/my/local/dir", to_path="s3://my-bucket/remote-dir", recursive=True)

Streaming Data

For large files, use get_stream and put_stream to handle data without loading it entirely into memory.

import flyte.storage as storage

# Stream download
async for chunk in storage.get_stream("gs://my-bucket/large-file.bin"):
process(chunk)

# Stream upload
async def data_gen():
yield b"part 1"
yield b"part 2"

await storage.put_stream(data_gen(), to_path="abfs://container/stream.txt")

Direct File Access

The open function returns an async file handle compatible with obstore or fsspec.

import flyte.storage as storage

async with await storage.open("s3://my-bucket/config.yaml", mode="rb") as f:
content = await f.read()

Troubleshooting

Anonymous Access

If you need to access public buckets without credentials, you can pass anonymous=True to the storage utilities. This sets the skip_signature flag internally.

import flyte.storage as storage

# Check if a public file exists
exists = await storage.exists("s3://public-bucket/data.csv", anonymous=True)

S3 Addressing Style

If your S3-compatible storage requires virtual-hosted style requests, set the FLYTE_AWS_S3_ADDRESSING_STYLE environment variable to virtual. By default, it uses the provider's default (usually path-style for custom endpoints like Minio).

Obstore Performance

flyte-sdk bypasses standard fsspec implementations for S3, GCS, and ABFS to use obstore. If you encounter issues with custom fsspec plugins, ensure they do not conflict with the obstore protocols (s3, gs, abfs, abfss).