Configuring Remote Storage Providers
flyte-sdk provides a unified storage interface for interacting with S3, Google Cloud Storage (GCS), and Azure Blob Storage (ABFS). It uses obstore for high-performance parallel IO and fsspec for broad compatibility.
Configuring S3 Storage
You can configure S3 storage by providing static credentials, using AWS profiles, or relying on IAM roles. The S3 class in flyte.storage manages these settings.
Using Environment Variables
The simplest way to initialize S3 is using the .auto() method, which automatically reads from environment variables like FLYTE_AWS_ACCESS_KEY_ID and FLYTE_AWS_SECRET_ACCESS_KEY.
import flyte
from flyte.storage import S3
# Automatically loads from environment variables
s3_storage = S3.auto(region="us-east-2")
flyte.init(
endpoint="dns:///localhost:8080",
storage=s3_storage
)
Local Development (Minio)
For local development using the Flyte sandbox (which uses Minio), use the .for_sandbox() helper. It defaults to http://localhost:4566 with standard sandbox credentials.
import flyte
from flyte.storage import S3
flyte.init(
endpoint="dns:///localhost:8090",
insecure=True,
storage=S3.for_sandbox(),
)
Authentication Resolution
The S3 provider resolves credentials in the following order:
- Static Credentials: Provided via
access_key_idandsecret_access_keyparameters. - AWS Profile: If
AWS_PROFILEandAWS_CONFIG_FILEare set, it usesboto3to resolve the profile. - IAM/Workload Identity: Falls back to the default AWS credential chain (e.g., IAM roles attached to the service account).
Configuring GCS Storage
GCS configuration typically relies on standard Google Application Credentials.
import flyte
from flyte.storage import GCS
# Uses GOOGLE_APPLICATION_CREDENTIALS environment variable
flyte.init(
endpoint="dns:///flyte.example.com",
storage=GCS.auto()
)
Configuring Azure Blob Storage (ABFS)
The ABFS class supports both account key and service principal authentication.
import flyte
from flyte.storage import ABFS
# Using Account Name and Key
abfs_storage = ABFS(
account_name="myaccount",
account_key="my-secret-key"
)
# Or using Service Principal
abfs_sp = ABFS(
account_name="myaccount",
tenant_id="...",
client_id="...",
client_secret="..."
)
flyte.init(storage=abfs_storage)
Global Storage Settings
The base Storage class provides parameters that apply to all providers, such as retry logic and debug logging.
from flyte.storage import S3
import datetime
storage = S3(
retries=5,
backoff=datetime.timedelta(seconds=10),
enable_debug=True
)
| Parameter | Environment Variable | Default |
|---|---|---|
retries | FLYTE_STORAGE_RETRIES | 3 |
backoff | FLYTE_STORAGE_BACKOFF_SECONDS | 5 |
enable_debug | FLYTE_STORAGE_DEBUG | False |
Interacting with Remote Storage
Once initialized, use the high-level async functions in flyte.storage to interact with data. These functions automatically use the configured provider and benefit from obstore performance optimizations.
Reading and Writing Files
Use get and put for transferring files between local and remote storage.
import flyte.storage as storage
# Download a file
local_path = await storage.get("s3://my-bucket/data.txt", to_path="/tmp/data.txt")
# Upload a directory recursively
remote_path = await storage.put("/my/local/dir", to_path="s3://my-bucket/remote-dir", recursive=True)
Streaming Data
For large files, use get_stream and put_stream to handle data without loading it entirely into memory.
import flyte.storage as storage
# Stream download
async for chunk in storage.get_stream("gs://my-bucket/large-file.bin"):
process(chunk)
# Stream upload
async def data_gen():
yield b"part 1"
yield b"part 2"
await storage.put_stream(data_gen(), to_path="abfs://container/stream.txt")
Direct File Access
The open function returns an async file handle compatible with obstore or fsspec.
import flyte.storage as storage
async with await storage.open("s3://my-bucket/config.yaml", mode="rb") as f:
content = await f.read()
Troubleshooting
Anonymous Access
If you need to access public buckets without credentials, you can pass anonymous=True to the storage utilities. This sets the skip_signature flag internally.
import flyte.storage as storage
# Check if a public file exists
exists = await storage.exists("s3://public-bucket/data.csv", anonymous=True)
S3 Addressing Style
If your S3-compatible storage requires virtual-hosted style requests, set the FLYTE_AWS_S3_ADDRESSING_STYLE environment variable to virtual. By default, it uses the provider's default (usually path-style for custom endpoints like Minio).
Obstore Performance
flyte-sdk bypasses standard fsspec implementations for S3, GCS, and ABFS to use obstore. If you encounter issues with custom fsspec plugins, ensure they do not conflict with the obstore protocols (s3, gs, abfs, abfss).