Skip to main content

Configuring Validation Error Handling

When a Pandera schema validation fails in a Flyte task, the default behavior is to raise an exception and fail the task. In flyte-sdk, you can use the ValidationConfig class to change this behavior, allowing tasks to log a warning and continue execution even when data does not strictly match the defined schema.

Configuring Validation Warnings

To prevent a task from failing on validation errors, apply ValidationConfig(on_error="warn") to your type hints using typing.Annotated. This is useful for debugging or for pipelines where data quality issues should be reported but not stop the entire workflow.

from typing import Annotated
import pandera.typing.pandas as pt
from flyteplugins.pandera import ValidationConfig
import flyte

@flyte.task
async def process_data(
# Task will log a warning if input df fails EmployeeSchema validation
df: Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")],
) -> Annotated[pt.DataFrame[EmployeeSchemaWithStatus], ValidationConfig(on_error="warn")]:
# If validation failed, df still contains the original data
processed_df = df.assign(status="active")
return processed_df

Configuration Options

The ValidationConfig class supports the on_error parameter with the following options:

  • "raise" (Default): The task fails immediately with a validation error if the data does not match the schema.
  • "warn": The validation error is logged, and an HTML report is generated in the Flyte UI, but the task continues execution with the unvalidated data.

Applying Configuration to Inputs and Outputs

You can configure validation behavior independently for task inputs and return values. This allows you to be strict about what your task produces while being lenient about what it accepts (or vice versa).

@flyte.task
async def flexible_input_strict_output(
# Warn on input errors
df: Annotated[pt.DataFrame[InputSchema], ValidationConfig(on_error="warn")],
) -> Annotated[pt.DataFrame[OutputSchema], ValidationConfig(on_error="raise")]:
# Task proceeds if input is invalid, but fails if return value is invalid
return transform(df)

Behavior in the Flyte UI

When on_error="warn" is used and a validation failure occurs:

  1. Task Status: The task remains in a SUCCEEDED state (provided no other errors occur).
  2. Logs: A warning message containing the validation error details is emitted to the task logs.
  3. Reports: If the task is configured with report=True (e.g., @flyte.task(report=True)), flyte-sdk still generates and attaches the Pandera validation report to the Flyte UI. This allows you to inspect exactly which rows or columns failed validation without stopping the pipeline.

Default Behavior

If you do not provide a ValidationConfig or use Annotated, flyte-sdk defaults to "raise". The following two definitions are functionally equivalent:

# Implicit default
@flyte.task
async def task_a(df: pt.DataFrame[MySchema]) -> pt.DataFrame[MySchema]:
...

# Explicit default
@flyte.task
async def task_b(
df: Annotated[pt.DataFrame[MySchema], ValidationConfig(on_error="raise")]
) -> Annotated[pt.DataFrame[MySchema], ValidationConfig(on_error="raise")]:
...