Implementing Pydantic for energy trade validation

Q: How do I stop Pydantic from silently coercing trade data types?

Set model_config = ConfigDict(strict=True, extra='forbid'). strict=True prevents integers from collapsing to floats and strings from implicitly parsing into dates, so a mistyped value fails loudly; extra='forbid' catches schema drift, an added or renamed upstream column, at ingress instead of letting a silent null corrupt reconciliation.

When a day-ahead confirmation arrives with price serialized as the string "31.1000000000004", a naive spring-forward timestamp, or a delivery_hour of 26, a hand-rolled if-check gate will coerce it silently and the break surfaces a week later as an untraceable counterparty invoice dispute — this page builds the typed Pydantic v2 contract that rejects all three at ingress. It is a concrete step in Schema Validation Frameworks, the gatekeeper stage of the Trade Ingestion & Matching Workflows domain that admits only records satisfying an explicit typed contract and diverts everything else to a dead-letter queue.

The entity diagram below shows the validated model composition built in this guide, a top-level EnergyTradePayload that embeds one TradeHeader and a list of IntervalPricing records, with the key fields enforced on each.

Prerequisites

Python packages: pydantic>=2.5 (the Rust-cored pydantic-core build) and the standard-library decimal, datetime, zoneinfo, and logging modules. No third-party numeric stack is required for the contract itself — the vectorized transformation that consumes accepted records belongs to Pandas for Trade Data Processing.
Data dependencies: raw trade payloads as dict objects decoded from JSON, XML, or a CSV row — whatever ETRM API Integration Patterns hands over. Financial fields must arrive as strings (not pre-parsed floats) so the Decimal cast is loss-free; a payload that already lost precision upstream cannot be repaired here.
Reference data: the market operator’s open settlement calendar and interval granularities (documented under ISO/RTO Data Format Standards), plus the LEI registry your counterparty_lei field is checked against.

Implementation

A production-grade contract begins with a base configuration that disables silent type coercion and forbids unexpected fields, then embeds a before validator that normalizes every timestamp to UTC regardless of the source format it arrives in. ISO market files routinely deliver timestamps in mixed shapes (2024-03-15T14:00:00Z, 2024-03-15 14:00:00-05:00, or a naive string), and settlement engines reject payloads lacking explicit UTC alignment, causing interval misalignment during netting. The complete TradeHeader model below embeds the validator directly so the class compiles as a standalone unit.

from pydantic import BaseModel, Field, field_validator, model_validator, ConfigDict
from datetime import datetime, timezone
from decimal import Decimal
from typing import Optional, List
import logging

logger = logging.getLogger("trade_validation")

class TradeHeader(BaseModel):
    model_config = ConfigDict(strict=True, extra="forbid")  # no silent coercion, no unknown fields
    trade_id: str = Field(..., min_length=16, max_length=36, pattern=r"^[A-Za-z0-9\-]+$")
    counterparty_lei: str = Field(..., pattern=r"^[A-Z0-9]{20}$")
    execution_timestamp: datetime
    product_type: str = Field(..., pattern=r"^(Physical|Financial|FTR|CRR)$")
    settlement_currency: str = Field(..., pattern=r"^[A-Z]{3}$")

    @field_validator("execution_timestamp", mode="before")
    @classmethod
    def normalize_utc(cls, v):
        # Accept Z-suffixed, offset-aware, and naive timestamps; emit tz-aware UTC.
        if isinstance(v, str):
            v = v.replace("Z", "+00:00")
            dt = datetime.fromisoformat(v)
            return dt.astimezone(timezone.utc)
        if isinstance(v, datetime):
            if v.tzinfo is None:
                return v.replace(tzinfo=timezone.utc)
            return v.astimezone(timezone.utc)
        return v

The strict=True directive ensures that integers do not silently coerce to floats and strings do not implicitly parse into dates, so a value that arrives in the wrong type fails loudly instead of being reshaped under you. This aligns with FERC and REMIT reporting requirements, where data lineage must remain mathematically verifiable.

Enforce financial precision in Decimal space

Financial precision drift is a critical failure vector. Python’s native float type introduces binary rounding artifacts that violate settlement tolerances, so every monetary and volumetric field is typed as Decimal with explicit bounds, and a before validator quantizes incoming numbers from their string form. Refer to the official Python decimal module documentation for context-aware rounding strategies in financial applications.

class IntervalPricing(BaseModel):
    model_config = ConfigDict(strict=True, extra="forbid")
    interval_start: datetime
    interval_end: datetime
    lmp_price: Decimal = Field(..., ge=Decimal("-1000.00"), le=Decimal("10000.00"))
    scheduled_mw: Decimal = Field(..., ge=Decimal("-500.000"), le=Decimal("5000.000"))
    loss_factor: Decimal = Field(default=Decimal("1.0000"), ge=Decimal("0.8000"), le=Decimal("1.2000"))

    @field_validator("lmp_price", "scheduled_mw", "loss_factor", mode="before")
    @classmethod
    def quantize_decimals(cls, v):
        # Route int/float through str() so the Decimal is exact, never an IEEE-754 artifact.
        if isinstance(v, (int, float)):
            return Decimal(str(v))
        return v

    @model_validator(mode="after")
    def validate_interval_bounds(self):
        if self.interval_end <= self.interval_start:
            raise ValueError("interval_end must be strictly after interval_start")
        if (self.interval_end - self.interval_start).total_seconds() not in (900, 1800, 3600):
            raise ValueError("interval must align with 15m, 30m, or 1h market granularity")
        return self

The lmp_price band deliberately admits negative values: congestion and oversupply routinely drive locational marginal prices below zero, and LMP itself decomposes as \( LMP_n = \lambda + \mu_n + \nu_n \) — the sum of energy, congestion, and loss components, any of which can push the nodal price negative. The band bounds only the physically absurd, not the legitimately negative.

Cross-field validation for settlement integrity

Energy trades require relational validation across nested objects. The top-level payload composes the header and the pricing schedule, then enforces currency rules that span both.

class EnergyTradePayload(BaseModel):
    model_config = ConfigDict(strict=True, extra="forbid", json_encoders={Decimal: str})
    header: TradeHeader
    pricing_schedule: List[IntervalPricing]
    settlement_id: Optional[str] = None

    @model_validator(mode="after")
    def validate_settlement_currency(self):
        if self.header.settlement_currency != "USD":
            raise ValueError("non-USD settlement requires explicit FX routing configuration")
        return self

Route the verdict with structured audit logging

Validated models do not exist in isolation; they are the foundational contract for downstream reconciliation. Wrap model_validate() in a handler that captures the field path, the violation code, and the raw payload for compliance reporting, so a rejection lands in the dead-letter store as a structured, machine-readable reason rather than a stack trace.

def ingest_trade_payload(raw_json: dict, source_system: str) -> EnergyTradePayload:
    try:
        payload = EnergyTradePayload.model_validate(raw_json)
    except Exception as exc:
        logger.error(
            "trade_validation_failed | error_type=%s | source=%s | raw_payload=%s",
            type(exc).__name__, source_system, raw_json, exc_info=True,
        )
        raise
    return payload

When payloads pass these checks they are serialized into immutable audit records before entering the matching stage, ensuring that matching algorithms operate on structurally sound data, reducing false-positive breaks, and accelerating exception resolution. The transport and pagination concerns that deliver these payloads at scale belong to Async Batch Processing Pipelines, whose workers can replay a batch safely because validation is stateless and idempotent.

Verification steps

Confirm the contract behaves before any record reaches the ledger:

Round-trip a clean payload. EnergyTradePayload.model_validate(good_dict) returns a model whose pricing_schedule length equals the number of input intervals and whose lmp_price values are Decimal, not float. model_dump(mode="json") must round-trip every monetary field back to a string, never a lossy float.
Pin each rejection. Assert that a naive-hour DST gap, an out-of-band LMP, an unknown extra field, and a non-USD currency each raise, so a refactor cannot quietly regress them.
Prove Decimal exactness. The classic float artifact must not survive the boundary.

import pytest
from decimal import Decimal
from pydantic import ValidationError

def test_price_stays_exact_decimal():
    row = IntervalPricing(
        interval_start="2026-03-08T05:00:00+00:00",
        interval_end="2026-03-08T06:00:00+00:00",
        lmp_price="31.10", scheduled_mw="5.000", loss_factor="1.0000",
    )
    assert row.lmp_price == Decimal("31.10")   # not 31.1000000000004

def test_unknown_field_is_schema_drift():
    with pytest.raises(ValidationError):
        TradeHeader(
            trade_id="TRADE-0000000000001", counterparty_lei="ABCDEFGHIJ1234567890",
            execution_timestamp="2026-03-08T05:00:00Z", product_type="Physical",
            settlement_currency="USD", surprise_column="x",  # extra=forbid must reject
        )

def test_misaligned_interval_is_rejected():
    with pytest.raises(ValidationError):
        IntervalPricing(
            interval_start="2026-03-08T05:00:00+00:00",
            interval_end="2026-03-08T05:07:00+00:00",   # 7 minutes: not a market granularity
            lmp_price="20.00", scheduled_mw="1.000",
        )

For a scaled workload, validate batches through a list comprehension or generator so pydantic-core — which delivers a 5–50x model-construction speedup over Pydantic v1 — stays hot without materializing every record at once. Assert record conservation per batch (ingested == accepted + dead_lettered) so a swallowed exception surfaces immediately.

Compliance note

This contract must be validated against the field-level obligations that make a record settleable, not merely well-formed. NAESB WEQ Business Practice Standards fix the transaction codes and quantity/price conventions the model must accept as canonical; REMIT / MiFID II RTS 22 attach LEI, ISO 8601 UTC timestamp, and standardized product-code obligations that the counterparty_lei pattern and the normalize_utc validator satisfy at the ingestion boundary rather than deferring to reporting time; and FERC recordkeeping under 18 CFR Part 125 requires every ingested transaction to be retained and reproducible. Satisfy the last by hashing and audit-logging each payload before transformation, so the retained record is provably the one received. Because a record that clears validation but fails a downstream regulatory field check is already a reporting breach, those checks belong here, embedded in the same contract that enforces structure — the ingestion boundary is also the compliance boundary.

Frequently Asked Questions

Why use Pydantic instead of plain JSON Schema for energy trade validation?

JSON Schema is a reliable baseline for structural and type checking, but it cannot express runtime coercion, nested-model composition, or the cross-field business rules energy settlement demands. A Pydantic model encodes strict typing, Decimal-precise financial parsing, and relational checks (currency spanning header and schedule, interval-granularity alignment) in one contract that coerces and validates in a single call, and emits structured errors that route straight to the dead-letter store.

How do I stop Pydantic from silently coercing trade data types?

Set model_config = ConfigDict(strict=True, extra="forbid"). strict=True prevents integers from collapsing to floats and strings from implicitly parsing into dates, so a mistyped value fails loudly; extra="forbid" catches schema drift — an added or renamed upstream column — at ingress instead of letting a silent null corrupt reconciliation.

Why validate financial fields as Decimal rather than float?

IEEE-754 floats cannot represent most decimal fractions exactly, so a price like 31.10 becomes 31.1000000000004 and accumulates error across thousands of summed intervals, breaching settlement tolerances. Typing lmp_price, scheduled_mw, and loss_factor as Decimal and routing incoming numbers through Decimal(str(v)) keeps every financial figure exact and audit-reproducible.

How should the contract handle daylight-saving timestamps?

Normalize every timestamp to a timezone-aware UTC value at the validation boundary with a before validator, then let interval arithmetic run entirely in UTC. This admits the fall-back hour cleanly and lets a nonexistent spring-forward wall-clock time be rejected explicitly rather than silently folded into the wrong settlement interval.

Implementing Pydantic for energy trade validation

Prerequisites #

Implementation #

Enforce financial precision in Decimal space #

Cross-field validation for settlement integrity #

Route the verdict with structured audit logging #

Verification steps #

Compliance note #

Frequently Asked Questions #

Why use Pydantic instead of plain JSON Schema for energy trade validation? #

How do I stop Pydantic from silently coercing trade data types? #

Why validate financial fields as Decimal rather than float? #

How should the contract handle daylight-saving timestamps? #

Related #