ETRM API Integration Patterns

Q: Should monetary and volume fields be parsed as float or Decimal?

Always Decimal. Binary float cannot represent most decimal fractions exactly, so float64 settlement arithmetic accumulates rounding error that surfaces as sub-cent invoice mismatches at month-end. Coerce vendor values via Decimal(str(v)) on ingest and keep those columns as object dtype through the pandas stage.

A vendor gateway that returns a 503 for ninety seconds at 07:00 market open should not be able to strand a day’s trade captures outside the reconciliation ledger — but a naive requests.get() in a for loop does exactly that, silently dropping confirmations while the settlement window ticks toward its cutoff. That is the failure mode this component exists to eliminate: transient connectivity and schema drift on the ETRM boundary leaking into downstream settlement math as missing legs, double-booked postings, or REMIT/EMIR reporting gaps. Within the Trade Ingestion & Matching Workflows domain, ETRM API integration owns the transport contract — how trade tickets, forward-curve adjustments, and settlement batches move off a vendor platform and into a canonical internal model with zero data loss and a reproducible audit trail. Done correctly, it treats every endpoint as an unreliable distributed system: guarded by a circuit breaker, retried with bounded backoff, and validated against a strict data contract before a single row reaches the matching engine.

The sequence below shows how the async ingestion client guards each fetch with a circuit breaker, retries transient failures with exponential backoff, and quarantines records that fail schema validation before they reach the DataFrame.

This component sits between the platform-level view described in ETRM System Architecture and the throughput layer owned by Async Batch Processing Pipelines. It is responsible for the connection itself — credentials, pagination, retry semantics, and the hand-off of raw payloads to the Schema Validation Frameworks that gate the reconciliation ledger.

Specification & Standards Reference

ETRM connectivity is not a free-form integration; the transport and the payload both answer to published standards, and conformance is what makes a sync auditable. Vendor REST surfaces authenticate with OAuth 2.0 client-credentials or short-lived JWTs, page results with cursor or offset semantics, and signal load state through Retry-After and RateLimit-* headers (RFC 6585, RFC 9110). On the data side, physical and financial trades must map to the field taxonomy that regulators expect: NAESB WGQ/WEQ business practice standards for gas and power transaction codes, FERC EQR quarterly reporting fields for wholesale sales, and the ISO/RTO market-specific record layouts covered in ISO/RTO Data Format Standards. The table below maps the transports a typical vendor exposes to the concern each one imposes on the ingestion client.

Transport	Typical use	Ordering guarantee	Primary failure mode	Client concern
Paginated REST (`GET /v2/trades`)	Trade capture, settlement batches	Cursor-ordered per page	`429`/`503` at market open	Backoff + cursor durability
WebSocket stream	Real-time confirmations	Sequence number per message	Silent disconnect, gap in seq	Heartbeat + gap detection
Batch SFTP drop	End-of-day statements	File-level, not record-level	Partial/late file arrival	Manifest + checksum verify
Webhook callback	Lifecycle events	None (at-least-once)	Duplicate delivery	Idempotency key dedupe

Regardless of transport, the ingestion layer normalizes every payload into one canonical internal model before matching logic runs. The credential and network-perimeter concerns underneath these transports — token rotation, mTLS, scope enforcement — belong to Building secure API gateways for ETRM sync; this page owns what happens once an authenticated connection is open.

Deterministic Data Contracts and Decimal-Safe Normalization

The contract is the gatekeeper. Before any transformation runs, each record is validated against a typed model that enforces mandatory fields — trade ID, counterparty LEI, product code, delivery period, volume, and pricing formula — and, critically, parses monetary and volumetric quantities as Decimal rather than binary float. Settlement arithmetic that runs on float64 accumulates rounding error that surfaces as sub-cent invoice mismatches at month-end close; the decimal module is mandatory for every quantity that feeds a financial posting. When a payload deviates from the contract, the record is quarantined to a dead-letter queue with a structured exception payload rather than allowed to corrupt the ledger. The deeper modeling patterns — nested composition, cross-field dependency checks, decimal precision by product class — are detailed in Schema Validation Frameworks.

from datetime import datetime
from decimal import Decimal

import pydantic


class TradeRecord(pydantic.BaseModel):
    trade_id: str
    counterparty_lei: str = pydantic.Field(min_length=20, max_length=20)
    product_code: str
    delivery_start: datetime
    delivery_end: datetime
    volume_mwh: Decimal          # parsed exactly, no binary-float drift
    price_usd: Decimal
    status: str

    @pydantic.field_validator("volume_mwh", "price_usd", mode="before")
    @classmethod
    def _as_decimal(cls, v):
        # Coerce vendor strings/numbers to Decimal via str() so we never route
        # a float through the binary mantissa on the way in.
        return Decimal(str(v))

Step-by-Step Implementation

The client is assembled as five composable stages. Each stage is independently testable and fails closed, so a fault in one never silently propagates a bad record into settlement.

Step 1 — Guard the endpoint with a circuit breaker

Blind retries against a degraded gateway cascade into system-wide latency. A circuit breaker fast-fails after a configurable error threshold, then probes the endpoint in a half-open state once the cooldown expires.

from datetime import datetime, timezone
from typing import Optional


class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5, recovery_timeout: float = 300.0):
        self.failure_count = 0
        self.last_failure_time: Optional[datetime] = None
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout

    def is_open(self) -> bool:
        if self.failure_count < self.failure_threshold:
            return False
        if self.last_failure_time and (
            datetime.now(timezone.utc) - self.last_failure_time
        ).total_seconds() > self.recovery_timeout:
            self.failure_count = 0  # transition to half-open
            return False
        return True

    def record_failure(self) -> None:
        self.failure_count += 1
        self.last_failure_time = datetime.now(timezone.utc)

    def record_success(self) -> None:
        self.failure_count = 0

Step 2 — Fetch a page with bounded backoff

The fetcher respects the vendor’s Retry-After header, retries only idempotent reads, and applies exponential delay with jitter so concurrent workers do not synchronize into a thundering herd. The delay for retry attempt $ n $ is

$$t_n = \min\bigl(t_{\max},; t_{\text{base}} \cdot 2^{,n}\bigr) + U(0, j)$$

where $ t_{\text{base}} $ is the base interval, $ t_{\max} $ the ceiling, and $ U(0, j) $ a uniform jitter term that de-correlates retries across the pool.

import asyncio
import logging
from typing import Dict, List

import aiohttp
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
    before_sleep_log,
)

logger = logging.getLogger("etrm.ingestion")
breaker = CircuitBreaker()


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=2, max=30),
    retry=retry_if_exception_type((aiohttp.ClientError, asyncio.TimeoutError)),
    before_sleep=before_sleep_log(logger, logging.WARNING),
    reraise=True,
)
async def fetch_etrm_batch(
    session: aiohttp.ClientSession, url: str, params: Dict
) -> List[Dict]:
    if breaker.is_open():
        raise RuntimeError("ETRM endpoint circuit is open. Deferring sync.")
    try:
        async with session.get(
            url, params=params, timeout=aiohttp.ClientTimeout(total=15)
        ) as resp:
            resp.raise_for_status()
            data = await resp.json()
    except (aiohttp.ClientError, asyncio.TimeoutError):
        # Record the failure so the breaker can trip after the threshold,
        # then re-raise to let tenacity apply backoff and retry.
        breaker.record_failure()
        raise
    breaker.record_success()
    return data.get("trades", [])

Token lifecycle, connection pooling, and the paginated sync loop that drives this fetcher are covered in depth in Automating ETRM sync with Python requests.

Step 3 — Validate and quarantine

Every raw record is coerced through the TradeRecord contract. Valid rows advance; structural defects divert to the dead-letter queue with the original payload preserved for remediation.

def validate_batch(raw_trades: List[Dict]) -> tuple[list[dict], list[dict]]:
    valid, quarantined = [], []
    for record in raw_trades:
        try:
            valid.append(TradeRecord(**record).model_dump())
        except pydantic.ValidationError as exc:
            logger.error("Quarantined malformed trade: %s", exc)
            quarantined.append({"payload": record, "errors": exc.errors()})
    return valid, quarantined  # ship `quarantined` to the DLQ topic

Step 4 — Normalize with vectorized, decimal-preserving pandas

Validated records enter transformation. Vectorized operations across tens of thousands of rows replace row-by-row iteration; the detailed performance techniques — categorical encodings, downcasting, chunked I/O — live in Pandas for Trade Data Processing. Decimal columns are held as object dtype so exact settlement arithmetic survives the DataFrame round-trip.

import pandas as pd


def normalize(validated: list[dict]) -> pd.DataFrame:
    lmp_df = pd.DataFrame(validated)
    if lmp_df.empty:
        return lmp_df

    lmp_df["product_code"] = lmp_df["product_code"].astype("category")

    # Coerce to a tz-aware column once, then derive views from it. model_dump()
    # yields native datetimes (object dtype), so .dt is unavailable until
    # pd.to_datetime runs.
    lmp_df["delivery_start"] = pd.to_datetime(lmp_df["delivery_start"], utc=True)
    lmp_df["delivery_window"] = lmp_df["delivery_start"].dt.tz_convert("US/Eastern")
    lmp_df["settlement_cycle"] = lmp_df["delivery_start"].dt.strftime("%Y-%m")

    # volume_mwh / price_usd stay Decimal (object dtype) — no downcast to float.
    return lmp_df.sort_values(["settlement_cycle", "trade_id"])

Step 5 — Persist with an idempotent upsert and audit hash

Network retries and overlapping polls will re-deliver the same trade. The persistence layer upserts on a composite key and stamps each record with a content hash so a replay is a no-op rather than a phantom posting.

import hashlib


def audit_key(record: dict) -> str:
    # Deterministic hash over the normalized identity fields. A re-delivered
    # trade produces the same key, so the upsert cannot double-book a leg.
    basis = f"{record['trade_id']}|{record['delivery_start']}|{record['volume_mwh']}"
    return hashlib.sha256(basis.encode()).hexdigest()


UPSERT_SQL = """
INSERT INTO settlement_ledger (trade_id, delivery_start, volume_mwh, price_usd, audit_hash)
VALUES (%(trade_id)s, %(delivery_start)s, %(volume_mwh)s, %(price_usd)s, %(audit_hash)s)
ON CONFLICT (trade_id, delivery_start) DO UPDATE
    SET volume_mwh = EXCLUDED.volume_mwh,
        price_usd  = EXCLUDED.price_usd,
        audit_hash = EXCLUDED.audit_hash
    WHERE settlement_ledger.audit_hash <> EXCLUDED.audit_hash;
"""

Edge Cases & Failure Modes

The failure modes below are the ones that turn a “successful” sync into a silent settlement discrepancy. Each needs explicit handling code, not a bare try/except.

Negative prices. A negative locational marginal price is a legitimate outcome of congestion and oversupply, not a validation error. Bound only physically impossible values; never clamp a negative LMP to zero, or curtailment settlements vanish.
DST boundaries. The spring-forward gap and fall-back overlap mean a naive local-time key collides or goes missing. Always normalize to UTC on ingest (Step 4) and derive local delivery windows from the tz-aware column, never the reverse.
Zero-volume intervals. A 0 MWh leg is valid (a scheduled-but-not-delivered position) and must survive to reconciliation; a null volume is a defect and routes to the DLQ. Distinguish them explicitly.
Stale telemetry. A 200 OK that returns a cursor pointing at yesterday’s page indicates a lagging replica. Track the max delivery timestamp seen and alert when a batch regresses.
Schema drift. A vendor silently adds or renames a field on a minor release. The contract must reject unknown-critical drift loudly rather than coerce; version the model and pin the expected schema hash.

from decimal import Decimal


def classify_volume(value) -> str:
    if value is None:
        return "quarantine"          # missing volume is a defect
    if Decimal(str(value)) == Decimal("0"):
        return "accept_zero_leg"     # scheduled, not delivered — keep it
    return "accept"


def is_price_plausible(price: Decimal) -> bool:
    # Admit negative LMPs; reject only non-finite / physically impossible values.
    floor, ceiling = Decimal("-1000"), Decimal("100000")  # $/MWh
    return floor <= price <= ceiling

Threshold & Alerting Configuration

The client’s behavior under stress is governed by a handful of tunable parameters, and each degradation tier routes to a distinct escalation path so a transient blip does not page the on-call desk while a genuine outage does. These thresholds share their tuning philosophy with Threshold Tuning & Alerts on the settlement side.

Signal	Warning tier	Critical tier	Escalation route
Consecutive fetch failures	≥ 3	≥ 5 (breaker opens)	Slack `#ingestion` → PagerDuty
DLQ quarantine rate	> 1% of batch	> 5% of batch	Data-quality on-call
Batch delivery lag	> 15 min	> 60 min (miss window)	Settlement operations lead
`429` rate-limit ratio	> 10% of requests	> 25% of requests	Vendor liaison + throttle down

QUARANTINE_WARN, QUARANTINE_CRIT = 0.01, 0.05


def quarantine_alert_tier(valid: list, quarantined: list) -> str:
    total = len(valid) + len(quarantined)
    if total == 0:
        return "ok"
    rate = len(quarantined) / total
    if rate > QUARANTINE_CRIT:
        return "critical"
    if rate > QUARANTINE_WARN:
        return "warning"
    return "ok"

Testing & Reconciliation Verification

Correctness is verified two ways: unit tests that pin each edge case, and a shadow calculation that re-derives settlement totals from the raw payloads independently of the production path, then diffs the two. A non-zero diff is a blocking gate before the ledger is published.

from decimal import Decimal


def test_negative_lmp_is_accepted():
    assert is_price_plausible(Decimal("-45.20")) is True


def test_null_volume_quarantined_zero_volume_kept():
    assert classify_volume(None) == "quarantine"
    assert classify_volume("0") == "accept_zero_leg"


def test_replay_is_idempotent():
    rec = {"trade_id": "T1", "delivery_start": "2026-07-03T00:00:00Z", "volume_mwh": "10"}
    assert audit_key(rec) == audit_key(dict(rec))  # same key ⇒ upsert no-op


def test_shadow_total_matches_ledger():
    # Re-sum volumes straight from validated payloads and compare to the
    # persisted ledger total; Decimal makes the equality exact.
    validated = [{"volume_mwh": Decimal("10.5")}, {"volume_mwh": Decimal("4.25")}]
    shadow = sum((r["volume_mwh"] for r in validated), Decimal("0"))
    assert shadow == Decimal("14.75")

Regulatory Alignment and Operational Readiness

ETRM API integration is a compliance surface as much as an engineering one. The pipeline must retain immutable audit logs, preserve original payload hashes, and expose traceable lineage from trade capture to final settlement — the evidence a REMIT, EMIR, or FERC EQR examiner expects, and the data-integrity controls NERC CIP requires. Treating every endpoint as an unreliable distributed system — designing for partial failure, enforcing strict decimal-safe contracts, and stamping each record with a reproducible hash — is what lets a desk close deterministically even through a vendor outage. Downstream, the same guarantees feed the Settlement Calculation & Validation Engines, and the ledger keys align with Settlement Cycle Mapping so a synced batch lands in the correct operating-day bucket.

Frequently Asked Questions

Why guard fetches with a circuit breaker instead of just retrying harder?

Unbounded retries against a degraded gateway amplify the outage: every worker piles more load onto the failing endpoint and starves healthy ones. A circuit breaker fast-fails once a failure threshold is crossed, sheds load during the vendor’s recovery window, and probes with a single half-open request before resuming — protecting downstream settlement engines from cascading timeouts.

Should monetary and volume fields be parsed as float or Decimal?

Always Decimal. Binary float cannot represent most decimal fractions exactly, so float64 settlement arithmetic accumulates rounding error that surfaces as sub-cent invoice mismatches at month-end. Coerce vendor values via Decimal(str(v)) on ingest and keep those columns as object dtype through the pandas stage.

How does the pipeline avoid double-booking a re-delivered trade?

Each record carries a content-hash idempotency key derived from its normalized identity fields, and persistence upserts on the composite (trade_id, delivery_start) key. A replayed page or a retried request produces the same key, so the write is a no-op instead of a phantom leg or a false tolerance breach.

Is a negative LMP a validation error?

No. Negative locational marginal prices are legitimate results of congestion and oversupply; rejecting them silently drops real curtailment settlements. Validation should admit negative prices and reject only non-finite or physically impossible magnitudes.

ETRM API Integration Patterns

Specification & Standards Reference #

Deterministic Data Contracts and Decimal-Safe Normalization #

Step-by-Step Implementation #

Step 1 — Guard the endpoint with a circuit breaker #

Step 2 — Fetch a page with bounded backoff #

Step 3 — Validate and quarantine #

Step 4 — Normalize with vectorized, decimal-preserving pandas #

Step 5 — Persist with an idempotent upsert and audit hash #

Edge Cases & Failure Modes #

Threshold & Alerting Configuration #

Testing & Reconciliation Verification #

Regulatory Alignment and Operational Readiness #

Frequently Asked Questions #

Why guard fetches with a circuit breaker instead of just retrying harder? #

Should monetary and volume fields be parsed as float or Decimal? #

How does the pipeline avoid double-booking a re-delivered trade? #

Is a negative LMP a validation error? #

Related #

Explore this topic

Automating ETRM sync with Python requests