ISO-NE vs CAISO reporting schema differences

Cross-market settlement reconciliation in North American wholesale power markets demands deterministic parsing architectures that explicitly account for schema divergences between independent system operators (ISOs) and regional transmission organizations (RTOs). While both ISO New England (an RTO) and the California Independent System Operator (an ISO) clear locational marginal prices (LMPs), congestion components, and metered generation data, their underlying reporting schemas diverge in timestamp conventions, interval granularity, node hierarchies, and delivery formats. These structural differences routinely trigger silent misalignments in financial settlement calculations, particularly when Python automation pipelines assume uniform column naming, implicit timezone handling, or homogeneous interval boundaries. Establishing fault-tolerant ingestion layers requires strict adherence to ISO/RTO Data Format Standards to survive API version shifts, market rule changes, and regulatory audit requirements.

The diagram below maps the four key schema divergences between the two markets and how each side normalizes into a single canonical settlement model.

flowchart LR
    subgraph NE["ISO-NE"]
        NE1["Eastern Time<br/>file-level metadata"]
        NE2["Hourly DA, 5-min RT"]
        NE3["Load Zones, Hubs"]
        NE4["Energy, Congestion, Loss"]
    end
    subgraph CA["CAISO"]
        CA1["Pacific Time<br/>Start plus End columns"]
        CA2["15-min FMM, 5-min RT"]
        CA3["PNode, APNode, Hubs"]
        CA4["MEC, MCC, MCL"]
    end
    NE1 --> CANON["Canonical model<br/>UTC anchored, volume-weighted"]
    CA1 --> CANON
    NE2 --> CANON
    CA2 --> CANON
    NE3 --> CANON
    CA3 --> CANON
    NE4 --> CANON
    CA4 --> CANON

Temporal Conventions and Interval Alignment Failures

ISO-NE and CAISO manage temporal metadata with fundamentally different operational philosophies, directly impacting pandas resampling logic and merge_asof reconciliation workflows. ISO-NE Day-Ahead (DA) and Real-Time (RT) LMP files typically publish Eastern Time intervals with an explicit Interval Start column formatted as YYYY-MM-DD HH:MM:SS. The timezone is rarely embedded in the string itself; it is instead governed by file-level metadata or FTP directory conventions. CAISO publishes Pacific Time intervals and routinely includes both Interval Start and Interval End columns. More critically, CAISO’s Fifteen-Minute Market (FMM) introduces 15-minute granularity that coexists with legacy 5-minute Real-Time intervals, creating overlapping timestamp windows that break naive hourly aggregation.

The most frequent reconciliation failure occurs when analysts apply pd.to_datetime() without explicit timezone anchoring. Daylight Saving Time transitions can silently shift settlement boundaries by one hour, violating FERC audit requirements. The production-safe pattern requires explicit UTC localization followed by regional conversion before any temporal resampling, as detailed in the pandas Time Series Documentation.

import pandas as pd
from zoneinfo import ZoneInfo
from typing import Literal

def normalize_interval_timestamps(
    df: pd.DataFrame, 
    tz_str: Literal["US/Eastern", "US/Pacific"]
) -> pd.DataFrame:
    """
    Safely normalizes ISO-NE/CAISO interval timestamps to explicit regional timezones.
    Prevents DST-induced settlement boundary drift.
    """
    df = df.copy()
    cols = ["Interval Start", "Interval End"]
    for col in cols:
        if col in df.columns:
            # Parse as UTC first to avoid ambiguous DST folding
            df[col] = pd.to_datetime(df[col], utc=True).dt.tz_convert(ZoneInfo(tz_str))
    return df

def align_to_fifteen_min_intervals(df: pd.DataFrame) -> pd.DataFrame:
    """
    Aggregates 5-minute RT data into 15-minute FMM-aligned buckets using
    MWh-weighted prices. Settlement value, not a simple price average, is the
    correct basis for netting, so each price component is weighted by interval MWh.
    Closing on the left boundary avoids look-ahead bias.
    """
    df = df.copy()
    price_cols = ["LMP", "Energy", "Congestion", "Loss"]
    # Weight each price component by interval volume so the bucket average
    # reflects settlement dollars rather than an unweighted price mean.
    for col in price_cols:
        df[f"_{col}_x_mw"] = df[col] * df["MW"]

    grouped = df.groupby(
        pd.Grouper(key="Interval Start", freq="15min", closed="left", label="left")
    )
    agg = grouped[[f"_{c}_x_mw" for c in price_cols] + ["MW"]].sum()

    out = pd.DataFrame(index=agg.index)
    for col in price_cols:
        # Guard against divide-by-zero in zero-MW (e.g. fully curtailed) buckets.
        out[col] = (agg[f"_{col}_x_mw"] / agg["MW"]).where(agg["MW"] != 0)
    out["MW"] = agg["MW"]
    return out.reset_index()

When reconciling ISO-NE hourly DA against CAISO 15-minute FMM data, never apply resample('15T').mean() directly to raw LMPs. Settlement rules mandate volume-weighted averaging or explicit interval mapping based on official market clearing timestamps. Always validate that the resulting temporal index aligns with the published market run schedule before proceeding to financial netting.

Node Topology and Pricing Point Mapping

Pricing point hierarchies represent the most complex schema divergence between the two markets. ISO-NE structures settlement around Load Zones, Interface Points, and specific Hub/Load nodes, typically identified via a Node or Location column. CAISO utilizes a highly granular PNode/APNode architecture, where physical nodes are aggregated into Trading Hubs and Congestion Zones for financial settlement. Column naming conventions are rarely consistent across market runs, and late-binding node reclassifications frequently break static lookup tables.

A robust reconciliation pipeline must decouple raw node identifiers from financial settlement zones through a deterministic mapping layer. This requires maintaining a version-controlled taxonomy registry that tracks node lifecycle events, retirements, and zone boundary adjustments. Implementing this mapping strategy aligns directly with the architectural principles outlined in Core Architecture & Market Taxonomy for Energy Settlements.

from pydantic import BaseModel, Field, ValidationError
from typing import Optional

class SettlementNodeSchema(BaseModel):
    """Strict schema validation for cross-market node mapping."""
    raw_node_id: str
    market: str = Field(pattern="^(ISO-NE|CAISO)$")
    pricing_point_type: str = Field(pattern="^(Hub|Zone|Interface|PNode|APNode)$")
    settlement_zone: str
    effective_date: str
    is_active: bool = True

def validate_and_map_nodes(raw_df: pd.DataFrame, mapping_df: pd.DataFrame) -> pd.DataFrame:
    """
    Validates raw node IDs against the active taxonomy registry.
    Drops unmapped or retired nodes to prevent silent settlement leakage.
    """
    merged = raw_df.merge(
        mapping_df[mapping_df["is_active"] == True],
        left_on="Node",
        right_on="raw_node_id",
        how="inner"
    )
    return merged.drop(columns=["raw_node_id"])

Component Decomposition and Column Semantics

LMP component decomposition follows divergent naming and null-value conventions. ISO-NE typically publishes Energy, Congestion, and Loss components as explicit decimal columns. CAISO uses the MEC (Marginal Energy Component), MCC (Marginal Congestion Component), and MCL (Marginal Loss Component) nomenclature. Furthermore, missing or suppressed values are handled differently: ISO-NE often uses empty strings or 0.0, while CAISO may publish -999, NaN, or omit rows entirely for constrained intervals.

Automated reconciliation pipelines must normalize these semantic differences before performing component summation checks (LMP == Energy + Congestion + Loss). A defensive parsing strategy should explicitly cast to float64, replace sentinel values with pd.NA, and enforce component integrity checks prior to financial aggregation.

def normalize_lmp_components(df: pd.DataFrame, market: str) -> pd.DataFrame:
    """
    Standardizes LMP component columns and enforces additive integrity.
    """
    df = df.copy()
    if market == "CAISO":
        df = df.rename(columns={"MEC": "Energy", "MCC": "Congestion", "MCL": "Loss"})

    component_cols = ["Energy", "Congestion", "Loss"]
    # Replace both string and numeric sentinels with proper NA. Cast to numeric
    # first so a numeric -999 (CAISO) is caught as well as string forms, then
    # use nullable Float64 so missing intervals propagate as NA rather than 0.
    for col in component_cols + ["LMP"]:
        df[col] = pd.to_numeric(
            df[col].replace(["NULL", ""], pd.NA), errors="coerce"
        ).astype("Float64")
    df[component_cols] = df[component_cols].mask(df[component_cols] == -999, pd.NA)

    # Verify LMP decomposition: LMP == Energy + Congestion + Loss.
    df["Reconstructed_LMP"] = df["Energy"] + df["Congestion"] + df["Loss"]
    tolerance = 0.001
    df["Component_Variance"] = (df["LMP"] - df["Reconstructed_LMP"]).abs()

    # Flag rows exceeding tolerance for manual settlement review. NA variances
    # (from suppressed components) are treated as failures, not silently passed.
    df["Audit_Flag"] = (df["Component_Variance"] <= tolerance).fillna(False).eq(False)
    return df

Delivery Mechanisms and Pipeline Resilience

Data delivery architectures further complicate cross-market automation. ISO-NE historically distributes settlement files via SFTP in compressed CSV or fixed-width formats, with explicit revision markers appended to filenames (e.g., _REV1, _FINAL). CAISO relies on the CAISO Market Data & OASIS Portal, exposing data through RESTful endpoints, XML/CSV payloads, and increasingly structured JSON schemas. Late data injections, provisional-to-final settlement updates, and market run reschedules are common in both jurisdictions.

Production-grade pipelines must implement idempotent ingestion, versioned data partitioning, and explicit audit trails. Every file drop should be hashed, logged, and stored in a raw landing zone before transformation. Settlement analysts require deterministic replay capabilities to validate financial positions against historical market runs. By enforcing strict schema contracts, explicit timezone localization, and volume-weighted temporal alignment, automation builders can eliminate silent reconciliation drift and maintain strict regulatory compliance across multi-market portfolios.