ISO-NE vs CAISO Reporting Schema Differences

Q: How do I stop Daylight Saving Time from shifting ISO-NE and CAISO settlement boundaries?

Parse every timestamp as UTC first, then convert to US/Eastern or US/Pacific. Parsing a naive Eastern or Pacific string directly makes the fall-back hour ambiguous and can silently move a settlement interval by an hour, which is a FERC audit finding. UTC anchoring makes the fold unambiguous and keeps the interval grid one-to-one across the transition.

Q: What is the CAISO equivalent of ISO-NE's Energy, Congestion, and Loss columns?

CAISO's MEC (Marginal Energy Component), MCC (Marginal Congestion Component), and MCL (Marginal Loss Component) map one-to-one to ISO-NE's Energy, Congestion, and Loss. Each triplet sums to the nodal LMP, but the labels differ and CAISO encodes suppressed values as -999, NaN, or an omitted row where ISO-NE uses 0.0 or an empty string. Rename the columns and normalize the sentinels before running the additive integrity check.

A cross-market portfolio nets to zero on paper but the T+4 statement disagrees by a few dollars per node, because an ingestion job aligned CAISO’s 5-minute Real-Time prices into 15-minute buckets with a plain mean() while ISO-NE’s hourly Day-Ahead prices stayed untouched — two different interval grids collapsed with two different rules, and the settlement value drifted. Within the ISO/RTO Data Format Standards component of the Core Architecture & Market Taxonomy for Energy Settlements framework, this page isolates exactly the four schema divergences that break naive cross-market reconciliation between ISO New England (an RTO) and the California Independent System Operator (an ISO), and gives the deterministic Python that folds both feeds into a single canonical settlement model — UTC-anchored, volume-weighted, and safe to post.

Both markets clear locational marginal prices (LMPs), congestion components, and metered generation, but their reporting schemas diverge in timestamp conventions, interval granularity, node hierarchies, and delivery formats. Those structural gaps trigger silent misalignments whenever a pipeline assumes uniform column naming, implicit timezone handling, or homogeneous interval boundaries.

The diagram below maps the four key schema divergences between the two markets and how each side normalizes into a single canonical settlement model.

The four divergences, side by side:

Schema dimension	ISO-NE	CAISO
Timezone encoding	Eastern Time, tz in file-level metadata / FTP path	Pacific Time, often with explicit `Interval Start` + `Interval End`
Interval granularity	Hourly Day-Ahead, 5-minute Real-Time	15-minute FMM coexisting with 5-minute Real-Time
Pricing point model	Load Zones, Interface Points, Hub/Load nodes	PNode / APNode aggregated into Trading Hubs, Congestion Zones
Component labels	`Energy` / `Congestion` / `Loss`	`MEC` / `MCC` / `MCL`
Suppressed values	empty string or `0.0`	`-999`, `NaN`, or row omitted
Delivery	SFTP, compressed CSV / fixed-width, `_REV1` / `_FINAL` markers	OASIS REST, XML / CSV / JSON payloads

Prerequisites

Python 3.11+ for zoneinfo in the standard library (no pytz dependency).
pandas 2.x for Grouper, merge_asof, and nullable Float64.
pydantic 2.x for the node mapping contract.
Data dependencies: an ISO-NE DA/RT LMP export (SFTP pull) and a CAISO OASIS LMP report, plus a version-controlled node taxonomy registry keyed by market and effective date.
Access: ISO-NE SMD/FTP credentials for the settlement file drop, and a CAISO OASIS query URL (no key required for public LMP reports). The node registry is maintained by the Schema Validation Frameworks component, which owns the contract every raw row is checked against before this normalization runs.

Divergence 1 — Temporal Conventions and Interval Alignment

ISO-NE and CAISO manage temporal metadata with fundamentally different operational philosophies, directly impacting pandas resampling logic and merge_asof reconciliation. ISO-NE Day-Ahead (DA) and Real-Time (RT) LMP files publish Eastern Time intervals with an explicit Interval Start column formatted as YYYY-MM-DD HH:MM:SS; the timezone is rarely embedded in the string, and is instead governed by file-level metadata or FTP directory conventions. CAISO publishes Pacific Time intervals and routinely includes both Interval Start and Interval End columns. More critically, CAISO’s Fifteen-Minute Market (FMM) introduces 15-minute granularity that coexists with legacy 5-minute Real-Time intervals, creating overlapping timestamp windows that break naive hourly aggregation.

The most frequent reconciliation failure occurs when analysts apply pd.to_datetime() without explicit timezone anchoring. Daylight Saving Time transitions can silently shift settlement boundaries by one hour, violating FERC audit requirements. The production-safe pattern parses as UTC first, then converts to the regional zone before any temporal resampling, as detailed in the pandas Time Series Documentation. When collapsing 5-minute Real-Time prices into 15-minute FMM-aligned buckets, the bucket value is the MWh-weighted mean of each price component, not an unweighted average:

$$\bar{P}{b} = \frac{\sum{i \in b} P_i \cdot MW_i}{\sum_{i \in b} MW_i}$$

where $ b $ is the 15-minute bucket, $ P_i $ is a price component in interval $ i $, and $ MW_i $ is that interval’s dispatched volume.

import pandas as pd
from zoneinfo import ZoneInfo
from typing import Literal


def normalize_interval_timestamps(
    lmp_df: pd.DataFrame,
    tz_str: Literal["US/Eastern", "US/Pacific"],
) -> pd.DataFrame:
    """
    Normalize ISO-NE / CAISO interval timestamps to an explicit regional zone.
    Parsing as UTC first avoids ambiguous DST folding on the fall-back hour and
    prevents DST-induced settlement boundary drift.
    """
    lmp_df = lmp_df.copy()
    for col in ("Interval Start", "Interval End"):
        if col in lmp_df.columns:
            lmp_df[col] = pd.to_datetime(lmp_df[col], utc=True).dt.tz_convert(
                ZoneInfo(tz_str)
            )
    return lmp_df


def align_to_fifteen_min_intervals(lmp_df: pd.DataFrame) -> pd.DataFrame:
    """
    Aggregate 5-minute RT data into 15-minute FMM-aligned buckets using
    MWh-weighted prices. Settlement value, not a simple price average, is the
    correct basis for netting, so each price component is weighted by interval
    MWh. Closing on the left boundary avoids look-ahead bias.
    """
    lmp_df = lmp_df.copy()
    price_cols = ["LMP", "Energy", "Congestion", "Loss"]
    # Weight each price component by interval volume so the bucket mean reflects
    # settlement dollars rather than an unweighted price average.
    for col in price_cols:
        lmp_df[f"_{col}_x_mw"] = lmp_df[col] * lmp_df["MW"]

    grouped = lmp_df.groupby(
        pd.Grouper(key="Interval Start", freq="15min", closed="left", label="left")
    )
    agg = grouped[[f"_{c}_x_mw" for c in price_cols] + ["MW"]].sum()

    settlement_interval = pd.DataFrame(index=agg.index)
    for col in price_cols:
        # Guard against divide-by-zero in zero-MW (fully curtailed) buckets.
        settlement_interval[col] = (
            agg[f"_{col}_x_mw"] / agg["MW"]
        ).where(agg["MW"] != 0)
    settlement_interval["MW"] = agg["MW"]
    return settlement_interval.reset_index()

When reconciling ISO-NE hourly DA against CAISO 15-minute FMM data, never apply resample('15min').mean() directly to raw LMPs. Settlement rules mandate volume-weighted averaging or explicit interval mapping against official market clearing timestamps. The delivery_date and interval boundaries this alignment produces are the same grid the Settlement Cycle Mapping engine keys on, so a misaligned index here corrupts every downstream cycle.

Divergence 2 — Node Topology and Pricing Point Mapping

Pricing point hierarchies represent the most complex schema divergence between the two markets. ISO-NE structures settlement around Load Zones, Interface Points, and specific Hub/Load nodes, identified via a Node or Location column. CAISO uses a highly granular PNode/APNode architecture, where physical nodes are aggregated into Trading Hubs and Congestion Zones for financial settlement. Column naming conventions are rarely consistent across market runs, and late-binding node reclassifications frequently break static lookup tables.

A robust reconciliation pipeline decouples raw node identifiers from financial settlement zones through a deterministic mapping layer. This requires a version-controlled taxonomy registry that tracks node lifecycle events, retirements, and zone boundary adjustments — the same discipline the Loss Factor Mapping Strategies engine relies on to attach a loss factor to each metered node.

from pydantic import BaseModel, Field
import pandas as pd


class SettlementNodeSchema(BaseModel):
    """Strict schema validation for cross-market node mapping."""

    raw_node_id: str
    market: str = Field(pattern="^(ISO-NE|CAISO)$")
    pricing_point_type: str = Field(pattern="^(Hub|Zone|Interface|PNode|APNode)$")
    settlement_zone: str
    effective_date: str
    is_active: bool = True


def validate_and_map_nodes(
    raw_df: pd.DataFrame, mapping_df: pd.DataFrame
) -> pd.DataFrame:
    """
    Validate raw node IDs against the active taxonomy registry.
    Drops unmapped or retired nodes to prevent silent settlement leakage;
    the dropped set is logged so an auditor can see what was excluded and why.
    """
    active = mapping_df[mapping_df["is_active"] == True]
    merged = raw_df.merge(
        active,
        left_on="Node",
        right_on="raw_node_id",
        how="inner",
    )
    dropped = raw_df[~raw_df["Node"].isin(active["raw_node_id"])]
    if not dropped.empty:
        dropped.to_parquet("audit/unmapped_nodes.parquet", index=False)
    return merged.drop(columns=["raw_node_id"])

Divergence 3 — Component Decomposition and Column Semantics

LMP component decomposition follows divergent naming and null-value conventions. ISO-NE publishes Energy, Congestion, and Loss as explicit decimal columns. CAISO uses MEC (Marginal Energy Component), MCC (Marginal Congestion Component), and MCL (Marginal Loss Component). Both markets satisfy the same identity — the nodal price is the sum of its three components — but the labels and the way missing data is encoded differ sharply:

$$LMP_n = \text{MEC}_n + \text{MCC}_n + \text{MCL}_n = E_n + C_n + L_n$$

Component	ISO-NE label	CAISO label	Typical suppressed encoding
Energy	`Energy`	`MEC`	ISO-NE: `0.0` / empty · CAISO: `-999`
Congestion	`Congestion`	`MCC`	ISO-NE: `0.0` / empty · CAISO: `NaN`
Loss	`Loss`	`MCL`	ISO-NE: `0.0` / empty · CAISO: row omitted

Automated pipelines must normalize these semantics before performing the additive integrity check LMP == Energy + Congestion + Loss. A defensive parser casts to numeric, replaces sentinel values with pd.NA, and enforces component integrity prior to financial aggregation — the identical decomposition the Pricing Logic Implementation engine validates before a charge is priced.

import pandas as pd


def normalize_lmp_components(lmp_df: pd.DataFrame, market: str) -> pd.DataFrame:
    """
    Standardize LMP component columns across ISO-NE and CAISO and enforce
    additive integrity (LMP == Energy + Congestion + Loss).
    """
    lmp_df = lmp_df.copy()
    if market == "CAISO":
        lmp_df = lmp_df.rename(
            columns={"MEC": "Energy", "MCC": "Congestion", "MCL": "Loss"}
        )

    component_cols = ["Energy", "Congestion", "Loss"]
    # Cast to numeric first so a numeric -999 (CAISO) is caught alongside string
    # forms, then use nullable Float64 so missing intervals propagate as NA
    # rather than collapsing to 0.
    for col in component_cols + ["LMP"]:
        lmp_df[col] = pd.to_numeric(
            lmp_df[col].replace(["NULL", ""], pd.NA), errors="coerce"
        ).astype("Float64")
    lmp_df[component_cols] = lmp_df[component_cols].mask(
        lmp_df[component_cols] == -999, pd.NA
    )

    reconstructed_lmp = lmp_df[component_cols].sum(axis=1, skipna=False)
    tolerance = 0.001
    component_variance = (lmp_df["LMP"] - reconstructed_lmp).abs()

    # Flag rows exceeding tolerance for manual settlement review. NA variances
    # (from suppressed components) are treated as failures, never silently passed.
    lmp_df["Reconstructed_LMP"] = reconstructed_lmp
    lmp_df["Component_Variance"] = component_variance
    lmp_df["Audit_Flag"] = (component_variance <= tolerance).fillna(False).eq(False)
    return lmp_df

Divergence 4 — Delivery Mechanisms and Pipeline Resilience

Data delivery architectures further complicate cross-market automation. ISO-NE historically distributes settlement files via SFTP in compressed CSV or fixed-width formats, with explicit revision markers appended to filenames (for example _REV1, _FINAL). CAISO relies on the CAISO Market Data & OASIS Portal, exposing data through RESTful endpoints, XML/CSV payloads, and increasingly structured JSON. Late data injections, provisional-to-final settlement updates, and market run reschedules are common in both jurisdictions.

Production-grade pipelines implement idempotent ingestion, versioned data partitioning, and explicit audit trails. Every file drop is hashed, logged, and stored in a raw landing zone before transformation, so settlement analysts retain deterministic replay against historical market runs.

import hashlib
from pathlib import Path


def land_raw_file(payload: bytes, source: str, run_ts: str, landing: str) -> dict:
    """
    Idempotently land a raw ISO-NE or CAISO file: content-hash it, write it once
    to a partitioned raw zone, and emit an audit record. A re-sent REV/FINAL drop
    with identical bytes is a no-op; changed bytes land as a new version.
    """
    digest = hashlib.sha256(payload).hexdigest()
    target = Path(landing) / source / run_ts / f"{digest}.raw"
    if target.exists():
        return {"source": source, "sha256": digest, "landed": False}
    target.parent.mkdir(parents=True, exist_ok=True)
    target.write_bytes(payload)
    return {"source": source, "sha256": digest, "landed": True, "path": str(target)}

Verification Steps

Confirm the normalization is correct before any figure reaches settlement:

Timezone anchoring. After normalize_interval_timestamps, assert the index is tz-aware and that a spring-forward day yields 23 distinct wall-clock hours, a fall-back day 25 — never a duplicated or collapsed hour: assert ne_df["Interval Start"].dt.tz is not None.
Interval count. For one trading day, align_to_fifteen_min_intervals on CAISO RT must emit exactly 96 buckets (assert len(fmm_df) == 96); a different count means an interval fell outside its bucket.
Component integrity. After normalize_lmp_components, assert not lmp_df["Audit_Flag"].any() — any flagged row is a decomposition that does not sum to the nodal price and must be quarantined, not posted.
Node coverage. Check audit/unmapped_nodes.parquet is empty for the run; a non-empty file means the taxonomy registry lags the market’s latest node reclassification.
Idempotent landing. Re-run land_raw_file on the same payload and assert landed is False — proof the raw zone deduplicates identical _REV/_FINAL re-sends.
Reconciliation diff. Volume-weight both normalized frames to the same hourly grid and diff the canonical LMP per node; the residual must be zero to the settlement tolerance, not merely small.

Compliance Note

FERC requires that settlement figures be reproducible and traceable across a Daylight Saving Time boundary, which is precisely why timestamps are anchored to UTC before any resampling — a boundary drift of one hour is an audit finding, not a rounding nuisance. On the ISO-NE side the governing references are Market Rule 1 and Manual M-28; on the CAISO side, the Business Practice Manual for Settlements and Statements together with NAESB WEQ-002 define the OASIS component labels and interval conventions. Because both operators version their data dictionaries against a tariff effective date, the node taxonomy registry and component mapping in this pipeline must be pinned to the schema version in force for the settlement date being reconciled, and any figure whose Audit_Flag is set must route to manual review before posting rather than settling on an unverified decomposition.

Frequently Asked Questions

Why does the CAISO-to-15-minute alignment use a volume-weighted mean instead of a simple average?

Because the settled quantity is dollars, not price. A 5-minute interval that dispatched 200 MWh at $40 and one that dispatched 5 MWh at $120 do not average to $80 in settlement — the high-volume interval dominates the money. A plain mean() overweights low-volume intervals and manufactures a per-node dollar drift against ISO-NE’s hourly Day-Ahead figures. Weighting each price component by interval MWh reproduces the settlement value exactly.

How do I stop Daylight Saving Time from shifting ISO-NE and CAISO settlement boundaries?

Parse every timestamp as UTC first with pd.to_datetime(col, utc=True), then convert to US/Eastern or US/Pacific with dt.tz_convert. Parsing a naive Eastern or Pacific string directly makes the fall-back hour ambiguous — pandas cannot tell the first 1:30 a.m. from the second — and can silently move a settlement interval by an hour, which is a FERC audit finding. UTC anchoring makes the fold unambiguous and keeps the interval grid one-to-one across the transition.

What is the CAISO equivalent of ISO-NE’s Energy, Congestion, and Loss columns?

CAISO’s MEC (Marginal Energy Component), MCC (Marginal Congestion Component), and MCL (Marginal Loss Component) map one-to-one to ISO-NE’s Energy, Congestion, and Loss. The economics are identical — each triplet sums to the nodal LMP — but the labels differ, and CAISO encodes suppressed values as -999, NaN, or an omitted row where ISO-NE uses 0.0 or an empty string. Rename the columns and normalize the sentinels before running the additive integrity check.

ISO-NE vs CAISO Reporting Schema Differences

Prerequisites #

Divergence 1 — Temporal Conventions and Interval Alignment #

Divergence 2 — Node Topology and Pricing Point Mapping #

Divergence 3 — Component Decomposition and Column Semantics #

Divergence 4 — Delivery Mechanisms and Pipeline Resilience #

Verification Steps #

Compliance Note #

Frequently Asked Questions #

Why does the CAISO-to-15-minute alignment use a volume-weighted mean instead of a simple average? #

How do I stop Daylight Saving Time from shifting ISO-NE and CAISO settlement boundaries? #

What is the CAISO equivalent of ISO-NE’s Energy, Congestion, and Loss columns? #

Related #