Montana Climate Office

GHCNd Station Drought Monitoring Pipeline: Methods

Zachary H. Hoylman — Montana Climate Office, University of Montana

Updated: Version 1.0

Abstract The Montana Climate Office (MCO) GHCNd Station Drought Monitoring Pipeline is an operational system that computes standardized drought indices on a daily basis from individual weather station observations across the contiguous United States (CONUS). The pipeline ingests daily station data from the Global Historical Climatology Network – Daily (GHCNd) [1], a quality-controlled archive of in situ observations from over 100,000 stations worldwide, updated daily by NOAA’s National Centers for Environmental Information (NCEI).

From the core GHCNd variables — precipitation (PRCP), maximum temperature (TMAX), and minimum temperature (TMIN) — the pipeline derives three standardized drought index families: the Standardized Precipitation Index (SPI), the Standardized Precipitation-Evapotranspiration Index (SPEI), and the Evaporative Demand Drought Index (EDDI). Reference evapotranspiration (ET0) is estimated using the Hargreaves-Samani method [2] with extraterrestrial radiation computed from FAO-56 equations [3]. Ancillary products include precipitation accumulations (mm), percent of normal, departure from normal, and empirical percentile ranks. Each metric is computed across multiple aggregation timescales ranging from 15 to 730 days, as well as water-year and year-to-date accumulations.

Unlike the companion MCO CONUS Drought Monitoring Pipeline, which operates on the ~4 km gridMET dataset, this pipeline produces point-based station estimates. Station-level drought indices offer several advantages: they reflect actual observed conditions rather than interpolated surfaces, they are available globally (not limited to CONUS gridded products), and they provide independent validation points for gridded analyses. Only stations that are currently reporting and have at least 30 years of historical data are included, ensuring robust climatological fitting for operational drought assessment.

2. Data

2.1 GHCNd Station Data

All meteorological inputs are obtained from the Global Historical Climatology Network – Daily (GHCNd) [1], an integrated database of daily climate summaries from land surface stations maintained by NOAA NCEI. GHCNd is subject to a suite of automated quality assurance procedures including duplicate checks, climatological outlier detection, spatial consistency tests, and temporal gap checks [4]. Observations that fail any quality check are flagged and excluded from this pipeline.

The following GHCNd elements are used:

Variable GHCNd element Native units Converted units Used for
PrecipitationPRCPtenths of mmmmSPI, SPEI water balance, precip accumulations
Max temperatureTMAXtenths of °C°CHargreaves-Samani ET0
Min temperatureTMINtenths of °C°CHargreaves-Samani ET0

Data Access

GHCNd data are obtained from the NCEI public archive as by-year CSV files (YYYY.csv.gz) from ncei.noaa.gov/pub/data/ghcn/daily/by_year/. Each file contains all station observations for a single calendar year (~170 MB compressed for recent years). The pipeline downloads the full historical archive on first run (cold start) and subsequently re-downloads only the current year file on each operational run (warm start), since historical years are immutable.

Quality Control

Each GHCNd observation carries a quality flag (QFLAG) from NCEI’s automated QC suite. Any observation with a non-blank quality flag is excluded prior to analysis. This removes values that failed duplicate checks, climatological range tests, spatial consistency checks, or other QC procedures described in Durre et al. (2010) [4].

2.2 Station Selection Criteria

The pipeline targets operational drought assessment and therefore restricts analysis to stations that are both currently reporting and have sufficient historical depth for robust climatological fitting. Station eligibility is determined from the ghcnd-inventory.txt metadata file, which records the first and last year of data for each station-element combination.

Filtering Criteria

  • CONUS bounding box — Only stations with US country prefix and coordinates within 24.5–49.5°N, 125–66.5°W are included, excluding Alaska, Hawaii, and US territories.
  • Recency — The station must have reported PRCP within the last calendar year (last year in inventory ≥ current year − 1).
  • Record length — The station must have at least 30 years of PRCP data (last year − first year + 1 ≥ 30).
  • SPEI/EDDI eligibility — Stations additionally require TMAX and TMIN with the same recency and record length thresholds. Approximately 75–80% of PRCP-eligible stations also report temperature.
  • Reporting latency — At computation time, stations whose most recent observation is older than the configured latency threshold (default: 3 days) are excluded from drought index computation for that run.
Current station counts (CONUS, April 2026): Approximately 5,277 stations are eligible for SPI (precipitation only), and approximately 4,096 are additionally eligible for SPEI and EDDI (precipitation + temperature).

3. Climatological Reference Periods

Scientific Motivation

Standardized drought indices measure anomalies relative to a baseline climatology. The choice of baseline profoundly affects index values: the same precipitation total can indicate moderate drought against a 1950–1980 baseline but near-normal conditions against a 1991–2020 WMO normal. As the climate warms, the statistical properties of temperature, PET, and water balance are trending systematically, violating the stationarity assumption embedded in fixed-period normals. Hoylman et al. (2022) [5] demonstrate that non-stationarity in baseline climate substantially alters drought frequency and severity assessments, motivating the use of flexible, contemporary reference periods for operational monitoring.

Supported Modes

Spec syntax Mode Description
rolling:30 Rolling trailing window Trailing 30-year window ending at the current year; tracks contemporary climatology (default)
fixed:1991:2020 Fixed interval WMO 1991–2020 standard normal
full Full archive All years from START_YEAR to present; maximizes sample size

For each timescale and reference period combination, a vector of one aggregated value per year is assembled by extracting the rolling window sum (or mean) ending on the same calendar month-day as the most recent observation. This vector serves as input to the distribution fitting procedure for SPI, SPEI, and EDDI.

Minimum data requirement: At least 3 valid years must be present in the climatological vector for a distribution fit to proceed. Stations or timescales that do not meet this threshold receive NA. With MIN_OBS_FRACTION=1, any year with a single missing day in the aggregation window is excluded from the fitting sample.

4. Reference Evapotranspiration (ET0) Estimation

SPEI and EDDI require an estimate of atmospheric evaporative demand. GHCNd provides only temperature data (TMAX, TMIN) — wind speed, humidity, and solar radiation are not available from the standard GHCNd archive. The Penman-Monteith method is therefore not feasible. The Thornthwaite method was rejected because it operates only at monthly timesteps and performs poorly in arid, semiarid, and high-elevation climates [6].

4.1 Hargreaves-Samani Method

Daily reference evapotranspiration is estimated using the Hargreaves-Samani (HS) equation [2], which requires only daily maximum and minimum temperature plus extraterrestrial radiation (computed from latitude and day of year):

ET0 = 0.0023 ⋅ Ra ⋅ (Tmean + 17.8) ⋅ (Tmax − Tmin)0.5

where:

  • ET0 is daily reference evapotranspiration (mm/day)
  • Ra is extraterrestrial radiation converted to equivalent evaporation (mm/day)
  • Tmean = (Tmax + Tmin) / 2 (°C)
  • Tmax − Tmin is the diurnal temperature range (°C), clamped ≥ 0
  • ET0 is clamped ≥ 0 (negative values are not physically meaningful)

The HS method has been shown to provide reasonable estimates of ET0 across a wide range of climates when compared to Penman-Monteith, particularly for drought monitoring applications where relative anomalies matter more than absolute accuracy [7].

4.2 Extraterrestrial Radiation (FAO-56)

Extraterrestrial radiation (Ra) is computed from station latitude and day of year using the standard FAO-56 equations [3]:

  1. Solar declination (Eq. 24):
    δ = 0.409 ⋅ sin(2π/365 ⋅ J − 1.39)
    where J is the day of year (1–366).
  2. Inverse relative Earth-Sun distance (Eq. 23):
    dr = 1 + 0.033 ⋅ cos(2π/365 ⋅ J)
  3. Sunset hour angle (Eq. 25):
    ωs = arccos[ −tan(φ) ⋅ tan(δ) ]
    where φ is station latitude in radians. For polar conditions where |tan(φ) ⋅ tan(δ)| ≥ 1, ωs is set to π (24-hour daylight) or 0 (no daylight).
  4. Extraterrestrial radiation (Eq. 21):
    Ra = (24 ⋅ 60 / π) ⋅ Gsc ⋅ dr ⋅ [ ωs sin(φ) sin(δ) + cos(φ) cos(δ) sin(ωs) ]
    where Gsc = 0.0820 MJ m−2 min−1 is the solar constant.
  5. Conversion to equivalent evaporation (Eq. 20):
    Ra (mm/day) = 0.408 ⋅ Ra (MJ m−2 day−1)
    where 0.408 = 1/λ and λ = 2.45 MJ kg−1 is the latent heat of vaporization.

5. Drought Index Methods

All indices are computed independently at each station. For a given aggregation window of W days, a rolling sum of the input variable is first computed over the station’s daily record. One value per climatological reference year is extracted by anchoring the window on the same calendar month-day as the most recent observation. A probability distribution (or empirical CDF) is then fitted to the resulting climatological sample, and the current (most recent) value is transformed to a standardized anomaly.

Aggregation Timescales

5.1 Standardized Precipitation Index (SPI)

Motivation

The SPI [8] expresses precipitation anomalies in units of standard deviations relative to a fitted climatological distribution, making it comparable across locations and timescales with differing precipitation regimes. It requires only precipitation data, making it computable for all stations in the network.

Input

Daily precipitation (PRCP), in millimeters. Rolling window sums.

Statistical Method

A two-parameter Gamma distribution is fitted to the climatological sample using L-moment / probability-weighted moment (PWM) estimation [9], as implemented in the R package lmomco:

  1. Zero handling (Stagge et al., 2015) — Following Stagge et al. (2015) [10], zero precipitation values are handled via a mixed distribution approach using the Weibull plotting position:
    p0 = m / (n + 1)
    0 = (m + 1) / [ 2(n + 1) ]
    where m is the number of zero values and n is the total sample size. The Gamma distribution is fitted only to positive values:
    H(x) = p0 + (1 − p0) ⋅ G(x; α, β)
  2. PWM estimation — Unbiased probability-weighted moments computed from positive values via lmomco::pwm.ub().
  3. L-moment conversion and parameter estimation — PWMs converted to L-moments, then lmomco::pargam() returns Gamma parameters (α, β).
  4. Normal quantile transform — SPI = Φ−1(p).
SPI = Φ−1[ FΓ(α,β)( XW ) ]

Interpretation

Category Description Percentile Range SPI / SPEI Values
Normal or wet conditionsx > 30x > −0.524
D0Abnormally Dry30 ≥ x > 20−0.524 ≥ x > −0.842
D1Moderate Drought20 ≥ x > 10−0.842 ≥ x > −1.281
D2Severe Drought10 ≥ x > 5−1.281 ≥ x > −1.644
D3Extreme Drought5 ≥ x > 2−1.644 ≥ x > −2.054
D4Exceptional Droughtx < 2x < −2.054

5.2 Ancillary Precipitation Metrics

Metric Formula Units Description
Percent of Normal (PON) ( XW / μclim ) × 100 % Current accumulation as percentage of climatological mean
Departure XW − μclim mm Absolute anomaly relative to climatological mean
Percentile n( XW ) 0–1 Empirical ECDF rank within climatological sample
Accumulation XW mm Raw rolling precipitation sum

5.3 Standardized Precipitation-Evapotranspiration Index (SPEI)

Motivation

The SPEI [11] extends SPI by incorporating atmospheric evaporative demand, making it sensitive to warming-driven increases in ET0 even when precipitation is unchanged. In a warming climate, SPEI captures hydrological drought stress that SPI alone would miss.

Input

Daily water balance = precipitation (PRCP) − reference evapotranspiration (ET0 from Hargreaves-Samani), in mm/day. Negative values indicate moisture deficit. Rolling window sums.

Statistical Method

The water balance variable is unbounded below and has heavier distributional tails than precipitation alone. The Generalized Logistic (GLO) distribution is therefore used:

  1. L-moments estimated from the climatological sample via unbiased PWMs.
  2. lmomco::parglo() fits the three-parameter GLO (ξ, α, k).
  3. GLO CDF evaluated at current observation → probability p.
  4. SPEI = Φ−1(p).
SPEI = Φ−1[ FGLO(ξ,α,k)( P − ET0 )W ]

Interpretation

SPEI values follow the same classification thresholds as SPI. Only stations with both PRCP and temperature data (TMAX + TMIN) can produce SPEI, as Hargreaves-Samani ET0 is required for the water balance computation.

5.4 Evaporative Demand Drought Index (EDDI)

Motivation

EDDI [12] isolates the atmospheric demand component of the hydrological cycle. Because it is derived solely from ET0 — without precipitation — it captures early warning signals of drought stress driven by high temperature and diurnal range before soil moisture depletion has occurred.

Input

Hargreaves-Samani ET0 (mm/day). Rolling window sums (cumulative evaporative demand).

Statistical Method

EDDI uses a nonparametric rank-based approach following Hobbins et al. (2016) [12]:

  1. Rank — Observations within the climatological sample are ranked from highest (rank 1) to lowest.
  2. Tukey plotting position
    pi = (i − 1/3) / (n + 1/3)
    where i is the rank (1 = maximum ET0).
  3. Inverse normal approximation — EDDI is derived via the rational approximation of Abramowitz and Stegun (1965) [13]. For p ≤ 0.5, W = √(−2 ln(p)); for p > 0.5, p is replaced with 1 − p and the sign is reversed.
  4. Sign convention — Positive EDDI = above-normal atmospheric demand (drought). Negative = wet anomaly.
EDDI = Φ−1[ pTukey( ET0,W ) ]

6. Processing Architecture

Station-Based Parallelism

Unlike the companion gridded pipeline, which decomposes the CONUS domain into spatial tiles, this pipeline parallelizes across stations. Each station’s time series is processed independently via pbmcapply::pbmclapply() (fork-based parallel apply). This is naturally embarrassingly parallel — stations share no state.

Cold Start vs. Warm Start

On first run (cold start), all 48 yearly CSV files are downloaded from NCEI and parsed into per-station RDS files. Each station’s RDS contains a complete daily time series with columns for date, PRCP, TMAX, and TMIN. On subsequent operational runs (warm start), only the current year and prior year CSV files are re-downloaded and merged into existing per-station RDS files, reducing I/O by approximately 200×.

S3 Caching (AWS Fargate)

When deployed on AWS Fargate, all per-station RDS files, raw CSVs, and derived outputs are synced to an S3 bucket at the end of each run. The next Fargate run restores this cache before processing, avoiding a full cold start on ephemeral compute infrastructure. Local development runs bypass S3 sync entirely.

Pipeline Steps

Step 0: Filter active CONUS stations from GHCNd inventory
Step 1: Download/parse GHCNd by-year CSVs → per-station RDS files
Step 2: Compute Hargreaves-Samani daily ET₀ for SPEI-eligible stations
Step 3: Compute SPI (all precip stations)
Step 4: Compute SPEI (stations with PET)
Step 5: Compute EDDI (stations with PET)
Step 6: Compute precipitation accumulations + ancillary metrics
Step 7: Assemble outputs (GeoJSON, CSV, per-station JSON, manifest)

7. Output Products

File Format Description
GHCNd_drought_current.geojson.gz GeoJSON (gzipped) All stations with all drought indices as Feature properties; ~2 MB compressed
all_stations.csv CSV Wide-format table: one row per station, all metrics as columns
stations/{ID}.json JSON Per-station detail files with all indices, accumulations, and metadata
station_catalog.csv CSV Station metadata, record dates, and available index flags
manifest.csv CSV Per-dataset date manifest for tracking data freshness

Index Summary

Index Stations Units / Range Distribution
SPIAll with PRCP (~5,277)Dimensionless z-scoreGamma (L-moments)
SPEIPRCP + TMAX + TMIN (~4,096)Dimensionless z-scoreGeneralized Logistic (L-moments)
EDDITMAX + TMIN (~4,096)Dimensionless z-scoreNonparametric (Tukey)
Precip accumulationAll with PRCPmmRaw sum
Percent of NormalAll with PRCP%Climatological mean
DepartureAll with PRCPmmClimatological mean
PercentileAll with PRCP0–1Empirical ECDF

8. Software & Reproducibility

Runtime Environment

The pipeline runs inside a Docker container based on rocker/r-ver:4.4.1 (R 4.4.1). The container includes the AWS CLI for S3 synchronization and minimal spatial libraries (GDAL/GEOS/PROJ for GeoJSON output via sf).

Key R Packages

PackageRole
data.tableFast CSV parsing and in-memory data manipulation
lmomcoL-moment parameter estimation; Gamma, GLO distributions
sfGeoJSON output generation
jsonlitePer-station JSON and compact GeoJSON output
pbmcapplyFork-based parallel apply with progress bar
purrrFunctional programming utilities

Source Code Structure

mco-GHCNd/
├── R/
│   ├── drought-functions.R        # Hargreaves-Samani, SPI, SPEI, EDDI fitting
│   ├── pipeline-common.R          # Shared infrastructure
│   ├── 0_ghcnd-station-filter.R   # Station selection
│   ├── 1_ghcnd-cache.R            # Data download and parsing
│   ├── 2_compute-pet.R            # Hargreaves-Samani ET₀
│   ├── 3_metrics-spi.R            # SPI computation
│   ├── 4_metrics-spei.R           # SPEI computation
│   ├── 5_metrics-eddi.R           # EDDI computation
│   ├── 6_metrics-precip-accum.R   # Precipitation accumulations
│   └── 7_output-assembly.R        # Output generation
├── pipeline/
│   ├── run_once.sh                # Orchestration script
│   └── run_test.sh                # Test runner
├── Dockerfile
├── docker-compose.yml
└── terraform/                     # AWS infrastructure (Fargate, S3, ECR)

Deployment

The pipeline is deployed on AWS Fargate via Amazon ECS, triggered nightly at 10:00 PM Mountain Time by an EventBridge Scheduler rule. Outputs are written to a public S3 bucket with CORS enabled for direct web consumption. Infrastructure is managed via Terraform.

9. References

  1. Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG. An overview of the Global Historical Climatology Network-Daily database. Journal of Atmospheric and Oceanic Technology. 2012;29(7):897–910. doi:10.1175/JTECH-D-11-00103.1
  2. Hargreaves GH, Samani ZA. Reference crop evapotranspiration from temperature. Applied Engineering in Agriculture. 1985;1(2):96–99. doi:10.13031/2013.26773
  3. Allen RG, Pereira LS, Raes D, Smith M. Crop evapotranspiration: guidelines for computing crop water requirements. FAO Irrigation and Drainage Paper 56. Rome: Food and Agriculture Organization of the United Nations; 1998.
  4. Durre I, Menne MJ, Gleason BE, Houston TG, Vose RS. Comprehensive automated quality assurance of daily surface observations. Journal of Applied Meteorology and Climatology. 2010;49(8):1615–1633. doi:10.1175/2010JAMC2375.1
  5. Hoylman ZH, Bocinsky RK, Jencso KG. Drought assessment has been outpaced by climate change: empirical arguments for a paradigm shift. Nature Communications. 2022;13:2715. doi:10.1038/s41467-022-30316-5
  6. Droogers P, Allen RG. Estimating reference evapotranspiration under inaccurate data conditions. Irrigation and Drainage Systems. 2002;16(1):33–45. doi:10.1023/A:1015508322413
  7. Begueria S, Vicente-Serrano SM, Angulo-Martinez M. A multiscalar global drought dataset: the SPEIbase. Bulletin of the American Meteorological Society. 2010;91(10):1351–1356. doi:10.1175/2010BAMS2988.1
  8. McKee TB, Doesken NJ, Kleist J. The relationship of drought frequency and duration to time scales. Proceedings of the 8th Conference on Applied Climatology. 1993;17(22):179–183. American Meteorological Society.
  9. Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B. 1990;52(1):105–124. doi:10.1111/j.2517-6161.1990.tb01775.x
  10. Stagge JH, Tallaksen LM, Gudmundsson L, Van Loon AF, Stahl K. Candidate distributions for climatological drought indices (SPI and SPEI). International Journal of Climatology. 2015;35(13):4027–4040. doi:10.1002/joc.4267
  11. Begueria S, Vicente-Serrano SM, Reig F, Latorre B. Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and related applications. International Journal of Climatology. 2014;34(10):3001–3023. doi:10.1002/joc.3887
  12. Hobbins MT, Wood A, McEvoy DJ, Huntington JL, Morton C, Anderson M, Hain C. The Evaporative Demand Drought Index. Part I: Linking drought evolution to variations in evaporative demand. Journal of Hydrometeorology. 2016;17(6):1745–1761. doi:10.1175/JHM-D-15-0121.1
  13. Abramowitz M, Stegun IA, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. Washington, DC: U.S. Government Printing Office; 1965.