GHCNd Station Drought Monitoring Pipeline: Methods

Zachary H. Hoylman — Montana Climate Office, University of Montana

Updated: 2026-04-11 Version 1.0

Abstract The Montana Climate Office (MCO) GHCNd Station Drought Monitoring Pipeline is an operational system that computes standardized drought indices on a daily basis from individual weather station observations across the contiguous United States (CONUS). The pipeline ingests daily station data from the Global Historical Climatology Network – Daily (GHCNd) ^[1], a quality-controlled archive of in situ observations from over 100,000 stations worldwide, updated daily by NOAA’s National Centers for Environmental Information (NCEI).

From the core GHCNd variables — precipitation (PRCP), maximum temperature (TMAX), and minimum temperature (TMIN) — the pipeline derives three standardized drought index families: the Standardized Precipitation Index (SPI), the Standardized Precipitation-Evapotranspiration Index (SPEI), and the Evaporative Demand Drought Index (EDDI). Reference evapotranspiration (ET₀) is estimated using the Hargreaves-Samani method ^[2] with extraterrestrial radiation computed from FAO-56 equations ^[3]. Ancillary products include precipitation accumulations (mm), percent of normal, departure from normal, and empirical percentile ranks. Each metric is computed across multiple aggregation timescales ranging from 15 to 730 days, as well as water-year and year-to-date accumulations.

Unlike the companion MCO CONUS Drought Monitoring Pipeline, which operates on the ~4 km gridMET dataset, this pipeline produces point-based station estimates. Station-level drought indices offer several advantages: they reflect actual observed conditions rather than interpolated surfaces, they are available globally (not limited to CONUS gridded products), and they provide independent validation points for gridded analyses. Only stations that are currently reporting and have at least 30 years of historical data are included, ensuring robust climatological fitting for operational drought assessment.

2. Data

2.1 GHCNd Station Data

All meteorological inputs are obtained from the Global Historical Climatology Network – Daily (GHCNd) ^[1], an integrated database of daily climate summaries from land surface stations maintained by NOAA NCEI. GHCNd is subject to a suite of automated quality assurance procedures including duplicate checks, climatological outlier detection, spatial consistency tests, and temporal gap checks ^[4]. Observations that fail any quality check are flagged and excluded from this pipeline.

The following GHCNd elements are used:

Variable	GHCNd element	Native units	Converted units	Used for
Precipitation	`PRCP`	tenths of mm	mm	SPI, SPEI water balance, precip accumulations
Max temperature	`TMAX`	tenths of °C	°C	Hargreaves-Samani ET₀
Min temperature	`TMIN`	tenths of °C	°C	Hargreaves-Samani ET₀

Data Access

GHCNd data are obtained from the NCEI public archive as by-year CSV files (YYYY.csv.gz) from ncei.noaa.gov/pub/data/ghcn/daily/by_year/. Each file contains all station observations for a single calendar year (~170 MB compressed for recent years). The pipeline downloads the full historical archive on first run (cold start) and subsequently re-downloads only the current year file on each operational run (warm start), since historical years are immutable.

Quality Control

Each GHCNd observation carries a quality flag (QFLAG) from NCEI’s automated QC suite. Any observation with a non-blank quality flag is excluded prior to analysis. This removes values that failed duplicate checks, climatological range tests, spatial consistency checks, or other QC procedures described in Durre et al. (2010) ^[4].

2.2 Station Selection Criteria

The pipeline targets operational drought assessment and therefore restricts analysis to stations that are both currently reporting and have sufficient historical depth for robust climatological fitting. Station eligibility is determined from the ghcnd-inventory.txt metadata file, which records the first and last year of data for each station-element combination.

Filtering Criteria

CONUS bounding box — Only stations with US country prefix and coordinates within 24.5–49.5°N, 125–66.5°W are included, excluding Alaska, Hawaii, and US territories.
Recency — The station must have reported PRCP within the last calendar year (last year in inventory ≥ current year − 1).
Record length — The station must have at least 30 years of PRCP data (last year − first year + 1 ≥ 30).
SPEI/EDDI eligibility — Stations additionally require TMAX and TMIN with the same recency and record length thresholds. Approximately 75–80% of PRCP-eligible stations also report temperature.
Reporting latency — At computation time, stations whose most recent observation is older than the configured latency threshold (default: 3 days) are excluded from drought index computation for that run.

Current station counts (CONUS, April 2026): Approximately 5,277 stations are eligible for SPI (precipitation only), and approximately 4,096 are additionally eligible for SPEI and EDDI (precipitation + temperature).

3. Climatological Reference Periods

Scientific Motivation

Standardized drought indices measure anomalies relative to a baseline climatology. The choice of baseline profoundly affects index values: the same precipitation total can indicate moderate drought against a 1950–1980 baseline but near-normal conditions against a 1991–2020 WMO normal. As the climate warms, the statistical properties of temperature, PET, and water balance are trending systematically, violating the stationarity assumption embedded in fixed-period normals. Hoylman et al. (2022) ^[5] demonstrate that non-stationarity in baseline climate substantially alters drought frequency and severity assessments, motivating the use of flexible, contemporary reference periods for operational monitoring.

Supported Modes

Spec syntax	Mode	Description
`rolling:30`	Rolling trailing window	Trailing 30-year window ending at the current year; tracks contemporary climatology (default)
`fixed:1991:2020`	Fixed interval	WMO 1991–2020 standard normal
`full`	Full archive	All years from `START_YEAR` to present; maximizes sample size

For each timescale and reference period combination, a vector of one aggregated value per year is assembled by extracting the rolling window sum (or mean) ending on the same calendar month-day as the most recent observation. This vector serves as input to the distribution fitting procedure for SPI, SPEI, and EDDI.

Minimum data requirement: At least 3 valid years must be present in the climatological vector for a distribution fit to proceed. Stations or timescales that do not meet this threshold receive NA. With MIN_OBS_FRACTION=1, any year with a single missing day in the aggregation window is excluded from the fitting sample.

4. Reference Evapotranspiration (ET₀) Estimation

SPEI and EDDI require an estimate of atmospheric evaporative demand. GHCNd provides only temperature data (TMAX, TMIN) — wind speed, humidity, and solar radiation are not available from the standard GHCNd archive. The Penman-Monteith method is therefore not feasible. The Thornthwaite method was rejected because it operates only at monthly timesteps and performs poorly in arid, semiarid, and high-elevation climates ^[6].

4.1 Hargreaves-Samani Method

Daily reference evapotranspiration is estimated using the Hargreaves-Samani (HS) equation ^[2], which requires only daily maximum and minimum temperature plus extraterrestrial radiation (computed from latitude and day of year):

ET₀ = 0.0023 ⋅ R_a^* ⋅ (T_mean + 17.8) ⋅ (T_max − T_min)^0.5

where:

ET₀ is daily reference evapotranspiration (mm/day)
R_a^* is extraterrestrial radiation already converted to equivalent evaporation depth (mm/day). That is, R_a^* = 0.408 ⋅ R_a where R_a is in MJ m⁻² day⁻¹ (FAO-56 Eq. 20; see Section 4.2). The 0.408 factor is applied once when computing R_a^* and is not reapplied here.
T_mean = (T_max + T_min) / 2 (°C)
T_max − T_min is the diurnal temperature range (°C), clamped ≥ 0
ET₀ is clamped ≥ 0 (negative values are not physically meaningful)

The HS method has been shown to provide reasonable estimates of ET₀ across a wide range of climates when compared to Penman-Monteith, particularly for drought monitoring applications where relative anomalies matter more than absolute accuracy ^[7].

4.2 Extraterrestrial Radiation (FAO-56)

Extraterrestrial radiation (R_a) is computed from station latitude and day of year using the standard FAO-56 equations ^[3]:

Solar declination (Eq. 24):
δ = 0.409 ⋅ sin(2π/365 ⋅ J − 1.39)
where J is the day of year (1–366).
Inverse relative Earth-Sun distance (Eq. 23):
d_r = 1 + 0.033 ⋅ cos(2π/365 ⋅ J)
Sunset hour angle (Eq. 25):
ω_s = arccos[ −tan(φ) ⋅ tan(δ) ]
where φ is station latitude in radians. For polar conditions where |tan(φ) ⋅ tan(δ)| ≥ 1, ω_s is set to π (24-hour daylight) or 0 (no daylight).
Extraterrestrial radiation (Eq. 21):
R_a = (24 ⋅ 60 / π) ⋅ G_sc ⋅ d_r ⋅ [ ω_s sin(φ) sin(δ) + cos(φ) cos(δ) sin(ω_s) ]
where G_sc = 0.0820 MJ m⁻² min⁻¹ is the solar constant.
Conversion to equivalent evaporation (Eq. 20):
R_a (mm/day) = 0.408 ⋅ R_a (MJ m⁻² day⁻¹)
where 0.408 = 1/λ and λ = 2.45 MJ kg⁻¹ is the latent heat of vaporization.

5. Drought Index Methods

All indices are computed independently at each station. For a given aggregation window of W days, the pipeline computes two distinct quantities:

The current observation, X_W^(now) — the rolling window of W days ending at the most recent valid observation date. This value is independent of the climatological reference period.
The reference distribution, {X_W^(y)} — one aggregated value per reference year y, anchored on the same calendar month-day as the current observation. The set of years y is determined by CLIM_PERIODS.

A probability distribution (or empirical CDF) is then fitted to the reference distribution, and the current observation is evaluated against that fit to produce a standardized anomaly. For rolling:N and full modes the current year is naturally part of the reference distribution, so the current observation also appears as the last element of the reference set. For fixed:YYYY:YYYY modes where the current year is outside the fixed range, the current observation is external to the reference set and is evaluated against (but not included in) the fitted distribution. Treating the current value and the reference distribution as separate inputs ensures correct behavior in all modes.

Aggregation Timescales

Fixed windows: 15, 30, 45, 60, 90, 120, 180, 365, and 730 days
Water-year accumulation (Oct 1 – current date)
Year-to-date accumulation (Jan 1 – current date)

5.1 Standardized Precipitation Index (SPI)

Motivation

The SPI ^[8] expresses precipitation anomalies in units of standard deviations relative to a fitted climatological distribution, making it comparable across locations and timescales with differing precipitation regimes. It requires only precipitation data, making it computable for all stations in the network.

Input

Daily precipitation (PRCP), in millimeters. Rolling window sums.

Statistical Method

A two-parameter Gamma distribution is fitted to the climatological sample using L-moment / probability-weighted moment (PWM) estimation ^[9], as implemented in the R package lmomco:

Zero handling (Stagge et al., 2015) — Following Stagge et al. (2015) ^[10], zero precipitation values are handled via a mixed distribution approach using the Weibull plotting position:
p₀ = m / (n + 1)

p̄₀ = (m + 1) / [ 2(n + 1) ]
where m is the number of zero values and n is the total sample size. The Gamma distribution is fitted only to positive values:
H(x) = p₀ + (1 − p₀) ⋅ G(x; α, β)
PWM estimation — Unbiased probability-weighted moments computed from positive values via lmomco::pwm.ub().
L-moment conversion and parameter estimation — PWMs converted to L-moments, then lmomco::pargam() returns Gamma parameters (α, β).
CDF evaluation — The non-exceedance probability is taken from the mixed distribution, not the Gamma alone:
p = p̄₀ if X_W = 0, otherwise p₀ + (1 − p₀) ⋅ G(X_W; α, β)
where p̄₀ = (m + 1) / [2(n + 1)] is the center-of-probability-mass term for exact zeros (Stagge et al. 2015, Eq. 4).
Normal quantile transform — SPI = Φ⁻¹(p).

SPI = Φ⁻¹[ H( X_W ) ]

where H(·) is the mixed CDF defined above.

Interpretation

Category	Description	Percentile Range	SPI / SPEI Values
	Normal or wet conditions	x > 30	x > −0.524
D0	Abnormally Dry	30 ≥ x > 20	−0.524 ≥ x > −0.842
D1	Moderate Drought	20 ≥ x > 10	−0.842 ≥ x > −1.281
D2	Severe Drought	10 ≥ x > 5	−1.281 ≥ x > −1.644
D3	Extreme Drought	5 ≥ x > 2	−1.644 ≥ x > −2.054
D4	Exceptional Drought	x < 2	x < −2.054

5.2 Ancillary Precipitation Metrics

Metric	Formula	Units	Description
Percent of Normal (PON)	( X_W / μ_clim ) × 100	%	Current accumulation as percentage of climatological mean
Departure	X_W − μ_clim	mm	Absolute anomaly relative to climatological mean
Percentile	F̂_n( X_W )	0–1	Empirical ECDF rank within climatological sample
Accumulation	X_W	mm	Raw rolling precipitation sum

5.3 Standardized Precipitation-Evapotranspiration Index (SPEI)

Motivation

The SPEI ^[11] extends SPI by incorporating atmospheric evaporative demand, making it sensitive to warming-driven increases in ET₀ even when precipitation is unchanged. In a warming climate, SPEI captures hydrological drought stress that SPI alone would miss.

Input

Daily water balance = precipitation (PRCP) − reference evapotranspiration (ET₀ from Hargreaves-Samani), in mm/day. Negative values indicate moisture deficit. Rolling window sums.

Statistical Method

The water balance variable is unbounded below and has heavier distributional tails than precipitation alone. The Generalized Logistic (GLO) distribution is therefore used:

L-moments estimated from the climatological sample via unbiased PWMs.
lmomco::parglo() fits the three-parameter GLO (ξ, α, k).
GLO CDF evaluated at current observation → probability p.
SPEI = Φ⁻¹(p).

SPEI = Φ⁻¹[ F_GLO(ξ,α,k)( P − ET₀ )_W ]

Interpretation

SPEI values follow the same classification thresholds as SPI. Only stations with both PRCP and temperature data (TMAX + TMIN) can produce SPEI, as Hargreaves-Samani ET₀ is required for the water balance computation.

5.4 Evaporative Demand Drought Index (EDDI)

Motivation

EDDI ^[12] isolates the atmospheric demand component of the hydrological cycle. Because it is derived solely from ET₀ — without precipitation — it captures early warning signals of drought stress driven by high temperature and diurnal range before soil moisture depletion has occurred.

Input

Hargreaves-Samani ET₀ (mm/day). Rolling window sums (cumulative evaporative demand).

Statistical Method

EDDI uses a nonparametric rank-based approach following Hobbins et al. (2016) ^[12]:

Descending rank — Each accumulated ET_0,W in the climatological sample is assigned a rank i in descending order, so that i = 1 corresponds to the maximum (driest) value. When the current observation is part of the reference set (rolling/full modes), ranking is done over the n-element reference distribution. When the current observation is external to the reference set (fixed-period mode with current year outside the range), the current value is appended to the reference distribution before ranking, producing an (n+1)-element sample. This preserves the original rank-based math for rolling/full modes while correctly placing the external current value in the empirical distribution for fixed-period modes.
Tukey plotting position —
p_i = (i − 1/3) / (n + 1/3)
Because rank 1 is the driest value, small p_i corresponds to high evaporative demand.
Inverse normal via rational approximation — To map p_i to a standard normal quantile, the Abramowitz & Stegun (1965) ^[13] rational approximation is used. Define
W(p) = √{ −2 ln[ min(p, 1−p) ] }
and let
Z(W) = W − (C₀ + C₁W + C₂W²) / (1 + d₁W + d₂W² + d₃W³)
with constants C₀ = 2.515517, C₁ = 0.802853, C₂ = 0.010328, d₁ = 1.432788, d₂ = 0.189269, d₃ = 0.001308.
Sign assignment — Apply the sign convention so that high ET₀ (small p_i) yields positive EDDI:
EDDI = +Z(W) if p_i ≤ 0.5 (drought), otherwise −Z(W)

This is mathematically equivalent to EDDI = Φ⁻¹(1 − p_i), since rank 1 = maximum demand.

6. Processing Architecture

Station-Based Parallelism

Unlike the companion gridded pipeline, which decomposes the CONUS domain into spatial tiles, this pipeline parallelizes across stations. Each station’s time series is processed independently via pbmcapply::pbmclapply() (fork-based parallel apply). This is naturally embarrassingly parallel — stations share no state.

Cold Start vs. Warm Start

On first run (cold start), all 48 yearly CSV files are downloaded from NCEI and parsed into per-station RDS files. Each station’s RDS contains a complete daily time series with columns for date, PRCP, TMAX, and TMIN. On subsequent operational runs (warm start), only the current year and prior year CSV files are re-downloaded and merged into existing per-station RDS files, reducing I/O by approximately 200×.

S3 Caching (AWS Fargate)

When deployed on AWS Fargate, all per-station RDS files, raw CSVs, and derived outputs are synced to an S3 bucket at the end of each run. The next Fargate run restores this cache before processing, avoiding a full cold start on ephemeral compute infrastructure. Local development runs bypass S3 sync entirely.

Pipeline Steps

Step 0: Filter active CONUS stations from GHCNd inventory
Step 1: Download/parse GHCNd by-year CSVs → per-station RDS files
Step 2: Compute Hargreaves-Samani daily ET₀ for SPEI-eligible stations
Step 3: Compute SPI (all precip stations)
Step 4: Compute SPEI (stations with PET)
Step 5: Compute EDDI (stations with PET)
Step 6: Compute precipitation accumulations + ancillary metrics
Step 7: Assemble outputs (GeoJSON, CSV, per-station JSON, manifest)

7. Output Products

File	Format	Description
`GHCNd_drought_current.geojson.gz`	GeoJSON (gzipped)	All stations with all drought indices as Feature properties; ~2 MB compressed
`all_stations.csv`	CSV	Wide-format table: one row per station, all metrics as columns
`stations/{ID}.json`	JSON	Per-station detail files with all indices, accumulations, and metadata
`station_catalog.csv`	CSV	Station metadata, record dates, and available index flags
`manifest.csv`	CSV	Per-dataset date manifest for tracking data freshness

Index Summary

Index	Stations	Units / Range	Distribution
SPI	All with PRCP (~5,277)	Dimensionless z-score	Gamma (L-moments)
SPEI	PRCP + TMAX + TMIN (~4,096)	Dimensionless z-score	Generalized Logistic (L-moments)
EDDI	TMAX + TMIN (~4,096)	Dimensionless z-score	Nonparametric (Tukey)
Precip accumulation	All with PRCP	mm	Raw sum
Percent of Normal	All with PRCP	%	Climatological mean
Departure	All with PRCP	mm	Climatological mean
Percentile	All with PRCP	0–1	Empirical ECDF

8. Software & Reproducibility

Runtime Environment

The pipeline runs inside a Docker container based on rocker/r-ver:4.4.1 (R 4.4.1). The container includes the AWS CLI for S3 synchronization and minimal spatial libraries (GDAL/GEOS/PROJ for GeoJSON output via sf).

Key R Packages

Package	Role
`data.table`	Fast CSV parsing and in-memory data manipulation
`lmomco`	L-moment parameter estimation; Gamma, GLO distributions
`sf`	GeoJSON output generation
`jsonlite`	Per-station JSON and compact GeoJSON output
`pbmcapply`	Fork-based parallel apply with progress bar
`purrr`	Functional programming utilities

Source Code Structure

mco-GHCNd/
├── R/
│   ├── drought-functions.R        # Hargreaves-Samani, SPI, SPEI, EDDI fitting
│   ├── pipeline-common.R          # Shared infrastructure
│   ├── 0_ghcnd-station-filter.R   # Station selection
│   ├── 1_ghcnd-cache.R            # Data download and parsing
│   ├── 2_compute-pet.R            # Hargreaves-Samani ET₀
│   ├── 3_metrics-spi.R            # SPI computation
│   ├── 4_metrics-spei.R           # SPEI computation
│   ├── 5_metrics-eddi.R           # EDDI computation
│   ├── 6_metrics-precip-accum.R   # Precipitation accumulations
│   └── 7_output-assembly.R        # Output generation
├── pipeline/
│   ├── run_once.sh                # Orchestration script
│   └── run_test.sh                # Test runner
├── Dockerfile
├── docker-compose.yml
└── terraform/                     # AWS infrastructure (Fargate, S3, ECR)

Deployment

The pipeline is deployed on AWS Fargate via Amazon ECS, triggered nightly at 10:00 PM Mountain Time by an EventBridge Scheduler rule. Outputs are written to a public S3 bucket with CORS enabled for direct web consumption. Infrastructure is managed via Terraform.

9. References

Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG. An overview of the Global Historical Climatology Network-Daily database. Journal of Atmospheric and Oceanic Technology. 2012;29(7):897–910. doi:10.1175/JTECH-D-11-00103.1
Hargreaves GH, Samani ZA. Reference crop evapotranspiration from temperature. Applied Engineering in Agriculture. 1985;1(2):96–99. doi:10.13031/2013.26773
Allen RG, Pereira LS, Raes D, Smith M. Crop evapotranspiration: guidelines for computing crop water requirements. FAO Irrigation and Drainage Paper 56. Rome: Food and Agriculture Organization of the United Nations; 1998.
Durre I, Menne MJ, Gleason BE, Houston TG, Vose RS. Comprehensive automated quality assurance of daily surface observations. Journal of Applied Meteorology and Climatology. 2010;49(8):1615–1633. doi:10.1175/2010JAMC2375.1
Hoylman ZH, Bocinsky RK, Jencso KG. Drought assessment has been outpaced by climate change: empirical arguments for a paradigm shift. Nature Communications. 2022;13:2715. doi:10.1038/s41467-022-30316-5
Droogers P, Allen RG. Estimating reference evapotranspiration under inaccurate data conditions. Irrigation and Drainage Systems. 2002;16(1):33–45. doi:10.1023/A:1015508322413
Begueria S, Vicente-Serrano SM, Angulo-Martinez M. A multiscalar global drought dataset: the SPEIbase. Bulletin of the American Meteorological Society. 2010;91(10):1351–1356. doi:10.1175/2010BAMS2988.1
McKee TB, Doesken NJ, Kleist J. The relationship of drought frequency and duration to time scales. Proceedings of the 8th Conference on Applied Climatology. 1993;17(22):179–183. American Meteorological Society.
Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B. 1990;52(1):105–124. doi:10.1111/j.2517-6161.1990.tb01775.x
Stagge JH, Tallaksen LM, Gudmundsson L, Van Loon AF, Stahl K. Candidate distributions for climatological drought indices (SPI and SPEI). International Journal of Climatology. 2015;35(13):4027–4040. doi:10.1002/joc.4267
Begueria S, Vicente-Serrano SM, Reig F, Latorre B. Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and related applications. International Journal of Climatology. 2014;34(10):3001–3023. doi:10.1002/joc.3887
Hobbins MT, Wood A, McEvoy DJ, Huntington JL, Morton C, Anderson M, Hain C. The Evaporative Demand Drought Index. Part I: Linking drought evolution to variations in evaporative demand. Journal of Hydrometeorology. 2016;17(6):1745–1761. doi:10.1175/JHM-D-15-0121.1
Abramowitz M, Stegun IA, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. Washington, DC: U.S. Government Printing Office; 1965.

2. Data

2.1 GHCNd Station Data

Data Access

Quality Control

2.2 Station Selection Criteria

Filtering Criteria

3. Climatological Reference Periods

Scientific Motivation

Supported Modes

4. Reference Evapotranspiration (ET0) Estimation

4.1 Hargreaves-Samani Method

4.2 Extraterrestrial Radiation (FAO-56)

5. Drought Index Methods

Aggregation Timescales

5.1 Standardized Precipitation Index (SPI)

Motivation

Input

Statistical Method

Interpretation

5.2 Ancillary Precipitation Metrics

5.3 Standardized Precipitation-Evapotranspiration Index (SPEI)

Motivation

Input

Statistical Method

Interpretation

5.4 Evaporative Demand Drought Index (EDDI)

Motivation

Input

Statistical Method

6. Processing Architecture

Station-Based Parallelism

Cold Start vs. Warm Start

S3 Caching (AWS Fargate)

Pipeline Steps

7. Output Products

Index Summary

8. Software & Reproducibility

Runtime Environment

Key R Packages

Source Code Structure

Deployment

9. References

4. Reference Evapotranspiration (ET₀) Estimation