GHCNd Station Drought Monitoring Pipeline: Methods
Zachary H. Hoylman — Montana Climate Office, University of Montana
Abstract The Montana Climate Office (MCO) GHCNd Station Drought Monitoring Pipeline is an operational system that computes standardized drought indices on a daily basis from individual weather station observations across the contiguous United States (CONUS). The pipeline ingests daily station data from the Global Historical Climatology Network – Daily (GHCNd) [1], a quality-controlled archive of in situ observations from over 100,000 stations worldwide, updated daily by NOAA’s National Centers for Environmental Information (NCEI).
From the core GHCNd variables — precipitation (PRCP), maximum temperature (TMAX), and minimum temperature (TMIN) — the pipeline derives three standardized drought index families: the Standardized Precipitation Index (SPI), the Standardized Precipitation-Evapotranspiration Index (SPEI), and the Evaporative Demand Drought Index (EDDI). Reference evapotranspiration (ET0) is estimated using the Hargreaves-Samani method [2] with extraterrestrial radiation computed from FAO-56 equations [3]. Ancillary products include precipitation accumulations (mm), percent of normal, departure from normal, and empirical percentile ranks. Each metric is computed across multiple aggregation timescales ranging from 15 to 730 days, as well as water-year and year-to-date accumulations.
Unlike the companion MCO CONUS Drought Monitoring Pipeline, which operates on the ~4 km gridMET dataset, this pipeline produces point-based station estimates. Station-level drought indices offer several advantages: they reflect actual observed conditions rather than interpolated surfaces, they are available globally (not limited to CONUS gridded products), and they provide independent validation points for gridded analyses. Only stations that are currently reporting and have at least 30 years of historical data are included, ensuring robust climatological fitting for operational drought assessment.
2. Data
2.1 GHCNd Station Data
All meteorological inputs are obtained from the Global Historical Climatology Network – Daily (GHCNd) [1], an integrated database of daily climate summaries from land surface stations maintained by NOAA NCEI. GHCNd is subject to a suite of automated quality assurance procedures including duplicate checks, climatological outlier detection, spatial consistency tests, and temporal gap checks [4]. Observations that fail any quality check are flagged and excluded from this pipeline.
The following GHCNd elements are used:
| Variable | GHCNd element | Native units | Converted units | Used for |
|---|---|---|---|---|
| Precipitation | PRCP | tenths of mm | mm | SPI, SPEI water balance, precip accumulations |
| Max temperature | TMAX | tenths of °C | °C | Hargreaves-Samani ET0 |
| Min temperature | TMIN | tenths of °C | °C | Hargreaves-Samani ET0 |
Data Access
GHCNd data are obtained from the NCEI public archive as by-year CSV files
(YYYY.csv.gz) from
ncei.noaa.gov/pub/data/ghcn/daily/by_year/. Each file contains all station
observations for a single calendar year (~170 MB compressed for recent years).
The pipeline downloads the full historical archive on first run (cold start) and
subsequently re-downloads only the current year file on each operational run (warm start),
since historical years are immutable.
Quality Control
Each GHCNd observation carries a quality flag (QFLAG) from NCEI’s
automated QC suite. Any observation with a non-blank quality flag is excluded prior to
analysis. This removes values that failed duplicate checks, climatological range tests,
spatial consistency checks, or other QC procedures described in
Durre et al. (2010) [4].
2.2 Station Selection Criteria
The pipeline targets operational drought assessment and therefore restricts analysis
to stations that are both currently reporting and have sufficient historical depth
for robust climatological fitting. Station eligibility is determined from the
ghcnd-inventory.txt metadata file, which records the first and last
year of data for each station-element combination.
Filtering Criteria
- CONUS bounding box — Only stations with US country prefix and coordinates within 24.5–49.5°N, 125–66.5°W are included, excluding Alaska, Hawaii, and US territories.
- Recency — The station must have reported PRCP within the last calendar year (last year in inventory ≥ current year − 1).
- Record length — The station must have at least 30 years of PRCP data (last year − first year + 1 ≥ 30).
- SPEI/EDDI eligibility — Stations additionally require TMAX and TMIN with the same recency and record length thresholds. Approximately 75–80% of PRCP-eligible stations also report temperature.
- Reporting latency — At computation time, stations whose most recent observation is older than the configured latency threshold (default: 3 days) are excluded from drought index computation for that run.
3. Climatological Reference Periods
Scientific Motivation
Standardized drought indices measure anomalies relative to a baseline climatology. The choice of baseline profoundly affects index values: the same precipitation total can indicate moderate drought against a 1950–1980 baseline but near-normal conditions against a 1991–2020 WMO normal. As the climate warms, the statistical properties of temperature, PET, and water balance are trending systematically, violating the stationarity assumption embedded in fixed-period normals. Hoylman et al. (2022) [5] demonstrate that non-stationarity in baseline climate substantially alters drought frequency and severity assessments, motivating the use of flexible, contemporary reference periods for operational monitoring.
Supported Modes
| Spec syntax | Mode | Description |
|---|---|---|
rolling:30 |
Rolling trailing window | Trailing 30-year window ending at the current year; tracks contemporary climatology (default) |
fixed:1991:2020 |
Fixed interval | WMO 1991–2020 standard normal |
full |
Full archive | All years from START_YEAR to present; maximizes sample size |
For each timescale and reference period combination, a vector of one aggregated value per year is assembled by extracting the rolling window sum (or mean) ending on the same calendar month-day as the most recent observation. This vector serves as input to the distribution fitting procedure for SPI, SPEI, and EDDI.
NA. With
MIN_OBS_FRACTION=1, any year with a single missing day in the aggregation
window is excluded from the fitting sample.
4. Reference Evapotranspiration (ET0) Estimation
SPEI and EDDI require an estimate of atmospheric evaporative demand. GHCNd provides only temperature data (TMAX, TMIN) — wind speed, humidity, and solar radiation are not available from the standard GHCNd archive. The Penman-Monteith method is therefore not feasible. The Thornthwaite method was rejected because it operates only at monthly timesteps and performs poorly in arid, semiarid, and high-elevation climates [6].
4.1 Hargreaves-Samani Method
Daily reference evapotranspiration is estimated using the Hargreaves-Samani (HS) equation [2], which requires only daily maximum and minimum temperature plus extraterrestrial radiation (computed from latitude and day of year):
where:
- ET0 is daily reference evapotranspiration (mm/day)
- Ra is extraterrestrial radiation converted to equivalent evaporation (mm/day)
- Tmean = (Tmax + Tmin) / 2 (°C)
- Tmax − Tmin is the diurnal temperature range (°C), clamped ≥ 0
- ET0 is clamped ≥ 0 (negative values are not physically meaningful)
The HS method has been shown to provide reasonable estimates of ET0 across a wide range of climates when compared to Penman-Monteith, particularly for drought monitoring applications where relative anomalies matter more than absolute accuracy [7].
4.2 Extraterrestrial Radiation (FAO-56)
Extraterrestrial radiation (Ra) is computed from station latitude and day of year using the standard FAO-56 equations [3]:
-
Solar declination (Eq. 24):
δ = 0.409 ⋅ sin(2π/365 ⋅ J − 1.39)where J is the day of year (1–366).
-
Inverse relative Earth-Sun distance (Eq. 23):
dr = 1 + 0.033 ⋅ cos(2π/365 ⋅ J)
-
Sunset hour angle (Eq. 25):
ωs = arccos[ −tan(φ) ⋅ tan(δ) ]where φ is station latitude in radians. For polar conditions where |tan(φ) ⋅ tan(δ)| ≥ 1, ωs is set to π (24-hour daylight) or 0 (no daylight).
-
Extraterrestrial radiation (Eq. 21):
Ra = (24 ⋅ 60 / π) ⋅ Gsc ⋅ dr ⋅ [ ωs sin(φ) sin(δ) + cos(φ) cos(δ) sin(ωs) ]where Gsc = 0.0820 MJ m−2 min−1 is the solar constant.
-
Conversion to equivalent evaporation (Eq. 20):
Ra (mm/day) = 0.408 ⋅ Ra (MJ m−2 day−1)where 0.408 = 1/λ and λ = 2.45 MJ kg−1 is the latent heat of vaporization.
5. Drought Index Methods
All indices are computed independently at each station. For a given aggregation window of W days, a rolling sum of the input variable is first computed over the station’s daily record. One value per climatological reference year is extracted by anchoring the window on the same calendar month-day as the most recent observation. A probability distribution (or empirical CDF) is then fitted to the resulting climatological sample, and the current (most recent) value is transformed to a standardized anomaly.
Aggregation Timescales
- Fixed windows: 15, 30, 45, 60, 90, 120, 180, 365, and 730 days
- Water-year accumulation (Oct 1 – current date)
- Year-to-date accumulation (Jan 1 – current date)
5.1 Standardized Precipitation Index (SPI)
Motivation
The SPI [8] expresses precipitation anomalies in units of standard deviations relative to a fitted climatological distribution, making it comparable across locations and timescales with differing precipitation regimes. It requires only precipitation data, making it computable for all stations in the network.
Input
Daily precipitation (PRCP), in millimeters. Rolling window sums.
Statistical Method
A two-parameter Gamma distribution is fitted to the climatological sample using
L-moment / probability-weighted moment (PWM) estimation
[9], as implemented in the R package
lmomco:
-
Zero handling (Stagge et al., 2015) — Following
Stagge et al. (2015) [10], zero precipitation
values are handled via a mixed distribution approach using the Weibull plotting
position:
p0 = m / (n + 1)p̄0 = (m + 1) / [ 2(n + 1) ]where m is the number of zero values and n is the total sample size. The Gamma distribution is fitted only to positive values:H(x) = p0 + (1 − p0) ⋅ G(x; α, β)
-
PWM estimation — Unbiased probability-weighted moments
computed from positive values via
lmomco::pwm.ub(). -
L-moment conversion and parameter estimation — PWMs
converted to L-moments, then
lmomco::pargam()returns Gamma parameters (α, β). - Normal quantile transform — SPI = Φ−1(p).
Interpretation
| Category | Description | Percentile Range | SPI / SPEI Values |
|---|---|---|---|
| Normal or wet conditions | x > 30 | x > −0.524 | |
| D0 | Abnormally Dry | 30 ≥ x > 20 | −0.524 ≥ x > −0.842 |
| D1 | Moderate Drought | 20 ≥ x > 10 | −0.842 ≥ x > −1.281 |
| D2 | Severe Drought | 10 ≥ x > 5 | −1.281 ≥ x > −1.644 |
| D3 | Extreme Drought | 5 ≥ x > 2 | −1.644 ≥ x > −2.054 |
| D4 | Exceptional Drought | x < 2 | x < −2.054 |
5.2 Ancillary Precipitation Metrics
| Metric | Formula | Units | Description |
|---|---|---|---|
| Percent of Normal (PON) | ( XW / μclim ) × 100 | % | Current accumulation as percentage of climatological mean |
| Departure | XW − μclim | mm | Absolute anomaly relative to climatological mean |
| Percentile | F̂n( XW ) | 0–1 | Empirical ECDF rank within climatological sample |
| Accumulation | XW | mm | Raw rolling precipitation sum |
5.3 Standardized Precipitation-Evapotranspiration Index (SPEI)
Motivation
The SPEI [11] extends SPI by incorporating atmospheric evaporative demand, making it sensitive to warming-driven increases in ET0 even when precipitation is unchanged. In a warming climate, SPEI captures hydrological drought stress that SPI alone would miss.
Input
Daily water balance = precipitation (PRCP) − reference evapotranspiration (ET0 from Hargreaves-Samani), in mm/day. Negative values indicate moisture deficit. Rolling window sums.
Statistical Method
The water balance variable is unbounded below and has heavier distributional tails than precipitation alone. The Generalized Logistic (GLO) distribution is therefore used:
- L-moments estimated from the climatological sample via unbiased PWMs.
lmomco::parglo()fits the three-parameter GLO (ξ, α, k).- GLO CDF evaluated at current observation → probability p.
- SPEI = Φ−1(p).
Interpretation
SPEI values follow the same classification thresholds as SPI. Only stations with both PRCP and temperature data (TMAX + TMIN) can produce SPEI, as Hargreaves-Samani ET0 is required for the water balance computation.
5.4 Evaporative Demand Drought Index (EDDI)
Motivation
EDDI [12] isolates the atmospheric demand component of the hydrological cycle. Because it is derived solely from ET0 — without precipitation — it captures early warning signals of drought stress driven by high temperature and diurnal range before soil moisture depletion has occurred.
Input
Hargreaves-Samani ET0 (mm/day). Rolling window sums (cumulative evaporative demand).
Statistical Method
EDDI uses a nonparametric rank-based approach following Hobbins et al. (2016) [12]:
- Rank — Observations within the climatological sample are ranked from highest (rank 1) to lowest.
-
Tukey plotting position —
pi = (i − 1/3) / (n + 1/3)where i is the rank (1 = maximum ET0).
- Inverse normal approximation — EDDI is derived via the rational approximation of Abramowitz and Stegun (1965) [13]. For p ≤ 0.5, W = √(−2 ln(p)); for p > 0.5, p is replaced with 1 − p and the sign is reversed.
- Sign convention — Positive EDDI = above-normal atmospheric demand (drought). Negative = wet anomaly.
6. Processing Architecture
Station-Based Parallelism
Unlike the companion gridded pipeline, which decomposes the CONUS domain into spatial
tiles, this pipeline parallelizes across stations. Each station’s time series is
processed independently via pbmcapply::pbmclapply() (fork-based parallel
apply). This is naturally embarrassingly parallel — stations share no state.
Cold Start vs. Warm Start
On first run (cold start), all 48 yearly CSV files are downloaded from NCEI and parsed into per-station RDS files. Each station’s RDS contains a complete daily time series with columns for date, PRCP, TMAX, and TMIN. On subsequent operational runs (warm start), only the current year and prior year CSV files are re-downloaded and merged into existing per-station RDS files, reducing I/O by approximately 200×.
S3 Caching (AWS Fargate)
When deployed on AWS Fargate, all per-station RDS files, raw CSVs, and derived outputs are synced to an S3 bucket at the end of each run. The next Fargate run restores this cache before processing, avoiding a full cold start on ephemeral compute infrastructure. Local development runs bypass S3 sync entirely.
Pipeline Steps
Step 0: Filter active CONUS stations from GHCNd inventory
Step 1: Download/parse GHCNd by-year CSVs → per-station RDS files
Step 2: Compute Hargreaves-Samani daily ET₀ for SPEI-eligible stations
Step 3: Compute SPI (all precip stations)
Step 4: Compute SPEI (stations with PET)
Step 5: Compute EDDI (stations with PET)
Step 6: Compute precipitation accumulations + ancillary metrics
Step 7: Assemble outputs (GeoJSON, CSV, per-station JSON, manifest)
7. Output Products
| File | Format | Description |
|---|---|---|
GHCNd_drought_current.geojson.gz |
GeoJSON (gzipped) | All stations with all drought indices as Feature properties; ~2 MB compressed |
all_stations.csv |
CSV | Wide-format table: one row per station, all metrics as columns |
stations/{ID}.json |
JSON | Per-station detail files with all indices, accumulations, and metadata |
station_catalog.csv |
CSV | Station metadata, record dates, and available index flags |
manifest.csv |
CSV | Per-dataset date manifest for tracking data freshness |
Index Summary
| Index | Stations | Units / Range | Distribution |
|---|---|---|---|
| SPI | All with PRCP (~5,277) | Dimensionless z-score | Gamma (L-moments) |
| SPEI | PRCP + TMAX + TMIN (~4,096) | Dimensionless z-score | Generalized Logistic (L-moments) |
| EDDI | TMAX + TMIN (~4,096) | Dimensionless z-score | Nonparametric (Tukey) |
| Precip accumulation | All with PRCP | mm | Raw sum |
| Percent of Normal | All with PRCP | % | Climatological mean |
| Departure | All with PRCP | mm | Climatological mean |
| Percentile | All with PRCP | 0–1 | Empirical ECDF |
8. Software & Reproducibility
Runtime Environment
The pipeline runs inside a Docker container based on rocker/r-ver:4.4.1
(R 4.4.1). The container includes the AWS CLI for S3 synchronization and minimal
spatial libraries (GDAL/GEOS/PROJ for GeoJSON output via sf).
Key R Packages
| Package | Role |
|---|---|
data.table | Fast CSV parsing and in-memory data manipulation |
lmomco | L-moment parameter estimation; Gamma, GLO distributions |
sf | GeoJSON output generation |
jsonlite | Per-station JSON and compact GeoJSON output |
pbmcapply | Fork-based parallel apply with progress bar |
purrr | Functional programming utilities |
Source Code Structure
mco-GHCNd/
├── R/
│ ├── drought-functions.R # Hargreaves-Samani, SPI, SPEI, EDDI fitting
│ ├── pipeline-common.R # Shared infrastructure
│ ├── 0_ghcnd-station-filter.R # Station selection
│ ├── 1_ghcnd-cache.R # Data download and parsing
│ ├── 2_compute-pet.R # Hargreaves-Samani ET₀
│ ├── 3_metrics-spi.R # SPI computation
│ ├── 4_metrics-spei.R # SPEI computation
│ ├── 5_metrics-eddi.R # EDDI computation
│ ├── 6_metrics-precip-accum.R # Precipitation accumulations
│ └── 7_output-assembly.R # Output generation
├── pipeline/
│ ├── run_once.sh # Orchestration script
│ └── run_test.sh # Test runner
├── Dockerfile
├── docker-compose.yml
└── terraform/ # AWS infrastructure (Fargate, S3, ECR)
Deployment
The pipeline is deployed on AWS Fargate via Amazon ECS, triggered nightly at 10:00 PM Mountain Time by an EventBridge Scheduler rule. Outputs are written to a public S3 bucket with CORS enabled for direct web consumption. Infrastructure is managed via Terraform.
9. References
- Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG. An overview of the Global Historical Climatology Network-Daily database. Journal of Atmospheric and Oceanic Technology. 2012;29(7):897–910. doi:10.1175/JTECH-D-11-00103.1
- Hargreaves GH, Samani ZA. Reference crop evapotranspiration from temperature. Applied Engineering in Agriculture. 1985;1(2):96–99. doi:10.13031/2013.26773
- Allen RG, Pereira LS, Raes D, Smith M. Crop evapotranspiration: guidelines for computing crop water requirements. FAO Irrigation and Drainage Paper 56. Rome: Food and Agriculture Organization of the United Nations; 1998.
- Durre I, Menne MJ, Gleason BE, Houston TG, Vose RS. Comprehensive automated quality assurance of daily surface observations. Journal of Applied Meteorology and Climatology. 2010;49(8):1615–1633. doi:10.1175/2010JAMC2375.1
- Hoylman ZH, Bocinsky RK, Jencso KG. Drought assessment has been outpaced by climate change: empirical arguments for a paradigm shift. Nature Communications. 2022;13:2715. doi:10.1038/s41467-022-30316-5
- Droogers P, Allen RG. Estimating reference evapotranspiration under inaccurate data conditions. Irrigation and Drainage Systems. 2002;16(1):33–45. doi:10.1023/A:1015508322413
- Begueria S, Vicente-Serrano SM, Angulo-Martinez M. A multiscalar global drought dataset: the SPEIbase. Bulletin of the American Meteorological Society. 2010;91(10):1351–1356. doi:10.1175/2010BAMS2988.1
- McKee TB, Doesken NJ, Kleist J. The relationship of drought frequency and duration to time scales. Proceedings of the 8th Conference on Applied Climatology. 1993;17(22):179–183. American Meteorological Society.
- Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B. 1990;52(1):105–124. doi:10.1111/j.2517-6161.1990.tb01775.x
- Stagge JH, Tallaksen LM, Gudmundsson L, Van Loon AF, Stahl K. Candidate distributions for climatological drought indices (SPI and SPEI). International Journal of Climatology. 2015;35(13):4027–4040. doi:10.1002/joc.4267
- Begueria S, Vicente-Serrano SM, Reig F, Latorre B. Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and related applications. International Journal of Climatology. 2014;34(10):3001–3023. doi:10.1002/joc.3887
- Hobbins MT, Wood A, McEvoy DJ, Huntington JL, Morton C, Anderson M, Hain C. The Evaporative Demand Drought Index. Part I: Linking drought evolution to variations in evaporative demand. Journal of Hydrometeorology. 2016;17(6):1745–1761. doi:10.1175/JHM-D-15-0121.1
- Abramowitz M, Stegun IA, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. Washington, DC: U.S. Government Printing Office; 1965.