Explainable machine learning improves interpretability in the predictive modeling of biological stream conditions in the Chesapeake Bay Watershed, USA Archives

Entry Thumbnail

Explainable machine learning improves interpretability in the predictive modeling of biological stream conditions in the Chesapeake Bay Watershed, USA

Anthropogenic alterations have resulted in widespread degradation of stream conditions. To aid in stream restoration and management, baseline estimates of conditions and improved explanation of factors driving their degradation are needed. We used random forests to model biological conditions using a benthic macroinvertebrate index of biotic integrity for small, non-tidal streams (upstream area ≤200 km2) in the Chesapeake Bay watershed (CBW) of the mid-Atlantic coast of North America. We utilized several global and local model interpretation tools to improve average and site-specific model inferences, respectively. The model was used to predict condition for 95,867 individual catchments for eight periods (2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019). Predicted conditions were classified as Poor, FairGood, or Uncertain to align with management needs and individual reach lengths and catchment areas were summed by condition class for the CBW for each period. Global permutation and local Shapley importance values indicated percent of forest, development, and agriculture in upstream catchments had strong impacts on predictions. Development and agriculture negatively influenced stream condition for model average (partial dependence [PD] and accumulated local effect [ALE] plots) and local (individual condition expectation and Shapley value plots) levels. Friedman’s H-statistic indicated large overall interactions for these three land covers, and bivariate global plots (PD and ALE) supported interactions among agriculture and development. Total stream length and catchment area predicted in FairGood conditions decreased then increased over the 19-years (length/area: 66.6/65.4% in 2001, 66.3/65.2% in 2011, and 66.6/65.4% in 2019). Examination of individual catchment predictions between 2001 and 2019 showed those predicted to have the largest decreases in condition had large increases in development; whereas catchments predicted to exhibit the largest increases in condition showed moderate increases in forest cover. Use of global and local interpretative methods together with watershed-wide and individual catchment predictions support conservation practitioners that need to identify widespread and localized patterns, especially acknowledging that management actions typically take place at individual-reach scales.

Find more information on the ScienceDirect page.

Entry Thumbnail

Linking Altered Flow Regimes to Biological Condition: an Example Using Benthic Macroinvertebrates in Small Streams of the Chesapeake Bay Watershed

Regionally scaled assessments of hydrologic alteration for small streams and its effects on freshwater taxa are often inhibited by a low number of stream gages. To overcome this limitation, we paired modeled estimates of hydrologic alteration to a benthic macroinvertebrate index of biotic integrity data for 4522 stream reaches across the Chesapeake Bay watershed. Using separate random-forest models, we predicted flow status (inflated, diminished, or indeterminant) for 12 published hydrologic metrics (HMs) that characterize the main components of flow regimes. We used these models to predict each HM status for each stream reach in the watershed, and linked predictions to macroinvertebrate condition samples collected from streams with drainage areas less than 200 km2. Flow alteration was calculated as the number of HMs with inflated or diminished status and ranged from 0 (no HM inflated or diminished) to 12 (all 12 HMs inflated or diminished). When focused solely on the stream condition and flow-alteration relationship, degraded macroinvertebrate condition was, depending on the number of HMs used, 3.8–4.7 times more likely in a flow-altered site; this likelihood was over twofold higher in the urban-focused dataset (8.7–10.8), and was never significant in the agriculture-focused dataset. Logistic regression analysis using the entire dataset showed for every unit increase in flow-alteration intensity, the odds of a degraded condition increased 3.7%. Our results provide an indication of whether altered streamflow is a possible driver of degraded biological conditions, information that could help managers prioritize management actions and lead to more effective restoration efforts.

The report has been published in Environmental Management.

Entry Thumbnail

Creating a stream health baseline for the Chesapeake basin from monitoring and model data

This report describes how monitoring and model data were analyzed and combined to generate a preliminary estimate of acceptable stream health in the Chesapeake Bay basin for the 2006 – 2011 baseline period. Streams in about 73% of the basin’s 64,020 sq. miles of drainage area were evaluated with monitoring results, and output from a predictive model was used to estimate stream health in the remaining 27%. Stream health was measured with the bioregion, family-level version of the “Chessie BIBI,” a multi-metric index for stream macroinvertebrate communities. Index scores are normally expressed as one of five index ratings: Excellent or Good (well-functioning), Fair (considered satisfactory), and Poor or Very Poor (stressed or poorly-functioning). Four versions of the predictive model were developed and tested, and the selected version outputs results as three-ratings: Excellent/Good, Fair, and Poor/Very Poor. The five ratings in the monitoring data were re-grouped to match the three ratings of the selected predictive model. The monitoring- and model-based ratings were then area-weighted to reduce bias caused by uneven sample densities and aggregated to the Chesapeake basin scale, with monitoring results given preference. The combined results suggest approximately 60% of the basin’s area had acceptable stream ratings (Excellent, Good, or Fair) during 2006 – 2011. This estimate is a preliminary baseline for the Chesapeake Bay Program’s stream health goal. A final baseline estimate will be produced after a higher resolution stream layer becomes available and acceptable stream health can be estimated as a percent of the basin’s stream miles.