-
PDF
- Split View
-
Views
-
Cite
Cite
Victoria A Prescott, Jack Marte, Reuben P Keller, Performance of alternative methods for generating species distribution models for invasive species in the Laurentian Great Lakes, Fisheries, 2025;, vuaf012, https://doi.org/10.1093/fshmag/vuaf012
- Share Icon Share
ABSTRACT
The Risk Assessment Mapping Program (RAMP) is a user-friendly tool that uses climate data and known occurrences of nonnative species to predict where the species may be able to survive. We compared the performance of RAMP and two machine learning methods, boosted regression trees and maximum entropy, at estimating distributions of 30 aquatic species that are nonnative to the Laurentian Great Lakes Basin. For each species and method, we created models and tested them against subsets of known occurrences to calculate true skill statistics (TSS). This measure ranges between –1 (no better than random) and 1 (perfect assessment). Average TSS values were 0.81 ± 0.09 (boosted regression tree), 0.76 ± 0.12 (maximum entropy), and 0.09 ± 0.06 (RAMP). Despite having high TSS values, our machine learning models generally underestimate potential distributions across the Great Lakes Basin. The RAMP forecasts much greater areas of the basin to be climatically appropriate for each species and may therefore be more suitable for conservative management decisions.

Rusty crayfish Orconectes rusticus. Photo credit: Ryan Hagerty, U.S. Fish & Wildlife Service.
INTRODUCTION
Freshwater ecosystems are under threat from a range of stressors, including pollution, habitat destruction, climate change, and invasive species (Best, 2019; Collen et al., 2014; Dudgeon, 2019). These factors each cause significant damage and, when combined, can cause particularly large impacts (Collen et al., 2014; Reid et al., 2019). On its own, the spread of invasive species is a top threat to freshwater ecosystems (Best, 2019; Dudgeon et al., 2006; Reid et al., 2019), resulting in loss of native species (Reid et al., 2019) and massive economic damage (Cuthbert et al., 2021). Climate change can exacerbate the threat of freshwater invasive species (Strayer, 2010) by expanding suitable habitat for invaders (Bellard et al., 2013; Collingsworth et al., 2017; Rahel & Olden, 2008), although the scale, type, and location of impacts are often difficult to predict (Hellmann et al., 2008; S. D. P. Smith et al., 2019).
The Laurentian Great Lakes Basin has been particularly affected by both invasive species and climate change, and climate change will continue to alter the ranges and impacts of species already present. Currently, there are 192 known nonindigenous aquatic species established in the Great Lakes Basin (GLANSIS, 2023). As climate change affects the region the suite of potential invaders will be altered because new conditions make the ecosystem suitable for different species (Collingsworth et al., 2017; Lennox et al., 2020). This presents a challenge for management because accurate predictions of future impacts are often difficult to generate.
A variety of decision support tools, such as risk assessments and distribution modeling, are available for assessing which nonnative species are likely to invade the Great Lakes and how distributions may change over time. These tools can inform management and policy approaches (Epanchin-Niell, 2017; Keller et al., 2007; Springborn et al., 2011, 2015). In recent years, the number of decision support tools available has grown (e.g., Copp et al., 2016; Engelstad et al., 2022; Kinsley et al., 2022; Marcot et al., 2019), and most of these tools include a climate matching component. Many approaches are available for climate matching, but they have rarely been compared (Vanderhoeven et al., 2017). Explicit tests of the accuracy and precision of available tools would make it possible for agencies and managers to understand the benefits and drawbacks of each tool.
In 2014, the U.S. Fish and Wildlife Service (USFWS) released the Risk Assessment Mapping Program (RAMP; Sanders et al., 2014; Sanders & Castiglione, 2014; USFWS, 2019). This computer program generates predictive maps of regions in North America that contain suitable climate for a species based on its known range. The RAMP is a user-friendly global information systems tool that uses the same algorithm as Climatch and can be applied to any species of interest (Crombie et al., 2008; Sanders & Castiglione, 2014; USFWS, 2019). It is used by the USFWS to rapidly estimate whether and in what regions nonnative species could become established in North America. Not only does RAMP compare the climate of a species’ current distribution to the climate of North America, but it can also be used to predict regions that will be suitable in 2050 and 2070 under three different climate change scenarios. Additionally, RAMP can produce outputs for the whole of North America or for the focused regions of Mexico, the Caribbean, and the Great Lakes Basin (Sanders et al., 2014; Sanders & Castiglione, 2014). It takes just minutes for RAMP to generate predictive maps. Furthermore, RAMP is included as a part of the USFWS rapid risk assessment to produce Ecological Risk Screening Summaries, and over 1,700 species have so far been assessed with this tool (USFWS, 2023).
Developing accurate invasive species distribution models can be challenging (Lee-Yaw et al., 2022; Liu et al., 2020; Nguyen & Leung, 2022). Predictions of the potential range of a nonnative species are affected by factors including assumptions about niche conservatism and niche-environment equilibrium (Araújo & Peterson, 2012; Barbet-Massin et al., 2018; Early & Sax, 2014; Elith et al., 2010; Elith & Leatherwick, 2009; Formoso-Freire et al., 2023; Liu et al., 2020, 2022), stage of invasion (Václavík & Meentemeyer, 2012), and model development and the variables used (Elith et al., 2010; Formoso-Freire et al., 2023; Fourcade, 2021; Gaul et al., 2020; Jiménez-Valverde et al., 2011; Medley, 2010; Rodda et al., 2011). Models that are produced using only bioclimatic data are particularly prone to issues because species may have altered climate tolerances in native vs. nonnative ranges (Broennimann & Guisan, 2008). Despite these issues, species distribution models (SDMs) are widely used and may be able to predict potential areas of successful establishment (Broennimann et al., 2007).
In this study, we compared RAMP’s models to models generated by two other SDM approaches: maximum entropy (MaxEnt) and boosted regression trees (BRTs). While RAMP provides rapid predictions with limited input required from the user, MaxEnt and BRTs are machine learning algorithms that could be implemented in place of RAMP, albeit with more user expertise and time required. Our goal in this work was to compare the performance of RAMP to the other tools when those tools are implemented with similar data and effort as required for RAMP. Each of these three tools begins with known occurrence points for the species in question. These occurrence points are linked to climate data and used to estimate the range of climate conditions over which a species can survive. The three tools have different algorithms to link the known distribution of a species to the estimated climate conditions in which it can survive. In this work, we have quantitatively tested the performance of each model by reserving a random subset of the known occurrence data and using this to test a model created using the remaining occurrence data.
We generated models using each of the three tools for 30 aquatic species that are nonnative to the Great Lakes Basin (Table 1). This includes seven high-profile invaders already established in the Great Lakes Basin and for which distribution modeling could inform future management. The remaining 23 species were selected based on their prominence as invaders elsewhere in the world and/or concern about them becoming established in the Great Lakes. Species came from a variety of taxa including fishes, plants, amphibians, crayfishes, and mollusks. Although each tool can predict ranges under future climate change scenarios, we have only tested them under current climates so that we can assess performance against known species distributions.
Species used in this study. Percent suitability is given as the percent of the Great Lakes Basin that has a climate score greater than 6 (Risk Assessment Mapping Program [RAMP]) or greater than the model threshold (boosted regression tree [BRT]/maximum entropy [MaxEnt]). True skills statistic (TSS) values for each model follow the percentages.
. | Scientific name . | Common name . | Percent suitability/TSS . | ||
---|---|---|---|---|---|
RAMP . | BRT . | MaxEnt . | |||
Plants | Eichhornia crassipes | Water hyacinth | 76/0.07 | 00/0.69 | 00/0.63 |
Trapa natansa | Water chestnut | 95/0.11 | 32/0.82 | 00/0.71 | |
Pistia stratiotes | Water lettuce | 76/0.06 | 00/0.72 | 00/0.62 | |
Egeria densaa | Brazilian elodea | 96/0.08 | 01/0.72 | 00/0.67 | |
Fishes | Carassius carassius | Crucian carp | 61/0.01 | 00/0.78 | 00/0.73 |
Channa argus | Northern snakehead | 71/0.15 | 00/0.98 | 00/0.97 | |
Cobitis taenia | Spine loach | 47/0.06 | 00/0.89 | 00/0.88 | |
Ctenopharyngodon idellaa | Grass carp | 98/0.04 | 31/0.72 | 53/0.64 | |
Hypophthalmichthys nobilis | Bighead carp | 70/0.14 | 04/0.85 | 45/0.82 | |
H. molitrix | Silver carp | 66/0.14 | 01/0.87 | 49/0.84 | |
Ictalurus furcatus | Blue catfish | 68/0.11 | 00/0.78 | 00/0.77 | |
Silurus glanis | Danube catfish | 59/0.09 | 00/0.72 | 00/0.58 | |
Leuciscus idus | Ide | 62/0.08 | 00/0.90 | 00/0.89 | |
Pseudorasbora parva | Topmouth gudgeon | 54/0.24 | 00/0.87 | 00/0.89 | |
Misgurnus fossilis | Mud loach | 61/0.06 | 00/0.79 | 00/0.72 | |
Mylopharyngodon piceus | Black carp | 35/0.00 | 02/0.75 | 02/0.75 | |
Crayfishes | Austropotamobius torrentium | Stone crayfish | 46/0.11 | 00/0.93 | 00/0.90 |
Cherax destructor | Yabby crayfish | 00/0.02 | 73/0.77 | 95/0.75 | |
Pacifastacus leniusculus | Signal crayfish | 53/0.10 | 00/0.82 | 00/0.77 | |
C. quadricarinatus | Australian red claw crayfish | 00/0.10 | 00/0.62 | 00/0.52 | |
Faxonius rusticusa | Rusty crayfish | 100/0.13 | 16/0.77 | 25/0.77 | |
Procambarus clarkiia | Red swamp crayfish | 94/0.16 | 00/0.86 | 00/0.80 | |
Euastacus armatus | Murray crayfish | 00/0.08 | 00/0.93 | 00/0.93 | |
Mollusks | Corbicula flumineaa | Asian clam | 100/NA | 04/0.82 | 01/0.69 |
Dreissena bugensisa | Quagga mussel | 100/0.05 | 27/0.88 | 48/0.83 | |
Lissachatina fulica | Giant African snail | 00/0.13 | 00/0.73 | 00/0.66 | |
Physella acuta | European physa | 93/0.13 | 00/0.92 | 00/0.86 | |
Pomacea canaliculata | Golden applesnail | 06/0.21 | 00/0.74 | 10/0.57 | |
Amphibians | Rhinella marina | Cane toad | 00/0.08 | 00/0.84 | 00/0.78 |
Osteopilus septentrionalis | Cuban tree frog | 27/0.00 | 00/0.95 | 00/0.90 |
. | Scientific name . | Common name . | Percent suitability/TSS . | ||
---|---|---|---|---|---|
RAMP . | BRT . | MaxEnt . | |||
Plants | Eichhornia crassipes | Water hyacinth | 76/0.07 | 00/0.69 | 00/0.63 |
Trapa natansa | Water chestnut | 95/0.11 | 32/0.82 | 00/0.71 | |
Pistia stratiotes | Water lettuce | 76/0.06 | 00/0.72 | 00/0.62 | |
Egeria densaa | Brazilian elodea | 96/0.08 | 01/0.72 | 00/0.67 | |
Fishes | Carassius carassius | Crucian carp | 61/0.01 | 00/0.78 | 00/0.73 |
Channa argus | Northern snakehead | 71/0.15 | 00/0.98 | 00/0.97 | |
Cobitis taenia | Spine loach | 47/0.06 | 00/0.89 | 00/0.88 | |
Ctenopharyngodon idellaa | Grass carp | 98/0.04 | 31/0.72 | 53/0.64 | |
Hypophthalmichthys nobilis | Bighead carp | 70/0.14 | 04/0.85 | 45/0.82 | |
H. molitrix | Silver carp | 66/0.14 | 01/0.87 | 49/0.84 | |
Ictalurus furcatus | Blue catfish | 68/0.11 | 00/0.78 | 00/0.77 | |
Silurus glanis | Danube catfish | 59/0.09 | 00/0.72 | 00/0.58 | |
Leuciscus idus | Ide | 62/0.08 | 00/0.90 | 00/0.89 | |
Pseudorasbora parva | Topmouth gudgeon | 54/0.24 | 00/0.87 | 00/0.89 | |
Misgurnus fossilis | Mud loach | 61/0.06 | 00/0.79 | 00/0.72 | |
Mylopharyngodon piceus | Black carp | 35/0.00 | 02/0.75 | 02/0.75 | |
Crayfishes | Austropotamobius torrentium | Stone crayfish | 46/0.11 | 00/0.93 | 00/0.90 |
Cherax destructor | Yabby crayfish | 00/0.02 | 73/0.77 | 95/0.75 | |
Pacifastacus leniusculus | Signal crayfish | 53/0.10 | 00/0.82 | 00/0.77 | |
C. quadricarinatus | Australian red claw crayfish | 00/0.10 | 00/0.62 | 00/0.52 | |
Faxonius rusticusa | Rusty crayfish | 100/0.13 | 16/0.77 | 25/0.77 | |
Procambarus clarkiia | Red swamp crayfish | 94/0.16 | 00/0.86 | 00/0.80 | |
Euastacus armatus | Murray crayfish | 00/0.08 | 00/0.93 | 00/0.93 | |
Mollusks | Corbicula flumineaa | Asian clam | 100/NA | 04/0.82 | 01/0.69 |
Dreissena bugensisa | Quagga mussel | 100/0.05 | 27/0.88 | 48/0.83 | |
Lissachatina fulica | Giant African snail | 00/0.13 | 00/0.73 | 00/0.66 | |
Physella acuta | European physa | 93/0.13 | 00/0.92 | 00/0.86 | |
Pomacea canaliculata | Golden applesnail | 06/0.21 | 00/0.74 | 10/0.57 | |
Amphibians | Rhinella marina | Cane toad | 00/0.08 | 00/0.84 | 00/0.78 |
Osteopilus septentrionalis | Cuban tree frog | 27/0.00 | 00/0.95 | 00/0.90 |
aNonnative species already established in the Great Lakes Basin.
Species used in this study. Percent suitability is given as the percent of the Great Lakes Basin that has a climate score greater than 6 (Risk Assessment Mapping Program [RAMP]) or greater than the model threshold (boosted regression tree [BRT]/maximum entropy [MaxEnt]). True skills statistic (TSS) values for each model follow the percentages.
. | Scientific name . | Common name . | Percent suitability/TSS . | ||
---|---|---|---|---|---|
RAMP . | BRT . | MaxEnt . | |||
Plants | Eichhornia crassipes | Water hyacinth | 76/0.07 | 00/0.69 | 00/0.63 |
Trapa natansa | Water chestnut | 95/0.11 | 32/0.82 | 00/0.71 | |
Pistia stratiotes | Water lettuce | 76/0.06 | 00/0.72 | 00/0.62 | |
Egeria densaa | Brazilian elodea | 96/0.08 | 01/0.72 | 00/0.67 | |
Fishes | Carassius carassius | Crucian carp | 61/0.01 | 00/0.78 | 00/0.73 |
Channa argus | Northern snakehead | 71/0.15 | 00/0.98 | 00/0.97 | |
Cobitis taenia | Spine loach | 47/0.06 | 00/0.89 | 00/0.88 | |
Ctenopharyngodon idellaa | Grass carp | 98/0.04 | 31/0.72 | 53/0.64 | |
Hypophthalmichthys nobilis | Bighead carp | 70/0.14 | 04/0.85 | 45/0.82 | |
H. molitrix | Silver carp | 66/0.14 | 01/0.87 | 49/0.84 | |
Ictalurus furcatus | Blue catfish | 68/0.11 | 00/0.78 | 00/0.77 | |
Silurus glanis | Danube catfish | 59/0.09 | 00/0.72 | 00/0.58 | |
Leuciscus idus | Ide | 62/0.08 | 00/0.90 | 00/0.89 | |
Pseudorasbora parva | Topmouth gudgeon | 54/0.24 | 00/0.87 | 00/0.89 | |
Misgurnus fossilis | Mud loach | 61/0.06 | 00/0.79 | 00/0.72 | |
Mylopharyngodon piceus | Black carp | 35/0.00 | 02/0.75 | 02/0.75 | |
Crayfishes | Austropotamobius torrentium | Stone crayfish | 46/0.11 | 00/0.93 | 00/0.90 |
Cherax destructor | Yabby crayfish | 00/0.02 | 73/0.77 | 95/0.75 | |
Pacifastacus leniusculus | Signal crayfish | 53/0.10 | 00/0.82 | 00/0.77 | |
C. quadricarinatus | Australian red claw crayfish | 00/0.10 | 00/0.62 | 00/0.52 | |
Faxonius rusticusa | Rusty crayfish | 100/0.13 | 16/0.77 | 25/0.77 | |
Procambarus clarkiia | Red swamp crayfish | 94/0.16 | 00/0.86 | 00/0.80 | |
Euastacus armatus | Murray crayfish | 00/0.08 | 00/0.93 | 00/0.93 | |
Mollusks | Corbicula flumineaa | Asian clam | 100/NA | 04/0.82 | 01/0.69 |
Dreissena bugensisa | Quagga mussel | 100/0.05 | 27/0.88 | 48/0.83 | |
Lissachatina fulica | Giant African snail | 00/0.13 | 00/0.73 | 00/0.66 | |
Physella acuta | European physa | 93/0.13 | 00/0.92 | 00/0.86 | |
Pomacea canaliculata | Golden applesnail | 06/0.21 | 00/0.74 | 10/0.57 | |
Amphibians | Rhinella marina | Cane toad | 00/0.08 | 00/0.84 | 00/0.78 |
Osteopilus septentrionalis | Cuban tree frog | 27/0.00 | 00/0.95 | 00/0.90 |
. | Scientific name . | Common name . | Percent suitability/TSS . | ||
---|---|---|---|---|---|
RAMP . | BRT . | MaxEnt . | |||
Plants | Eichhornia crassipes | Water hyacinth | 76/0.07 | 00/0.69 | 00/0.63 |
Trapa natansa | Water chestnut | 95/0.11 | 32/0.82 | 00/0.71 | |
Pistia stratiotes | Water lettuce | 76/0.06 | 00/0.72 | 00/0.62 | |
Egeria densaa | Brazilian elodea | 96/0.08 | 01/0.72 | 00/0.67 | |
Fishes | Carassius carassius | Crucian carp | 61/0.01 | 00/0.78 | 00/0.73 |
Channa argus | Northern snakehead | 71/0.15 | 00/0.98 | 00/0.97 | |
Cobitis taenia | Spine loach | 47/0.06 | 00/0.89 | 00/0.88 | |
Ctenopharyngodon idellaa | Grass carp | 98/0.04 | 31/0.72 | 53/0.64 | |
Hypophthalmichthys nobilis | Bighead carp | 70/0.14 | 04/0.85 | 45/0.82 | |
H. molitrix | Silver carp | 66/0.14 | 01/0.87 | 49/0.84 | |
Ictalurus furcatus | Blue catfish | 68/0.11 | 00/0.78 | 00/0.77 | |
Silurus glanis | Danube catfish | 59/0.09 | 00/0.72 | 00/0.58 | |
Leuciscus idus | Ide | 62/0.08 | 00/0.90 | 00/0.89 | |
Pseudorasbora parva | Topmouth gudgeon | 54/0.24 | 00/0.87 | 00/0.89 | |
Misgurnus fossilis | Mud loach | 61/0.06 | 00/0.79 | 00/0.72 | |
Mylopharyngodon piceus | Black carp | 35/0.00 | 02/0.75 | 02/0.75 | |
Crayfishes | Austropotamobius torrentium | Stone crayfish | 46/0.11 | 00/0.93 | 00/0.90 |
Cherax destructor | Yabby crayfish | 00/0.02 | 73/0.77 | 95/0.75 | |
Pacifastacus leniusculus | Signal crayfish | 53/0.10 | 00/0.82 | 00/0.77 | |
C. quadricarinatus | Australian red claw crayfish | 00/0.10 | 00/0.62 | 00/0.52 | |
Faxonius rusticusa | Rusty crayfish | 100/0.13 | 16/0.77 | 25/0.77 | |
Procambarus clarkiia | Red swamp crayfish | 94/0.16 | 00/0.86 | 00/0.80 | |
Euastacus armatus | Murray crayfish | 00/0.08 | 00/0.93 | 00/0.93 | |
Mollusks | Corbicula flumineaa | Asian clam | 100/NA | 04/0.82 | 01/0.69 |
Dreissena bugensisa | Quagga mussel | 100/0.05 | 27/0.88 | 48/0.83 | |
Lissachatina fulica | Giant African snail | 00/0.13 | 00/0.73 | 00/0.66 | |
Physella acuta | European physa | 93/0.13 | 00/0.92 | 00/0.86 | |
Pomacea canaliculata | Golden applesnail | 06/0.21 | 00/0.74 | 10/0.57 | |
Amphibians | Rhinella marina | Cane toad | 00/0.08 | 00/0.84 | 00/0.78 |
Osteopilus septentrionalis | Cuban tree frog | 27/0.00 | 00/0.95 | 00/0.90 |
aNonnative species already established in the Great Lakes Basin.
METHODS
Bioclimatic data
All three methods use bioclimatic data from Worldclim.org version 1.4 (Hijmans et al., 2005; Sanders & Castiglione, 2014) to estimate climate similarity of each species’ current distribution to that of the Great Lakes Basin (Table 2). We used version 1.4 because that was the version RAMP used at the time of this study, and it is the version that has been used to produce the vast majority of the Ecological Risk Screening Summaries published by the USFWS (USFWS, 2023). The Risk Assessment Mapping Program uses 16 of the 19 available bioclimatic variables because those variables were used by Bomford (2008; USFWS, 2024). We used all 19 variables for MaxEnt and BRTs because this is how these tools are usually implemented (Table 2) and because it required no additional effort. Although RAMP uses WorldClim’s 2.5-arc minute resolution (Sanders & Castiglione, 2014), we chose to use the 30-sec resolution for MaxEnt and BRT because this finer resolution is available, and resolution may impact model accuracy (Chauvier et al., 2022). We did not absolutely mirror RAMP’s use of bioclimatic variables or resolution in our machine learning methods because we did not want to unnecessarily limit the capabilities of the machine learning methods. We reasoned that if machine learning SDMs were to replace RAMP, users would likely utilize the finest resolution possible and all available bioclimatic variables because there is no additional cost to doing so.
Bioclimatic variables provided by worldclim.com that were used to model suitable climates in this study. “X” denotes if the Risk Assessment Mapping Program (RAMP) and/or the machine learning methods incorporated each variable.BRT = boosted regression tree; MaxEnt = maximum entropy.
Bioclimatic variable . | RAMP . | MaxEnt/BRT . |
---|---|---|
Annual mean temperature | X | X |
Mean diurnal range (mean of monthly [max temp—min temp]) | X | |
Isothermality (BIO2/BIO7) (×100) | X | |
Temperature seasonality (SD × 100) | X | |
Max temperature of warmest month | X | X |
Min temperature of coldest month | X | X |
Temperature annual range (BIO5–BIO6) | X | X |
Mean temperature of wettest quarter | X | X |
Mean temperature of driest quarter | X | X |
Mean temperature of warmest quarter | X | X |
Mean temperature of coldest quarter | X | X |
Annual precipitationa | X | X |
Precipitation of wettest month | X | X |
Precipitation of driest month | X | X |
Precipitation seasonality (coefficient of variation)a | X | X |
Precipitation of wettest quarter | X | X |
Precipitation of driest quarter | X | X |
Precipitation of warmest quarter | X | X |
Precipitation of coldest quarter | X | X |
Bioclimatic variable . | RAMP . | MaxEnt/BRT . |
---|---|---|
Annual mean temperature | X | X |
Mean diurnal range (mean of monthly [max temp—min temp]) | X | |
Isothermality (BIO2/BIO7) (×100) | X | |
Temperature seasonality (SD × 100) | X | |
Max temperature of warmest month | X | X |
Min temperature of coldest month | X | X |
Temperature annual range (BIO5–BIO6) | X | X |
Mean temperature of wettest quarter | X | X |
Mean temperature of driest quarter | X | X |
Mean temperature of warmest quarter | X | X |
Mean temperature of coldest quarter | X | X |
Annual precipitationa | X | X |
Precipitation of wettest month | X | X |
Precipitation of driest month | X | X |
Precipitation seasonality (coefficient of variation)a | X | X |
Precipitation of wettest quarter | X | X |
Precipitation of driest quarter | X | X |
Precipitation of warmest quarter | X | X |
Precipitation of coldest quarter | X | X |
aIndicates that the variable was down weighted (see text).
Bioclimatic variables provided by worldclim.com that were used to model suitable climates in this study. “X” denotes if the Risk Assessment Mapping Program (RAMP) and/or the machine learning methods incorporated each variable.BRT = boosted regression tree; MaxEnt = maximum entropy.
Bioclimatic variable . | RAMP . | MaxEnt/BRT . |
---|---|---|
Annual mean temperature | X | X |
Mean diurnal range (mean of monthly [max temp—min temp]) | X | |
Isothermality (BIO2/BIO7) (×100) | X | |
Temperature seasonality (SD × 100) | X | |
Max temperature of warmest month | X | X |
Min temperature of coldest month | X | X |
Temperature annual range (BIO5–BIO6) | X | X |
Mean temperature of wettest quarter | X | X |
Mean temperature of driest quarter | X | X |
Mean temperature of warmest quarter | X | X |
Mean temperature of coldest quarter | X | X |
Annual precipitationa | X | X |
Precipitation of wettest month | X | X |
Precipitation of driest month | X | X |
Precipitation seasonality (coefficient of variation)a | X | X |
Precipitation of wettest quarter | X | X |
Precipitation of driest quarter | X | X |
Precipitation of warmest quarter | X | X |
Precipitation of coldest quarter | X | X |
Bioclimatic variable . | RAMP . | MaxEnt/BRT . |
---|---|---|
Annual mean temperature | X | X |
Mean diurnal range (mean of monthly [max temp—min temp]) | X | |
Isothermality (BIO2/BIO7) (×100) | X | |
Temperature seasonality (SD × 100) | X | |
Max temperature of warmest month | X | X |
Min temperature of coldest month | X | X |
Temperature annual range (BIO5–BIO6) | X | X |
Mean temperature of wettest quarter | X | X |
Mean temperature of driest quarter | X | X |
Mean temperature of warmest quarter | X | X |
Mean temperature of coldest quarter | X | X |
Annual precipitationa | X | X |
Precipitation of wettest month | X | X |
Precipitation of driest month | X | X |
Precipitation seasonality (coefficient of variation)a | X | X |
Precipitation of wettest quarter | X | X |
Precipitation of driest quarter | X | X |
Precipitation of warmest quarter | X | X |
Precipitation of coldest quarter | X | X |
aIndicates that the variable was down weighted (see text).
Species occurrence data
We used identical species occurrence data across all three modeling methods. Data were obtained from the Global Biodiversity Information Facility (GBIF; www.gbif.org/) via R (R Core Team, 2017) using the package “rgbif” version 0.9.8, which uses an application–programming interface to obtain occurrence coordinates (Chamberlain, 2017). We compared each species distribution from the GBIF with distributions from two (one in the case of amphibians, see below) other scientific sources to minimize any errors in the GBIF data (Beck et al., 2014). For plants and crayfishes, we used the Invasive Species Compendium from the Centre for Agriculture and Biosciences International (CABI, 2017) and the Global Invasive Species Database (Invasive Species Specialist Group, 2015) to check distributions. We used FishBase (Froese & Pauly, 2018) and the Centre for Agriculture and Biosciences International to verify fish distributions. For mollusk distributions, we used MolluscaBase (MolluscaBase, 2018) and the Global Invasive Species Database, and for amphibians, we used AmphibiaWeb (AmphibiaWeb, 2018). Any GBIF occurrence points that seemed unlikely based on the distributions found in the other sources were removed.
Occurrence points from the GBIF are automatically collected by RAMP. To ensure that RAMP, MaxEnt, and BRTs used exactly the same occurrence points, we modified the Python code of RAMP to use coordinates from saved Excel files that we created from GBIF data. We used occurrence data from native and nonnative ranges because this approach has been shown to increase model accuracy compared to using only native or nonnative occurrence data (Barbet-Massin et al., 2018; Broennimann & Guisan, 2008; Ihlow et al., 2016; but see Webber et al., 2011).
We used ArcMap 10.4.1 for Desktop (Environmental Systems Research Institute, Redlands, California) to check that all occurrence points overlaid the bioclimatic rasters. Occurrence points that did not fall within the rasters (i.e., points that occurred in the ocean) were removed. The accuracy of predictive models declines with fewer than 100 occurrence points (Wisz et al., 2008). In this study, all species had more than 100 occurrence points, except for Black Carp Mylopharyngodon piceus, which had 41.
Background points
For MaxEnt and BRTs, we built our models using occurrence and background points. Species distribution models developed using presence and absence data have stronger predictive abilities (Brotons et al., 2004; Segurado & Araújo, 2004) but absence data are usually unavailable. We substituted true absence data with background points (Elith et al., 2006), which are randomly selected points from the same region as the occurrence points. We randomly selected background points from both native and nonnative distributions (Broennimann & Guisan, 2008). Supplementary material 1 (see online Supplementary Material) gives a full explanation of how we obtained background points.
Model building
Model evaluation with true skill statistics (TSS; see “Model evaluation” below) required global rather than North American models. Though RAMP presents models for North America, with simple modifications to the Python code and target points within ArcMap, we were able to extend those models to the global scale. Additionally, RAMP uses a grid of North American target points 15 km apart (Sanders & Castiglione, 2014). We used the “Create Fishnet” tool in ArcMap to create target points across the world, ensuring that the points within North America overlapped with RAMP’s original points. Importantly, these global-scale target points contain the same Worldclim data set as RAMP’s North American points. We then used the “Clip” function to remove any target points that did not overlap with the global layout defined in RAMP’s bioclimatic rasters (i.e., we eliminated all target points that did not have bioclimatic information, such as target points located within oceans). During model building, all target points are given a climate matching score between 0 and 10, and RAMP treats scores of 6 and above as suitable climate for the species in question (Sanders et al., 2014; USFWS, 2019, 2024).
We used the same training and testing data for all three SDM techniques. First, we created models using a random sample of 80% of the occurrence and background points before testing the ability of the model to determine whether the remaining 20% of occurrence and background points are within suitable climates. We used the testing background data from the BRT model to assess RAMP’s predictions. We followed standard methods for our use of each tool (see Supplementary material 1 for a full description).
Model evaluation
True skills statistic (TSS) scores quantify the match of the test output to the training output (Lobo et al., 2008) and come from the equation: specificity + sensitivity − 1. Sensitivity is the ability of a model to accurately determine where a species will occur (true positive), and specificity is the ability of a model to accurately determine where a species will not occur (true negative; Allouche et al., 2006). The TSS values range from − 1 to 1, with values close to 1 indicating that the model produces very good determinations of where a species will and will not occur, and values zero or below indicating assessments are at random or worse. For MaxEnt and BRT, we obtained specificity and sensitivity values from the threshold for each model that produced equal sensitivity and specificity. Values above the threshold were taken as a determination that the climate is suitable for the species and values below as a determination that the climate is not suitable. See Supplementary material 1 for explanations for how we obtained specificity and sensitivity scores for RAMP and how we calculated percent climate suitability for each output.
RESULTS
Model specifications and scores
The number of occurrence points obtained from the GBIF for the 30 species (Table 1) ranged from 41 to 2,976 points ( [SD]; see Supplementary material 2). All ranges presented in the following are given as mean ± SD). Table 1 lists the percentage of pixels in the Great Lakes Basin with a score above the model thresholds for MaxEnt and BRTs and the percentage of grid points with a score of 6 or above for RAMP. The TSS scores for each model are given.
Models created from BRTs had a mean percentage of pixels with scores above model threshold (i.e., percent of pixels with suitable climate) of 6.37% ± 15.58 and the highest mean TSS values (. Of the BRT models, the model created for the Australian red claw crayfish Cherax quadricarinatus had the lowest TSS (0.62), and the model for northern snakehead Channa argus had the highest TSS (0.98). We built BRT models using 550–5,300 trees (. Threshold values for which specificity equals sensitivity ranged from 0.22 to 0.83 (, and at these thresholds, specificity and sensitivity values ranged from 0.81 to 0.99 (; Supplementary material 2).
MaxEnt models had a similar mean percentage of pixels with scores above the model threshold of 10.93% ± 23.24 and also had high TSS values (. Similar to the BRT models, the model for Australian red claw crayfish had the lowest TSS (TSS = 0.52) and the model for northern snakehead had the highest value (TSS = 0.97). Threshold values for models created using MaxEnt ranged from 0.15 to 0.53 (, and specificity and sensitivity values at these thresholds ranged from 0.76 to 0.99 ().
Models produced by RAMP had a considerably higher mean percentage of scores at or above the threshold of 6 (57.15% ± 34.60) than BRTs or MaxEnt. TSS values for the RAMP models () were lower than for the other models. Topmouth gudgeon Pseudorasbora parva had the highest TSS from RAMP at 0.24. Specificity values ranged from N/A to 0.26 (; Supplementary material 2), and sensitivity values ranged from 0.90 to 1.0 (; Supplementary material 2).
Analysis of variance showed a significant difference in TSS values among the three tools (). A Tukey’s post-hoc text showed that RAMP TSS scores were significantly lower than both BRTs ( and MaxEnt (. There was not a significant difference in TSS values between BRTs and MaxEnt ().
Model comparisons
Outputs for RAMP, in addition to having lower TSS values, consistently calculate a higher percentage of suitable climate across the Great Lakes Basin than either BRTs or MaxEnt. For example, RAMP finds that 27% of the Great Lakes Basin has suitable climate for the Cuban tree frog Osteopilus septentrionalis (Figure 1A; TSS = 0.00). In contrast, both BRT (TSS = 0.95) and MaxEnt (TSS = 0.90) models find that 0% of the Great Lakes Basin has a suitable climate for this species (Table 1; Figure 1B and 1C). In the example of water hyacinth Eichhornia crassipes (Figure 2A), RAMP’s assessment is also very different compared to MaxEnt and BRTs (Figure 2B and 2C). In this instance, RAMP (TSS = 0.07) finds suitable climate for water hyacinth across 76% of the Basin (Table 1) whereas BRTs (TSS = 0.69) and MaxEnt (TSS = 0.63) identify 0% of the Basin as having suitable climate for this species. See Supplementary material 3 for all maps developed in this study.
![Climate suitability of the Great Lakes Basin for Cuban tree frog Osteopilus septentrionalis as assessed by (A) the Risk Assessment Mapping Program (RAMP; true skill statistics [TSS] = 0.00), (B) boosted regression tree (BRT; TSS = 0.95), and (C) maximum entropy (MaxEnt; TSS = 0.90). The color scale corresponds to RAMP and MaxEnt climate suitability scores from 0–1. The color scale for RAMP is identical except RAMP scores are given from 0–10.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/fisheries/PAP/10.1093_fshmag_vuaf012/1/m_vuaf012f1.jpeg?Expires=1748883401&Signature=G4AYRZL90kJm4lAb4GYV30afYdcHb-JqbnsxzMhTa4~R0LSdwg47BSEXSBnkR7~q~HkxRSSJOCUG1qfXNK-jY7n~lcXK5-sj7vq-zCweU5LmmIHddd1sWpWvXxgA2LfX6AXVgyyDY8hbTDFUk~sVfKn0SSiUHx2gO0ZKyuvj~fG7odWtRaPp-7A4nBoBaWUUfuhmMsS7VlKXE4CCWNWaUVWexK~5QrUpV~PsNcVURbPRlYK6ScSg7ffV9f~JNonsY-~23UZFnWhbr-MWMRoEKzYK29dttWLFT6vVZjmFsB0eltbMgjcOh5HzIvPzZ5fKS7U2YYE2jhWrP~7wc-mUHw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Climate suitability of the Great Lakes Basin for Cuban tree frog Osteopilus septentrionalis as assessed by (A) the Risk Assessment Mapping Program (RAMP; true skill statistics [TSS] = 0.00), (B) boosted regression tree (BRT; TSS = 0.95), and (C) maximum entropy (MaxEnt; TSS = 0.90). The color scale corresponds to RAMP and MaxEnt climate suitability scores from 0–1. The color scale for RAMP is identical except RAMP scores are given from 0–10.
![Climate suitability of the Great Lakes Basin for water hyacinth Eichhornia crassipes as assessed by (A) the Risk Assessment Mapping Program (RAMP; true skill statistics [TSS] = 0.07), (B) boosted regression trees (BRT; TSS = 0.69), and (C) maximum entropy (MaxEnt; TSS = 0.63). The color scale corresponds to RAMP and MaxEnt climate suitability scores from 0–1. The color scale for RAMP is identical except RAMP scores are given from 0–10.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/fisheries/PAP/10.1093_fshmag_vuaf012/1/m_vuaf012f2.jpeg?Expires=1748883401&Signature=ZkALktGr5RSPwrM~7RxNREyDF23xvPTje490Hvg9LlzAX9XNPBvV68ULhFIfJDTpUXPUmV4QCIg9A9ZL4I~aczofgPVfV---l8W9CQdNlc2hQ9DaBP7WEer3TVXniX0qVXZwaaAnfBxGpAH1RstniUGAexNeR9l8gHvds6byiAxLSFNx47511GA4gxrsTIKRK-lCatuc14bUCHT-Vw1biPODO-kcABoFv-e93Q8CVghioe6hIqaFJxHXk8I-HbEAfg6wB6wIfYPKF8Lk4Yccpn2QJ8McX-~PEXDUdii9qVOJCaf7UsUlkhPM9y77u-3swmifA4LR7qrSADkUSG32ZQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Climate suitability of the Great Lakes Basin for water hyacinth Eichhornia crassipes as assessed by (A) the Risk Assessment Mapping Program (RAMP; true skill statistics [TSS] = 0.07), (B) boosted regression trees (BRT; TSS = 0.69), and (C) maximum entropy (MaxEnt; TSS = 0.63). The color scale corresponds to RAMP and MaxEnt climate suitability scores from 0–1. The color scale for RAMP is identical except RAMP scores are given from 0–10.
DISCUSSION
We compared the performance of RAMP to two machine learning methods, MaxEnt and BRTs, at assessing the climate suitability of the Great Lakes Basin for 30 established or potentially invasive aquatic species. The differences among these tools in terms of data required, algorithms, climate suitability thresholds, and outputs make it impossible to conduct exact comparisons. Instead, we took RAMP and implemented it as specified by the USFWS for their Ecological Risk Screening Summaries. We implemented MaxEnt and BRTs in the way that we consider it is likely that these tools would be implemented if they were used as alternatives to RAMP. Specifically, where possible, we aimed for the effort required to use these tools to be similar to that required for RAMP. For this reason, we did not use advanced features of machine learning tools, we used identical species presence data to that used by RAMP, and we used additional resolution and climatic data only when this was possible without extra effort on the part of the user.
A main result is that RAMP overwhelmingly determines that far larger areas of the Great Lakes Basin are viable habitat for species than either MaxEnt or BRTs. This corresponds to RAMP receiving TSS scores that are much lower than for the other tools. Outputs for red swamp crayfish Procambarus clarkii highlight the different results generated by the machine learning methods vs. RAMP and the benefits and drawbacks of the different approaches. The BRT and MaxEnt models found no suitable climate in the Great Lakes Basin for red swamp crayfish, whereas RAMP estimated 94% of the area is suitable. Red swamp crayfish are currently established in the Great Lakes Basin, including in Lakes Michigan and Erie, the Chicago River, Sandusky Bay, and in several inland ponds in southern Michigan, and they were previously established in ponds in southeastern Wisconsin (K. Smith et al., 2018; Nagy et al., 2021). Egly et al. (2019) predicted that suitable habitat for red swamp crayfish in the Great Lakes occurs in areas of Lakes Michigan, Erie, Huron, and Ontario. The machine learning methods used in this study, despite having high TSS values, underestimate the areas of the Great Lakes Basin that have suitable climate for this species. Indeed, the machine learning methods failed to identify any portion of the Great Lakes Basin as climatically appropriate even though the species is known to be established in the basin. This pattern is common across several of the species examined in this study.
One of the notable differences between RAMP’s methods and those of MaxEnt and BRTs was how the methods handled establishing climate suitability thresholds. Regardless of species or model development, RAMP utilizes a single climate suitability threshold value of 6 for all models (Sanders et al., 2014; USFWS, 2019, 2024), and this may limit RAMP’s ability to identify suitable habitat more accurately. Hill et al. (2024) examined output from Climatch and found that the preset threshold score of 6 did not reflect real invasion success as the models produced high climate suitability outputs for nonnative freshwater fish species that failed to establish in Florida. The more recent version of RAMP attempts to address this by considering a percent climate suitability of 0.2% or higher to merit genuine concern for invasion (USFWS, 2024). In other words, if less than 0.2% of target points have a climate suitability score above 6, then the likelihood of invasion is low. Unlike RAMP, the machine learning methods model thresholds are based on the models being built and, thus, are not limited to a single value. For example, the BRT and MaxEnt model thresholds for grass carp Ctenopharyngodon idella were 0.58 and 0.50, respectively. Using these thresholds, percent climate suitability across the Great Lakes Basin for this species is 31% (BRT) and 53% (MaxEnt). However, RAMP estimates that 98% of the Great Lakes Basin has suitable climate when using a threshold score of 6. Using a threshold score of 8, for example, would provide a percent suitability of 62%, putting all three outputs more in agreement with each other (but see the rest of the Discussion). The TSS values for RAMP predictions may improve if habitat suitability thresholds for each species were defined based on the model built and not a predefined value of 6.
Machine learning SDMs are known to underpredict species distributions despite high model accuracy assessments (Broennimann et al., 2007; Liu et al., 2020; Medley, 2010; Rodríguez-Rey et al., 2019), and models built using climate data alone may overinflate the importance of climate on distributions (Early & Sax, 2014). This comes partly from the power of the algorithms to identify relatively small climate envelopes in the training data that do not encompass a species’ fundamental climatic niche (Wiens et al., 2009). It can also be driven by spatially autocorrelated training data (Bahn & McGill, 2013; Roberts et al., 2017; Santini et al., 2021). Specifically, in place of true absence data it is common to use—as we did—background points from regions nearby to species presence points. This gives relatively minor differences between the climates in the presence and absence data sets, and results in estimated climatic ranges that are very narrow (Thuiller et al., 2004; VanDerWal et al., 2009). Further, Webber et al. (2011) found that correlative niche models, such as MaxEnt and BRTs, were less conservative relative to mechanistic models, even when testing data are truly independent of training data. In our study, RAMP overestimated the potential distribution of most species. Despite their low TSS scores, RAMP models may give a more accurate assessment than the machine learning tools of whether species can become established in at least some parts of the Great Lakes region.
In utilizing random subsets of the occurrence data for testing and training, we generated machine learning models that are likely to underestimate climate suitability due to spatial autocorrelation between the testing and training data (Bahn & McGill, 2013; Roberts et al., 2017). However, to build truly predictive machine learning models and to test RAMP’s ability to make accurate predictions, we recommend further work to test models against independent occurrence data that are not spatially autocorrelated (Bahn & McGill, 2013; Roberts et al., 2017; Santini et al., 2021; Webber et al., 2011; Wenger & Olden, 2012). One way to do this would be to develop models using presence/absence data from one region (e.g., a continent) and testing the model in another. Such an approach would be time consuming and only possible for a limited range of species, but would give a more rigorous description of the performance of these three tools. We note that although we modified RAMP to give global outputs it was developed only for North America.
Generally, RAMP made assessments of suitable habitat that were very large and that had very low TSS scores. Also, RAMP uses the same Euclidean distance algorithm as Climatch (Sanders et al., 2014; USFWS, 2019), and this latter tool has been used frequently in predicting species distributions based on climate similarities between native and nonnative regions (e.g., Bomford, Darbyshire, et al., 2009; Bomford, Kraus, et al., 2009). Wearne et al. (2013) generated models from Climatch, BRTs, and MaxEnt using a variety of environmental variables to identify suitable habitat across Australia for the invasive semi-aquatic plant species West Indian marsh grass Hymenachne amplexicaulis. This study found that Climatch models produced higher sensitivity than specificity scores, a similar pattern to what we found across the 30 species modeled in this study. Even so, the mean Climatch TSS score in Wearne et al. (2013) was ; much higher than RAMP’s mean TSS score in this study. Hill et al. (2024) concluded that Climatch may overestimate climate similarities when occurrence data in the region of interest are used. It is possible that the algorithm used for Climatch and RAMP performs better for terrestrial than aquatic species, but we note that BRT and MaxEnt do not seem to suffer similarly.
Management implications
Accurate species distribution models can potentially provide useful information to agencies because they indicate how much effort to expend for controlling a given nonnative species. This requires that predictions are produced with a strong understanding of model assumptions, parameters, and model and data limitations that are ecologically meaningful, and without cognitive bias (Hui, 2023). This is especially true for large regions that encompass multiple jurisdictions that have a range of policy approaches (Lodge et al., 2016). Ideally, all jurisdictions with an interest in preventing and managing invasive species in the Great Lakes Basin would use the same modeling tool or combination of tools (e.g., predictive models and risk assessments) to make predictions about invasive species ranges in the Great Lakes Basin. This would support a consistent, standardized decision-making process and provide a uniform understanding of risks from species invasions. Different jurisdictions across the Great Lakes region currently use different risk assessment tools (Gantz et al., 2015), resulting in a range of predictions and different levels of concern about the same species (Nathan et al., 2014; Peters & Lodge, 2009). Consistent use of SDM methods and risk assessments across all jurisdictions would reduce the risk of this happening. Importantly, a common approach would also be efficient for all jurisdictions because assessments would only need to be conducted once (Gantz et al., 2015; Peters & Lodge, 2009).
The outputs given by RAMP and used in the USFWS Ecological Risk Screen Summaries are simpler to obtain, but greatly overestimate the region in which a species will find suitable climatic conditions. Although RAMP received lower TSS scores, in many cases, it would likely give better guidance for managers. A species that is not established and that is assessed to have zero appropriate habitat will likely receive no management. A model that overestimates the extent of appropriate habitat may be wrong on extent but is likely to prompt the type of management that could prevent invasion. Given that the machine learning models require more expertise and time to generate and can be prone to overfitting and underpredicting future distributions, a tool such as RAMP may be more appropriate for application to rapid risk assessment.
Widespread adoption of RAMP would be appropriate if users explicitly acknowledge its overestimation and all parties who used the tool found this acceptable. For example, if the goal is to be highly risk averse when it comes to new nonnative species or the spread of existing invaders, then overestimation of the potential range is warranted and RAMP may be appropriate. Additionally, RAMP could provide rapid guidance as new threats emerge, and species considered high risk could be prioritized for more intensive study. The USFWS currently uses RAMP as part of its Ecological Risk Screening Summaries (USFWS, 2023), which are intended to be rapid species risk assessments, although it is not clear how they are being used in management and policy.
We recommend that RAMP be further assessed so that its outputs can be better interpreted for decision making. This should be done for a suite of species using truly independent presence data. For example, models for species that are established on multiple continents could have RAMP models developed based on species presences on one (or more) continent(s) and then applied to the other continent(s) for assessment of model performance (Elith & Burgman, 2002; Lee-Yaw et al., 2022). This would be a strong test could be used to better understand and interpret RAMP outputs.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Fisheries online.
DATA AVAILABILITY
The data underlying this article will be shared on reasonable request to the corresponding author.
ETHICS STATEMENT
Not applicable. No animal subjects were used in this research.
FUNDING
This work was funded by grants from Illinois Department of Natural Resources (CAFWS- 118) and U.S. Fish and Wildlife Service (F16AP00241).
ACKNOWLEDGMENTS
The authors would like to thank Kevin Irons, Mike Hoff, and Erin O’Shaughnessey for many helpful conversations as we developed the ideas for this research. Comments from two anonymous reviewers helped us to improve the manuscript.
Victoria Prescott, Jack Marte, and Reuben Keller contributed to the study design. Victoria Prescott collected the data. Victoria Prescott and Jack Marte analyzed the data. Victoria Prescott and Jack Marte wrote the manuscript. Victoria Prescott, Jack Marte, and Reuben Keller edited the manuscript.
REFERENCES
Author notes
CONFLICTS OF INTEREST: The authors declare they have no conflicts of interest.