-
PDF
- Split View
-
Views
-
Cite
Cite
Chad M Topaz, Berenize Garcia Nueva, Patrick Izidro de Souza, Jesse Schumann, Leah Shvedova, Xizhen Cai, Shaoyang Ning, Race and Superfund site remediation, PNAS Nexus, Volume 3, Issue 9, September 2024, pgae364, https://doi.org/10.1093/pnasnexus/pgae364
- Share Icon Share
Abstract
Superfund is a federal program established in 1980 to manage the cleanup of hazardous waste sites across the United States. Given the health and economic costs borne by people living near these sites, any demographic disparities within the Superfund program are issues of environmental justice. We investigate whether racial demographics local to a Superfund site are associated with its cleanup status, and if so, how. Our work addresses gaps in the literature by using detailed geospatial processing, comprehensive data, and a more complete set of racial/ethnic categorizations. We study 1,688 Superfund sites across the country. Under a wide variety of modeling scenarios, we consistently find that the proportion of the nearby population that is Asian is negatively associated with the probability of a Superfund site being cleaned up. This association has remained unidentified until now, possibly because earlier research on Superfund sites did not distinguish Asian populations as a separate group. Our result underscores the need for specific measurement and inclusion of diverse populations in environmental studies.
Motivated by issues of environmental fairness in the United States, we examine the relationship between the cleanup of certain hazardous waste sites, known as Superfund sites, and the racial demographics of nearby communities. We discover a correlation: areas with higher Asian populations tend to be associated with a lower probability of site cleanup. Our study highlights a previously underexplored pattern by specifically including Asian populations in the analysis. This result highlights the need for inclusive approaches to studying environmental issues.
Introduction
Superfund is a federal program established in 1980 to manage the cleanup of hazardous waste sites across the United States. Officially known as the Comprehensive Environmental Response, Compensation, and Liability Act, this program began as a response to growing public concern over environmental disasters such as the Love Canal landfill in New York and the Valley of the Drums waste dump in Kentucky. Superfund aimed to create a comprehensive framework for remediating sites contaminated with hazardous substances, providing a national strategy for emergency response, information gathering and analysis, accountability, and site cleanup. Although the politics and funding mechanisms surrounding Superfund have evolved over the past four decades, the program remains a cornerstone of the Environmental Protection Agency (EPA) (1).
Individuals residing near Superfund sites are subject to substantial health and economic risks. Living within 2 km of a Superfund site before its cleanup is associated with a 20 to 25% increase in the risk of congenital anomalies in infants (2). More starkly, living in an area with a Superfund site has been associated with a loss of life expectancy as large as 1.22 years (3). The escalating impact of climate change, marked by more frequent and severe natural disasters, poses a growing threat. This trend has the potential to destabilize toxic waste sites, thereby heightening the risk of exposure for surrounding communities (4). As for economics, heavily polluted sites are associated with lower property values (5), cascading health care costs, and lost workforce productivity (6).
The diminished health and economic outcomes linked to Superfund sites are a significant concern. When these outcomes disproportionately affect certain demographic groups, they raise important issues of environmental justice. Environmental justice, as defined by the EPA, means “the just treatment and meaningful involvement of all people, regardless of income, race, color, national origin, Tribal affiliation, or disability, in agency decision-making and other Federal activities that affect human health and the environment,” (7). In assessing Superfund sites from a perspective of justice, several basic questions arise: Are there inequities in the distribution of Superfund sites among different populations in the United States? Are there inequities in the benefits these communities receive from Superfund designation and cleanup efforts? And finally, are there inequities in the prioritization of Superfund sites for remediation?
In exploring the impact of the Superfund program, research yields differing insights on the first two questions. An early study suggested that wealthier, more educated counties benefited more from the program (8). This work was followed by research indicating a lower likelihood of Superfund designation in areas with higher Black populations (9). A detailed analysis of Florida census tracts revealed a correlation between high Black and Hispanic populations and proximity to Superfund sites (10). However, a later study in Portland, Oregon and Detroit, Michigan found that while heavily Black neighborhoods often contained Superfund sites, economic deprivation was a more significant factor than race in site location (11). See (12) for a recent and more extensive review of some of the conflicting literature.
The third question—regarding remediation prioritization—has sparse supporting literature. Early in the Superfund program’s history, sites in Black, urban, and less-educated neighborhoods may have faced discrimination, a trend that appears to have lessened over time (13). A study focusing on the southern region of the EPA found that Superfund sites with more low-income residents tended to involve the community more in remediation, whereas those with higher percentages of ethnic and racial minorities were less likely to do so (14). Significantly, a review paper highlights the ongoing limitations in research concerning the role of race in environmental cleanup (15).
Contributions of this study
From the reviewed literature on Superfund sites, several observations are noteworthy. First, the bulk of research concentrates on demographics near Superfund sites rather than the sites’ outcomes. Second, while many studies use convenient geographic units like counties or census tracts, this approach has limitations, as pollutant behavior and population dynamics may transcend these administrative boundaries. Third, much of the literature is somewhat dated, predominantly published pre-2000. Finally, the examination of racial demographics is limited: one study groups all minorities as nonwhite (8), while others focus on specific racial and ethnic groups: Black only (11), Black and Hispanic only (10), and Black, Hispanic, and Native American only (9).
Our present work addresses the following research question: are racial demographics local to a Superfund site associated with its cleanup status, and if so, how? In answering this question, we fill the aforementioned gaps in the literature. Specifically:
We offer a thorough and up-to-date examination of Superfund site cleanups, encompassing the entire span of the program across the United States since its inception.
We treat local populations in a detailed manner, tabulating them through geospatial processing, which moves beyond the limitations of county and census tract level studies.
We provide a comprehensive exploration of racial and ethnic demographics, utilizing all primary categories recognized in the US Census.
Under a wide variety of modeling scenarios, we consistently find that the proportion of the nearby population that is Asian is negatively associated with the probability of a Superfund site being cleaned up. This association has not been identified until now. Essentially, it is impossible to detect patterns concerning Asian populations without specifically measuring and including them in analysis.
Data sources
Our study relies on two data sources. The first is the EPA, which provides information about Superfund sites, including their geographic locations. Our second source of data comprises demographic information drawn from the United States Census and the American Community Survey (ACS). To gain a nuanced understanding of the demographic landscape surrounding these Superfund sites, we perform geospatial processing. By doing so, we obtain estimates for demographic characteristics in the proximity of Superfund sites that are more refined than those available from raw census data alone. We perform all data acquisition and processing with R, primarily using the tidyverse, tidycensus, tigris, and sf packages. Our code and data are available in our online Open Science Framework repository (16).
National priorities list
The National Priorities List (NPL) identifies sites eligible for cleanup under the Superfund program, with sites added through a detailed EPA review process. A site’s deletion from the NPL indicates successful cleanup and confirmation by the EPA that no further action is required to protect health and the environment. In our study, deletion of a site means it has been thoroughly remediated.
For our research, we use the EPA’s publicly available data files: one listing active Superfund sites on the NPL and another for remediated sites on the Deleted NPL (17, 18). We acquired the data files during November, 2022, at which time there were 1,334 records on the NPL and 452 records on the Deleted NPL, for a total of 1,786 records. To maintain geographic focus in our study, we remove a small number of records from US territories, retaining only those from the 50 states and the District of Columbia, which reduces the dataset to 1,753 records. We also remove one duplicate record, leaving 1,752 records.
The dataset contains 25 variables, providing basic information about each Superfund site, such as its name and location. Most of these variables, primarily administrative descriptors from the EPA’s raw data, play no role in our analysis and we do not retain them. However, four critical variables we do employ are latitude, longitude, years listed, and the Hazard Ranking System (HRS) score.
Latitude and longitude are intermediate variables in our geospatial processing of census data (details provided later) but are excluded from our final dataset. Additionally, the dataset incorporates the date each site was added to the NPL. We convert these dates into the number of years since listing (2022 minus the listing year, as our calculations were done in 2022) for both active and deleted records. One record lacks a listing date. This site is marked as deleted on the same date as the initial batch of 363 sites added to the NPL on September 8, 1983. We deduce that the site with the missing date was likely remediated between its proposed and final listing on the NPL. Therefore, we impute its listing date as September 8, 1983, compute its years since listing accordingly, and maintain its deleted status in our analysis.
Another key variable that we retain is HRS score. Assigned by the EPA, the HRS score is a single numerical score created from a complex evaluation process that considers factors such as the toxicity and quantity of waste, the likelihood of a site releasing these hazardous substances, and the potential impact of such releases on human populations and sensitive environments. The HRS score specifically assesses four potential contamination pathways: groundwater migration; surface water migration; soil exposure and subsurface intrusion; and air migration. These pathways can impact drinking water, the human food chain, sensitive environments, and population centers. All these elements together inform the HRS score calculation. It is vital to note, however, that the EPA explicitly states that “HRS scores do not determine the priority in funding EPA remedial response actions, because the information collected to develop HRS scores is not sufficient to determine either the extent of contamination or the appropriate response for a particular site” (19). Despite this policy, our models aim to ascertain if there is an association between the HRS score and site deletion, and if so, the extent of this association.
Unfortunately, some records in our dataset are missing HRS scores, represented as zero in the data. This issue affects 24 out of the 441 deleted sites and 33 out of the 1,311 undeleted sites. We must exclude these 57 records from our analysis, leaving us with a total of 1,695 records (94.9% of the original data).
Census data
We augment the data above with detailed demographic information pertaining to the populations residing within various distances of the Superfund sites. The inclusion of these data is what will allow us to explore the potential correlations between these demographic characteristics and the deletion status of the sites. As detailed in Methodology, our analysis encompasses data snapshots from two distinct time points: the years 2000 and 2021.
We source data from the Decennial Census Summary Files 1 and 3 for the year 2000 (20, 21), and from the ACS 5-Year Estimates for 2021 (22). These data include information on race, income, and citizenship status. While race is our primary explanatory variable, we include income and citizenship status as control variables in some models to account for potential influences on environmental policy decisions and broader socioeconomic dynamics related to environmental justice. For example, income levels may be linked to a community’s ability to advocate for cleanup or its political influence, and citizenship status may reflect access to government resources or participation in public decision-making.
Regarding racial demographics, we employ the variable HISPANIC OR LATINO BY RACE from the 2000 Census Summary File 1 and HISPANIC OR LATINO ORIGIN BY RACE from the 2021 ACS. These variables provide population counts across different racial and ethnic groups within each census tract. Our constructed race variable is a composite of race and ethnicity, categorizing populations into eight distinct groups: Asian, Black, Indigenous (American Indian and Alaska Native in government data), Pacific Islander, White, Another Racial Identification (ARI, called Some Other Race in government data), and Multiracial (Two or More Races in government data)—all for non-Hispanic/Latinx ethnicity—as well as Hispanic/Latinx. While the Census and ACS datasets allow for ARI and multiracial identification, the publicly available summary data do not provide information on the specific identities contained within these categories. This limits our analysis, potentially obscuring important variations within these diverse groups and their potential associations with site cleanup.
We obtain income data from the variables HOUSEHOLD INCOME IN 1999 in the 2000 Census and HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) in the 2021 ACS, providing insight into the income distribution within each census tract. Citizenship information comes from the variables PLACE OF BIRTH BY CITIZENSHIP STATUS in the 2000 Census and PLACE OF BIRTH BY NATIVITY AND CITIZENSHIP STATUS in the 2021 ACS, offering proportions of native-born US citizens, naturalized citizens, and noncitizens within each census tract. As we describe later, we end up using income and citizenship information only from 2021.
Our decision to utilize 2021 ACS data rather than the temporally proximate 2020 Census is driven by practical considerations, namely, the absence of an application programming interface (API) for the latter. Despite the earlier 2000 Census having an API, the government has not provided one for the 2020 Census, nor stated reason for the API’s omission. This API’s absence hindered our geospatial processing pipeline (described below). The 2021 ACS, on the other hand, does offer an API. While potential minor discrepancies in tract definitions or variable collection methods can exist between surveys and years, a core objective of both the ACS and Census is to maintain comparability across datasets. This emphasis on consistency allows for meaningful comparisons. Additionally, our analysis does not attempt longitudinal comparisons. Instead, we evaluate the robustness of significant effects and their directions across modeling scenarios, including dataset selection. As we will demonstrate in Results, consistent findings across the 2000 Census and 2021 ACS data underscore the robustness and reliability of our conclusions.
Geospatial processing
While the demographic data described above forms a foundational layer of our analysis, it is structured in terms of census tracts, lacking specific reference to the geographical positioning of Superfund sites. To address this challenge and align our demographic data within the spatial context of these sites, we perform geospatial processing. Our methodology involves constructing a buffer with a specified radius R centered on each Superfund site to accurately capture the census demographics in that area.
The geographies of census tracts are complex and vary considerably. For any given Superfund site, a buffer may be entirely within a single census tract, or it may encompass multiple tracts, sometimes only partially. When a census tract is fully enclosed by the buffer, we include all of its demographic data. For tracts that intersect partially with the buffer, we employ a proportional approach. More specifically, we calculate a fraction of the population based on the ratio of the tract’s land area within the buffer to its total land area, excluding water features. We use the derived ratio as a multiplier to determine the portion of the tract’s population to be included in our analysis.
Having established our method for integrating demographic data with the specific locations of Superfund sites, we then apply this approach to every site in our dataset for the years 2000 and 2021. As existing literature does not have consensus on a single distance that would represent “proximity” to pollution, we examine a range, namely, buffers of 1, 2, 3, and 4 miles from each Superfund site. We will see that our results are relatively insensitive to the choice of buffer across this range.
Final dataset
The census data provide raw population counts for race and citizenship and raw household counts for income. We convert these counts to population proportions and household proportions for analysis. Regarding race, the data are divided into eight groups which collectively account for 100% of the population. To avoid redundancy, we omit the white category. For similar reasons, we omit the highest income bracket (income >$200,000) and we exclude the native-born citizen category from our citizenship analysis. In making these exclusions, no information is lost.
Our dataset encompasses 1,695 Superfund sites. To account for variations over time and area, we analyze each site across two different demographic data years (2000 and 2021) and four distinct geographic buffers (1, 2, 3, and 4 miles). For four sites, the race, income, and citizenship percentages are not mathematically defined for all combinations of year and radius because there is no measured population. We remove these sites, yielding 1,688 sites remaining (99.6% of previous data). We arrange our data so each site appears eight times, once for each combination of demographic data year and geographic scale. This rearrangement of data produces 13,504 rows. There are 32 variables:
Site EPA ID, a unique identifier for each site,
Demographic Year, the year of the census/ACS data used for each record, which is either 2000 or 2021, used solely for data filtering,
R, the geographic buffer around a Superfund site, with possible values being 1, 2, 3, or 4 miles, also used solely for filtering,
HRS Score, the Hazard Ranking System score indicating the EPA’s assessment of the site’s environmental risk level,
Region ID, the Superfund site’s region according to the EPA’s geographic coding scheme, with 10 distinct values,
State, the state where the Superfund site is located (or District of Columbia),
Race variables, of which there are seven, detailing demographic proportions for Asian, Black, Hispanic/Latinx, Indigenous, Pacific Islander, Another Racial Identification (ARI), and Multiracial groups,
Income variables, of which there are 15, describing the proportion of households in different income brackets, ranging from under $10,000 to $150,000–$200,000,
Citizenship variables, namely, the proportions of naturalized citizens and noncitizens,
Years Listed, the number of years since the Superfund site was added to the NPL, and
Deleted, a dichotomous outcome variable indicating whether the site has achieved deletion status.
Data exploration
Our data set contains many descriptive variables for each combination of demographic year and geographic buffer. These data offer extensive control variables for our eventual detailed analyses. However, we begin with a focused exploration centered on the NPL variables—HRS score and years listed—alongside race proportions, which are central to our research question.
Figure 1 visualizes our exploration. It shows the relationship between Superfund site deletion status and local racial proportions, using 2021 ACS data with a geographic buffer of 4 miles. We segment data into high/low categories (top/bottom 50%) for HRS score and years listed. Consistent with prior findings (see Introduction), Black and Latinx populations have high mean proportions near Superfund sites. Additionally, we observe something new: the Asian population proportion is consistently lower for deleted sites than for not deleted sites across all NPL variable combinations. This pattern is unique to the Asian population. To investigate further, we turn to statistical modeling.

Data exploration: association between Superfund site deletion status and local racial demographics, stratified by NPL variables. Green and red bars represent the mean racial proportions for deleted and nondeleted Superfund sites, segmented into high/low categories (top/bottom 50%, respectively) for Hazard Ranking System (HRS) score and years listed on the NPL. Demographic data are from the 2021 ACS ( mile geographic buffer). The mean proportion of individuals identified as Asian is consistently lower for deleted sites across all NPL variable combinations, a pattern unique to this racial group. Blue bars are constant across the four panels and show the overall proportion of each group within the US population.
Methodology
We employ logistic regression to model the probability of Superfund site deletion, utilizing 18 different modeling specifications. All specifications focus on racial demographics around the site while controlling for the site’s HRS score and duration on the NPL. We incorporate demographic data from 2000 and 2021 to examine the influence of both historical and contemporary demographics on our conclusions. One model weights demographic proportions by distance to the Superfind site. Additionally, our specifications explore factors like geography (using state or regional fixed/random effects), income, and citizenship. We also consider alternative models, such as a linear probability model (LPM) and one accounting for partial site cleanup.
It is important to acknowledge the limitations of our modeling framework. Principally, it does not directly address the censored nature of our data, an aspect which might be more effectively managed using survival analysis techniques. However, since our study does not focus on time-to-event analysis, we opt against this approach. Additionally, our models assume a constant effect of predictors throughout the entire period under study. While we attempt to mitigate this assumption by analyzing data from different time points, our strategy may not fully capture temporal dynamics. Finally, the logistic regression framework assumes independence of observations. This assumption might be unrealistic if decisions made about one Superfund site influence the fate of another site, potentially due to shared policies and/or limited resources.
We considered more complex modeling frameworks that could directly address censored data, time-varying explanatory variables, and interdependencies among sites. One such model integrates Joint Modeling of Longitudinal and Time-to-Event Data with a competing risks framework. The joint modeling approach combines the analysis of longitudinal data, like demographic changes over time, with survival analysis, addressing the censored nature of site deletions. Adding a competing risks component would allow us to explore how the deletion of one Superfund site might affect the chances of deletion for another. However, this model would bring its own set of challenges. It would be computationally intensive and rely on critical assumptions about the relationship between the longitudinal and event processes, as well as distributional assumptions. Deviations from these assumptions could lead to biased outcomes.
An alternative to logistic regression is the LPM, which expresses the probability of an event as a linear function of predictors. Despite this model’s easy of interpretation, it has some shorcomings. It predicts some probabilities outside the valid zero to one range, assumes constant effects across predictor ranges—missing potential nonlinearities—and is susceptible to heteroscedasticity, which can bias estimates. In contrast, logistic regression bounds predictions, captures nonlinear relationships, and is more robust to heteroscedasticity. We include an LPM model with robust standard errors in our robustness checks, keeping in mind the aforementioned limitations.
In summary, to achieve a balance between simplicity, computational feasibility, and interpretability, while maintaining statistical advantages, we use logistic regression. We enhance its effectiveness by examining demographic data from two time points. While acknowledging its limitations, this methodology provides a focused and pragmatic approach for addressing the research questions in our study.
Results
In advance of fitting models, we set two criteria for model acceptance. First, to ensure we measure meaningful associations, our explanatory variables must not exhibit high correlation. We require variance inflation factors to satisfy , in line with the standard criterion (23). Second, although our models are for inference rather than classification, it is still essential that they possess adequate discriminatory power. Therefore, we require an in-sample area-under-curve , a threshold commonly acknowledged as the minimum for acceptable discrimination (24).
We first fit eight models, each incorporating racial demographics, HRS score, and years listed on the NPL as explanatory variables. Table 1 summarizes these models, defined by each combination of demographic data year (2000, 2021) and geographic buffer ( miles). Models V through VIII satisfy our criteria for both and . Models I through IV narrowly miss acceptance based on the criterion, albeit only in the third decimal place. The Pacific Islander population is significant () in Model VIII only, suggesting a lack of robustness to the result. In contrast, the Asian population, HRS score, and years listed on the NPL consistently emerge as statistically significant (). Since Model VIII has the lowest (most preferred) values of Aikake Information Criterion (AIC) and Bayes Information Criterion (BIC), we adopt it as our baseline model.
. | Census year 2000 . | |||
---|---|---|---|---|
. | Model I: 1 . | Model II: 2 . | Model III: 3 . | Model IV: 4 . |
Race | ||||
Asian | 3.767 (1.553)c | 4.118 (1.610)c | 4.586 (1.699)b | 5.159 (1.785)b |
Black | 0.637 (0.359) | 0.689 (0.391) | 0.554 (0.419) | 0.394 (0.443) |
Indigenous | 0.298 (1.105) | 0.605 (1.195) | 0.698 (1.221) | 0.770 (1.252) |
Latinx | 0.274 (0.399) | 0.272 (0.411) | 0.216 (0.422) | 0.177 (0.436) |
Pacific | 3.694 (16.825) | 14.267 (20.109) | 16.831 (23.637) | 15.592 (26.789) |
ARI | 78.015 (48.012) | 49.167 (42.262) | 33.950 (37.050) | 31.668 (35.233) |
Multi | 5.545 (6.456) | 2.973 (7.275) | 2.554 (7.820) | 4.682 (8.400) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a |
Years listed | 0.093 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.091 (0.010)a |
. | Census year 2000 . | |||
---|---|---|---|---|
. | Model I: 1 . | Model II: 2 . | Model III: 3 . | Model IV: 4 . |
Race | ||||
Asian | 3.767 (1.553)c | 4.118 (1.610)c | 4.586 (1.699)b | 5.159 (1.785)b |
Black | 0.637 (0.359) | 0.689 (0.391) | 0.554 (0.419) | 0.394 (0.443) |
Indigenous | 0.298 (1.105) | 0.605 (1.195) | 0.698 (1.221) | 0.770 (1.252) |
Latinx | 0.274 (0.399) | 0.272 (0.411) | 0.216 (0.422) | 0.177 (0.436) |
Pacific | 3.694 (16.825) | 14.267 (20.109) | 16.831 (23.637) | 15.592 (26.789) |
ARI | 78.015 (48.012) | 49.167 (42.262) | 33.950 (37.050) | 31.668 (35.233) |
Multi | 5.545 (6.456) | 2.973 (7.275) | 2.554 (7.820) | 4.682 (8.400) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a |
Years listed | 0.093 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.091 (0.010)a |
Max VIF | 2.223 | 2.338 | 2.480 | 2.819 |
AUC | 0.698 | 0.699 | 0.698 | 0.697 |
AIC | 1,714.117 | 1,714.421 | 1,716.045 | 1,716.225 |
BIC | 1,768.430 | 1,768.734 | 1,770.358 | 1,770.538 |
Log likelihood | 847.058 | 847.210 | 848.022 | 848.112 |
Deviance | 1,694.117 | 1,694.421 | 1,696.045 | 1,696.225 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 2.223 | 2.338 | 2.480 | 2.819 |
AUC | 0.698 | 0.699 | 0.698 | 0.697 |
AIC | 1,714.117 | 1,714.421 | 1,716.045 | 1,716.225 |
BIC | 1,768.430 | 1,768.734 | 1,770.358 | 1,770.538 |
Log likelihood | 847.058 | 847.210 | 848.022 | 848.112 |
Deviance | 1,694.117 | 1,694.421 | 1,696.045 | 1,696.225 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
. | ACS Year 2021 . | |||
---|---|---|---|---|
. | Model V: 1 . | Model VI: 2 . | Model VII: 3 . | Model VIII: 4 . |
Race | ||||
Asian | 2.168 (0.840)b | 2.476 (0.915)b | 2.812 (0.965)b | 3.211 (1.021)b |
Black | 0.606 (0.369) | 0.718 (0.396) | 0.630 (0.421) | 0.538 (0.441) |
Indigenous | 0.561 (1.228) | 1.026 (1.371) | 1.126 (1.401) | 1.135 (1.417) |
Latinx | 0.218 (0.340) | 0.283 (0.354) | 0.204 (0.369) | 0.172 (0.380) |
Pacific | 10.575 (6.695) | 12.147 (8.019) | 19.387 (10.457) | 25.326 (12.674)c |
ARI | 10.187 (10.752) | 14.660 (13.250) | 13.485 (15.379) | 13.620 (16.859) |
Multi | 0.741 (2.635) | 0.433 (3.212) | 0.891 (3.710) | 1.585 (4.038) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.032 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a |
. | ACS Year 2021 . | |||
---|---|---|---|---|
. | Model V: 1 . | Model VI: 2 . | Model VII: 3 . | Model VIII: 4 . |
Race | ||||
Asian | 2.168 (0.840)b | 2.476 (0.915)b | 2.812 (0.965)b | 3.211 (1.021)b |
Black | 0.606 (0.369) | 0.718 (0.396) | 0.630 (0.421) | 0.538 (0.441) |
Indigenous | 0.561 (1.228) | 1.026 (1.371) | 1.126 (1.401) | 1.135 (1.417) |
Latinx | 0.218 (0.340) | 0.283 (0.354) | 0.204 (0.369) | 0.172 (0.380) |
Pacific | 10.575 (6.695) | 12.147 (8.019) | 19.387 (10.457) | 25.326 (12.674)c |
ARI | 10.187 (10.752) | 14.660 (13.250) | 13.485 (15.379) | 13.620 (16.859) |
Multi | 0.741 (2.635) | 0.433 (3.212) | 0.891 (3.710) | 1.585 (4.038) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.032 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a |
Max VIF | 1.158 | 1.249 | 1.386 | 1.551 |
AUC | 0.702 | 0.701 | 0.701 | 0.702 |
AIC | 1,715.410 | 1,712.979 | 1,712.539 | 1,711.391 |
BIC | 1,769.723 | 1,767.292 | 1,766.852 | 1,765.704 |
Log likelihood | 847.705 | 846.490 | 846.270 | 845.695 |
Deviance | 1,695.410 | 1,692.979 | 1,692.539 | 1,691.391 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 1.158 | 1.249 | 1.386 | 1.551 |
AUC | 0.702 | 0.701 | 0.701 | 0.702 |
AIC | 1,715.410 | 1,712.979 | 1,712.539 | 1,711.391 |
BIC | 1,769.723 | 1,767.292 | 1,766.852 | 1,765.704 |
Log likelihood | 847.705 | 846.490 | 846.270 | 845.695 |
Deviance | 1,695.410 | 1,692.979 | 1,692.539 | 1,691.391 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Each model uses the same variables: seven racial demographic proportions, the site’s Hazard Ranking System (HRS) score, and years since NPL inclusion. The racial demographics differ by data year (2000 Decennial Census, 2021 ACS 5-Year Estimates) and geographic buffer around each site ( miles). Parentheses indicate standard errors for each estimate. Each sub-table’s lower section details model performance metrics, including maximum Variance Inflation Factor (VIF), in-sample Area Under Curve (AUC), AIC, and BIC.
aP < 0.001; bP < 0.01; cP < 0.05.
. | Census year 2000 . | |||
---|---|---|---|---|
. | Model I: 1 . | Model II: 2 . | Model III: 3 . | Model IV: 4 . |
Race | ||||
Asian | 3.767 (1.553)c | 4.118 (1.610)c | 4.586 (1.699)b | 5.159 (1.785)b |
Black | 0.637 (0.359) | 0.689 (0.391) | 0.554 (0.419) | 0.394 (0.443) |
Indigenous | 0.298 (1.105) | 0.605 (1.195) | 0.698 (1.221) | 0.770 (1.252) |
Latinx | 0.274 (0.399) | 0.272 (0.411) | 0.216 (0.422) | 0.177 (0.436) |
Pacific | 3.694 (16.825) | 14.267 (20.109) | 16.831 (23.637) | 15.592 (26.789) |
ARI | 78.015 (48.012) | 49.167 (42.262) | 33.950 (37.050) | 31.668 (35.233) |
Multi | 5.545 (6.456) | 2.973 (7.275) | 2.554 (7.820) | 4.682 (8.400) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a |
Years listed | 0.093 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.091 (0.010)a |
. | Census year 2000 . | |||
---|---|---|---|---|
. | Model I: 1 . | Model II: 2 . | Model III: 3 . | Model IV: 4 . |
Race | ||||
Asian | 3.767 (1.553)c | 4.118 (1.610)c | 4.586 (1.699)b | 5.159 (1.785)b |
Black | 0.637 (0.359) | 0.689 (0.391) | 0.554 (0.419) | 0.394 (0.443) |
Indigenous | 0.298 (1.105) | 0.605 (1.195) | 0.698 (1.221) | 0.770 (1.252) |
Latinx | 0.274 (0.399) | 0.272 (0.411) | 0.216 (0.422) | 0.177 (0.436) |
Pacific | 3.694 (16.825) | 14.267 (20.109) | 16.831 (23.637) | 15.592 (26.789) |
ARI | 78.015 (48.012) | 49.167 (42.262) | 33.950 (37.050) | 31.668 (35.233) |
Multi | 5.545 (6.456) | 2.973 (7.275) | 2.554 (7.820) | 4.682 (8.400) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a |
Years listed | 0.093 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.091 (0.010)a |
Max VIF | 2.223 | 2.338 | 2.480 | 2.819 |
AUC | 0.698 | 0.699 | 0.698 | 0.697 |
AIC | 1,714.117 | 1,714.421 | 1,716.045 | 1,716.225 |
BIC | 1,768.430 | 1,768.734 | 1,770.358 | 1,770.538 |
Log likelihood | 847.058 | 847.210 | 848.022 | 848.112 |
Deviance | 1,694.117 | 1,694.421 | 1,696.045 | 1,696.225 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 2.223 | 2.338 | 2.480 | 2.819 |
AUC | 0.698 | 0.699 | 0.698 | 0.697 |
AIC | 1,714.117 | 1,714.421 | 1,716.045 | 1,716.225 |
BIC | 1,768.430 | 1,768.734 | 1,770.358 | 1,770.538 |
Log likelihood | 847.058 | 847.210 | 848.022 | 848.112 |
Deviance | 1,694.117 | 1,694.421 | 1,696.045 | 1,696.225 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
. | ACS Year 2021 . | |||
---|---|---|---|---|
. | Model V: 1 . | Model VI: 2 . | Model VII: 3 . | Model VIII: 4 . |
Race | ||||
Asian | 2.168 (0.840)b | 2.476 (0.915)b | 2.812 (0.965)b | 3.211 (1.021)b |
Black | 0.606 (0.369) | 0.718 (0.396) | 0.630 (0.421) | 0.538 (0.441) |
Indigenous | 0.561 (1.228) | 1.026 (1.371) | 1.126 (1.401) | 1.135 (1.417) |
Latinx | 0.218 (0.340) | 0.283 (0.354) | 0.204 (0.369) | 0.172 (0.380) |
Pacific | 10.575 (6.695) | 12.147 (8.019) | 19.387 (10.457) | 25.326 (12.674)c |
ARI | 10.187 (10.752) | 14.660 (13.250) | 13.485 (15.379) | 13.620 (16.859) |
Multi | 0.741 (2.635) | 0.433 (3.212) | 0.891 (3.710) | 1.585 (4.038) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.032 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a |
. | ACS Year 2021 . | |||
---|---|---|---|---|
. | Model V: 1 . | Model VI: 2 . | Model VII: 3 . | Model VIII: 4 . |
Race | ||||
Asian | 2.168 (0.840)b | 2.476 (0.915)b | 2.812 (0.965)b | 3.211 (1.021)b |
Black | 0.606 (0.369) | 0.718 (0.396) | 0.630 (0.421) | 0.538 (0.441) |
Indigenous | 0.561 (1.228) | 1.026 (1.371) | 1.126 (1.401) | 1.135 (1.417) |
Latinx | 0.218 (0.340) | 0.283 (0.354) | 0.204 (0.369) | 0.172 (0.380) |
Pacific | 10.575 (6.695) | 12.147 (8.019) | 19.387 (10.457) | 25.326 (12.674)c |
ARI | 10.187 (10.752) | 14.660 (13.250) | 13.485 (15.379) | 13.620 (16.859) |
Multi | 0.741 (2.635) | 0.433 (3.212) | 0.891 (3.710) | 1.585 (4.038) |
NPL | ||||
HRS score | 0.031 (0.007)a | 0.031 (0.007)a | 0.031 (0.007)a | 0.032 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a | 0.092 (0.010)a |
Max VIF | 1.158 | 1.249 | 1.386 | 1.551 |
AUC | 0.702 | 0.701 | 0.701 | 0.702 |
AIC | 1,715.410 | 1,712.979 | 1,712.539 | 1,711.391 |
BIC | 1,769.723 | 1,767.292 | 1,766.852 | 1,765.704 |
Log likelihood | 847.705 | 846.490 | 846.270 | 845.695 |
Deviance | 1,695.410 | 1,692.979 | 1,692.539 | 1,691.391 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 1.158 | 1.249 | 1.386 | 1.551 |
AUC | 0.702 | 0.701 | 0.701 | 0.702 |
AIC | 1,715.410 | 1,712.979 | 1,712.539 | 1,711.391 |
BIC | 1,769.723 | 1,767.292 | 1,766.852 | 1,765.704 |
Log likelihood | 847.705 | 846.490 | 846.270 | 845.695 |
Deviance | 1,695.410 | 1,692.979 | 1,692.539 | 1,691.391 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Each model uses the same variables: seven racial demographic proportions, the site’s Hazard Ranking System (HRS) score, and years since NPL inclusion. The racial demographics differ by data year (2000 Decennial Census, 2021 ACS 5-Year Estimates) and geographic buffer around each site ( miles). Parentheses indicate standard errors for each estimate. Each sub-table’s lower section details model performance metrics, including maximum Variance Inflation Factor (VIF), in-sample Area Under Curve (AUC), AIC, and BIC.
aP < 0.001; bP < 0.01; cP < 0.05.
Prior literature in environmental justice suggests the incorporation of distance-weighted populations; see, e.g. (25). While that literature focuses on health outcomes, and we are instead focused on clean-up status, we nonetheless explore this approach. Model IX employs an inverse-square weighting scheme to emphasize populations closer to Superfund sites. We divide each site’s 4-mile buffer into a central disc (radius 1 mile) and three concentric rings. We then divide the population in each zone by the squared distance of its outer edge. Next, we sum these weighted population differences to obtain a total adjusted population count for each race category around each Superfund site. Finally, we calculate the proportion of each race category at a site by dividing its adjusted population count by the overall adjusted population. We use these calculated proportions in our revised logistic regression model.
Results appear in Table 2. Significant effects and their directions are consistent across the two models and statistically indistinguishable. Likewise, the model metrics show no meaningful differences. From this, we conclude that results from our baseline model, Model VIII, are robust to whether or not population is weighted.
Comparison of baseline Model VIII (unweighted demographics) and Model IX (inverse-square distance weighted demographics).
. | Unweighted . | Weighted . |
---|---|---|
. | Model VIII . | Model IX . |
Race | ||
Asian | 3.211 (1.021)b | 2.934 (0.970)b |
Black | 0.538 (0.441) | 0.657 (0.415) |
Indigenous | 1.135 (1.417) | 1.025 (1.377) |
Latinx | 0.172 (0.380) | 0.195 (0.367) |
Pacific | 25.326 (12.674)c | 21.319 (9.873)c |
ARI | 13.620 (16.859) | 10.730 (15.172) |
Multi | 1.585 (4.038) | 1.714 (3.599) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.031 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a |
. | Unweighted . | Weighted . |
---|---|---|
. | Model VIII . | Model IX . |
Race | ||
Asian | 3.211 (1.021)b | 2.934 (0.970)b |
Black | 0.538 (0.441) | 0.657 (0.415) |
Indigenous | 1.135 (1.417) | 1.025 (1.377) |
Latinx | 0.172 (0.380) | 0.195 (0.367) |
Pacific | 25.326 (12.674)c | 21.319 (9.873)c |
ARI | 13.620 (16.859) | 10.730 (15.172) |
Multi | 1.585 (4.038) | 1.714 (3.599) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.031 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a |
Max VIF | 1.551 | 1.379 |
AUC | 0.702 | 0.703 |
AIC | 1,711.391 | 1,710.959 |
BIC | 1,765.704 | 1,765.272 |
Log likelihood | 845.695 | 845.480 |
Deviance | 1,691.391 | 1,690.959 |
Num. obs. | 1,688 | 1,688 |
Max VIF | 1.551 | 1.379 |
AUC | 0.702 | 0.703 |
AIC | 1,711.391 | 1,710.959 |
BIC | 1,765.704 | 1,765.272 |
Log likelihood | 845.695 | 845.480 |
Deviance | 1,691.391 | 1,690.959 |
Num. obs. | 1,688 | 1,688 |
Both models use 2021 ACS data within a 4-mile buffer. Model IX emphasizes the influence of populations closer to Superfund sites. The significant effects and their directions are identical across the two models. Parentheses indicate standard errors for each estimate. Model metrics are as in Table 1.
aP < 0.001; bP < 0.01; cP < 0.05.
Comparison of baseline Model VIII (unweighted demographics) and Model IX (inverse-square distance weighted demographics).
. | Unweighted . | Weighted . |
---|---|---|
. | Model VIII . | Model IX . |
Race | ||
Asian | 3.211 (1.021)b | 2.934 (0.970)b |
Black | 0.538 (0.441) | 0.657 (0.415) |
Indigenous | 1.135 (1.417) | 1.025 (1.377) |
Latinx | 0.172 (0.380) | 0.195 (0.367) |
Pacific | 25.326 (12.674)c | 21.319 (9.873)c |
ARI | 13.620 (16.859) | 10.730 (15.172) |
Multi | 1.585 (4.038) | 1.714 (3.599) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.031 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a |
. | Unweighted . | Weighted . |
---|---|---|
. | Model VIII . | Model IX . |
Race | ||
Asian | 3.211 (1.021)b | 2.934 (0.970)b |
Black | 0.538 (0.441) | 0.657 (0.415) |
Indigenous | 1.135 (1.417) | 1.025 (1.377) |
Latinx | 0.172 (0.380) | 0.195 (0.367) |
Pacific | 25.326 (12.674)c | 21.319 (9.873)c |
ARI | 13.620 (16.859) | 10.730 (15.172) |
Multi | 1.585 (4.038) | 1.714 (3.599) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.031 (0.007)a |
Years listed | 0.092 (0.010)a | 0.092 (0.010)a |
Max VIF | 1.551 | 1.379 |
AUC | 0.702 | 0.703 |
AIC | 1,711.391 | 1,710.959 |
BIC | 1,765.704 | 1,765.272 |
Log likelihood | 845.695 | 845.480 |
Deviance | 1,691.391 | 1,690.959 |
Num. obs. | 1,688 | 1,688 |
Max VIF | 1.551 | 1.379 |
AUC | 0.702 | 0.703 |
AIC | 1,711.391 | 1,710.959 |
BIC | 1,765.704 | 1,765.272 |
Log likelihood | 845.695 | 845.480 |
Deviance | 1,691.391 | 1,690.959 |
Num. obs. | 1,688 | 1,688 |
Both models use 2021 ACS data within a 4-mile buffer. Model IX emphasizes the influence of populations closer to Superfund sites. The significant effects and their directions are identical across the two models. Parentheses indicate standard errors for each estimate. Model metrics are as in Table 1.
aP < 0.001; bP < 0.01; cP < 0.05.
As another robustness check, and as mentioned earlier, we compare our baseline Model VIII with an LPM, Model X, having the same explanatory variables. To address potential heteroscedasticity in this model, we use robust standard errors of type HC1 (26). Table 3 summarizes the comparison. Significant effects and their directions are consistent across the two models, except for the Pacific Islander population. Though this effect is significant in Model VIII, its nonsignificance in other model specifications in Table 1 suggests it may be less reliable.
Comparison of baseline Model VIII (logistic regression) with Model X (LPM) predicting Superfund site deletion.
. | Logistic . | Linear . |
---|---|---|
. | Model VIII . | Model X . |
Race | ||
Asian | 3.211 (1.021)b | 0.526 (0.148)a |
Black | 0.538 (0.441) | 0.068 (0.072) |
Indigenous | 1.135 (1.417) | 0.108 (0.183) |
Latinx | 0.172 (0.380) | 0.016 (0.063) |
Pacific | 25.326 (12.674)c | 3.801 (2.227) |
ARI | 13.620 (16.859) | 2.039 (2.623) |
Multi | 1.585 (4.038) | 0.259 (0.670) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.005 (0.001)a |
Years listed | 0.092 (0.010)a | 0.010 (0.001)a |
. | Logistic . | Linear . |
---|---|---|
. | Model VIII . | Model X . |
Race | ||
Asian | 3.211 (1.021)b | 0.526 (0.148)a |
Black | 0.538 (0.441) | 0.068 (0.072) |
Indigenous | 1.135 (1.417) | 0.108 (0.183) |
Latinx | 0.172 (0.380) | 0.016 (0.063) |
Pacific | 25.326 (12.674)c | 3.801 (2.227) |
ARI | 13.620 (16.859) | 2.039 (2.623) |
Multi | 1.585 (4.038) | 0.259 (0.670) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.005 (0.001)a |
Years listed | 0.092 (0.010)a | 0.010 (0.001)a |
Num. obs. | 1,688 | 1,688 |
Num. obs. | 1,688 | 1,688 |
Significant effects and their directions are identical across the two models, with the exception of the Pacific Islander population in Model VIII, which may be less reliable due to its nonsignificance in other model specifications (see Table 1). Model X uses HC1-type robust standard errors to address heteroscedasticity. Both models use 2021 American Community Survey data with a 4 mile geographic buffer.
aP < 0.001; bP < 0.01; cP < 0.05.
Comparison of baseline Model VIII (logistic regression) with Model X (LPM) predicting Superfund site deletion.
. | Logistic . | Linear . |
---|---|---|
. | Model VIII . | Model X . |
Race | ||
Asian | 3.211 (1.021)b | 0.526 (0.148)a |
Black | 0.538 (0.441) | 0.068 (0.072) |
Indigenous | 1.135 (1.417) | 0.108 (0.183) |
Latinx | 0.172 (0.380) | 0.016 (0.063) |
Pacific | 25.326 (12.674)c | 3.801 (2.227) |
ARI | 13.620 (16.859) | 2.039 (2.623) |
Multi | 1.585 (4.038) | 0.259 (0.670) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.005 (0.001)a |
Years listed | 0.092 (0.010)a | 0.010 (0.001)a |
. | Logistic . | Linear . |
---|---|---|
. | Model VIII . | Model X . |
Race | ||
Asian | 3.211 (1.021)b | 0.526 (0.148)a |
Black | 0.538 (0.441) | 0.068 (0.072) |
Indigenous | 1.135 (1.417) | 0.108 (0.183) |
Latinx | 0.172 (0.380) | 0.016 (0.063) |
Pacific | 25.326 (12.674)c | 3.801 (2.227) |
ARI | 13.620 (16.859) | 2.039 (2.623) |
Multi | 1.585 (4.038) | 0.259 (0.670) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.005 (0.001)a |
Years listed | 0.092 (0.010)a | 0.010 (0.001)a |
Num. obs. | 1,688 | 1,688 |
Num. obs. | 1,688 | 1,688 |
Significant effects and their directions are identical across the two models, with the exception of the Pacific Islander population in Model VIII, which may be less reliable due to its nonsignificance in other model specifications (see Table 1). Model X uses HC1-type robust standard errors to address heteroscedasticity. Both models use 2021 American Community Survey data with a 4 mile geographic buffer.
aP < 0.001; bP < 0.01; cP < 0.05.
For further investigation, we compare our baseline Model VIII to four alternative specifications incorporating geographic effects. Models XI and XII treat EPA region (10 categories) as fixed and random effects, respectively. Models XIII and XIV repeat this approach for state (51 categories). We fit models with random effects using the lmer package in R. Theoretically, a more granular approach using census tracts or counties could be desirable. However, with only 1,688 Superfund site observations, including these effects for the large number of unique counties (756) or census tracts (over 20,000) is not advisable.
Table 4 presents the results. Model XIV, which includes state random effects, is the only model with lower AIC and BIC values than our baseline. Regardless, significant effects and their directions remain identical across all four alternative specifications, excepting (as before) the Pacific Islander population.
Comparison of baseline Model VIII with four alternative specifications incorporating geographic effects for Superfund site deletion.
. | Model VIII . | Model XI . | Model XII . | Model XIII . | Model XIV . |
---|---|---|---|---|---|
Race | |||||
Asian | 3.211 (1.021)b | 2.381 (1.141)c | 2.867 (1.090)b | 2.591 (1.235)c | 2.772 (1.155)c |
Black | 0.538 (0.441) | 0.205 (0.530) | 0.381 (0.495) | 0.202 (0.630) | 0.375 (0.515) |
Indigenous | 1.135 (1.417) | 2.079 (1.533) | 1.637 (1.483) | 2.855 (1.907) | 1.596 (1.545) |
Latinx | 0.172 (0.380) | 0.257 (0.452) | 0.065 (0.424) | 0.171 (0.543) | 0.097 (0.458) |
Pacific | 25.326 (12.674)c | 20.487 (13.761) | 23.054 (12.903) | 0.440 (17.254) | 19.402 (13.862) |
ARI | 13.620 (16.859) | 5.503 (17.395) | 10.431 (16.417) | 12.919 (18.550) | 16.715 (18.066) |
Multi | 1.585 (4.038) | 3.514 (4.293) | 2.425 (4.143) | 7.699 (5.065) | 2.978 (4.401) |
NPL | |||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.098 (0.011)a | 0.095 (0.010)a | 0.104 (0.011)a | 0.095 (0.011)a |
Geography | None | Region FE | Region RE | State FE | State RE |
. | Model VIII . | Model XI . | Model XII . | Model XIII . | Model XIV . |
---|---|---|---|---|---|
Race | |||||
Asian | 3.211 (1.021)b | 2.381 (1.141)c | 2.867 (1.090)b | 2.591 (1.235)c | 2.772 (1.155)c |
Black | 0.538 (0.441) | 0.205 (0.530) | 0.381 (0.495) | 0.202 (0.630) | 0.375 (0.515) |
Indigenous | 1.135 (1.417) | 2.079 (1.533) | 1.637 (1.483) | 2.855 (1.907) | 1.596 (1.545) |
Latinx | 0.172 (0.380) | 0.257 (0.452) | 0.065 (0.424) | 0.171 (0.543) | 0.097 (0.458) |
Pacific | 25.326 (12.674)c | 20.487 (13.761) | 23.054 (12.903) | 0.440 (17.254) | 19.402 (13.862) |
ARI | 13.620 (16.859) | 5.503 (17.395) | 10.431 (16.417) | 12.919 (18.550) | 16.715 (18.066) |
Multi | 1.585 (4.038) | 3.514 (4.293) | 2.425 (4.143) | 7.699 (5.065) | 2.978 (4.401) |
NPL | |||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.098 (0.011)a | 0.095 (0.010)a | 0.104 (0.011)a | 0.095 (0.011)a |
Geography | None | Region FE | Region RE | State FE | State RE |
Max VIF | 1.551 | 1.291 | 1.654 | ||
AUC | 0.702 | 0.718 | 0.714 | 0.766 | 0.748 |
AIC | 1,711.391 | 1,705.021 | 1,709.263 | 1,701.189 | 1,698.691 |
BIC | 1,765.704 | 1,808.216 | 1,769.007 | 2,027.066 | 1,758.436 |
Log likelihood | 845.695 | 833.510 | 843.631 | 790.594 | 838.346 |
Deviance | 1,691.391 | 1,667.021 | 1,581.189 | ||
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 1.551 | 1.291 | 1.654 | ||
AUC | 0.702 | 0.718 | 0.714 | 0.766 | 0.748 |
AIC | 1,711.391 | 1,705.021 | 1,709.263 | 1,701.189 | 1,698.691 |
BIC | 1,765.704 | 1,808.216 | 1,769.007 | 2,027.066 | 1,758.436 |
Log likelihood | 845.695 | 833.510 | 843.631 | 790.594 | 838.346 |
Deviance | 1,691.391 | 1,667.021 | 1,581.189 | ||
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 | 1,688 |
Models XI and XII treat EPA region (10 categories) as fixed and random effects, respectively. Models XIII and XIV repeat this approach for state (51 categories). All models use 2021 American Community Survey data and a 4-mile buffer. Standard errors appear in parentheses. Significant effects and their directions are identical across the models, with the exception of the Pacific Islander coefficient in Model VIII, which may be less reliable due to its nonsignificance in other model specifications (see Table 1). Model metrics are as in Table 1. For models with categorical predictors, the reported VIF is a scaled Generalized Variance Inflation Factor (GFIV), namely where df is degrees of freedom. R does not provide variance inflation factors and deviance for mixed effect models.
aP < 0.001; bP < 0.01; cP < 0.05.
Comparison of baseline Model VIII with four alternative specifications incorporating geographic effects for Superfund site deletion.
. | Model VIII . | Model XI . | Model XII . | Model XIII . | Model XIV . |
---|---|---|---|---|---|
Race | |||||
Asian | 3.211 (1.021)b | 2.381 (1.141)c | 2.867 (1.090)b | 2.591 (1.235)c | 2.772 (1.155)c |
Black | 0.538 (0.441) | 0.205 (0.530) | 0.381 (0.495) | 0.202 (0.630) | 0.375 (0.515) |
Indigenous | 1.135 (1.417) | 2.079 (1.533) | 1.637 (1.483) | 2.855 (1.907) | 1.596 (1.545) |
Latinx | 0.172 (0.380) | 0.257 (0.452) | 0.065 (0.424) | 0.171 (0.543) | 0.097 (0.458) |
Pacific | 25.326 (12.674)c | 20.487 (13.761) | 23.054 (12.903) | 0.440 (17.254) | 19.402 (13.862) |
ARI | 13.620 (16.859) | 5.503 (17.395) | 10.431 (16.417) | 12.919 (18.550) | 16.715 (18.066) |
Multi | 1.585 (4.038) | 3.514 (4.293) | 2.425 (4.143) | 7.699 (5.065) | 2.978 (4.401) |
NPL | |||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.098 (0.011)a | 0.095 (0.010)a | 0.104 (0.011)a | 0.095 (0.011)a |
Geography | None | Region FE | Region RE | State FE | State RE |
. | Model VIII . | Model XI . | Model XII . | Model XIII . | Model XIV . |
---|---|---|---|---|---|
Race | |||||
Asian | 3.211 (1.021)b | 2.381 (1.141)c | 2.867 (1.090)b | 2.591 (1.235)c | 2.772 (1.155)c |
Black | 0.538 (0.441) | 0.205 (0.530) | 0.381 (0.495) | 0.202 (0.630) | 0.375 (0.515) |
Indigenous | 1.135 (1.417) | 2.079 (1.533) | 1.637 (1.483) | 2.855 (1.907) | 1.596 (1.545) |
Latinx | 0.172 (0.380) | 0.257 (0.452) | 0.065 (0.424) | 0.171 (0.543) | 0.097 (0.458) |
Pacific | 25.326 (12.674)c | 20.487 (13.761) | 23.054 (12.903) | 0.440 (17.254) | 19.402 (13.862) |
ARI | 13.620 (16.859) | 5.503 (17.395) | 10.431 (16.417) | 12.919 (18.550) | 16.715 (18.066) |
Multi | 1.585 (4.038) | 3.514 (4.293) | 2.425 (4.143) | 7.699 (5.065) | 2.978 (4.401) |
NPL | |||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.098 (0.011)a | 0.095 (0.010)a | 0.104 (0.011)a | 0.095 (0.011)a |
Geography | None | Region FE | Region RE | State FE | State RE |
Max VIF | 1.551 | 1.291 | 1.654 | ||
AUC | 0.702 | 0.718 | 0.714 | 0.766 | 0.748 |
AIC | 1,711.391 | 1,705.021 | 1,709.263 | 1,701.189 | 1,698.691 |
BIC | 1,765.704 | 1,808.216 | 1,769.007 | 2,027.066 | 1,758.436 |
Log likelihood | 845.695 | 833.510 | 843.631 | 790.594 | 838.346 |
Deviance | 1,691.391 | 1,667.021 | 1,581.189 | ||
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 1.551 | 1.291 | 1.654 | ||
AUC | 0.702 | 0.718 | 0.714 | 0.766 | 0.748 |
AIC | 1,711.391 | 1,705.021 | 1,709.263 | 1,701.189 | 1,698.691 |
BIC | 1,765.704 | 1,808.216 | 1,769.007 | 2,027.066 | 1,758.436 |
Log likelihood | 845.695 | 833.510 | 843.631 | 790.594 | 838.346 |
Deviance | 1,691.391 | 1,667.021 | 1,581.189 | ||
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 | 1,688 |
Models XI and XII treat EPA region (10 categories) as fixed and random effects, respectively. Models XIII and XIV repeat this approach for state (51 categories). All models use 2021 American Community Survey data and a 4-mile buffer. Standard errors appear in parentheses. Significant effects and their directions are identical across the models, with the exception of the Pacific Islander coefficient in Model VIII, which may be less reliable due to its nonsignificance in other model specifications (see Table 1). Model metrics are as in Table 1. For models with categorical predictors, the reported VIF is a scaled Generalized Variance Inflation Factor (GFIV), namely where df is degrees of freedom. R does not provide variance inflation factors and deviance for mixed effect models.
aP < 0.001; bP < 0.01; cP < 0.05.
Next, we examine whether socioeconomic controls impact our results. Model XV includes income, Model XVI includes citizenship, and Model XVII includes both; see Table 5. All these models exhibit maximum s exceeding our established threshold of 5, thereby indicating multicollinearity and compromising their reliability. We exclude them from further analysis. However, despite these limitations, it is noteworthy that the estimates for HRS score, years listed, and Asian population remain stable across these models.
Comparison of baseline Model VIII with three alternative specifications incorporating socioeconomic variables for Superfund site deletion.
. | Model VIII . | Model XV . | Model XVI . | Model XVII . |
---|---|---|---|---|
Race | ||||
Asian | 3.211 (1.021)b | 3.724 (1.345)b | 4.091 (1.484)b | 4.407 (1.639)b |
Black | 0.538 (0.441) | 0.498 (0.517) | 0.512 (0.447) | 0.398 (0.523) |
Indigenous | 1.135 (1.417) | 1.795 (1.591) | 1.167 (1.421) | 1.890 (1.602) |
Latinx | 0.172 (0.380) | 0.209 (0.395) | 0.026 (0.646) | 0.113 (0.670) |
Pacific | 25.326 (12.674)c | 28.077 (12.962)c | 25.685 (12.765)c | 28.329 (13.013)c |
ARI | 13.620 (16.859) | 14.965 (17.224) | 18.132 (17.428) | 18.242 (17.618) |
Multi | 1.585 (4.038) | 0.581 (4.163) | 1.001 (4.070) | 0.266 (4.183) |
NPL | ||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.093 (0.010)a | 0.091 (0.010)a | 0.091 (0.010)a |
Income | No | Yes | No | Yes |
Citizenship | No | No | Yes | Yes |
. | Model VIII . | Model XV . | Model XVI . | Model XVII . |
---|---|---|---|---|
Race | ||||
Asian | 3.211 (1.021)b | 3.724 (1.345)b | 4.091 (1.484)b | 4.407 (1.639)b |
Black | 0.538 (0.441) | 0.498 (0.517) | 0.512 (0.447) | 0.398 (0.523) |
Indigenous | 1.135 (1.417) | 1.795 (1.591) | 1.167 (1.421) | 1.890 (1.602) |
Latinx | 0.172 (0.380) | 0.209 (0.395) | 0.026 (0.646) | 0.113 (0.670) |
Pacific | 25.326 (12.674)c | 28.077 (12.962)c | 25.685 (12.765)c | 28.329 (13.013)c |
ARI | 13.620 (16.859) | 14.965 (17.224) | 18.132 (17.428) | 18.242 (17.618) |
Multi | 1.585 (4.038) | 0.581 (4.163) | 1.001 (4.070) | 0.266 (4.183) |
NPL | ||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.093 (0.010)a | 0.091 (0.010)a | 0.091 (0.010)a |
Income | No | Yes | No | Yes |
Citizenship | No | No | Yes | Yes |
Max VIF | 1.551 | 8.358 | 5.231 | 8.447 |
AUC | 0.702 | 0.717 | 0.706 | 0.718 |
AIC | 1,711.391 | 1,722.015 | 1,712.616 | 1,724.092 |
BIC | 1,765.704 | 1,857.797 | 1,777.792 | 1,870.737 |
Log likelihood | 845.695 | 836.007 | 844.308 | 835.046 |
Deviance | 1,691.391 | 1,672.015 | 1,688.616 | 1,670.092 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 1.551 | 8.358 | 5.231 | 8.447 |
AUC | 0.702 | 0.717 | 0.706 | 0.718 |
AIC | 1,711.391 | 1,722.015 | 1,712.616 | 1,724.092 |
BIC | 1,765.704 | 1,857.797 | 1,777.792 | 1,870.737 |
Log likelihood | 845.695 | 836.007 | 844.308 | 835.046 |
Deviance | 1,691.391 | 1,672.015 | 1,688.616 | 1,670.092 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Model XV includes household income, Model XVI includes citizenship status, and Model XVII includes both. All models use 2021 ACS data and a 4-mile buffer. Standard errors appear in parentheses. Significant effects and their directions are identical across the models. Model metrics are as in Table 1, with the reported VIF defined as in Table 4.
aP < 0.001; bP < 0.01; cP < 0.05.
Comparison of baseline Model VIII with three alternative specifications incorporating socioeconomic variables for Superfund site deletion.
. | Model VIII . | Model XV . | Model XVI . | Model XVII . |
---|---|---|---|---|
Race | ||||
Asian | 3.211 (1.021)b | 3.724 (1.345)b | 4.091 (1.484)b | 4.407 (1.639)b |
Black | 0.538 (0.441) | 0.498 (0.517) | 0.512 (0.447) | 0.398 (0.523) |
Indigenous | 1.135 (1.417) | 1.795 (1.591) | 1.167 (1.421) | 1.890 (1.602) |
Latinx | 0.172 (0.380) | 0.209 (0.395) | 0.026 (0.646) | 0.113 (0.670) |
Pacific | 25.326 (12.674)c | 28.077 (12.962)c | 25.685 (12.765)c | 28.329 (13.013)c |
ARI | 13.620 (16.859) | 14.965 (17.224) | 18.132 (17.428) | 18.242 (17.618) |
Multi | 1.585 (4.038) | 0.581 (4.163) | 1.001 (4.070) | 0.266 (4.183) |
NPL | ||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.093 (0.010)a | 0.091 (0.010)a | 0.091 (0.010)a |
Income | No | Yes | No | Yes |
Citizenship | No | No | Yes | Yes |
. | Model VIII . | Model XV . | Model XVI . | Model XVII . |
---|---|---|---|---|
Race | ||||
Asian | 3.211 (1.021)b | 3.724 (1.345)b | 4.091 (1.484)b | 4.407 (1.639)b |
Black | 0.538 (0.441) | 0.498 (0.517) | 0.512 (0.447) | 0.398 (0.523) |
Indigenous | 1.135 (1.417) | 1.795 (1.591) | 1.167 (1.421) | 1.890 (1.602) |
Latinx | 0.172 (0.380) | 0.209 (0.395) | 0.026 (0.646) | 0.113 (0.670) |
Pacific | 25.326 (12.674)c | 28.077 (12.962)c | 25.685 (12.765)c | 28.329 (13.013)c |
ARI | 13.620 (16.859) | 14.965 (17.224) | 18.132 (17.428) | 18.242 (17.618) |
Multi | 1.585 (4.038) | 0.581 (4.163) | 1.001 (4.070) | 0.266 (4.183) |
NPL | ||||
HRS | 0.032 (0.007)a | 0.033 (0.007)a | 0.032 (0.007)a | 0.033 (0.007)a |
Years listed | 0.092 (0.010)a | 0.093 (0.010)a | 0.091 (0.010)a | 0.091 (0.010)a |
Income | No | Yes | No | Yes |
Citizenship | No | No | Yes | Yes |
Max VIF | 1.551 | 8.358 | 5.231 | 8.447 |
AUC | 0.702 | 0.717 | 0.706 | 0.718 |
AIC | 1,711.391 | 1,722.015 | 1,712.616 | 1,724.092 |
BIC | 1,765.704 | 1,857.797 | 1,777.792 | 1,870.737 |
Log likelihood | 845.695 | 836.007 | 844.308 | 835.046 |
Deviance | 1,691.391 | 1,672.015 | 1,688.616 | 1,670.092 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Max VIF | 1.551 | 8.358 | 5.231 | 8.447 |
AUC | 0.702 | 0.717 | 0.706 | 0.718 |
AIC | 1,711.391 | 1,722.015 | 1,712.616 | 1,724.092 |
BIC | 1,765.704 | 1,857.797 | 1,777.792 | 1,870.737 |
Log likelihood | 845.695 | 836.007 | 844.308 | 835.046 |
Deviance | 1,691.391 | 1,672.015 | 1,688.616 | 1,670.092 |
Num. obs. | 1,688 | 1,688 | 1,688 | 1,688 |
Model XV includes household income, Model XVI includes citizenship status, and Model XVII includes both. All models use 2021 ACS data and a 4-mile buffer. Standard errors appear in parentheses. Significant effects and their directions are identical across the models. Model metrics are as in Table 1, with the reported VIF defined as in Table 4.
aP < 0.001; bP < 0.01; cP < 0.05.
Up until this point, we have focused on full deletion from the NPL as our outcome. However, the NPL also includes “partial deletion,” indicating that parts of a site have been remediated, while others may still require monitoring, further treatment, or land-use restrictions. To investigate the impact of this distinction, we consider whether a different handling of the outcome impacts our results. In Model XVIII, we count partial deletions as deletions, a change from our previous models where they were not included. Table 6 compares Model XVIII to the baseline model. Crucially, the significant effects and their directions are stable across the two models. However, Model VIII has of 0.679, which is below our threshold of 0.7 for discriminatory power. Therefore, we exclude this model from further analysis.
Comparison of baseline Model VIII (full deletion outcome) with Model XVIII (full or partial deletion outcome) for Superfund site cleanup.
. | Full deletion . | Full or partial deletion . |
---|---|---|
. | Model VIII . | Model XVIII . |
Race | ||
Asian | 3.211 (1.021)b | 2.740 (0.904)b |
Black | 0.538 (0.441) | 0.783 (0.409) |
Indigenous | 1.135 (1.417) | 0.362 (1.158) |
Latinx | 0.172 (0.380) | 0.210 (0.356) |
Pacific | 25.326 (12.674)c | 22.426 (11.692) |
ARI | 13.620 (16.859) | 9.235 (15.194) |
Multi | 1.585 (4.038) | 1.883 (3.721) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.022 (0.006)a |
Years listed | 0.092 (0.010)a | 0.081 (0.008)a |
. | Full deletion . | Full or partial deletion . |
---|---|---|
. | Model VIII . | Model XVIII . |
Race | ||
Asian | 3.211 (1.021)b | 2.740 (0.904)b |
Black | 0.538 (0.441) | 0.783 (0.409) |
Indigenous | 1.135 (1.417) | 0.362 (1.158) |
Latinx | 0.172 (0.380) | 0.210 (0.356) |
Pacific | 25.326 (12.674)c | 22.426 (11.692) |
ARI | 13.620 (16.859) | 9.235 (15.194) |
Multi | 1.585 (4.038) | 1.883 (3.721) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.022 (0.006)a |
Years listed | 0.092 (0.010)a | 0.081 (0.008)a |
Max VIF | 1.551 | 1.573 |
AUC | 0.702 | 0.679 |
AIC | 1,711.391 | 1,906.187 |
BIC | 1,765.704 | 1,960.500 |
Log likelihood | 845.695 | 943.094 |
Deviance | 1,691.391 | 1,886.187 |
Num. obs. | 1,688 | 1,688 |
Max VIF | 1.551 | 1.573 |
AUC | 0.702 | 0.679 |
AIC | 1,711.391 | 1,906.187 |
BIC | 1,765.704 | 1,960.500 |
Log likelihood | 845.695 | 943.094 |
Deviance | 1,691.391 | 1,886.187 |
Num. obs. | 1,688 | 1,688 |
Both models use 2021 ACS data and a 4-mile buffer. Significant effects and their directions are identical across the models, with the exception of the Pacific Islander coefficient in Model VIII, which may be less reliable due to its nonsignificance in other model specifications (see Table 1). SEs appear in parentheses. Model metrics are as in Table 1.
aP < 0.001; bP < 0.01; cP < 0.05
Comparison of baseline Model VIII (full deletion outcome) with Model XVIII (full or partial deletion outcome) for Superfund site cleanup.
. | Full deletion . | Full or partial deletion . |
---|---|---|
. | Model VIII . | Model XVIII . |
Race | ||
Asian | 3.211 (1.021)b | 2.740 (0.904)b |
Black | 0.538 (0.441) | 0.783 (0.409) |
Indigenous | 1.135 (1.417) | 0.362 (1.158) |
Latinx | 0.172 (0.380) | 0.210 (0.356) |
Pacific | 25.326 (12.674)c | 22.426 (11.692) |
ARI | 13.620 (16.859) | 9.235 (15.194) |
Multi | 1.585 (4.038) | 1.883 (3.721) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.022 (0.006)a |
Years listed | 0.092 (0.010)a | 0.081 (0.008)a |
. | Full deletion . | Full or partial deletion . |
---|---|---|
. | Model VIII . | Model XVIII . |
Race | ||
Asian | 3.211 (1.021)b | 2.740 (0.904)b |
Black | 0.538 (0.441) | 0.783 (0.409) |
Indigenous | 1.135 (1.417) | 0.362 (1.158) |
Latinx | 0.172 (0.380) | 0.210 (0.356) |
Pacific | 25.326 (12.674)c | 22.426 (11.692) |
ARI | 13.620 (16.859) | 9.235 (15.194) |
Multi | 1.585 (4.038) | 1.883 (3.721) |
NPL | ||
HRS score | 0.032 (0.007)a | 0.022 (0.006)a |
Years listed | 0.092 (0.010)a | 0.081 (0.008)a |
Max VIF | 1.551 | 1.573 |
AUC | 0.702 | 0.679 |
AIC | 1,711.391 | 1,906.187 |
BIC | 1,765.704 | 1,960.500 |
Log likelihood | 845.695 | 943.094 |
Deviance | 1,691.391 | 1,886.187 |
Num. obs. | 1,688 | 1,688 |
Max VIF | 1.551 | 1.573 |
AUC | 0.702 | 0.679 |
AIC | 1,711.391 | 1,906.187 |
BIC | 1,765.704 | 1,960.500 |
Log likelihood | 845.695 | 943.094 |
Deviance | 1,691.391 | 1,886.187 |
Num. obs. | 1,688 | 1,688 |
Both models use 2021 ACS data and a 4-mile buffer. Significant effects and their directions are identical across the models, with the exception of the Pacific Islander coefficient in Model VIII, which may be less reliable due to its nonsignificance in other model specifications (see Table 1). SEs appear in parentheses. Model metrics are as in Table 1.
aP < 0.001; bP < 0.01; cP < 0.05
Discussion
In our analysis, the admissible models, namely, Models V–IX and XI–XIV, consistently identify three statistically significant explanatory variables.
First, HRS score has an estimate of approximately (), indicating a negative relationship with the probability of a site reaching deletion status, conditioned on other variables. The corresponding odds ratio is 0.97. This finding may initially appear counterintuitive, as one might expect sites with higher HRS scores, signifying greater hazards, to be prioritized for cleanup. However, the result aligns with the EPA’s own clarification that HRS score does not directly equate to cleanup priority. In fact, it is plausible that sites with higher HRS scores, requiring more extensive remediation efforts, could be deprioritized relative to those needing less resource-intensive interventions. This interpretation underscores the complexity of decision-making in environmental management and suggests a potential area for further investigation into how resource allocation decisions are made for site remediation.
Second, we find a consistent positive coefficient for the years since a site was listed on the NPL. To one significant digit and with bankers’ rounding, the estimate is either or () depending on the model. This positive relationship suggests that the longer a site has been on the NPL, the more likely it is to achieve deletion status, conditioned on other variables. This trend aligns with intuitive expectations; newer sites typically require time to be assessed, prioritized, and allocated the necessary resources for cleanup.
The third key finding directly addresses our research question. Specifically, the estimates for the Asian population’s association with Superfund site deletion status vary across models, ranging from to , with or depending on the model. This negative relationship suggests that the the larger the Asian population is local to a site, the less likely it is to achieve deletion status, conditioned on other variables. Figure 2 visually represents these results, displaying the 95% CIs for the corresponding odds ratios for a 1% increase in the Asian population. A shaded band within the figure highlights the overlapping region of these intervals, illustrating the consistency across different models.

Effect of Asian population on site deletion status, estimated from all models meeting our admission criteria (, Area Under Curve ). These models are Models V–IX and XI–XIV (Tables 1, 2, and 4). The vertical axis shows 95% CIs for the odds ratio associated with a 1% increase in the Asian population. The shaded band highlights the overlapping region, demonstrating consistency across model specifications. These specifications vary in demographic data time periods (2000 and 2021), geographic buffers (1–4 miles), population weighting, and the inclusion of geographic effects (EPA region or state, fixed/random effects, or no adjustment). All admitted models indicate that increased Asian population around Superfund sites correlates with decreased probability of deletion (remediation), as shown by an odds ratio lower than one.
Critically, the three significant effects we identify—HRS score, years listed, and Asian population—and their directions demonstrate robustness across a wide range of modeling assumptions. Even in models that we excluded due to slightly missing our VIF or acceptance criteria, these effects and their directions are consistent. This robustness holds true despite variations in:
Data and time period: Census 2000 or ACS 2021,
Geographic buffer size: 1, 2, 3, or 4 miles,
Weighting: Whether or not population is weighted inversely by distance,
Model type: Logistic or LPM
Geographic Effects: Incorporating EPA region or state using fixed effects, random effects, or not at all,
Socioeconomic controls: Inclusion of income and citizenship variables, and
Outcome definition: Whether or not partial deletions are counted as deletions.
Finally, we address the magnitude of the estimates for the Indigenous and Pacific Islander racial groups. As Tables 1–6 illustrate, these coefficients are notably larger than those for other racial groups. This is likely attributable to the sparsity of data for these populations within our dataset, with median Indigenous and Pacific Islander proportions of 0.2% and 0.01%, respectively.
To assess the robustness of our findings in light of these data sparsity, we conduct a post hoc analysis similar to our baseline Model VIII, but collapsing the Black, Indigenous, Latinx, Pacific Islander, Indigenous, and multiracial population proportions into a single variable. While this approach is racially reductionist, it allows us to examine the association between Asian population and site deletion probability without the potential influence of small racial proportions for specific groups. The significant coefficients (and standard errors) are 2.973 (0.946) for Asian population, 0.031 (0.007) for HRS score, and 0.091 (0.010) for years listed. These estimates agree closely with our other models, further supporting the robustness of our primary findings.
Our analysis has revealed a significant association between race and Superfund site cleanup outcomes—specifically for the Asian population—at the national level. To explore potential regional variations, we conducted additional logistic regression analyses similar to our baseline Model VIII, but stratified by EPA region. These analyses involved estimating separate models for each racial group within each of the 10 EPA regions, resulting in a total of 7 coefficients (one for each racial group) for each of 10 regions, totaling 70 coefficients. We then adjusted -values using the Bonferroni correction for multiple comparisons. The stratified analyses did not yield statistically significant associations between race and cleanup outcomes after adjusting for multiple comparisons (all ). While we considered state-level stratification as well, this approach would have resulted in 357 coefficients to be estimated from our dataset of 1,688 observations. Given the potential for reduced statistical power with such a high number of comparisons, we opted to focus on the regional analysis.
Taken together, our significant national finding and lack of significant regional findings suggest that the significant effect observed at the national level might reflect a broader trend influenced by factors operating across a wider scale. Stratifying the data reduces the sample size within each region, potentially decreasing the statistical power to detect significant effects, especially if the effect sizes within individual regions are smaller than the overall nationwide effect. Additionally, it is possible that the relationship between race and cleanup outcomes is heterogeneous across the United States, with distinct regional factors obscuring the pattern when the analysis is localized. The lack of significant findings within regions or states does not invalidate the nationwide result. It does, however, highlight the need to examine environmental justice issues at multiple spatial scales to gain a comprehensive understanding of complex disparities.
Conclusion
We have analyzed the relationship between localized racial demographics and the cleanup status of Superfund sites. Through a range of modeling scenarios—employing demographic data from two time periods, a range of geographic buffers, and various control variables—we consistently observed a negative association between the proportion of the local population that is Asian and the probability of a Superfund site being cleaned up. Our result appears to be the first identification of this association, likely because it is the first study of Superfund sites that considers Asian populations as a distinct group. Our research has two primary implications.
First, additional work is necessary to uncover the root causes of the association we have observed. Such investigations could encompass qualitative social science inquiries into socio-economic dynamics, the degree of community engagement in environmental issues, and possible systemic biases influencing environmental policy decisions. Complementary statistical analyses, focused on determining causation, are equally essential, as our own analysis is not causal.
Secondly, our study underscores the how Asian populations have sometimes been excluded in justice-focused research. It is essential to contextualize our findings within the broader landscape of Asian demographics in the United States. As we noted, much of the quantitative research on racial disparities in the Superfund program predates the year 2000, a period before the significant growth of the Asian population in the United States, which has more than doubled since then. Non-Hispanic Asian Americans, being the fastest-growing racial/ethnic group, are projected to become the largest minority group by 2055 (27). While the proportional representation of a minoritized group should not dictate how deserving it is of study, the oversight of Asian populations in justice-focused research nonetheless represents a significant gap.
We acknowledge the considerable diversity within the broad category of “Asian.” A thorough understanding of Asian populations demands an in-depth analysis of various subgroups. This need is evident in research concerning health disparities (28), economic outcomes (29), and social service utilization (30), among other areas. Our research represents a preliminary step towards more comprehensive analysis and understanding of Asian populations within the realm of environmental justice issues.
Acknowledgments
We are grateful to the following individuals for helpful conversations: Prof. Jayajit Chakraborty (University of Texas, El Paso), Prof. Phil Chodrow (Middlebury College), Prof. José Constantine (Williams College), Prof. Desiree Plata (MIT), and Lt. Commander Gary Riley (National Park Service).
Funding
The authors declare no funding.
Author Contributions
This work began as a student-led project designed by B.G.N., P.I.d.S., J.S., and L.S. in a class taught by C.M.T. at Williams College. B.G.N., P.I.d.S., J.S., L.S., and C.M.T. conceived the study and drafted the manuscript. B.G.N., P.I.d.S., J.S., and L.S. conducted a literature review, performed background research, and played a significant role in formulating the analytical strategy. C.M.T. was the primary contributor to data acquisition, data processing, coding, and statistical modeling efforts. X.C. and S.N. played a role in statistical modeling and editing of the manuscript.
Preprint
This manuscript was posted as a preprint at https://doi.org/10.31235/osf.io/93n78.
Data Availability
The data underlying this article are available in an Open Science Framework repository at https://osf.io/efu63 (16).
References
Author notes
Competing Interest: The authors declare no competing interest.