-
PDF
- Split View
-
Views
-
Cite
Cite
Daniel Clarkson, Emma Eastoe, Amber Leeson, The importance of context in extreme value analysis with application to extreme temperatures in the U.S. and Greenland, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 72, Issue 4, August 2023, Pages 829–843, https://doi.org/10.1093/jrsssc/qlad020
- Share Icon Share
Abstract
Statistical extreme value models allow estimation of the frequency, magnitude, and spatio-temporal extent of extreme temperature events in the presence of climate change. Unfortunately, the assumptions of many standard methods are not valid for complex environmental data sets, with a realistic statistical model requiring appropriate incorporation of scientific context. We examine two case studies in which the application of routine extreme value methods result in inappropriate models and inaccurate predictions. In the first scenario, incorporating random effects reflects shifts in unobserved climatic drivers that led to record-breaking US temperatures in 2021, permitting greater accuracy in return period prediction. In scenario two, a Gaussian mixture model fit to ice surface temperatures in Greenland improves fit and predictive abilities, especially in the poorly-defined upper tail around C.
1 Introduction
The roots of the statistical modelling of extreme events lie in the theoretical work of Von Mises (1936) and Gnedenko (1943) who sought to understand the asymptotic behaviour of sample maxima. From the 1950s, this probability theory began to be translated into the statistical modelling framework now known as the block maxima method (Gumbel, 1958). Both theory and modelling framework were later extended to a Peaks over threshold (PoT) approach (Davison & Smith, 1990; Pickands, 1975). It was recognized early on that both block maxima and PoT frameworks could make major contributions to the study of natural hazards, leading to a long-standing relationship with the environmental sciences and civil engineering. Specific application areas include hydrology (Jonathan & Ewans, 2013; Katz et al., 2002), air pollution (Eastoe & Tawn, 2009; Gouldsborough et al., 2021; Reich et al., 2013), temperatures (Acero et al., 2014; Winter et al., 2016), precipitation (Katz, 1999), drought (Burke et al., 2010), wind speeds (Fawcett & Walshaw, 2006), statistical downscaling (Friederichs, 2010; Towe et al., 2017), space weather (Rogers et al., 2020; Thomson et al., 2011), and climate change problems (Cheng et al., 2014; Cooley, 2009; Sterl et al., 2008).
Both theory and models have evolved considerably since the mid-twentieth century. In particular, there have been massive advances in methodology for multivariate extreme events (Heffernan & Tawn, 2004; Ramos & Ledford, 2009; Rootzén et al., 2018) and the related areas of spatial (Cooley et al., 2012; Huser & Wadsworth, 2019), temporal (Ledford & Tawn, 2003; Winter & Tawn, 2017), and spatio-temporal (Economou et al., 2014; Simpson & Wadsworth, 2021) extreme value modelling. These developments are in part a response to the increasing availability of large spatio-temporal environmental data sets obtained, for example, from satellite images, high-resolution process-based models and online measurement tools.
Nevertheless, the continued development of methods for univariate extremes remains a crucial research topic, not least due to the absence of a unified best practice methodology for modelling univariate extremes in the presence of complex trends, such as those seen in the presence of climate change. There is also still no single standard to inform the quantification and communication of risk in the presence of trends, whether or not this is climate change driven. While such a standard may ultimately prove to be unobtainable, the commonly employed approach of reporting a single risk measure to cover all scenarios, regardless of long-term trends or other changes, can, and should, be improved upon. Lastly, specification of a model that describes the marginal model well is an essential first step in the very large number of copula-based multivariate extreme value models (Joe, 1994).
This paper examines extreme temperature data for two regions with contrasting climates and very different geophysical features. The first case study assesses the heatwave experienced in the western U.S. during the summer of 2021 to ascertain whether the record-breaking temperatures from this event are consistent with historical temperature extremes. This study was motivated by a recent analysis of ERA5 reanalysis model output (Philip et al., 2021) which used a statistical extreme value model fitted to pre-2021 temperature extremes to estimate the upper bound of the temperature distribution. They found that the 2021 observations exceeded this upper bound at multiple locations. We propose some adaptations to their analysis to better investigate if this is the case and, if so, to understand why. The second case study analyses spatio-temporal records of ice surface temperature for the Greenland ice sheet, motivated by the need to find an appropriate univariate model for the measurements in each cell as the first stage of a spatial extreme value model. Given the large number of cells (1139), the marginal model should ideally be designed to permit automatic fitting, however the characteristics of the upper tail of the temperature distribution differ considerably across the ice sheet and so careful consideration is required to achieve this objective.
A feature common to both case studies is that the tail behaviour of the univariate distribution of interest cannot be described using out-of-the-box extreme value methods. Instead, by applying a pragmatic, contextualized and data-driven approach, much improved models can be developed while avoiding the need to incorporate huge numbers of parameters or highly complex structures. This approach has several advantages. Firstly, since the models are computationally both cheap and stable, analysis of large data sets is entirely feasible. Secondly, simple models are less prone to errors in fitting and interpretation, making them particularly useful for those without statistical training. Lastly, a simple model results in straightforward estimates of risk which makes it easier for the end-user to put these estimates to use.
2 Extreme value analysis
This section provides background on univariate extreme value modelling and introduces the random effects model that is subsequently used in Section 3. Univariate extreme value methods encompass models for both block maxima and PoT data sets. Central to this paper is the PoT approach which characterizes the tail behaviour of the underlying process by modelling only those observations in the data set that lie above a preselected high threshold. In contrast, block maxima methods describe the distribution of the maxima of predefined and nonoverlapping blocks; for environmental studies this block is usually a year. Each method has a theoretically motivated probability distribution at its core: the generalized extreme value distribution (block maxima) and the generalized Pareto distribution (PoT). While theoretical justification of these distributions is beyond the scope of the paper, note that, as with the central limit theorem, the justification relies on asymptotic arguments. For finite sample sizes, the distributions therefore only approximate the tail behaviour and are not guaranteed to describe the data exactly.
2.1 Vanilla block maxima and PoT models
The theory on which the vanilla versions of both block maxima and PoT models are based starts with the assumption of a sequence of independent replicates of a random variable , with distribution . Under some nonrestrictive conditions on , Von Mises (1936) and Gnedenko (1943) showed that in the limit as , the distribution of the normalized block maximum is one of the three types of max-stable distribution: Weibull, Fréchet, or Gumbel. While both the rate of convergence to the limiting distribution and the sequences and are determined by the underlying distribution which, in a modelling context is unknown, this result is nevertheless used to justify the use of the generalized extreme-value distribution (GEV) to model the maxima of large but finite blocks.
The GEV is a class of distributions combining the three types of max-stable distribution. The resulting cumulative distribution function is,
where in the case . This distribution has location , scale and shape parameters, with the Weibull, Fréchet, and Gumbel distributions corresponding to , and respectively. As a model the GEV has the advantage of increased flexibility over choosing one of the three types a priori.
The vanilla PoT model is motivated by a result of Pickands (1975) who showed that, under some nonrestrictive regularity conditions, as for ,
where , , and is the survival function of the generalized Pareto distribution. The parameters , , and are known as the rate, scale and shape, respectively; the rate is sometimes called the exceedance probability. The value of the shape parameter is determined by the rate at which as ; exponential decay corresponds to , and heavy (light) decay to (). For , there is no finite end-point, while if .
By assuming that this limit result is a suitable approximation to the behaviour of the exceedances of a finite, but high, level , it can be used to motivate a model in which the magnitudes of the threshold exceedances are a random sample from the generalized Pareto distribution and the exceedance times a random sample from a homogeneous Poisson process. While not essential an assumption of independence is often made about the exceedances. If there is evidence of serial dependence, a routine and theoretically justified approach is to identify all clusters of extremes and model only the frequency and magnitude of the cluster maxima (Davis et al., 2013; Drees, 2008; Laurini & Tawn, 2003; Smith & Weissman, 1994). In either case, the parameters are estimated with standard statistical inference procedures, e.g maximum likelihood or Bayesian inference.
A crucial modelling consideration is the choice of threshold , which requires a compromise between retaining a sufficiently large number of exceedances such that estimation of the model parameters is possible, while still ensuring that the assumption of the limiting approximation is plausible. Various tools are available to aid threshold selection. Most of these use two key threshold stability properties of the generalized Pareto distribution to identify the minimum plausible threshold above which the distribution is a suitable description of the data.
2.2 Random effects PoT model
The most common extension of the vanilla PoT model is to model one or more of the distribution parameters as a function of covariates. The motivation for this is that most physical processes display changes in their behaviour over time. Such changes may be cyclic, smooth or sudden and often arise in response to changes in one or more of climate, weather conditions or other geophysical processes. While many commonly used regression modelling frameworks, both parametric (Davison & Smith, 1990; Eastoe & Tawn, 2009; Smith, 1989) and semi-parametric (Hall & Tajvidi, 2000; Jones et al., 2016; Yee & Stephenson, 2007; Youngman, 2019), have been incorporated into the PoT framework, there are several barriers that prevent their successful implementation. Firstly, since most environmental processes are driven by multiple interacting factors, appropriate drivers will often be either unknown or unobserved, or both. Secondly, PoT data sets are small by definition and there is frequently insufficient information to identify the often subtle covariate relationships. Lastly, while the motivating extreme value theory provides a justification to extrapolate into the underlying tail distribution given values of the covariates, there is no similar justification to extrapolate the covariate-response relationship beyond the measurement period. This is a serious limitation for climate change forecasts.
A plausible alternative is the PoT random effects model which overcomes the above limitations for cases where changes in extremal behaviour occur between, but not within, years. Previously applied to hydrological data by Eastoe (2019) and Towe et al. (2020), inter-year variability in both the magnitudes and frequency of the extremes is modelled by latent variables (or ‘random effects’) which vary between years. The advantage of this framework is that, unlike a parametric regression, it requires no prior specification of the functional form for the inter-year variability. In this sense, the model is a compromise between the rigid fully parametric regression model and the flexible nonparametric regression. It can be used to identify long-term trends, unusually extreme years, and groupings of years in which the extremal behaviour is similar. The model also has the capability to make predictions about extreme episodes over a period of time (e.g., a number of decades) but not to do so for a specific out-of-sample year. Although it is beyond the scope of the current paper, such predictions could be achieved by incorporating serial dependence in the random effects.
For generality, we define a threshold for year . This allows for cases in which it is appropriate for the threshold, and hence the definition of an extreme observation, to vary between years. Let be the th observation in year then, for ,
where
and , , and are prespecified distributions with parameters , , and , respectively. In hierarchical modelling terminology, equations (2) and (3) are data and latent process models, respectively, and are random effects for year . We assume throughout that the responses are independent given the random effects and that the random effects are mutually independent and serially uncorrelated.
Let denote the full vector of data, be the number of years in the observation period and be the number of observations in each year. Denote the vectors of random effects by , , . The starting point for inference is the joint likelihood function
where
and , and are each the product of the relevant distribution taken from equation (3), for example where is the density associated with . Note that the dimension of the full parameter vector is .
Since the random effects cannot be integrated out of the joint likelihood function, maximum likelihood parameter estimates must be obtained through numerical optimization of the very high-dimensional log-likelihood function. A more pragmatic approach is to use Bayesian inference; to do so, prior distributions , , and must now be specified for , , and , respectively. We recommend that these are chosen to be as uninformative as possible, e.g., flat Gaussian distributions, in order to avoid undue impact on the analysis. Combining the prior distributions with the likelihood function (4), the joint posterior is
Since closed forms cannot be found for the joint and marginal posterior distributions, the posterior distribution is estimated using a sampling algorithm. We use an adaptive Markov chain Monte Carlo (MCMC) algorithm to enable automatic tuning of the proposal step-sizes and achieve fast convergence. Our algorithm is based on the adaptive Metropolis–Within–Gibbs scheme proposed by Roberts and Rosenthal (2009). Each parameter is updated separately via a Metropolis-Hastings random walk (MHRW) step, allowing separate evolution of the MHRW scaling parameter associated with each of the model parameters. Earlier work by Eastoe (2019) showed that of a number of possible alternatives, this algorithm worked the best for this particular model.
3 Case study 1: US air temperatures
This case study is motivated by the 2021 heatwave which affected three western states in the United States. An analysis of climate model annual maxima by Philip et al. (2021) showed that the observed 2021 annual maximum were inconsistent with historical extremes. Further, this finding recurred across a number of stations and held despite adjustments made to the data to account for long-term trends in global temperature. Our objectives are to examine whether (a) an analysis of observational annual maxima is consistent with the ERA5 analysis of Philip et al. (2021), (b) an analysis of PoT data points to the same conclusions as an analysis of annual maxima, and (c) a more flexible model for long-term trends captures better the 2021 event.
The three regions most affected by the heatwave were Oregon, Washington, and British Columbia. For these regions, we obtain maximum daily temperatures from Version 3 of the Global Historical Climatology Network (Menne et al., 2012). Stations are included only if they came online in or before 1950, were still online at the time of the 2021 heatwave, and have at most 20 of observations missing. This leaves the majority of stations in Oregon (62) and Washington (61). While there are fewer in British Columbia (17), the majority are situated in the region most affected by the heatwave, i.e., the south of the province close to the Washington border.
The heatwave epicentre is clearly shown in Figure 1. Areas including Portland and The Dalles recorded some of the highest temperatures, with the highest observation across all stations being C at The Dalles Municipal Airport (lat 45.62, long 121.16) on 28 June 2021. This date appears to correspond to the peak of the heatwave, with station-wide average temperature of C. While temperatures at coastal stations are lower than at the epicentre, extremity is defined relative to local climate norms and so measurements from these stations are included in the analysis when the temperature exceeds the station-specific threshold. All temperatures are adjusted by subtracting the average of Global mean surface temperature (GMST) from the previous 4 years (Lenssen et al., 2019). GMST represents temperature deviations from the 1951 to 1980 means, and is used to remove the effects of global climate change.

Maximum temperature observations for weather stations in Oregon (South), Washington (Centre), British Columbia (North) on 27 June 2021, 28 June 2021, and 29 June 2021. Stations with missingvalue(s) on these days are omitted.
3.1 Annual maxima analysis
We start by considering annual maxima. The GEV is fitted to the adjusted maxima using maximum-likelihood estimation. Following Philip et al. (2021), we fit to pre-2021 data only and use the resulting model to assess the likelihood of the 2021 temperatures being predicted prior to the heatwave.
From the GEV model fits, it is immediately clear why there is concern: of the stations have a 2021 annual maximum which exceeds the finite upper end-point implied by the GEV model fitted to pre-2021 data, i.e., these 2021 observations are apparently infeasible. Even more concerning is that this occurs despite the adjustments for GMST to account for global climate variation—without this adjustment, of stations exhibit the same behaviour. Note that these are relatively coarse statistics—the upper limit can only be estimated at locations for which and the uncertainty in the estimated limit has not been taken into consideration—however they suggest that climate variation, at least as measured by GMST, does not fully explain the variation in annual maxima, and indeed may play only a small role. Further, annual maxima only capture the temperature on a single day and are not therefore capable of informing a broader understanding of shifts in frequency and magnitude of extreme heatwaves. Such events are likely to be influenced by many more factors than simply the overall climate trends of the region. Capturing all such factors is made difficult by the complexity of meteorological and climate processes and incomplete recording of relevant physical variables.
3.2 Peaks over threshold analysis
Progressing to the PoT model, GMST adjusted data is used with station-specific quantile to give year- and station-specific thresholds. This results in between 419 and 639 pre-2021 threshold exceedances at each station—considerably more data than the 71 annual maxima. To assess the extremity of the 2021 event under the PoT model fitted to pre-2021 data, Figure 2 shows predicted return periods associated with the 2021 annual maxima. The estimated period is finite at all stations, with a maximum of 559 years (Estevan Point, British Columbia). The return period is greater than 5 (10) years for () of the stations. Thus the PoT analysis suggests that the 2021 maximum temperatures are extreme, but not as extreme as the GEV fit implies. However, while the return periods determine the rarity of a single observation, they do not quantify whether the heatwave itself was exceptionally extreme.

(a) Return periods of the 2021 maxima at each weather station using the fitted eneralized Pareto distributions (GPDs), and (b) Return periods of the 2021 maxima at each weather station using the fitted random effects PoT model. Sixteen stations are not included as their 2021 maxima are below their quantile.
The PoT random effects model uses the same thresholds as the vanilla PoT model. As discussed earlier, the model cannot predict the extremal behaviour in a specific year and therefore, in contrast to the vanilla fit, we include the 2021 data when estimating the parameters. A simplified version of the model described in equations (2) and (3) was used with only the scale parameter allowed to vary between years. The justification for a constant shape parameter is that estimation of a varying shape parameter requires so much information that it is common practice to model it as either constant or piece-wise constant. A varying rate parameter was found to be unnecessary due to the year-specific threshold. Each site was modelled separately.
To ensure that the scale parameter estimates are within the parameter space, we use a log link function:
We used independent Normal priors for , , and . The adaptive MCMC algorithm was run for 100,000 iterations at each site with a burn-in of 10,000. The output was thinned by retaining only every 50th point in each chain. Trace plots and posterior distributions at a random selection of sites were examined for mixing and convergence. An example of these plots is shown for the shape and scale parameter estimates for the Cedar Lake station in Figure A.1 of the appendix.
Figure 3 shows the estimated scale parameter random effects by station and by year, highlighting years with relatively high extreme temperatures. Note that the inter-year variability in the random effects does not show evidence of a long-term trend, suggesting that the unidentified drivers which cause this variability are unlikely to themselves have long-term trends that are substantially different to that seen in GMST. The random effects in a given year are reasonably consistent across stations, with a maximum cross-station standard deviation of . The overall range is with 17 of the top 20 estimates occurring in 2021.

Estimated scale parameter annual random effects for each weather station (thin black lines), the annual mean for all stations (central bold red line), and GMST (bold blue line, in degrees). Results cover the period 1950 to 2021 inclusive.
The random effects give a much clearer picture of the relative magnitude of the 2021 heatwave. Considering inter-year variability much reduces the return periods of each 2021 maxima for those that exceed the threshold (Figure 3), and in particular, the cross-station mean of the random effects is highest in 2021 () compared with the next highest () in 1961. The highest variance in random effects also occurs in 2021, perhaps due to different parts of the region experiencing the heatwave to varying degrees, with extremely high relative temperatures at some stations and more typical temperatures at others, and while previous years have isolated random effects at individual stations that reach the same levels as 2021, consistency of the high random effects in 2021 distinguishes it from previous warm years, e.g., 1961.
4 Case Study 2: Greenland ice surface temperatures
One impact of the rise in global temperatures caused by climate change has been an increased volume of melt water generated by the world’s ice sheets. This has led to a sea level rise that has already increased the risk of flooding in low-lying areas and will continue to do so while temperatures continue to rise. The two largest ice sheets, Greenland and the Antarctic, have had an accelerating contribution to sea level rise since the early 1990s (Rignot et al., 2011), and are likely to continue to play a key role in the near future (Rahmstorf et al., 2017). Since melt occurs when the ice surface temperature exceeds C, understanding the spatial extent, frequency, magnitude and trends of such events is vital to quantify melt risk both now and in the future, not least because the consequences of increased melt are likely to persist for years to come (Kulp & Strauss, 2019).
Spatial extreme value models provide an ideal tool to analyse the widespread melt events that are of most concern. This case study developed from what was initially intended as a routine first step in fitting such a model to a spatio-temporal Ice Surface Temperature (IST) data set from a multilayer Greenland Moderate Resolution Imaging Spectroradiometer (MODIS)-based product (Hall et al., 2018), the MODIS/Terra Sea Ice Extent 5-Min L2 Swath 1 km, version 6 (MOD29). The spatial resolution of the data is over the period 01 January 2001 to 31 December 2019, and the temporal resolution is daily observations. The IST data product uses a cloud mask to filter observations that occur in cloudy conditions since water vapour in the clouds can interfere with infrared radiation leading to measurement inaccuracies. Since cloudy days are generally warmer than clear days (Koenig & Hall, 2010), the data is missing not at random. Consequently, we do not attempt imputation of missing values and instead limit the inferences from our model to clear days only. The data are unseasonalized throughout the analysis.
Since melt occurs when the IST exceeds C, positive temperatures are of most concern. For the majority of cells, particularly inland and/or at higher altitudes, there are too few positive observations to take C as the PoT modelling threshold. However, when taking a modelling threshold below C, there are problems with the PoT fit, as seen in Figure 4. Many cells have a secondary mode corresponding to a large mass of observations close to C and a much lower-weighted tail covering small positive temperature values. The most likely explanation of this is that once the temperature exceeds C the ice melts. This change in state from ice to melt water, which is no longer considered the surface of the ice sheet, creates a soft upper limit close to C on ISTs. We cannot determine from the data which of the positive observations are water temperatures and which are due to measurement or recording uncertainty (C Hall et al., 2018). Regardless of the reason, the soft upper limit results in a distribution mode close to C, rendering the generalized Pareto distribution unsuitable. Lastly, we note that the shape of the tail distribution, and especially the behaviour at or near C, varies primarily in response to altitude and proximity to the coast.

Example mixture model fit for a single cell (82.47N, 37.50E). Histogram of the data with 3 ice component densities (dotted blue lines), melt component density (dotted red line), and the overall density (bold black line). Density function has been scaled to the maximum value of the histogram data.
The objective now is to build a model to reflect these insights and establish how the melt threshold impacts the distribution of ISTs. To facilitate ease of application at all locations, we propose a truncated Gaussian mixture model for the full distribution of ISTs. This circumvents the need to define a tail region—as discussed previously, for this particular data set such a definition is nontrivial due to the varying cross-site behaviour of the upper tail. Utilizing information on behaviour in sub-tail regions should help better identify behaviour close to the melt threshold. Furthermore, a mixture model allows joint modelling of the ice and melt observations; this is critical since we have no information on which category each observation belongs to.
The truncated Gaussian mixture model has separate model components for ice and melt temperatures. Observations are not preassigned to a component, rather the fit provides an estimate of the probability with which each observation belongs to each component. For ice components and a single melt component, let be the weight associated with ice component and be the weight associated with the melt component, such that . Let and be the probability density functions of the ice and melt components, respectively, such that for each , , where is the mean, the standard deviation, and and the lower and upper truncation points. Then the mixture model probability density function of IST is:
Ice temperature components are bounded above by , since these should not exceed C, and have no lower bound since the only physical lower limit is absolute zero, which is far below the range of the data. For the melt component, there is no upper bound as ISTs are not feasibly physically limited on the ice sheet. The lower bound is set to as this has previously been estimated as the melting point of saline ice and thereby acts as a conservative minimum estimate for the ice sheet. It follows that data in the range can be modelled by either the ice or melt components, capturing the uncertainty in the true nature of the ice sheet when these observations were measured. The probability of melt for given IST is defined as the ratio of the densities of the ice and melt components, such that:
As a result of the model truncation points, below C and above C. ISTs between these values vary smoothly within this range. Small discontinuities in the melt probability may arise at C and C due to the truncation of the mixture components at these points; the magnitude of the discontinuity depends on the severity of the truncation. To find the most appropriate number of ice components for the model and allow for the diversity of distributions across the ice sheet, we fit the model with from 3 to 6 and compare the fits using Bayesian information criterion (BIC). In contrast, a single melt component is used since, in theory, the melt temperatures should show much less cross-cell variability than the ice temperatures.
We use the Expectation-maximization (EM) algorithm to estimate the parameters , and for each model component. We found 800 iterations of the algorithm optimal in terms of both convergence and computational time. To improve the stability of the algorithm, particularly for cells with few or no observations above C, we set additional restrictions on two parameters: and . This allows a melt component—however small—to be fitted even if there are no observations in the melt temperature range, and avoids melt components with since this results in a point-mass density for cells with only a single observation above C. We apply the model to 1,139 cells across the ice sheet; this represents a subsample of 1 in every 50 available cells in both horizontal and vertical directions. The cells are equally spaced in distance and are taken from those cells identified as ice by the MODIS ice mask. The number and spread of the sub-sampled cells accurately reflects the variety in glaciological and climatological conditions across the ice sheet. A range of average temperatures, surface conditions, latitudes, longitudes, elevations, and distances to the coast are all observed within the sample.
The impact of the soft upper limit on ISTs can be seen from the number of ice components selected by the BIC to model the data at different cells. Figure 5 shows the number of ice components in the best-fitting model to be higher at cells close to the coastline than those closer to the centre of the ice sheet. There are also trends within coastal areas, with cells on the south west coast generally having the highest number of model components and areas in the north west coast having fewer. Cells closer to the coast generally have higher average temperatures, experience more melt and therefore have more frequently truncated observations. Furthermore, melt estimates from the model agree with empirical estimates of melt well using previously established methods of identifying melt with a single threshold (Clarkson et al., 2022). This coastal temperature trend is consistent with local climate trends of higher temperatures occurring closer to the coast and demonstrates the impact of the upper limit from the truncated observations. This also demonstrates the difficulty mentioned above of finding a single distribution to describe the data at all locations of the ice sheet.

(a) Estimated weight of the melt component of each fitted mixture model. Due to restrictions during the fitting process the minimum possible value of this estimate is 0.0005. (b) The number of ice components in the best-fitting mixture model at each cell in our study area.
While there is value in examining how each parameter varies across the ice sheet, to understand melt it is most useful to consider variability and trends in the weight of the melt component. This gives an overview of melt observed across the ice sheet, and reflects broad climate trends and the coastal effect. Figure 5 shows coastal areas having more of their distribution associated with melt temperatures, with cells in the south having a slightly higher weight than those in the north. Melt probability (6) is estimated for each observation and used to calculate the expected number of melt observations at each cell. The cross-cell average is then used to estimate the number of melt observations across the ice sheet in each year. These values are likely to be under-estimates since the expected melt at each cell would be higher at almost all cells if it were possible to also include cloudy days in the analysis. Figure 6 shows the majority () of cells have a nontrivial expectation of melt () in an average year. Although melt is more common on the coasts, it is clear from the expected melt estimates that this is not exclusive with some inland cells also experiencing melt with relatively high frequency, and the main factors that appear to determine the melt extent are distance to the coast and elevation.

Expected melt calculated from the fitted mixture models. The probability of each observation representing a melt temperature is calculated then averaged to the expectation in a single year.
5 Discussion
Extreme value analysis has a long-established link with hazard modelling, risk management and climate change analysis. While historically focus has been on the tail behaviour of univariate responses, more recently considerable attention has been given to advancing extreme value theory and models for multivariate responses, due to (a) an increasing awareness of the potential for impact from multiple hazards and (b) an explosion in the availability of high-dimensional spatio-temporal data products, e.g., satellite images and numerical climate model output. Analysis of the extreme events of such data sets raises many challenges, from defining and identifying jointly extreme events to developing both statistical models and inference tools and the probability theory which underpins them. In this paper, we step back to assess ongoing challenges in univariate extreme value modelling. Such challenges are not restricted to model development, and it is our opinion that for them to be met successfully requires sustained dialogue between statisticians and fellow scientists. Evaluating impacts of climate change on natural hazards requires an understanding of the physical behaviour of both the hazard and any driving processes. Only once this is appropriately incorporated into the statistical model can attribution and prediction of the effects of climate change begin.
The two analyses presented in this paper demonstrate the necessity for novel, bespoke, context-driven models to address questions driven by climate change. While IST and air temperature data sets share some similarities—both measure the same physical variable on or near the Earth’s surface—the most suitable modelling approach for each is quite different due to differences in the physical behaviour of the processes and their drivers, the data collection methods and the underlying research questions. For ISTs, the physical properties of both the ice sheet and the melt process drive the model. Seemingly straightforward tasks—such as defining melt—are nontrivial and, despite representing the tail of the sample, the distribution of temperatures around the melt point does not comply with the assumptions necessary for an out-of-the-box extreme value analysis. While the proposed Gaussian mixture model is not consistent with more routine methods for modelling tail data, it does incorporate an understanding of the melt process that cannot be included in a PoT model. In contrast, the core challenge in the US air temperature study arises from the 2021 heatwave which, despite accounting for global temperature trends, cannot be explained through an extreme value analysis of historical data. To improve our understanding of the extremity of the 2021 event, a PoT random effects model is used to give a better indication of the relative size of the 2021 heatwave in the context of previous extreme temperatures, climate trends, and inter-year variability due to unobserved climate drivers.
All statistical models are a compromise between model parsimony and functionality. Further, the scope for generalization from any statistical model is entirely dependent on the quality and quantity of the data to which the model is fitted. With ever more powerful computing power at our disposal it can be tempting to develop high-dimensional models that aim to mimic their process-based counterparts, regardless of the size and quality of the data set. It is our belief that parsimony is in fact to be preferred, and that a key advantage of statistical models is their comparative cheapness to fit and ease of interpretation. At the same time, some compromise may be desirable; where incorporation of additional model features significantly improves interpretation, predictive abilities or our understanding of the physical process, then such features should be included where the information in the data supports this. Ultimately the utility of the model should be assessed by its ability to describe the data set, with both the most suitable modelling strategy and the necessary level of complexity judged against this benchmark.
The models proposed here are no exception, having both advantages and limitations. The random effects model provides a more flexible description of the data than permitted under a parametric regression model. However, unlike a parametric regression model, it does not permit estimation of the extreme behaviour in a specific year. It further assumes that, conditional on the annual random effects, the observations are independent. In practice, there is likely to be some short-range serial dependence in the largest values. The model could be extended to incorporate serial dependence in either, or both, of the random effects and observations; the consequences of this would be an increased number of model parameters and a much more complex inference process. The mixture model could be criticized for diverging from the traditional approach of analysing only the tail data. Preliminary investigations using this approach provided such a poor fit to the data that it was immediately clear that an alternative approach—taking into account the unusual shape of the tail distribution—was required. The resulting model provides a much better description of the upper tail and, indeed, is a good fit to the full range of values. Neither the random effects nor the mixture model accounts for the inherent spatial structure in the physical processes and while this is beyond the scope of the current research it is the subject of ongoing research by the authors.
We end by discussing some broader open questions for the application of extreme value analysis in climate change scenarios. The first of these concerns guidance for scientists on the most appropriate way to incorporate change in either univariate or multivariate models. We conjecture that it is unlikely that there will ever be a definitive way to do this due to the diverse physical properties exhibited by different environmental hazards. Instead, statisticians should strive for an increased awareness of the appropriateness, advantages and limitations of the available methods through direct comparisons of these different strategies. Such comparisons should also support an increased understanding of the benefits of trialling multiple modelling approaches.
Secondly, extreme value models are asymptotically justified in the sense of extrapolation into the tail of an underlying distribution. Most existing approaches to incorporating change assume the model parameters vary in time and/or in response to driving physical processes. Such regression-based models, whether semi- or fully parametric, have no similar justification for the extrapolation of trends into the future. Further, the predictions of future events will be very sensitive to the form assumed for the change. Again, this is strong motivation to search for more reliable, robust methods.
Thirdly, most routine multivariate, spatio- and temporal extreme value models require a separate model for the univariate marginal distributions. It is therefore critical that these marginal models reflect as faithfully as possible both the empirical properties of the data and physical properties of the underlying processes. As in the case of the ice surface temperatures discussed in Section 4, this might mean reconsidering the use of an extreme value model for the marginal distributions.
Finally, much work remains to identify risk measures which still have an interpretation under climate change; return levels (periods) make less sense if there is a long-term trend—what is deemed extreme now may no longer be extreme in 100 years. Return levels and periods which condition on a time period or covariates (Eastoe & Tawn, 2009) are more realistic but harder to interpret. Related to this is the problem of engaging with fellow scientists and the general public on the topic of changes in the risk of extreme events. It is vital to emphasize that acknowledgement of such change should not imply a loss of credibility in the predictions, but rather a more accurate reflection of a changing world.
Acknowledgments
We thank Paul Young for an introduction to the US temperature data, and David Leslie and Rachael Duncan for helpful comments on the draft. D.C., E.E. and A.L. were supported by the EPSRC-funded Data Science of the Natural Environment project (EP/R01860X/1).
Appendix. MCMC trace plots

(a) Trace plot of the shape parameter for weather station ‘Cedar Tree’ with the mean of the estimates (red horizontal line). (b) Histogram of the sampled shape parameters from the same MCMC chain as the trace plot, with the mean (bold red line) and and quantiles (normal red lines).
References
Author notes
*Read before The Royal Statistical Society at the first meeting on ‘Statistical aspects of climate change’ held at the Society's 2022 annual conference in Aberdeen on Wednesday, 14 September 2022, the President, Professor Sylvia Richardson, in the Chair.
Conflict of interest None declared.