-
PDF
- Split View
-
Views
-
Cite
Cite
Jiangshan Lai, Yi Zou, Shuang Zhang, Xiaoguang Zhang, Lingfeng Mao, glmm.hp: an R package for computing individual effect of predictors in generalized linear mixed models, Journal of Plant Ecology, Volume 15, Issue 6, December 2022, Pages 1302–1307, https://doi.org/10.1093/jpe/rtac096
- Share Icon Share
Abstract
Generalized linear mixed models (GLMMs) have been widely used in contemporary ecology studies. However, determination of the relative importance of collinear predictors (i.e. fixed effects) to response variables is one of the challenges in GLMMs. Here, we developed a novel R package, glmm.hp, to decompose marginal R2 explained by fixed effects in GLMMs. The algorithm of glmm.hp is based on the recently proposed approach ‘average shared variance’ i.e. used for multivariate analysis. We explained the principle and demonstrated the use of this package by simulated dataset. The output of glmm.hp shows individual marginal R2s that can be used to evaluate the relative importance of predictors, which sums up to the overall marginal R2. Overall, we believe the glmm.hp package will be helpful in the interpretation of GLMM outcomes.
摘要
广义线性混合效应模型(GLMMs)是当代生态学研究中广泛应用的数据分析模型。然而,确定GLMMs中共线性的预测变量(固定效应)对响应变量的相对重要性是个挑战。基于适用于多元分析的‘平均共享方差’的算法,我们开发一个新的R包glmm.hp来分解GLMMs中由固定效应解释的边际(marginal)R2。我们论述了该软件包的工作原理并通过模拟数据集演示了该软件包的使用。glmm.hp 包的输出结果为每个预测变量将获得一个独自的(individual)边际R2,且它们的总和刚好等于模型总的边际R2。总之,我们相信glmm.hp包将有助于解释GLMMs 的输出结果。
INTRODUCTION
Ecological data are commonly non-normal or hierarchical with nested structure, and therefore are often defy the statistical assumptions of classical linear models (Bolker et al. 2009). Generalized linear mixed models (GLMMs) provide a more flexible approach for analyzing these non-normal data when random effects are present to deal with hierarchical features, which have been widely used in analyzing ecology data (Harrison 2014; Harrison et al. 2018; Schielzeth et al. 2020). Although GLMMs are complex and relatively technical, modern software has made GLMMs accessible and relatively straightforward to non-experts, and have fit in widely applicable tools (Harrison et al. 2018; Lai et al. 2019), such as the lme4 (Bates et al. 2015) and nlme (Pinheiro et al. 2020) packages in the R language.
In a linear model context, the coefficient of determination, known as R2, is a useful and intuitive tool for assessing the goodness-of-fit of models. For models estimated by the least squares method, R2 can be expressed as 1 minus the proportion of residual variance to total variance (i.e. the variance of the dependent variable y):
Although the residual variance can easily be estimated for the normal linear model without random effects, this is a difficult task for GLMMs. Nakagawa and Schielzeth (2013) pointed out two difficulties in generalizing R2 statistic in GLMMs: (i) it is not straightforward to calculate the residual variance when the model contains non-normal error structures; and (ii) it is not clear which level should be used to calculate the unexplained variance due to the hierarchical data structure (see also in Harrison et al. 2018; Nakagawa et al. 2017). Nakagawa and Schielzeth (2013) then proposed two complementary R2 values: the marginal R2 encompassing variance explained by the fixed effects only, and the conditional R2 comprising variance explained by both fixed and random effects, i.e. the variance explained by the whole model. This simple method has been widely accepted and applied in GLMMs.
When there are multiple predictors in regression, researchers usually are interested in understanding the contribution of each predictor (Healy 1990), which can be reflected from their contribution proportions to the model’s goodness-of-fit, e.g. the R2 (Johnson and LeBreton 2004). If multiple fixed predictors are existed in the GLMMs, then the marginal R2 quantifies the variance explained by all predictors together. Recently, Stoffel et al. (2021) developed an R package: partR2, i.e. able to compute part R2 in GLMMs. The part R2, which is also known as the semi-partial coefficient of determination, enables us to understand the proportion of variance uniquely explained by individual predictors (e.g. fractions [a], [b] and [c] in Fig. 1).

Venn diagram representing the proportions of variance components in a mixed model. The large circle represents the variance in a response Y. The upper semicircle (YRE) represents the variance explained by random effects. Three lower small circles with covariances intersections represent variance explained by fixed effects. The dashed line symbolizes that the intersections are divided into equal components. This graph is redrawn after Stoffel et al. (2021).
Understanding the variance uniquely explained by single fixed predictor makes sense when predictors are independent. However, when predictors are correlated, understanding their importance based solely on their part R2 can be much less informative, as it ignores the importance of the shared variance, which is also true in the GLMMs (e.g. fractions [d], [e], [f] and [g] in Fig. 1). In ecology studies, predictors in are usually correlated to each other and hence it is not clear to which predictor this shared variance should be attributed (Graham 2003; Murray and Conner 2009).
Recently, Lai et al. (2022) proposed the ‘average shared variance’ method for predictor in multiple regression and canonical analyses. This method suggested that shared variance can be decomposed into equal components according to the number of involved predictors, so that the relative importance of each predictor can be simply estimated by its part R2 plus the sum of all allocated average shared R2 (Fig. 1). This method is identical to those methods independently described in the literature such as ‘averaging over orderings’ (Kruskal 1987; Lindeman et al. 1980), ‘hierarchical partitioning’ (Chevan and Sutherland 1991) and ‘dominance analysis’ (Budescu 1993) for multiple regression.
Here, we extend the ‘average shared variance’ algorithm to decompose the marginal R2 of GLMMs. We develop a new package glmm.hp to decompose the marginal R2 of GLMMs to evaluate the relative importance of each fixed predictor. We demonstrate the use of glmm.hp using a simulated dataset in partR2 package.
MATHEMATICAL REPRESENTATION
According to the concept of ‘average shared variance’ (Lai et al. 2022), the shared R2 can be divided into equal components by the number of involved predictors, and then allocated to these predictors. For illustration, we assume three correlated fixed predictors in a mixed model (X1, X2 and X3; Fig. 1). Fractions of [a], [b] and [c] correspond to the part R2 by fixed effect predictors X1, X2 and X3, respectively. Fractions [d], [e] and [f] are the shared R2 by two predictors, [g] is the shared R2 by all three predictors. The individual marginal R2 of each predictor is the sum of all its allocated shared R2 plus its part R2, which can be written as follows:
where
The advantage of the method is that the total marginal R2 can be decomposed into order-independent contributions from the individual predictors. The fractions from [a] to [g] can be calculated by the algorithm of commonality analysis (Nimon et al. 2008; Ray-Mukherjee et al. 2014).
The order of relative importance of fixed predictors in GLMMs is determined by their individual marginal R2. However, when the number of predictors (P) increases, the number of fractions including part and shared R2 increases (2P − 1 fractions) exponentially, which will result in an exponentially increased calculation volume with the increase in the number of predictors (Lai et al. 2022).
PACKAGE DESCRIPTION
The glmm.hp package is written in the language R (R Core Team 2021) and can be installed from the Comprehensive R Archive Network (http://cran.r-project.org/web/packages/glmm.hp/) or Github (https://github.com/laijiangshan/glmm.hp). The package contains one key homonymous function: glmm.hp() which computes the individual marginal R2 of each fixed predictor in GLMMs. There are four internal functions to support the glmm.hp function: creatbin(), genList(), GMI() and odd(). The manual of the internal functions can be found in its document. A S3method function plot.glmmhp() can be used to generate a bar chart of the each predictor based on the output of glmm.hp() via ggplot2 package (Wickham 2016).
The glmm.hp() function depends on r.squaredGLMM() in the MuMIn package (Bartoń 2022), which is used to calculate the marginal R2 of GLMMs. The glmm.hp() function only has one argument: mod, i.e. a fitted (generalized) linear mixed model, which comes from package the lme4 (Bates et al. 2015) and nlme (Pinheiro et al. 2020). The workflow of the glmm.hp() function is: (i) extracting the original dataset and formula from the mod; (ii) extracting names of predictors (i.e. fixed effect variables) from the formula and (iii) calculating the individual marginal R2 for each fixed predictor by unique (i.e. part R2) and the shared marginal R2 from the commonality analysis. The function glmm.hp() can work without constraining the data distribution once r.squaredGLMM() can correctly work with mod and return the marginal R2 of the model. For non-Gaussian models, r.squaredGLMM() can return different types of marginal R2 depending on the methods used to derive the variance at the observation level, e.g. ‘theoretical’ and ‘delta’ in the binomial distribution (Nakagawa et al. 2017). The function glmm.hp() reports the partitioning of all types of marginal R2 (see Supplementary Appendix). Handling data containing missing values can be problematic in glmm.hp and rows with missing values will be removed automatically. The function glmm.hp() returns a ‘glmmhp’ class object. Models with interaction terms are not allowed in glmm.hp(). However, if both of the interaction variables are continuous variables, one can use their product (i.e. their interaction term) as a new predictor and insert it into the original model.
A WORKING EXAMPLE
Here, we illustrate functionalities of glmm.hp() for the Gaussian distribution using a simulated dataset on grassland biodiversity and biomass that are available in the partR2 package (Stoffel et al. 2021). The dataset contains 200 cases and 11 variables, and we use the first six variables (Biomass, Year, Temperature, Precipitation, SpeciesDiversity and Population). The data simulate a scenario that 20 invertebrate species are sampled once a year for 10 consecutive years, with species diversity and biomass are calculated. Temperature and precipitation corresponding to each sample are recorded. A linear mixed effect model is used to evaluate the response of invertebrate biomass to invertebrate diversity, sampling year, sampling temperature and precipitation. As the relationship between response variables and predictors may differ for different invertebrate species, species is set as the random effect. Considering the potential correlation between explanatory variables (e.g. temperature and precipitation), hereafter we demonstrate how to use glmm.hp() to compute the individual marginal R2 taking the shared fractions into account.
We fit the linear mixed effects model using lmer() function from the lme4 package (Bates et al. 2015), while results are identical when using lme() function from nlme package (Pinheiro et al. 2020; see Supplementary Appendix). The biomass dataset are available in partR2 package and needs to be loaded using the data() function.

The function glmm.hp() returns a list containing two elements.
$r.squaredGLMM: This element shows the total marginal R2 and conditional R2 of model. In this example, they are 0.6005 and 0.6443.
$hierarchical.partitioning: This element is a matrix containing the unique, average shared and individual effect of each predictor, and individual contribution percentage in the ‘I.perc’ column. In this case, the individual effect of Year, Temperature, Precipitation and SpeciesDiversity are 0.0155, 0.1836, 0.2231 and 0.1783, respectively, which sum up to the overall marginal R2 (0.6005). Users can easily know how to calculate the individual effect, i.e. the sum of its unique effect and average shared effect.
When evaluating predictors importance based solely on their unique effect (i.e. part R2) as suggested by Stoffel et al. (2021), SpeciesDiversity is the most important predictor with the highest part R2 (0.1718), which is much higher than other three variables. The part R2s of Temperature (0.0381) and Precipitation (0.0872) are relatively low due to their collinearity (Pearson r = 0.69), and therefore are not informative. However, when evaluating predictors’ importance based on their individual effect, the most important predictor is Precipitation, followed by Temperature, Species Diversity and Year. As the glmm.hp() function considers both the unique and shared variance that sum to the overall marginal R2, it provides an ideal indicator to evaluate the relative importance of each fixed effect predictor.
The plot.glmmhp() function produces a bar chart (Fig. 2), while different plots based on the numerical output of glmm.hp() can be generated according to the users’ preferences. In this demo, we provide an example with Gaussian error distribution in GLMMs, other examples in with non-Gaussian distribution (e.g. binomial distribution) can be found in the Supplementary Appendix.

The relative importance of individual predictors in predicting biomass in the dataset by glmm.hp.
CONCLUSIONS
Here, we present an R package: glmm.hp that facilitates interpretation of the importance of each fixed predictor in GLMMs. The package allows us to quantify the importance of each predictor variable, particularly when these variables are correlated (i.e. with collinearity). Notably, the glmm.hp implements the R2 calculation based on Nakagawa and Schielzeth (2013) and Nakagawa et al. (2017), which provides an accessible description of the problems with, and solutions to, generalizing R2 metrics to GLMMs, but there are other estimation methods for R2 for GLMMs (e.g. Jaeger et al. 2017; Piepho 2019). Starting with commonality analysis that quantifies the components of unique and shared marginal R2, individual contribution of each predictor is calculated as the sum of its unique and its allocated average shared variance. The advantage of this method is to decompose overall marginal R2 into order-independent contributions of each predictor that automatically sum to the overall marginal R2. Overall, we believe the glmm.hp package will be helpful in interpreting the GLMM outcomes, which will make valuable contribution to ecology and other related fields.
Funding
This work was supported by the National Natural Science Foundation of China (32271551) and the Metasequoia funding of Nanjing Forestry University.
Conflict of interest statement. The authors declare that they have no conflict of interest.