glmm.hp: an R package for computing individual effect of predictors in generalized linear mixed models

Abstract

Generalized linear mixed models (GLMMs) have been widely used in contemporary ecology studies. However, determination of the relative importance of collinear predictors (i.e. fixed effects) to response variables is one of the challenges in GLMMs. Here, we developed a novel R package, glmm.hp, to decompose marginal R² explained by fixed effects in GLMMs. The algorithm of glmm.hp is based on the recently proposed approach ‘average shared variance’ i.e. used for multivariate analysis. We explained the principle and demonstrated the use of this package by simulated dataset. The output of glmm.hp shows individual marginal R²s that can be used to evaluate the relative importance of predictors, which sums up to the overall marginal R². Overall, we believe the glmm.hp package will be helpful in the interpretation of GLMM outcomes.

摘要

glmm.hp:一个计算广义线性混合模型中单个预测变量效应的R包

广义线性混合效应模型(GLMMs)是当代生态学研究中广泛应用的数据分析模型。然而，确定GLMMs中共线性的预测变量(固定效应)对响应变量的相对重要性是个挑战。基于适用于多元分析的‘平均共享方差’的算法，我们开发一个新的R包glmm.hp来分解GLMMs中由固定效应解释的边际(marginal)R²。我们论述了该软件包的工作原理并通过模拟数据集演示了该软件包的使用。glmm.hp 包的输出结果为每个预测变量将获得一个独自的(individual)边际R²，且它们的总和刚好等于模型总的边际R²。总之，我们相信glmm.hp包将有助于解释GLMMs 的输出结果。

coefficient of determination, commonality analysis, fixed effect, GLMM, hierarchical partitioning, marginal R², relative importance, variance partitioning

决定系数，, 共性分析，, 固定效应，, 广义线性混合效应模型，, 层次分割，, 边际R^2，, 相对重要性，, 方差分解

Issue Section:

Method

Handling Editor:

INTRODUCTION

Ecological data are commonly non-normal or hierarchical with nested structure, and therefore are often defy the statistical assumptions of classical linear models (Bolker et al. 2009). Generalized linear mixed models (GLMMs) provide a more flexible approach for analyzing these non-normal data when random effects are present to deal with hierarchical features, which have been widely used in analyzing ecology data (Harrison 2014; Harrison et al. 2018; Schielzeth et al. 2020). Although GLMMs are complex and relatively technical, modern software has made GLMMs accessible and relatively straightforward to non-experts, and have fit in widely applicable tools (Harrison et al. 2018; Lai et al. 2019), such as the lme4 (Bates et al. 2015) and nlme (Pinheiro et al. 2020) packages in the R language.

In a linear model context, the coefficient of determination, known as R², is a useful and intuitive tool for assessing the goodness-of-fit of models. For models estimated by the least squares method, R² can be expressed as 1 minus the proportion of residual variance to total variance (i.e. the variance of the dependent variable y):

\begin{array}{l} R^{2} = 1 - \frac{V a r (r e s i d u a l)}{V a r (y)} \end{array}

(1)

Although the residual variance can easily be estimated for the normal linear model without random effects, this is a difficult task for GLMMs. Nakagawa and Schielzeth (2013) pointed out two difficulties in generalizing R² statistic in GLMMs: (i) it is not straightforward to calculate the residual variance when the model contains non-normal error structures; and (ii) it is not clear which level should be used to calculate the unexplained variance due to the hierarchical data structure (see also in Harrison et al. 2018; Nakagawa et al. 2017). Nakagawa and Schielzeth (2013) then proposed two complementary R² values: the marginal R² encompassing variance explained by the fixed effects only, and the conditional R² comprising variance explained by both fixed and random effects, i.e. the variance explained by the whole model. This simple method has been widely accepted and applied in GLMMs.

When there are multiple predictors in regression, researchers usually are interested in understanding the contribution of each predictor (Healy 1990), which can be reflected from their contribution proportions to the model’s goodness-of-fit, e.g. the R² (Johnson and LeBreton 2004). If multiple fixed predictors are existed in the GLMMs, then the marginal R² quantifies the variance explained by all predictors together. Recently, Stoffel et al. (2021) developed an R package: partR2, i.e. able to compute part R² in GLMMs. The part R², which is also known as the semi-partial coefficient of determination, enables us to understand the proportion of variance uniquely explained by individual predictors (e.g. fractions [a], [b] and [c] in Fig. 1).

Figure 1:

Venn diagram representing the proportions of variance components in a mixed model. The large circle represents the variance in a response Y. The upper semicircle (Y_RE) represents the variance explained by random effects. Three lower small circles with covariances intersections represent variance explained by fixed effects. The dashed line symbolizes that the intersections are divided into equal components. This graph is redrawn after Stoffel et al. (2021).

Open in new tab Download slide

Understanding the variance uniquely explained by single fixed predictor makes sense when predictors are independent. However, when predictors are correlated, understanding their importance based solely on their part R² can be much less informative, as it ignores the importance of the shared variance, which is also true in the GLMMs (e.g. fractions [d], [e], [f] and [g] in Fig. 1). In ecology studies, predictors in are usually correlated to each other and hence it is not clear to which predictor this shared variance should be attributed (Graham 2003; Murray and Conner 2009).

Recently, Lai et al. (2022) proposed the ‘average shared variance’ method for predictor in multiple regression and canonical analyses. This method suggested that shared variance can be decomposed into equal components according to the number of involved predictors, so that the relative importance of each predictor can be simply estimated by its part R² plus the sum of all allocated average shared R² (Fig. 1). This method is identical to those methods independently described in the literature such as ‘averaging over orderings’ (Kruskal 1987; Lindeman et al. 1980), ‘hierarchical partitioning’ (Chevan and Sutherland 1991) and ‘dominance analysis’ (Budescu 1993) for multiple regression.

Here, we extend the ‘average shared variance’ algorithm to decompose the marginal R² of GLMMs. We develop a new package glmm.hp to decompose the marginal R² of GLMMs to evaluate the relative importance of each fixed predictor. We demonstrate the use of glmm.hp using a simulated dataset in partR2 package.

MATHEMATICAL REPRESENTATION

According to the concept of ‘average shared variance’ (Lai et al. 2022), the shared R² can be divided into equal components by the number of involved predictors, and then allocated to these predictors. For illustration, we assume three correlated fixed predictors in a mixed model (X₁, X₂ and X₃; Fig. 1). Fractions of [a], [b] and [c] correspond to the part R² by fixed effect predictors X₁, X₂ and X₃, respectively. Fractions [d], [e] and [f] are the shared R² by two predictors, [g] is the shared R² by all three predictors. The individual marginal R² of each predictor is the sum of all its allocated shared R² plus its part R², which can be written as follows:

\begin{array}{l} R_{X_{1}}^{2} = a + \frac{d}{2} + \frac{f}{2} + \frac{g}{3} \end{array}

(2)

\begin{array}{l} R_{X_{2}}^{2} = b + \frac{d}{2} + \frac{e}{2} + \frac{g}{3} \end{array}

(3)

\begin{array}{l} R_{X_{3}}^{2} = c + \frac{f}{2} + \frac{e}{2} + \frac{g}{3} \end{array}

(4)

where $R_{X_{1}}^{2}$ , $R_{X_{2}}^{2}$ and $R_{X_{3}}^{2}$ are individual marginal R² of predictor X₁, X₂ and X₃, with its sum up as the overall marginal $R^{2} (R_{m . f u l l}^{2})$ ⁠.

\begin{array}{l} R_{X_{1}}^{2} + R_{X_{2}}^{2} + R_{X_{3}}^{2} = R_{m . f u l l}^{2} \end{array}

(5)

The advantage of the method is that the total marginal R² can be decomposed into order-independent contributions from the individual predictors. The fractions from [a] to [g] can be calculated by the algorithm of commonality analysis (Nimon et al. 2008; Ray-Mukherjee et al. 2014).

The order of relative importance of fixed predictors in GLMMs is determined by their individual marginal R². However, when the number of predictors (P) increases, the number of fractions including part and shared R² increases (2^P − 1 fractions) exponentially, which will result in an exponentially increased calculation volume with the increase in the number of predictors (Lai et al. 2022).

PACKAGE DESCRIPTION

The glmm.hp package is written in the language R (R Core Team 2021) and can be installed from the Comprehensive R Archive Network (http://cran.r-project.org/web/packages/glmm.hp/) or Github (https://github.com/laijiangshan/glmm.hp). The package contains one key homonymous function: glmm.hp() which computes the individual marginal R² of each fixed predictor in GLMMs. There are four internal functions to support the glmm.hp function: creatbin(), genList(), GMI() and odd(). The manual of the internal functions can be found in its document. A S3method function plot.glmmhp() can be used to generate a bar chart of the each predictor based on the output of glmm.hp() via ggplot2 package (Wickham 2016).

The glmm.hp() function depends on r.squaredGLMM() in the MuMIn package (Bartoń 2022), which is used to calculate the marginal R² of GLMMs. The glmm.hp() function only has one argument: mod, i.e. a fitted (generalized) linear mixed model, which comes from package the lme4 (Bates et al. 2015) and nlme (Pinheiro et al. 2020). The workflow of the glmm.hp() function is: (i) extracting the original dataset and formula from the mod; (ii) extracting names of predictors (i.e. fixed effect variables) from the formula and (iii) calculating the individual marginal R² for each fixed predictor by unique (i.e. part R²) and the shared marginal R² from the commonality analysis. The function glmm.hp() can work without constraining the data distribution once r.squaredGLMM() can correctly work with mod and return the marginal R² of the model. For non-Gaussian models, r.squaredGLMM() can return different types of marginal R² depending on the methods used to derive the variance at the observation level, e.g. ‘theoretical’ and ‘delta’ in the binomial distribution (Nakagawa et al. 2017). The function glmm.hp() reports the partitioning of all types of marginal R² (see Supplementary Appendix). Handling data containing missing values can be problematic in glmm.hp and rows with missing values will be removed automatically. The function glmm.hp() returns a ‘glmmhp’ class object. Models with interaction terms are not allowed in glmm.hp(). However, if both of the interaction variables are continuous variables, one can use their product (i.e. their interaction term) as a new predictor and insert it into the original model.

A WORKING EXAMPLE

Here, we illustrate functionalities of glmm.hp() for the Gaussian distribution using a simulated dataset on grassland biodiversity and biomass that are available in the partR2 package (Stoffel et al. 2021). The dataset contains 200 cases and 11 variables, and we use the first six variables (Biomass, Year, Temperature, Precipitation, SpeciesDiversity and Population). The data simulate a scenario that 20 invertebrate species are sampled once a year for 10 consecutive years, with species diversity and biomass are calculated. Temperature and precipitation corresponding to each sample are recorded. A linear mixed effect model is used to evaluate the response of invertebrate biomass to invertebrate diversity, sampling year, sampling temperature and precipitation. As the relationship between response variables and predictors may differ for different invertebrate species, species is set as the random effect. Considering the potential correlation between explanatory variables (e.g. temperature and precipitation), hereafter we demonstrate how to use glmm.hp() to compute the individual marginal R² taking the shared fractions into account.

We fit the linear mixed effects model using lmer() function from the lme4 package (Bates et al. 2015), while results are identical when using lme() function from nlme package (Pinheiro et al. 2020; see Supplementary Appendix). The biomass dataset are available in partR2 package and needs to be loaded using the data() function.

The function glmm.hp() returns a list containing two elements.

$r.squaredGLMM: This element shows the total marginal R² and conditional R² of model. In this example, they are 0.6005 and 0.6443.
$hierarchical.partitioning: This element is a matrix containing the unique, average shared and individual effect of each predictor, and individual contribution percentage in the ‘I.perc’ column. In this case, the individual effect of Year, Temperature, Precipitation and SpeciesDiversity are 0.0155, 0.1836, 0.2231 and 0.1783, respectively, which sum up to the overall marginal R² (0.6005). Users can easily know how to calculate the individual effect, i.e. the sum of its unique effect and average shared effect.

When evaluating predictors importance based solely on their unique effect (i.e. part R²) as suggested by Stoffel et al. (2021), SpeciesDiversity is the most important predictor with the highest part R² (0.1718), which is much higher than other three variables. The part R²s of Temperature (0.0381) and Precipitation (0.0872) are relatively low due to their collinearity (Pearson r = 0.69), and therefore are not informative. However, when evaluating predictors’ importance based on their individual effect, the most important predictor is Precipitation, followed by Temperature, Species Diversity and Year. As the glmm.hp() function considers both the unique and shared variance that sum to the overall marginal R², it provides an ideal indicator to evaluate the relative importance of each fixed effect predictor.

The plot.glmmhp() function produces a bar chart (Fig. 2), while different plots based on the numerical output of glmm.hp() can be generated according to the users’ preferences. In this demo, we provide an example with Gaussian error distribution in GLMMs, other examples in with non-Gaussian distribution (e.g. binomial distribution) can be found in the Supplementary Appendix.

Figure 2:

The relative importance of individual predictors in predicting biomass in the dataset by glmm.hp.

Open in new tab Download slide

CONCLUSIONS

Here, we present an R package: glmm.hp that facilitates interpretation of the importance of each fixed predictor in GLMMs. The package allows us to quantify the importance of each predictor variable, particularly when these variables are correlated (i.e. with collinearity). Notably, the glmm.hp implements the R² calculation based on Nakagawa and Schielzeth (2013) and Nakagawa et al. (2017), which provides an accessible description of the problems with, and solutions to, generalizing R² metrics to GLMMs, but there are other estimation methods for R² for GLMMs (e.g. Jaeger et al. 2017; Piepho 2019). Starting with commonality analysis that quantifies the components of unique and shared marginal R², individual contribution of each predictor is calculated as the sum of its unique and its allocated average shared variance. The advantage of this method is to decompose overall marginal R² into order-independent contributions of each predictor that automatically sum to the overall marginal R². Overall, we believe the glmm.hp package will be helpful in interpreting the GLMM outcomes, which will make valuable contribution to ecology and other related fields.

Funding

This work was supported by the National Natural Science Foundation of China (32271551) and the Metasequoia funding of Nanjing Forestry University.

Conflict of interest statement. The authors declare that they have no conflict of interest.

REFERENCES

Bartoń

(

2022

)

MuMIn: Multi-Model Inference. R Package Version 1.46.0

. https://CRAN.R-project.org/package=MuMIn

Bates

Machler

Bolker

, et al. . (

2015

)

Fitting linear mixed-effects models using lme4

J Stat Softw

–

Month:	Total Views:
November 2022	99
December 2022	94
January 2023	372
February 2023	309
March 2023	664
April 2023	1,485
May 2023	1,165
June 2023	389
July 2023	450
August 2023	449
September 2023	364
October 2023	442
November 2023	604
December 2023	492
January 2024	570
February 2024	422
March 2024	525
April 2024	451
May 2024	477
June 2024	434
July 2024	408
August 2024	425
September 2024	433
October 2024	547
November 2024	569
December 2024	526
January 2025	345
February 2025	351
March 2025	554
April 2025	452
May 2025	88

Article Contents

glmm.hp: an R package for computing individual effect of predictors in generalized linear mixed models

Abstract

摘要

INTRODUCTION

MATHEMATICAL REPRESENTATION

PACKAGE DESCRIPTION

A WORKING EXAMPLE

CONCLUSIONS

Funding

REFERENCES

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only