Abstract

glmm.hp is an R package designed to evaluate the relative importance of collinear predictors within generalized linear mixed models (GLMMs). Since its initial release in January 2022, it has been rapidly gained recognition and popularity among ecologists. However, the previous glmm.hp package was limited to work GLMMs derived exclusively from the lme4 and nlme packages. The latest glmm.hp package has extended its functions. It has integrated results obtained from the glmmTMB package, thus enabling it to handle zero-inflated generalized linear mixed models (ZIGLMMs) effectively. Furthermore, it has introduced the new functionalities of commonality analysis and hierarchical partitioning for multiple linear regression models by considering both unadjusted R2 and adjusted R2. This paper will serve as a demonstration for the applications of these new functionalities, making them more accessible to users.

摘要

glmm.hp包对零膨胀广义线性混合模型与多元回归的扩展

glmm.hp是一个专为评估广义线性混合模型(GLMMs)中共线预测变量的相对重要性而开发的R包。自从其于2022年1月发布以来,已迅速在生态学界获得认可和流行。然而,先前的glmm.hp包仅限于处理仅来源于lme4nlme包的GLMMs。最新的glmm.hp包增加了新功能。首先,它整合了从glmmTMB包获得的结果,使其能够有效地处理零膨胀广义线性混合模型。此外,最新的glmm.hp包添加了基于原始R2和校正R2的普通多元回归的共性分析和层次分割的功能。本文将展示这些新功能,更方便广大的研究人员使用。

INTRODUCTION

Generalized linear mixed models (GLMMs) are widely used in modern ecological research due to their flexibility in handling non-normal distribution and hierarchical structured data (Bolker et al. 2009). However, a challenge in utilizing GLMMs is assessing the relative importance of correlated predictors (referred to as fixed effects) with respect to response variables (Stoffel et al. 2017, 2021). To address this challenge, Lai et al. (2022a) introduced a specialized R package named ‘glmm.hp’, which enables researchers to quantify the individual contributions of predictors in GLMMs by decomposing the commonly used Nakagawa marginal R2 (Nakagawa and Schielzeth 2013; Nakagawa et al. 2017).

The glmm.hp package extends the ‘average shared variance’ methodology developed by Lai et al. (2022b) for canonical analyses to GLMMs. The core idea of this method is to equally allocate the shared variance caused by collinear explanatory variables. The individual R2 for each variable will be composed of both unique and allocated shared R2. In this way, the sum of individual R2 for each variable equals the total R2. Notably, this method yields similar results to other established techniques documented in the literature, such as the ‘averaging over orderings’ (Kruskal and Majors 1989; Lindeman et al. 1980), ‘hierarchical partitioning’ (Chevan and Sutherland 1991) and ‘dominance analysis’ (Budescu 1993), which are frequently used in multiple linear regression analysis (Bi 2012). However, compared to these above complex derivation procedures, the method of ‘average shared variance’ is more intuitive and easily comprehensible (Lai et al. 2022b).

The glmm.hp package was initially launched on the R official website (https://cran.r-project.org/web/packages/glmm.hp/index.html) in January 2022. As of the time of this writing, the package has accumulated more than 12 000 downloads, as reported on the R package monitoring website (www.datasciencemeta.com/rpackages). A search on Google Scholar shows the package has been utilized in more than 30 research papers. These findings highlight the increasing recognition and adoption of glmm.hp package within the community of ecologists.

An article introducing the principles and operational procedures of the glmm.hp package was published in the sixth issue of this journal in 2022 (Lai et al. 2022a). In the version (0.0-3) available at that time, glmm.hp package was primarily designed for GLMMs models only from the lme4 package (Bates et al. 2015) and nlme package (Pinheiro et al. 2020). Since the publication of the paper, we have continuously enhanced the package’s capabilities. Specifically, we integrated the outcomes of glmmTMB package (Brooks et al. 2017) into the glmm.hp package within GLMMs models. Additionally, we incorporated the results from ordinary multiple linear regression into glmm.hp package, obtained through the lm() functions in base package. In this article, we illustrate these new functionalities of glmm.hp package (version 0.1-0) through illustrative case studies.

WORKING EXAMPLE

glmm.hp() working example for glmmTMB()

Zero-inflated generalized linear mixed models (ZIGLMMs) are an extension of GLMMs that address the issue of excessive zero values in count data (Zeileis et al. 2008). In many ecological cases, count data may exhibit more zeros than the expected standard Poisson or negative binomial distribution (Harrison 2014). ZIGLMMs account for this excess of zeros by considering two processes: one process for the excess zeros (zero-inflation) and another process for the remaining counts (Brooks et al. 2017).

When it comes to fit ZIGLMMs using the glmmTMB package, researchers can use its flexible framework to model both the count portion and the zero-inflation portion of the data. The glmmTMB package provides the ability to specify different distributions (e.g. Poisson, negative binomial) and link functions for each part of the model. To learn more about the features of the glmmTMB package, you can consult the help documentation provided with glmmTMB package (https://cran.r-project.org/web/packages/glmmTMB/index.html).

We demonstrate the capabilities of the glmm.hp() function when applied to the output generated by the glmmTMB() function. To do so, we utilize a dataset containing information on the abundance of salamanders, which is readily available within the glmmTMB package. This dataset comprises count data representing the abundance of salamanders, recorded on four separate occasions across 23 different stream sites. Some of these sites have been affected by coal mining activities, and the observations encompass various salamander species and life stages (Price et al. 2016).

Here, we fit ZIGLMMs to evaluate the response of abundance of salamanders (count data) to coal mining (‘mined’ variable) and species (‘spp’ variable), while sample site is set as the random effect and chose Poisson distribution. In this case, the aim is to compare the relative importance of coal mining (mined) and species (spp) on the abundance of salamanders.

The glmm.hp() function relies on the r.squaredGLMM() function from the MuMIn package (Bartoń 2022) to compute the marginal R2 of GLMMs. For Poisson-distributed GLMMs, the r.squaredGLMM() function provides three types of R2: ‘delta’, ‘lognormal’ and ‘trigamma’. The differences among them mainly results from variations in denominator in the calculation of the R2 (Nakagawa and Schielzeth 2013; Nakagawa et al. 2017). For more details, one can refer to the help documentation of the r.squaredGLMM() function in the MuMIn package (Bartoń 2022). Typically, one tends to favor selecting the highest R2. Hence, we also plot the decomposition for the highest R2 (i.e. ‘lognormal’ type) here, located in the second row. Consequently, we set argument ‘n = 2’ in the plot() generic function (Fig. 1). Under all three types of R2, it is evident that coal mining (mined) has a greater impact on the abundance of salamanders compared to species (spp). It’s important to note that our objective here is to illustrate the process of using glmm.hp() function working the output of glmmTMB package.

The relative importance of individual predictors on the abundance of salamanders (count data) in the dataset by glmm.hp() for output of glmmTMB().
Figure 1:

The relative importance of individual predictors on the abundance of salamanders (count data) in the dataset by glmm.hp() for output of glmmTMB().

glmm.hp() working example for lm()

For ordinary multiple linear regression, there are several commonly used R packages for conducting R2 decomposition (including commonality analysis and hierarchical partitioning). For instance, the yhat package is dedicated to commonality analysis (Nimon et al. 2013), while the hier.part package (Walsh and Mac Nally 2013), relaimpo package (Grömping 2006) and dominanceanalysis package (Navarrete and Soares 2020), are employed for hierarchical partitioning. However, it’s worth noting that all of these packages exclusively perform the unadjusted R2 decomposition. In ecological research, it is a standard practice to employ the adjusted R2, since unadjusted R2 is biased (Peres-Neto et al. 2006). In order to address this limitation, the current glmm.hp package has expanded its capabilities to encompass both commonality analysis and hierarchical partitioning (through setting the ‘commonality’ argument) for ordinary multiple linear regression. It also provides options for both unadjusted R2 and adjusted R2 through the ‘type’ argument in glmm.hp() function. These options allow us to explore the sources of negative values that may occasionally arise during the decomposition of adjusted R2. When decomposing adjusted R2, individual components may yield negative values, which could be attributed to suppressor variables or the use of adjusted R2 (Nimon and Oswald 2013; Peres-Neto et al. 2006; Ray-Mukherjee et al. 2014). If negative values appear during the version of adjusted R2 but disappear in the version of unadjusted R2, it can be deduced that these negative values are a result of the adjusted R2, as exemplified in the current case.

To illustrate the application of glmm.hp() function to ordinary multiple linear regression (i.e. lm() in R), we utilize the built-in dataset ‘mtcars’ in R (R core team 2022). The data was sourced from the 1974 ‘Motor Trend US’ magazine and encompasses fuel consumption along with 10 aspects of automobile design and performance for 32 automobiles. In this case, we investigate the relative importance of car weight (wt), number of carburetors (carb) and number of cylinders (cyl) on gasoline efficiency (miles per gallon, mpg).

Results from commonality analysis (Fig. 2 and 3) or hierarchical partitioning (Fig. 4) indicate that car weight (wt) has the most impact on gasoline efficiency, followed by number of cylinders (cyl) and lastly number of carburetors (carb). It’s important to note that for the sake of convenience in demonstration, the build-in ‘mtcars’ dataset was used, and this model may lack practical significance.

Commonality analysis of three variables on gasoline efficiency based on adjusted R2 (default) by glmm.hp(), common variance between ‘wt’ and ‘carb’ is a negative value (−0.002).
Figure 2:

Commonality analysis of three variables on gasoline efficiency based on adjusted R2 (default) by glmm.hp(), common variance between ‘wt’ and ‘carb’ is a negative value (−0.002).

Commonality analysis of three variables on gasoline efficiency based on unadjusted R2 by glmm.hp() (setting argument type = ‘R2’), common variance between ‘wt’ and ‘carb’ change from negative (−0.002) in adjusted R2 scenario to positive value (0.002) in unadjusted R2 scenario, hence it can be inferred that the negative values is caused by the adjusted R2.
Figure 3:

Commonality analysis of three variables on gasoline efficiency based on unadjusted R2 by glmm.hp() (setting argument type = ‘R2’), common variance between ‘wt’ and ‘carb’ change from negative (−0.002) in adjusted R2 scenario to positive value (0.002) in unadjusted R2 scenario, hence it can be inferred that the negative values is caused by the adjusted R2.

The relative importance of individual variables on gasoline efficiency based on adjusted R2 through hierarchical partitioning via the glmm.hp().
Figure 4:

The relative importance of individual variables on gasoline efficiency based on adjusted R2 through hierarchical partitioning via the glmm.hp().

DISCUSSION

The glmmTMB package provides a versatile platform for fitting complex models like ZIGLMMs, which are particularly useful for analysing count data with excessive zeros (Brooks et al. 2017). Its flexibility in handling various response distributions and random structures makes it a valuable tool for researchers in fields such as ecology and biology (Douma and Weedon 2019). The glmm.hp package incorporates the decomposition of output of glmmTMB, greatly expanding the functionalities of glmm.hp (), while providing valuable insights for interpreting glmmTMB() output.

The current glmm.hp package has the capability to simultaneously perform commonality analysis and hierarchical partitioning for ordinary multiple regression models. Furthermore, it can also decompose both the unadjusted R2 and the adjusted R2. This enhancement not only addresses the limitations of commonly used packages like ‘yhat’ for commonality analysis (Nimon et al. 2013), as well as packages for hierarchical partitioning such as ‘hier.part’ (Walsh and Mac Nally 2013), ‘relaimpo’ (Grömping 2006) and ‘dominanceanalysis’ (Navarrete and Soares 2020), which do not support to decompose adjusted R2. The advantages of decomposing adjusted R2 are 2-fold: firstly, it provides an unbiased estimation of R2 and is widely utilized in the field of ecology (Peres-Neto et al. 2006). Secondly, by comparing the results between the unadjusted and adjusted R2, researchers can determine whether the negative values observed in commonality analysis or hierarchical partitioning are a result of using adjusted R2. These new capability enhances the precision and reliability of regression model analysis, making glmm.hp package a valuable tool for researchers in various domains.

According to our findings from the Google Scholar search, as of the time of writing this paper, it is evident that the glmm.hp package has gained substantial recognition within the academic community. It has been utilized in more than 30 peer-reviewed research papers for the purpose of partitioning marginal R2 in GLMMs. These publications span a wide spectrum of scientific disciplines, illustrating the versatility and applicability of the glmm.hp package. A selection of these fields and associated references include: plant ecology (e.g. Gu et al. 2022; Guo et al. 2022; Wan et al. 2023; Yan et al. 2023; Yang et al. 2023; Zhang et al. 2022), animal ecology (e.g. Ao et al. 2022; Liu et al. 2023; Wang et al. 2023), environmental science (e.g. Agusto et al. 2022; Wu et al. 2023), agriculture (e.g. Sha et al. 2023), microbiology (e.g. Chen et al. 2023; Fu et al. 2022), conservation biology (e.g. Le et al. 2023; Tobisch et al. 2023). The broad adoption of the glmm.hp package across these diverse domains underscores its status as a preferred and trusted tool among researchers in ecology and related fields.

In the future, we are dedicated to continuously enhancing the capabilities of glmm.hp package and optimizing its analytical speed to meet the evolving needs of the researchers. We encourage all researchers who incorporate the glmm.hp package into their studies to provide proper attribution by citing this article. Citation information can be easily obtained by typing the following command: citation(‘glmm.hp’). This practice helps acknowledge and support the ongoing development and maintenance of this valuable analytical resource for the scientific community.

Funding

The work was supported by the National Natural Science Foundation of China (32271551) and the Metasequoia funding of Nanjing Forestry University.

Conflict of interest statement. The authors declare that they have no conflict of interest.

REFERENCES

Agusto
LE
,
Qin
G
,
Thibodeau
B
, et al. . (
2022
)
Fiddling with the blue carbon: Fiddler crab burrows enhance CO2 and CH4 efflux in saltmarsh
.
Ecol Indic
144
:
109538
.

Ao
S
,
Ye
L
,
Liu
X
, et al. . (
2022
)
Elevational patterns of trait composition and functional diversity of stream macroinvertebrates in the Hengduan Mountains region, Southwest China
.
Ecol Indic
144
:
109558
.

Bartoń
K
(
2022
)
MuMIn: Multi-model Inference
.
R package version 1.46.0
.

Bates
D
,
Machler
M
,
Bolker
BM
, et al. . (
2015
)
Fitting linear mixed-effects models using lme4
.
J Stat Softw
67
:
1
48
.

Bi
J
(
2012
)
A review of statistical methods for determination of relative importance of correlated predictors and identification of drivers of consumer liking
.
J Sens Stud
27
:
87
101
.

Bolker
BM
,
Brooks
ME
,
Clark
CJ
, et al. . (
2009
)
Generalized linear mixed models: a practical guide for ecology and evolution
.
Trends Ecol Evol
24
:
127
135
.

Brooks
ME
,
Kristensen
K
,
van Benthem
KJ
, et al. . (
2017
)
glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling
.
R J
9
:
378
400
.

Budescu
DV
(
1993
)
Dominance analysis: a new approach to the problem of relative importance of predictors in multiple-regression
.
Psychol Bull
114
:
542
551
.

Chen
J
,
Zhao
Q
,
Li
F
, et al. . (
2023
)
Nutrient availability and acid erosion determine the early colonization of limestone by lithobiontic microorganisms
.
Front Microbiol
14
:
1194871
.

Chevan
A
,
Sutherland
M
(
1991
)
Hierarchical partitioning
.
Am Stat
45
:
90
96
.

Douma
JC
,
Weedon
JT
(
2019
)
Analysing continuous proportions in ecology and evolution: a practical introduction to beta and Dirichlet regression
.
Methods Ecol Evol
10
:
1412
1430
.

Fu
Q
,
Shao
YZ
,
Wang
SL
, et al. . (
2022
)
Soil microbial distribution depends on different types of landscape vegetation in temperate urban forest ecosystems
.
Front Ecol Evol
10
:
858254
.

Grömping
U
(
2006
)
Relative importance for linear regression in R: the package relaimpo
.
J Stat Softw
17
:
1
27
.

Gu
J
,
Song
X
,
Liao
Y
, et al. . (
2022
)
Tree species drive the diversity of epiphytic bryophytes in the alpine forest ecosystem: a case study in Tibet
.
Forests
13
:
2154
.

Guo
LZ
,
Liu
L
,
Meng
HZ
, et al. . (
2022
)
Biogeographic patterns of leaf element stoichiometry of Stellera chamaejasme L. in degraded grasslands on Inner Mongolia Plateau and Qinghai-Tibetan Plateau
.
Plants
11
:
1943
.

Harrison
XA
(
2014
)
Using observation-level random effects to model overdispersion in count data in ecology and evolution
.
PeerJ
2
:
e616
.

Kruskal
W
,
Majors
R
(
1989
)
Concepts of relative importance in recent scientific literature
.
Am Stat
43
:
2
6
.

Lai
J
,
Zou
Y
,
Zhang
S
, et al. . (
2022a
)
glmmhp: an R package for computing individual effect of predictors in generalized linear mixed models
.
J Plant Ecol
15
:
1302
1307
.

Lai
JS
,
Zou
Y
,
Zhang
JL
, et al. . (
2022b
)
Generalizing hierarchical and variation partitioning in multiple regression and canonical analyses using the rdaccahp R package
.
Methods Ecol Evol
13
:
782
788
.

Le
H
,
Zhao
C
,
Xu
W
, et al. . (
2023
)
Anthropogenic activities explained the difference in exotic plants invasion between protected and non-protected areas at a northern subtropics biodiversity hotspot
.
J Environ Manage
345
:
118939
.

Lindeman
RH
,
Merenda
PF
,
Gold
RZ
(
1980
)
Introduction to Bivariate and Multivariate Analysis
.
Glenview, IL
:
Scott Foresman
.

Liu
Y
,
Li
J
,
Liu
Y
, et al. . (
2023
)
Interactive effects of flooding duration and sediment texture on the growth and adaptation of three plant species in the Poyang Lake wetland
.
Biology
12
:
944
.

Nakagawa
S
,
Schielzeth
H
(
2013
)
A general and simple method for obtaining R2 from generalized linear mixed-effects models
.
Methods Ecol Evol
4
:
133
142
.

Nakagawa
S
,
Johnson
PCD
,
Schielzeth
H
(
2017
)
The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded
.
J R Soc Interface
14
:
20170213
.

Navarrete
CB
,
Soares
FC
(
2020
)
dominanceanalysis: Dominance Analysis
.
R package version 2.0.0
.

Nimon
K
,
Oswald
FL
(
2013
)
Understanding the results of multiple linear regression: beyond standardized regression coefficients
.
Organ Res Methods
16
:
650
674
.

Nimon
K
,
Oswald
FL
,
Roberts
JK
(
2013
)
Yhat: Interpreting Regression Effects
.
R package version 2.0.0
.

Peres-Neto
PR
,
Legendre
P
,
Dray
S
, et al. (
2006
)
Variation partitioning of species data matrices: estimation and comparison of fractions
.
Ecology
87
:
2614
2625
.

Pinheiro
J
,
Bates
D
,
DebRoy
S
, et al. . (
2020
)
nlme: Linear and Nonlinear Mixed Effects Models.
R package version 3
.
1-149
.

Price
SJ
,
Muncy
BL
,
Bonner
SJ
, et al. . (
2016
)
Effects of mountaintop removal mining and valley filling on the occupancy and abundance of stream salamanders
.
J Appl Ecol
53
:
459
468
.

R Core Team
(
2022
)
R: A Language and Environment for Statistical Computing
.
Vienna, Austria
:
R Foundation for Statistical Computing
. http://www.R-project.org/ (
20 September 2023
, date last accessed).

Ray-Mukherjee
J
,
Nimon
K
,
Mukherjee
S
, et al. . (
2014
)
Using commonality analysis in multiple regressions: a tool to decompose regression effects in the face of multicollinearity
.
Methods Ecol Evol
5
:
320
328
.

Sha
Z
,
Wang
J
,
Ma
X
, et al. . (
2023
)
Ammonia loss potential and mitigation options in a wheat-maize rotation system in the North China Plain: a data synthesis and field evaluation
.
Agric Ecosyst Environ
352
:
108512
.

Stoffel
MA
,
Nakagawa
S
,
Schielzeth
H
(
2017
)
rptR: repeatability estimation and variance decomposition by generalized linear mixed-effects models
.
Methods Ecol Evol
8
:
1639
1644
.

Stoffel
MA
,
Nakagawa
S
,
Schielzeth
H
(
2021
)
partR2: partitioning R2 in generalized linear mixed models
.
PeerJ
9
:
e11414
.

Tobisch
C
,
Rojas-Botero
S
,
Uhler
J
, et al. . (
2023
)
Conservation-relevant plant species indicate arthropod richness across trophic levels: habitat quality is more important than habitat amount
.
Ecol Indic
148
:
110039
.

Walsh
CJ
,
Mac Nally
R
(
2013
)
hier.part: Hierarchical Partitioning
.
R package version 1.0-4
.

Wan
J-Z
,
Wang
Q
,
Wang
C-J
(
2023
)
Biomass and nitrogen content of petiole and rachis predict leaflet trait variation in compound pinnate leaves of plants
.
Flora
298
:
152207
.

Wang
L
,
Feng
J
,
Mou
P
, et al. . (
2023
)
Relative abundance of Roe deer (Capreolus pygargus) related to overstory structure and understory food resources in Northeast China
.
Global Ecol Conserv
46
:
e02542
.

Wu
Y
,
Du
Y
,
Liu
X
, et al. . (
2023
)
Grassland biodiversity response to livestock grazing, productivity, and climate varies across biome components and diversity measurements
.
Sci Total Environ
878
:
162994
.

Yan
Z
,
Lv
T
,
Liu
Y
, et al. . (
2023
)
Responses of soil phosphorus cycling and bioavailability to plant invasion in river-lake ecotones
.
Ecol Appl
33
:
e2843
.

Yang
X
,
Jiang
Y
,
Xue
F
, et al. . (
2023
)
Effects of environmental factors on the nonstructural Carbohydrates in Larix principis-rupprechtii
.
Forests
14
:
345
.

Zeileis
A
,
Kleiber
C
,
Jackman
S
(
2008
)
Regression models for count data in R
.
J Stat Softw
27
:
1
25
.

Zhang
M
,
Gao
H
,
Chen
S
, et al. . (
2022
)
Linkages between stomatal density and minor leaf vein density across different altitudes and growth forms
.
Front Plant Sci
13
:
1064344
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Handling Editor: Jinbao Liao
Jinbao Liao
Handling Editor
Search for other works by this author on: