-
PDF
- Split View
-
Views
-
Cite
Cite
Erin Hengel, Almudena Sevilla, Sarah Smith, Measuring research quality in a more inclusive way: Evidence from the UK Research Excellence Framework, Research Evaluation, 2024;, rvae013, https://doi.org/10.1093/reseval/rvae013
- Share Icon Share
Abstract
Evidence suggests that common metrics of research quality—e.g. journal publications and citations—are systematically biased against certain groups. But does relying solely on them to evaluate quality lead to lower diversity in academia? In this paper, we start to answer this question by analysing data from the UK’s nationwide research assessment exercise, the Research Excellence Framework. We find that narrowly focussed output-based measures of departmental research quality do indeed negatively correlate with the diversity of departmental staff, while measures of research impact and of the quality of the research environment correlate positively. An aggregate measure that incorporates all three components is therefore likely to better promote staff diversity compared to more narrowly defined output-focused measures. More generally, our results suggest that comprehensive definitions of research quality may be more effective at promoting diversity in academia compared to narrower measures. We further argue that funding decisions informed by broader measures result in more efficient resource allocations across the higher education sector.
1. Introduction
Common metrics of research quality and productivity—such as citations and publication counts—play a crucial role in academic job-market decisions (e.g. tenure and promotion). They also increasingly inform which projects and individuals are awarded competitive, non-recurrent grant funding. Soon, metrics may even drive the allocation of recurrent research funding across institutions, too (see, e.g. MacIntosh 2021).
While most metrics are easy to compute and readily available, they only proxy for the true quality of a project and the true performance of researchers. Consequently, they are measured with error—and according to many studies, this error correlates with researcher characteristics. For example, Card et al. (2020), Hengel (2022) and Hengel and Moon (2023) show that female economists are held to higher acceptance standards at top economics journals compared to male economists. Men are also better connected to their academic networks (Ductor, Goyal and Prummer 2023) which probably facilitates their outcomes in peer review (for evidence, see, e.g. Colussi 2018). Meanwhile, Ferber (1986, 1988), Dion, Sumner and Mitchell (2018) and Koffi (2021) show that journal articles written by men are less likely to cite women than they are to cite other men, and Larivière et al. (2013) find that articles with a first or last female author are cited less than observably equivalent articles with male authors in the same positions.
If certain groups publish less, and are less well-cited compared to other groups, then relying solely on metrics based on publications or citations may advantage the latter at the expense of the former. In contrast, broader measures of research quality may impose less of a disadvantage on under-represented groups. To date, however, there is little evidence on the practical use of such measures or on whether they are better at increasing the diversity of academic staff compared to narrower measures. This is the contribution of our paper.
In particular, we provide new evidence on the relationship between the multiple, expansive measures of research quality in the UK’s Research Excellence Framework (REF) and the diversity of academic staff. Every 6–7 years, the quality of the research produced by academic departments at UK universities is evaluated in a nation-wide exercise known as the REF. The REF’s scope of assessment is defined broadly to include departments’ ‘outputs’ (academic publications), ‘impact’ (case studies documenting how research has changed policy and practice) and ‘environment’ (narrative accounts of how departments ‘support the production of excellent research’). Government research funding is then allocated to universities according to a weighted average of their departments’ performance in each of these three elements.
Combining departmental-level evaluation data from the 2014 REF with data on departments’ academic staff diversity from the UK’s Higher Education Statistical Agency, we ask whether the broad scope of the REF—and in particular the inclusion of impact and environment in its definition of research quality—is more likely to promote diversity among academic staff compared to an alternative, narrower definition that considers only outputs.1
We find that the output score negatively correlates with our measure of diversity on both counts: departments that scored higher for their outputs were not only less diverse at the time of REF submission but were also less likely to increase their diversity in subsequent years. By contrast, the impact score positively correlates with our measure of diversity at the time of submission, suggesting that more diverse departments produce better impact. Although the environment score negatively correlates with staff member diversity at the time of REF submission, it positively correlates with departments’ subsequent progress on diversity, in line with its more forward-looking nature. This evidence highlights that measures of research quality correlate with diversity but the direction of that correlation depends on how quality is defined. Our findings also suggests that comprehensive measures of quality could mitigate distortions caused by individual, narrowly-defined metrics.
Our study is related to a long-standing literature in economics and management on the challenge of rewarding performance in the face of multiple and competing objectives (Kerr 1975; Holmstrom and Milgrom 1991). In particular, our evidence highlights the tension of ‘rewarding A, while hoping for B’ (Kerr 1975) in the context of measuring and incentivising research quality in higher education. Our evidence on the relationship between the scope of research quality measurement and diversity can contribute to the discussion in several countries—particularly those that have national research assessment processes—about the use of different measures for evaluating research quality (see, e.g. Bishop 2021). We also contribute to an ongoing debate about promoting diversity in higher education (see, e.g. Gamage and Sevilla 2019; Lundberg and Stearns 2019; Gamage, Sevilla and Smith 2020; Bateman et al. 2021) by providing new evidence on how the choice of performance metrics impacts the diversity of academic staff.
The paper is organized as follows. The next section provides more detail on the UK REF process. Section 3 explains our methodology, while Section 4 presents the results. In Section 5, we discuss the implications of our findings.
2. The REF 2014
Since 1986, research in UK higher education institutions (HEIs) has been subject to a thorough, national assessment process known originally as the Research Assessment Exercise (RAE) and, since 2014, as the Research Excellence Framework (REF).2 The results of the process—which takes place (roughly) every 6–7 years—are primarily used to allocate ∼£2 billion per year of central government research funding across universities, but they are also included in various league tables (e.g. the university rankings produced by both the Complete University Guide and the Guardian University Guide incorporate REF scores) and promoted by individual HEIs in order to attract staff and students. Thus, REF outcomes directly and indirectly determine how resources are allocated between institutions and have had a profound impact on universities’ research investment strategies and hiring and promotion decisions (De Fraja, Facchini and Gathergood 2019).
REF submissions are made at the level of Units of Assessment (UoAs) which correspond broadly to academic departments.3 In REF 2014, assessment of research quality was carried out by 36 subject sub-panels, consisting of academic and external assessors. The sub-panels were organized into four main panels covering medicine, health and life sciences (panel A), physical sciences, engineering and maths (Panel B), social sciences (Panel C), and arts and humanities (Panel D).
In 2014, each UoA submitted the following three elements to the REF:
A curated collection of its staff members research outputs (e.g. books and academic articles).
A limited number of impact case studies documenting the wider social impact of staff members’ research (e.g. the change in policy and practice that their research achieved).
A narrative account of the UoA’s research environment, covering the following four dimensions: (i) the coherence of the UoA’s research agenda; (ii) resources, facilities and infrastructure; (iii) external engagement; and (iv) ‘people’, which included the promotion of equality and diversity among the UoA’s staff members.4
While outputs and impact evaluated the quality of departments’ research, the REF 2014’s environment score instead measured their strategies, processes and culture for supporting that research. It was also the only component that was explicitly forward-looking, in that it intended to identify departments that could sustain a positive research environment going forward.5
REF 2014’s sub-panel members read and assessed the quality of every submitted UoA’s outputs, impact and environment without making formal use of metrics such as citations and journal rankings.6 Instead, quality was assessed subjectively against the following broad criteria: outputs were judged on their ‘originality, significance and rigour’; impact case studies were judged for their ‘reach and significance’; and research environments were judged for their ‘vitality and significance’. Against these criteria, research quality was graded from 4* (highest) to 1* (lowest) according to the broad standards summarized in Table 1. Each sub-panel and main panel additionally conducted benchmarking exercises to agree on more specific standards for each grade. Many sub-panels also double-scored submissions to improve the consistency of assessment, and impact case studies were also evaluated by external assessors working outside academia.
. | Outputs . | Impact . | Environment . |
---|---|---|---|
4* | World-leading | Outstanding | World-leading |
3* | Internationally excellent… but falls short of the highest standards | Very considerable | Internationally excellent… but falls short of the highest standards |
2* | Recognized internationally | Considerable | Recognized internationally |
1* | Recognized nationally | Recognized but modest | Recognized nationally |
. | Outputs . | Impact . | Environment . |
---|---|---|---|
4* | World-leading | Outstanding | World-leading |
3* | Internationally excellent… but falls short of the highest standards | Very considerable | Internationally excellent… but falls short of the highest standards |
2* | Recognized internationally | Considerable | Recognized internationally |
1* | Recognized nationally | Recognized but modest | Recognized nationally |
. | Outputs . | Impact . | Environment . |
---|---|---|---|
4* | World-leading | Outstanding | World-leading |
3* | Internationally excellent… but falls short of the highest standards | Very considerable | Internationally excellent… but falls short of the highest standards |
2* | Recognized internationally | Considerable | Recognized internationally |
1* | Recognized nationally | Recognized but modest | Recognized nationally |
. | Outputs . | Impact . | Environment . |
---|---|---|---|
4* | World-leading | Outstanding | World-leading |
3* | Internationally excellent… but falls short of the highest standards | Very considerable | Internationally excellent… but falls short of the highest standards |
2* | Recognized internationally | Considerable | Recognized internationally |
1* | Recognized nationally | Recognized but modest | Recognized nationally |
At the end of the exercise, the shares of each UoA’s outputs, impact and environment that were graded 4*, 3*, etc were published on the REF 2014 website. Each UoA also received an overall grade profile that was a weighted sum of the grades given to each of the three elements.7 This final grade profile was used to determine the allocation of government funding, with zero weight given to 1* and 2* research and the highest weight given to 4* research (De Fraja, Facchini and Gathergood 2019).
3. Methodology
3.1 Conceptual framework
In this section, we present a very simple framework to clarify our thinking on distortions introduced when imperfectly measuring research quality and to motivate the empirical approach that follows. Assume that the quality of research in department is determined by , where is the talent of department . Assume also that the government would like to distribute funds to different departments according to , but this is unobserved. Instead, only a proxy of it, , is observed. is assumed to positively correlate with —i.e. departments with higher usually also have higher —but is also systematically biased in favour of people from certain groups. For example, suppose mapped the number of citations accruing to . Given evidence of bias in the decision to cite (Ferber 1986; Ferber 1988; Larivière et al. 2013; Dion, Sumner and Mitchell 2018; Koffi 2021), would likely underestimate the quality of female talent in department and over-estimate the quality of its male talent.
Equation (1) highlights an important implication of using metrics as a measure of research quality: unless the proxy perfectly captures the underlying construct of interest, it will result in a misallocation of money within the sector—e.g. money will go to institutions that produce the most-highly cited publications which, in a world where citations are biased in favour of a particular group, are unlikely to be the most diverse institutions.
Furthermore, by rewarding instead of , departments are incentivised to reduce diversity of . Since was chosen to maximize and is systematically biased in favour of people from certain groups, then will likely be less diverse than . This is because departments choose to hire a pool of talent that is disproportionately composed of group members that are advantaged by . An implication of this is that departments that perform well on the basis of a biased measure of research quality will tend to be less diverse than those that perform less well.
One way to move closer to would be to augment with a complementary measure that positively correlates with diversity. In principle, the impact and environment measures in the REF 2014 may have fulfilled this role—indeed, the people element of the environment score explicitly included the promotion of equality and diversity, and panel members were required to consider this as part of their assessment. This insight motivates our empirical analysis, described in the next section, which examines the relationship between the different measures of quality in the REF and (a measure of) diversity.
3.2 Empirical approach
The dependent variable, , is a measure of the diversity of academic staff in department in higher education institution allocated to sub-panel . We regress this on a weighted sum of the shares of outputs, impact and environment that were rated 3* and 4*, i.e. .8 In order to remove systematic variation across HEIs and subjects, Equation (2) additionally controls for fixed effects for institutions () and sub-panels ().
To estimate Equations (2) and (3), we measure as the percentage of academic staff in a department who were non-white and/or female in 2013, the year UoAs made their REF submissions. To capture , we subtract in 2013 from in 2018. Although these proxies of diversity are by no means comprehensive, they do capture important dimensions of under-representation (gender, e.g.) that have been shown to matter in terms of publications and citations.
Our measure of comes from the academic staff census data collected by the Higher Education Statistical Agency (HESA). HESA staff data are reported by universities and cover all individuals on a contract of employment with a publicly funded higher education provider in the UK during a given academic year (1 August to 31 July). To identify academic staff, we restrict our data to non-administrative staff members on academic contracts who are engaged in teaching and/or research. We additionally exclude senior management (including heads of school and function heads) and staff members employed by professional service departments (e.g. central administration, staff and student facilities, and catering).
We merge our HESA data on departments’ demographic profiles with publicly available information on departments’ REF 2014 performance using the mapping described in Supplementary Appendix A.9 After merging, our final dataset covers 1,736 academic departments across 36 different disciplines at 151 UK higher education institutions.10 Basic summary statistics are provided in Supplementary Appendix A.
4. Results
4.1 Main results
Our main regression results are presented in Table 2. Panel A displays results from estimating Equation (2) using the 2013 percentage of non-white-male staff members as the dependent variable. Panel B shows results from estimating Equation (3) using the post-REF improvement in diversity (2013–18) as the dependent variable. Column (1) includes no controls; columns (2) and (3) add, respectively, sub-panel and HEI fixed effects.
. | (1) . | (2) . | (3) . |
---|---|---|---|
Dependent variable: | |||
Output | −0.047*** | −0.005 | −0.011 |
(0.011) | (0.010) | (0.011) | |
Impact | 0.023*** | 0.009* | 0.010** |
(0.006) | (0.005) | (0.005) | |
Environment | −0.015** | −0.020*** | −0.015*** |
(0.006) | (0.005) | (0.005) | |
Constant | 52.846*** | ||
(1.815) | |||
Env.-out. | 0.047 | 0.269 | 0.771 |
Env.-imp. | 0.000 | 0.000 | 0.001 |
Out.-imp. | 0.000 | 0.216 | 0.094 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,635 | 1,635 | 1,635 |
R-squared | 0.041 | 0.448 | 0.538 |
Dependent variable: | |||
Output | −0.013** | −0.014* | −0.014 |
(0.006) | (0.007) | (0.009) | |
Impact | 0.000 | 0.000 | −0.002 |
(0.004) | (0.004) | (0.004) | |
Environment | 0.007* | 0.008* | 0.009** |
(0.004) | (0.004) | (0.004) | |
Constant | 5.038*** | ||
(1.207) | |||
Env.-out. | 0.027 | 0.036 | 0.041 |
Env.-imp. | 0.288 | 0.234 | 0.095 |
Out.-imp. | 0.091 | 0.105 | 0.213 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,598 | 1,598 | 1,598 |
R-squared | 0.004 | 0.046 | 0.166 |
. | (1) . | (2) . | (3) . |
---|---|---|---|
Dependent variable: | |||
Output | −0.047*** | −0.005 | −0.011 |
(0.011) | (0.010) | (0.011) | |
Impact | 0.023*** | 0.009* | 0.010** |
(0.006) | (0.005) | (0.005) | |
Environment | −0.015** | −0.020*** | −0.015*** |
(0.006) | (0.005) | (0.005) | |
Constant | 52.846*** | ||
(1.815) | |||
Env.-out. | 0.047 | 0.269 | 0.771 |
Env.-imp. | 0.000 | 0.000 | 0.001 |
Out.-imp. | 0.000 | 0.216 | 0.094 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,635 | 1,635 | 1,635 |
R-squared | 0.041 | 0.448 | 0.538 |
Dependent variable: | |||
Output | −0.013** | −0.014* | −0.014 |
(0.006) | (0.007) | (0.009) | |
Impact | 0.000 | 0.000 | −0.002 |
(0.004) | (0.004) | (0.004) | |
Environment | 0.007* | 0.008* | 0.009** |
(0.004) | (0.004) | (0.004) | |
Constant | 5.038*** | ||
(1.207) | |||
Env.-out. | 0.027 | 0.036 | 0.041 |
Env.-imp. | 0.288 | 0.234 | 0.095 |
Out.-imp. | 0.091 | 0.105 | 0.213 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,598 | 1,598 | 1,598 |
R-squared | 0.004 | 0.046 | 0.166 |
Note. Results from estimating Equation (2) (panel A) and Equation (3) (panel B). is the share of non-white-male staff in a department in 2013 (in percentages); is the change in between 2013–18. Scores are the weighted sum of 4* and 3* research (). Sample excludes multiple submissions from the same departments (65 observations). Standard errors clustered at the institution level in parentheses. ***, ** and * significant at the 1%, 5% and 10% level, respectively.
. | (1) . | (2) . | (3) . |
---|---|---|---|
Dependent variable: | |||
Output | −0.047*** | −0.005 | −0.011 |
(0.011) | (0.010) | (0.011) | |
Impact | 0.023*** | 0.009* | 0.010** |
(0.006) | (0.005) | (0.005) | |
Environment | −0.015** | −0.020*** | −0.015*** |
(0.006) | (0.005) | (0.005) | |
Constant | 52.846*** | ||
(1.815) | |||
Env.-out. | 0.047 | 0.269 | 0.771 |
Env.-imp. | 0.000 | 0.000 | 0.001 |
Out.-imp. | 0.000 | 0.216 | 0.094 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,635 | 1,635 | 1,635 |
R-squared | 0.041 | 0.448 | 0.538 |
Dependent variable: | |||
Output | −0.013** | −0.014* | −0.014 |
(0.006) | (0.007) | (0.009) | |
Impact | 0.000 | 0.000 | −0.002 |
(0.004) | (0.004) | (0.004) | |
Environment | 0.007* | 0.008* | 0.009** |
(0.004) | (0.004) | (0.004) | |
Constant | 5.038*** | ||
(1.207) | |||
Env.-out. | 0.027 | 0.036 | 0.041 |
Env.-imp. | 0.288 | 0.234 | 0.095 |
Out.-imp. | 0.091 | 0.105 | 0.213 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,598 | 1,598 | 1,598 |
R-squared | 0.004 | 0.046 | 0.166 |
. | (1) . | (2) . | (3) . |
---|---|---|---|
Dependent variable: | |||
Output | −0.047*** | −0.005 | −0.011 |
(0.011) | (0.010) | (0.011) | |
Impact | 0.023*** | 0.009* | 0.010** |
(0.006) | (0.005) | (0.005) | |
Environment | −0.015** | −0.020*** | −0.015*** |
(0.006) | (0.005) | (0.005) | |
Constant | 52.846*** | ||
(1.815) | |||
Env.-out. | 0.047 | 0.269 | 0.771 |
Env.-imp. | 0.000 | 0.000 | 0.001 |
Out.-imp. | 0.000 | 0.216 | 0.094 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,635 | 1,635 | 1,635 |
R-squared | 0.041 | 0.448 | 0.538 |
Dependent variable: | |||
Output | −0.013** | −0.014* | −0.014 |
(0.006) | (0.007) | (0.009) | |
Impact | 0.000 | 0.000 | −0.002 |
(0.004) | (0.004) | (0.004) | |
Environment | 0.007* | 0.008* | 0.009** |
(0.004) | (0.004) | (0.004) | |
Constant | 5.038*** | ||
(1.207) | |||
Env.-out. | 0.027 | 0.036 | 0.041 |
Env.-imp. | 0.288 | 0.234 | 0.095 |
Out.-imp. | 0.091 | 0.105 | 0.213 |
Sub-panel f.e. | ✓ | ✓ | |
Institution f.e. | ✓ | ||
No. obs. | 1,598 | 1,598 | 1,598 |
R-squared | 0.004 | 0.046 | 0.166 |
Note. Results from estimating Equation (2) (panel A) and Equation (3) (panel B). is the share of non-white-male staff in a department in 2013 (in percentages); is the change in between 2013–18. Scores are the weighted sum of 4* and 3* research (). Sample excludes multiple submissions from the same departments (65 observations). Standard errors clustered at the institution level in parentheses. ***, ** and * significant at the 1%, 5% and 10% level, respectively.
Looking first at panel A, higher output scores are associated with lower shares of non-white-male staff members at the time of REF submission (i.e. ), indicating that departments with higher scoring outputs were generally less diverse in 2013. Adding sub-panel fixed effects, the coefficient remains negative but becomes smaller and statistically insignificant, indicating that there are subjects that have systematically high output scores/low levels of diversity. By contrast, a department’s impact score positively correlates with our measure of diversity (i.e. ), and this is robust to including sub-panel and institution fixed effects. This indicates that more diverse departments produce better impact. Perhaps surprisingly, the coefficient on environment score is also negative (i.e. ), including sub-panel and institution fixed effects. Although panel members are asked to consider measures to promote diversity and equality as part of the environment evaluation, the relationship between the environment score and our measure of diversity at the time of submission is negative.
Estimating Equation (3)—i.e. using the change in non-white-male staff (2013–18) as the dependent variable ()—we see that departments that scored highly on outputs experienced a decline in diversity (an increase in the share of white men) in the years following the REF (i.e. ). Adding sub-panel and institution fixed effects does not change the magnitude of the coefficient but increases the standard error such that the estimated effect is no longer statistically significant (column (3)). The correlation between the impact score and post-REF change in diversity is very close to zero.
By contrast, there is a positive relationship between the environment score and future diversity improvements (i.e. ). This suggests that the environment score—the only component of the REF that is forward looking—may capture aspects of departmental strategies, processes and culture that are important for promoting diversity. The size of the implied effect is quite large. Across the 36 sub-panels, the (average) inter-quartile range of (our transformed) environment score is ∼200, which would equate to a 1.80 percentage point increase in the share of non-white-male staff (based on the coefficient of 0.009) compared to a mean increase over the period of 3.93 percentage points.
4.2 Further analysis
Is it possible to say whether there are particular aspects of departmental processes and culture that are associated with improvements in our measure of diversity? This insight could help to inform strategies to increase representation from under-represented groups. It could also form the basis for designing alternative—and more targeted—approaches that reward specific drivers of increased representation, rather than the broad and subjective environment measure. We consider two possible candidates—Athena SWAN accreditation (a UK-wide initiative aimed at improving gender equality in higher education) and the quality of management practices at the department level.
4.2.1 Athena SWAN
The Athena SWAN Charter was launched in 2005 to advance the careers of women initially in STEM fields but later across all academic fields. Athena SWAN awards are given—at bronze, silver or gold level—to universities and, separately, to individual departments that can demonstrate a commitment to gender equality. The submission process, which typically takes a couple of years, requires a comprehensive audit of gender equality, and a set of concrete proposals for change (see Gamage, Sevilla and Smith 2020). Many environment statements refer to Athena SWAN—either because the departments already have an award or because they are in the process of applying for one.
We re-run Equations (2) and (3), additionally including a binary indicator (‘Athena’) which takes the value 1 if the departmental statement includes a mention of Athena SWAN. The results are reported in columns (1) and (3) of Table 3. We find that departments that refer to Athena SWAN tend to have lower diversity at the time of REF submission. This seemingly counter-intuitive result may suggest that the decision to apply for Athena SWAN is a response to low diversity; however, the results in column (3) show that departments with a mention of Athena SWAN also make more subsequent improvement in diversity in the years following REF submission. Nevertheless, neither of these correlations is statistically significant.
Correlations between REF 2014 scores and departmental diversity, additional controls
. | (1) . | (2) . | (3) . | (4) . |
---|---|---|---|---|
Dependent variable: | Ddsi | Ddsi | Ddsi | Ddsi |
Output | −0.011 | −0.011 | −0.014 | −0.029 |
(0.011) | (0.033) | (0.009) | (0.027) | |
Impact | 0.010** | 0.024* | −0.002 | −0.038** |
(0.005) | (0.013) | (0.004) | (0.015) | |
Environment | −0.014*** | −0.016 | 0.009** | 0.017 |
(0.005) | (0.015) | (0.004) | (0.013) | |
Athena | −0.666 | −2.158 | 0.119 | 2.538 |
(0.916) | (2.567) | (0.858) | (2.517) | |
Management | 0.319 | 0.349 | ||
(0.670) | (0.536) | |||
People management | 0.462** | 0.538** | ||
(0.213) | (0.217) | |||
Sub-panel f.e. | ✓ | ✓ | ✓ | ✓ |
Institution f.e. | ✓ | ✓ | ✓ | ✓ |
No. obs. | 1,635 | 166 | 1,598 | 164 |
R-squared | 0.538 | 0.877 | 0.166 | 0.718 |
. | (1) . | (2) . | (3) . | (4) . |
---|---|---|---|---|
Dependent variable: | Ddsi | Ddsi | Ddsi | Ddsi |
Output | −0.011 | −0.011 | −0.014 | −0.029 |
(0.011) | (0.033) | (0.009) | (0.027) | |
Impact | 0.010** | 0.024* | −0.002 | −0.038** |
(0.005) | (0.013) | (0.004) | (0.015) | |
Environment | −0.014*** | −0.016 | 0.009** | 0.017 |
(0.005) | (0.015) | (0.004) | (0.013) | |
Athena | −0.666 | −2.158 | 0.119 | 2.538 |
(0.916) | (2.567) | (0.858) | (2.517) | |
Management | 0.319 | 0.349 | ||
(0.670) | (0.536) | |||
People management | 0.462** | 0.538** | ||
(0.213) | (0.217) | |||
Sub-panel f.e. | ✓ | ✓ | ✓ | ✓ |
Institution f.e. | ✓ | ✓ | ✓ | ✓ |
No. obs. | 1,635 | 166 | 1,598 | 164 |
R-squared | 0.538 | 0.877 | 0.166 | 0.718 |
Note. Results from estimating Equation (2) (columns (1) and (2)) and Equation (3) (columns (3) and (4)). is the share of non-white-male staff in a department in 2013 (in percentages); is the change in between 2013–18. Scores are the weighted sum of 4* and 3* research (). Athena is an indicator variable equal to one if the department mentioned the word ‘Athena’ at least once in its environment statement; Management is a measure of the (average) quality of management practices relating to operations (on a scale of 1–5) collected by McCormack, Propper and Smith (2014); People management is a is a measure of the (average) quality of management practices, relating to personnel. Sample excludes multiple submissions from the same departments (65 observations). Standard errors clustered at the institution level in parentheses. ***, ** and * significant at the 1%, 5% and 10% level, respectively.
Correlations between REF 2014 scores and departmental diversity, additional controls
. | (1) . | (2) . | (3) . | (4) . |
---|---|---|---|---|
Dependent variable: | Ddsi | Ddsi | Ddsi | Ddsi |
Output | −0.011 | −0.011 | −0.014 | −0.029 |
(0.011) | (0.033) | (0.009) | (0.027) | |
Impact | 0.010** | 0.024* | −0.002 | −0.038** |
(0.005) | (0.013) | (0.004) | (0.015) | |
Environment | −0.014*** | −0.016 | 0.009** | 0.017 |
(0.005) | (0.015) | (0.004) | (0.013) | |
Athena | −0.666 | −2.158 | 0.119 | 2.538 |
(0.916) | (2.567) | (0.858) | (2.517) | |
Management | 0.319 | 0.349 | ||
(0.670) | (0.536) | |||
People management | 0.462** | 0.538** | ||
(0.213) | (0.217) | |||
Sub-panel f.e. | ✓ | ✓ | ✓ | ✓ |
Institution f.e. | ✓ | ✓ | ✓ | ✓ |
No. obs. | 1,635 | 166 | 1,598 | 164 |
R-squared | 0.538 | 0.877 | 0.166 | 0.718 |
. | (1) . | (2) . | (3) . | (4) . |
---|---|---|---|---|
Dependent variable: | Ddsi | Ddsi | Ddsi | Ddsi |
Output | −0.011 | −0.011 | −0.014 | −0.029 |
(0.011) | (0.033) | (0.009) | (0.027) | |
Impact | 0.010** | 0.024* | −0.002 | −0.038** |
(0.005) | (0.013) | (0.004) | (0.015) | |
Environment | −0.014*** | −0.016 | 0.009** | 0.017 |
(0.005) | (0.015) | (0.004) | (0.013) | |
Athena | −0.666 | −2.158 | 0.119 | 2.538 |
(0.916) | (2.567) | (0.858) | (2.517) | |
Management | 0.319 | 0.349 | ||
(0.670) | (0.536) | |||
People management | 0.462** | 0.538** | ||
(0.213) | (0.217) | |||
Sub-panel f.e. | ✓ | ✓ | ✓ | ✓ |
Institution f.e. | ✓ | ✓ | ✓ | ✓ |
No. obs. | 1,635 | 166 | 1,598 | 164 |
R-squared | 0.538 | 0.877 | 0.166 | 0.718 |
Note. Results from estimating Equation (2) (columns (1) and (2)) and Equation (3) (columns (3) and (4)). is the share of non-white-male staff in a department in 2013 (in percentages); is the change in between 2013–18. Scores are the weighted sum of 4* and 3* research (). Athena is an indicator variable equal to one if the department mentioned the word ‘Athena’ at least once in its environment statement; Management is a measure of the (average) quality of management practices relating to operations (on a scale of 1–5) collected by McCormack, Propper and Smith (2014); People management is a is a measure of the (average) quality of management practices, relating to personnel. Sample excludes multiple submissions from the same departments (65 observations). Standard errors clustered at the institution level in parentheses. ***, ** and * significant at the 1%, 5% and 10% level, respectively.
The coefficients on the environment score in columns (1) and (3) of Table 3 are similar to those shown in Table 2. Thus, controlling for whether a department’s environment statement mentions Athena SWAN has little impact on the magnitude or significance of the correlation between a department’s environment score and the diversity of its staff. This suggests that the environment score’s relationship with promoting equality and diversity captures more than just whether a department has (or is applying for) an Athena SWAN award. (For further analysis and discussion of what the environment measures, see Supplementary Appendix C.)
4.2.2 Management practices
The way a department is run may be an important factor in determining its strategies, processes and culture, i.e. the environment score may reflect the quality of management of a department. There is a body of literature in economics on measuring management quality in different organizations, showing that the quality of an organization’s (measured) management practices in relation to operations and people correlates positively with its overall performance (see Bloom et al. 2014). This relationship holds for many different sectors, including UK higher education (McCormack, Propper and Smith 2014). There is also evidence that better managed organizations have practices that facilitate a better work-life balance, including part-time work flexibility, time off for family duties, childcare support and the ability to work from home (Bloom and Van Reenen 2006). This suggests that better-managed organizations might have environments that are more conducive to a higher share of women, but this has not been tested explicitly.
Scores reflecting the quality of management at the departmental level (specifically, operations management quality and people management quality) were collected for ∼160 departments (covering English, Psychology, Business and Computer Science) in 2012 by McCormack, Propper and Smith (2014). We add these (standardized) management scores as further controls in Equations (2) and (3) to see whether there is any evidence that management practices can explain the observed environment effect. We find that better managed departments—particularly in the dimension of people management—are indeed more diverse, both at the time of REF submission (close to when the management scores were collected) and afterwards (columns (2) and (4), respectively). However, after including the management practice scores, the positive relationship between environment score and post-REF improvement remains, and increases in magnitude, although it also becomes insignificant, likely due to smaller samples.
Given small samples, these results are only suggestive. Moreover, we do not know if they only hold for specific departments (business, computer science, English and psychology) or instead extend more broadly. Nevertheless, they provide preliminary evidence that people management processes may be an important component of a positive environment that can increase representation from historically under-represented groups and may be a direction for future research.
5. Discussion
Our paper provides new evidence from REF 2014 on the relationship between alternative measures of the quality of research in an academic department and the diversity of its academic staff, measured by the share of historically under-represented groups. The main findings are that a measure of output research quality is negatively correlated with this measure of diversity, while measures of the impact of research and the quality of the research environment positively correlate with it.
One implication is that the choice with respect to the scope of research quality matters for diversity in higher education. In several countries which have national research assessment processes, there have been debates on the best way to assess research quality. There is a push for metric-based systems for assessing outputs, which have the attraction of being cheaper to implement. The Australian Research Council, for example, has used an evaluation system strongly supported by bibliometric indicators in its Excellence in Research for Australia assessments (Arnold et al. 2018). To the extent that narrow and metrics-based approaches are biased against certain groups, however, then our evidence indicates that this approach will result in a misallocation of resources within the sector.
A second implication is that broadening the scope of research quality measures can mitigate some of the negative effects on diversity. By incorporating measures of research impact and environment quality alongside a measure of output quality, the REF allocated more funding to departments that increased diversity than it otherwise would have done, albeit the differences are small. To quantify the effect of incorporating the environment score, we can compare the average post-REF change in diversity, weighted by the amount of funding that departments receive (according to the funding formula ) first, based only on outputs and second, incorporating environment scores. The output-weighted increase in diversity (the reduction in the share of white men) is 3.73 percentage points. Adding environment score increases this to 3.76. This is a positive effect, but small, partly because the environment score carries a small weight in the overall REF 2014 assessment (0.15 compared to 0.65 for outputs) and partly because the environment score is closely correlated with the output score. If outputs and environment were weighted equally, the weighted increase in diversity would be 3.81. The plans for REF 2029 are to reduce the weight given to outputs and to increase the weight given to environment.
The point of this paper is to provide evidence that the scope of research quality measurement matters and can have implications for under-represented groups. We have considered the UK REF because it offers a range of different measures of quality, but this is not an endorsement of current REF measures. As has been discussed, there may be a high level of subjectivity in the assessments. There is also ambiguity in the four different elements that are included in the environment measure. Unpacking this—and understanding exactly which elements of the research environment are beneficial for diversity—remains a topic for further discussion and research.
Acknowledgements
We are grateful to the editor and three referees for valuable comments. The HESA data analysed in this paper are the copyright of the Higher Education Statistics Agency (HESA) Limited. Neither HESA nor HESA Services Limited are responsible for any inferences or conclusions derived from the data or other information supplied by HESA Limited or HESA Services Limited.
Supplementary data
Supplementary data are available at Research Evaluation Journal online.
Funding
This work was supported by European Research Council through the PARENTIME project, Consolidator Grant (CoG), SH3, ERC-2017-COG.
Conflict of interest statement. None declared.
Notes
Our measure of diversity is the share of staff members in a department who are not white men. Although this measure does not capture all dimensions of diversity that matter, it does measure the presence of historically under-represented groups. It is also easily measurable on a consistent basis over time and across institutions using existing administrative data on the population of academics.
Similar assessments have been introduced in the Netherlands (Observatory of Science and Technology), Italy (Triennial Re-search Evaluation), Australia (Excellence in Research for Australia) and New Zealand (Performance-based Research Fund).
Universities can—and do—decide which sub-panel to submit particular staff to. For example, economics staff can be submitted either to the economics and econometrics sub-panel or the business and management sub-panel.
Alongside the environment statement, UoAs were required to provide information on grant income and numbers of PhD students. However, these data were contextualized within the environment statement itself.
See Supplementary Appendix C for further discussion and insights on what, precisely, the environment score in the REF 2014 was measuring.
Several sub-panels (e.g. clinical medicine, physics and economics and econometrics) had access to citation data but these data were used to supplement rather than replace peer assessment.
The weights given to outputs, impact and environment were, respectively, 65%, 20% and 15%.
This was also the formula used to determine funding allocations after the REF 2014 concluded (De Fraja, Facchini and Gathergood, 2019).
In 13 departments, REF environment, impact and output results were not published because the number of submitted staff was three or fewer.
For several departments, HESA data on staff demographics in either 2013 or 2018 were unavailable. We also exclude 65 observations corresponding to multiple submissions from the same department. (For example, University of Chester made two environment submissions to the ‘Geography, environmental studies and archaeology’ UoA, one for ‘Geography and development studies’ and another for ‘Archaeology’. Both observations are excluded from the analysis.) As a result, the final main estimation samples shown in Table 2 and Table 3 include only 1,635 observations when the dependent variable is and 1,598 when the dependent variable is .