-
PDF
- Split View
-
Views
-
Cite
Cite
Asmamaw Demis Bizuneh, Anju E Joham, Chau Thien Tay, Sylvia Kiconco, Arul Earnest, Raja Ram Dhungana, Larisa V Suturina, Xiaomiao Zhao, Alessandra Gambineri, Fahimeh Ramezani Tehrani, Bulent O Yildiz, Jin Ju Kim, Liangzhi Xu, Christian Chigozie Makwe, Helena J Teede, Ricardo Azziz, The PCOS Phenotype in Unselected Populations study: ethnic variation in population-based normative cut-offs for defining hirsutism, European Journal of Endocrinology, Volume 192, Issue 3, March 2025, Pages 228–239, https://doi.org/10.1093/ejendo/lvaf030
- Share Icon Share
Abstract
Hirsutism, a diagnostic feature of polycystic ovary syndrome (PCOS), is often defined using arbitrary percentile cutoffs, rather than normative cutoffs from population-based data. We aimed to define normative cutoffs for hirsutism in diverse populations.
Unselected population-based cluster analysis of individual participant data (IPD).
The PCOS Phenotype in Unselected Populations (P-PUP) study IPD asset of community-based studies, underwent k-means cluster analysis, of directly assessed hirsutism, using the modified Ferriman–Gallwey (mFG) visual scale. The primary outcome was ethnicity-specific normative cutoffs for the mFG score. Medians and cutoffs were compared across ethnic groups.
We included 9829 unselected, medically unbiased participants, aged 18-45 years from 12 studies conducted across 8 countries including China, Iran, Italy, Nigeria, Russia, South Korea, Turkey, and the United States. The mFG cutoff scores for hirsutism on cluster analysis varied across ethnicities, ranging from 4 to 8. White Iranians had the highest cutoff score of 8, followed by White Italians and Black Africans of 7. Asian Han Chinese, White Russian, Turkish, and Black Americans shared a cutoff of 5; White Americans, Asian Koreans, Asian Russians, and Mixed Russians shared a cutoff of 4. Comparing medians and mFG cutoffs across ethnicities confirmed the same differences.
This study confirms the 2023 International PCOS Guidelines recommendations defining hirsutism as an mFG score between 4 and 6 for the majority of populations studied, with few exceptions. However, we also highlight ethnic variation in mFG cutoff scores, suggesting that clinicians consider ethnicity in optimal diagnosis and personalized interventions.
To our knowledge, this is the first study to use a large global IPD dataset of well-phenotyped, community-based, medically unbiased unselected populations of women of reproductive age, to establish normative cutoff scores for hirsutism. We defined the normative cutoff for hirsutism using mFG scores in 9829 populations employing cluster analysis. We found significant variation in the distribution and cutoff values for the mFG score across ethnicities, ranging from 4 to 8. As expected, we also found that women classified as hirsute based on cluster-derived cutoffs exhibited a higher prevalence of PCOS, polycystic ovarian morphology, ovulatory dysfunction, and hyperandrogenemia. Extension of this work across broader world regions is needed to determine natural cutoff values of mFG scores in large, unselected populations.
Introduction
Polycystic ovary syndrome (PCOS) is a common endocrine disorder that manifests in 10%-13% of women during their reproductive years.1,2 The diagnosis of PCOS is based on the International PCOS Guidelines and in adults requires the presence of 2 of the following 3 criteria: oligo or anovulation, hyperandrogenism (hirsutism and/or hyperandrogenemia), and polycystic ovarian morphology on ultrasound or elevated anti-Mullerian hormone levels.3-6 Hirsutism is characterized by the presence of excessive terminal (coarse, pigmented, medullated, and exceeding 5 mm in length) hair growth in women, with PCOS being the most common associated condition.7 Hirsutism is most frequently assessed clinically using the modified Ferriman–Gallwey (mFG) visual scale, which quantifies hair growth in 9 body areas, including the upper lip, chin and neck, midline chest, upper and lower abdomen, upper arms, thighs, and upper and lower back, each graded on a score of 0-4.8,9
Hirsutism is present in 70% to 80% of women with PCOS depending on the mFG cutoff used7,10 and is among the leading causes of diminished quality of life (QoL) and psychological well-being in women with PCOS.11,12 Hirsutism is associated with feelings of embarrassment, shame, and social isolation,13-16 leading to diminished self-perceived health, anxiety, and depression.12,14 Hirsutism in individuals with PCOS is generally correlated with the degree of biochemical hyperandrogenism17-20 and metabolic dysfunction19,21,22 and is associated with cardiometabolic complications, including insulin resistance, metabolic syndrome, type 2 diabetes, and heart disease.21,23-25 However, the correlation is not absolute, as some women with hirsutism exhibit normal androgen levels.26
Recognizing that an arbitrary percentile does not describe a natural or normative cutoff, despite the fact that the 95th percentile mFG score in Ferriman's original population of young white women was >7, he defined “hirsutism when body hair scores exceeded 4 and non-hirsute when less.8,27” However, this was subsequently erroneously interpreted as an mFG score of ≥8 to define hirsutism. This contrasts with conventional diagnostic approaches, where cutoffs are based on short and long-term complications in the condition or are based on the natural clustering of the data.28 To address the controversy and limitations in mFG cutoffs for hirsutism diagnosis, cluster analysis which can delineate natural groupings, is the recommended data reduction method.29,30 Although isolated studies have attempted to utilize cluster analysis,31,32 there remains a need for validation in larger, multi-ethnic unselected populations.31-33 In fact, recent studies across diverse ethnicities and countries suggest this cutoff value may not be universally applicable in the clinical setting.26,34
The current International PCOS Guidelines recommend an mFG cutoff score of between 4 and 6, whilst emphasizing ethnic variations.3-6 Defining how much male-like hair growth should be considered abnormal or reflect hirsutism, has been challenging with limited medically unbiased population data. This is despite the high prevalence and significant negative impact of hirsutism on women's lives and was a major research priority in the International PCOS research roadmap.35 To address this priority gap, the PCOS Phenotype in Unselected Populations (P-PUP) study was established, an international multicenter collaboration to define ethnicity-specific mFG and other diagnostic cutoff scores in diverse medically unbiased unselected populations.33 This specific study utilizes individual participant data to define mFG score distribution and cutoffs across different ethnic groups, using cluster analysis.
Methods
Search strategy and selection criteria
The detailed methodology for the P-PUP individual participant data (IPD) study can be found in both the pre-registration on PROSPERO (ID = CRD42021267847) and the published protocol.33 P-PUP was generated by a collaborative global consortium and encompasses community-based studies conducted across multiple countries and world regions.32,36-46 All individual studies received ethical approval from the relevant committees in their respective countries/institutions. This study was conducted in accordance with the principles outlined in the Declaration of Helsinki and was approved by the Monash University Human Research Ethics Committee (HUMREC) approved the P-PUP study (ID: 26938).
A systematic search of the EMBASE and Medline (Ovid) databases and a manual Google search were conducted from 1990 to October 2, 2020, to identify eligible studies. The key search terms included PCOS, ovulatory dysfunction (OD), hirsutism (mFG score), polycystic ovary morphology (PCOM), unselected populations etc. with a full search strategy for Medline (Ovid) presented in Table S1. Known PCOS experts (R.A. and H.J.T.) were consulted to identify potential ongoing or unpublished studies. We included any population or community-based study assessing medically unbiased, unselected populations, meeting a minimum sample size of 300, and reporting at least one directly assessed PCOS diagnostic feature; with no language restrictions.
A single reviewer (S.K.) screened the titles and abstracts. Full-text articles were screened in duplicate (S.K. and A.d.B.) with discrepancies resolved by a third reviewer (H.J.T.). We reached out to the corresponding and/or lead authors of all eligible studies to contribute de-identified IPD and 2 reminder emails were sent to those who did not respond. Data-sharing agreements were completed and data on sociodemographic factors (age and race/ethnicity), cardiometabolic outcomes [weight, height, body mass index (BMI), waist-hip ratio, and waist circumference], and PCOS features including OD, clinical and biochemical hyperandrogenism, and PCOM (follicular number per ovary [FNPO] and ovarian volume [OV]) was shared.
Study quality appraisal
Two reviewers (A.d.B. and R.R.D.) appraised included studies using the AXIS appraisal tool for cross-sectional studies.47 The tool consists of 20 items that assess various aspects such as the study's aim, design, sample size, outcome variable measurement, statistical methods, response rate, result consistency, discussion and conclusion, limitations, and ethical approval, with responses categorized as “yes,” “no,” or “don’t know.” Overall quality was rated as low (score 1-7), medium (score 8-14), or high (score 15-20).
Data analysis
The primary outcome was mFG cutoff scores using cluster analysis. All populations included in our analysis were assessed directly for the presence of terminal hair growth, and hirsutism was assessed using mFG visual scale scores for terminal hair growth in 9 androgen-sensitive body areas (upper lip, chin, chest, upper and lower abdomen, upper arm, thigh, and upper and lower back). Scores were assigned from 0 (no growth of terminal hair) to 4 (extensive terminal hair growth) in each of the 9 sites as per the published photographic atlas.9 In all populations, trained healthcare professionals (midwives, nurses, and physicians including general practitioners, gynecologists, or medical and reproductive endocrinologists) evaluated all participants in the IPD dataset, with individual total scores ranging from 0-36.32,36-46 We did not include data obtained by self-report of the subjects themselves.
We included women of reproductive age (18-45 years) who had complete data on mFG scores and ethnicity. Women outside this age range, and those of ethnicities with very few participants, were excluded. Additionally, women with natural/surgical menopause did not provide informed consent, pregnant and lactating women, ovarian pathology, or disorders with elevated levels of hormones tested in exclusion of other conditions in the diagnosis of PCOS, including follicle-stimulating hormone (FSH, >25 IU/L), prolactin (>67.2 µg/L), 17-hydroxyprogesterone (17-OHP >30 nmol/L), and thyroid-stimulating hormone (TSH, >5 µU/mL), were not included in the analysis. All laboratory values for hormones and androgens were converted to International System of Units (SI) units. The distribution of mFG scores between ethnic groups was determined descriptively based on means and medians. Standardized data (mean = 0, SD = 1) underwent k-means cluster analysis (k = 2) with clusters determined using the average silhouette method.
The distribution of clinical and biochemical features among women above or below the hirsutism cutoff scores was assessed using an independent t-test (for normally distributed data), Mann–Whitney U-test (for skewed data), and chi-square test (for categorical data). In characterizing the populations, laboratory androgen measurements and ultrasonography methods were performed as documented in the original primary publications.32,36-45
A mixed-effects logistic regression model was used to assess the effects of age, BMI, and active oral contraceptive pill (OCP) use (defined as the use of OCP at the time of the survey or within the last 3 months) on mFG cutoffs.48 This approach is appropriate for modeling the binary outcome hirsutism status, accounting for the clustering of subjects by the center and ensuring precise estimates for the fixed effects (individual predictors) and the random effects attributable to clustering.49,50 A sensitivity analysis was conducted to evaluate the potential impact of active OCP use on the mFG score cutoffs. Other secondary outcomes were the presence of PCOS, PCOM, OD, and hyperandrogenemia, as defined in individual studies. All the medians and cutoff values determined by cluster analysis were compared between ethnic groups using the Kruskal–Wallis test. Statistical significance was declared with a P-value less than .05. Analyses were performed using Stata (version 18.0)51 and RStudio (version 4.2.2).
Results
Study characteristics and risk of bias
The literature search yielded 6507 publications, of which 1437 duplicates were removed. Screening of titles and abstracts excluded 4728 studies, leaving 342 studies for full-text review. Of these, 313 studies were excluded for various reasons, such as the selected/referred population (n = 185), conference abstracts and reviews (n = 72), sample size <300 (n = 24), not directly assessed/self-report (n = 12), not having PCOS/features (n = 11), ineligible population with predefined comorbidity (n = 6), and the same research group/data (n = 3). Finally, 29 studies met the eligibility criteria and their authors were contacted. Of these, the authors of 19 studies, encompassing 31 580 participants, did not respond. A total of 10 investigators responded, with 7 completing the agreement and sharing their IPD. However, 3 studies52-54 with a total of 17 439 participants did not complete the data-sharing process, leaving a total of 49 019 IPD not included. Additionally, 2 ongoing studies were identified through the PCOS experts, and these 2 additional studies contributed to more than one study from different populations. Taken together, the P-PUP study dataset included 12 medically unbiased, unselected community-based studies with 12 513 participants from China (n = 2; 4645 women),32,36 Iran (n = 3; 2517 women),37-39 Italy (n = 1; 519 women),40 Nigeria (n = 1; 440 women),46 Russia (n = 1; 2695 women),41 South Korea (n = 1; 499 women),42 Turkey (n = 1; 392 women),43 and the United States (n = 2; 806 women)44,45 (Figure 1). All studies were cross-sectional, with sample sizes ranging from 392 to 3000. Quality appraisal of included studies is presented in Table S2, with all included studies having a low risk of bias (high quality) (Table 1).

Dataset . | Country . | Sample size . | Ethnicity . | Ovulatory dysfunction definition . | Clinical hyperandrogenism definition . | Hyperandrogenemia definition . | Assay method . | PCOM definition . | Diagnosis criteria . | Quality score . |
---|---|---|---|---|---|---|---|---|---|---|
Zhao et al. 201132 | China | 3000 | Asian Han Chinese | <8 cycle/year or cycle length >35 days | mFG ≥8 or acne | Above 95th centile; TT: 3.08 nmol/L, FAI: 6.74, A4: 17.67 nmol/L. | CLIA | ≥12 follicles | Rotterdam | 18 |
Zhuang et al. 201436 | China | 1645 | Asian Han Chinese | Cycle length ≥ 35 days or absent for >3 months | mFG ≥6 or acne | Two standard deviations above the normal (FT > 0.0111 nmol/L). | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 18 |
Tehrani et al. 2011a37 | Iran | 1000 | White Iranian | Cycle length >35 days or P4 < 4 ng/mL | mFG ≥8, acne, or alopecia | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | N/A | NIH 1990 | 19 |
Tehrani et al. 2011b38 | Iran | 915 | White Iranian | Cycle length >35 days | mFG ≥8 | Above 95th centile; TT: 3.05 nmol/L, FAI: 5.47, DHEAS: 6.67 µmol/L, A4: 8.03 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Tehrani et al. 201439 | Iran | 602 | White Iranian | Amenorrhea, cycle length <21 days or >35 days | mFG ≥8 | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Gambineri et al. 201340 | Italy | 519 | White Italian | <8 cycles/year | mFG ≥8 or alopecia | TT > 1.57 nmol/L | LC-MS/MS | N/A | Rotterdam | 18 |
Makwe et al. 202346 | Nigeria | 440 | Black African | Cycle length <26 days or >35 days | mFG ≥6 | N/A | N/A | ≥20 follicles or ≥10 cm3 | Rotterdam | 20 |
Suturina et al. 202241 | Russia | 1134 | White Russian, Asian Russian, and Mixed Russian | <8 cycles/year or cycle length < 21 days or >35 days | mFG ≥3 for healthy controls | Above 98th centile; TT: 2.56 nmol/L (White Russians) & 1.42 nmol/L (Asian Russians and Mixed Russians), FAI: 6.9 (White Russians) & 2.9 (Asian Russians and Mixed Russians), DHEAS: 9.62 µmol/L. | LC-MC/MS & CLIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Kim et al. 201142 | South Korea | 499 | Asian Korean | <8 cycles/year or cycle length >35 or amenorrhea | mFG ≥6 | Above 95th centile; TT: 2.36 nmol/L, FAI: 5.36. | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Yildiz et al. 201243 | Turkey | 392 | Turkish | Cycle length ≤23 days or ≥ 35 days or P4 < 5 ng/mL | mFG ≥6 | Above 95th centile; TT: 1.9 nmol/L, FAI: 4.94, DHEAS: 8.83 µmol/L, A4: 10.37 nmol/L. | CLIA, RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Knochenhauer et al. 199845 | USA | 383 | White Americans and Black Americans | ≤8 cycles/year or cycle >35 days | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
Azziz et al. 200444 | USA | 388 | White Americans and Black Americans | ≤8 cycles/year or cycle length <26 or >35 days or day 22-24 P4 < 4 ng/mL. | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
Dataset . | Country . | Sample size . | Ethnicity . | Ovulatory dysfunction definition . | Clinical hyperandrogenism definition . | Hyperandrogenemia definition . | Assay method . | PCOM definition . | Diagnosis criteria . | Quality score . |
---|---|---|---|---|---|---|---|---|---|---|
Zhao et al. 201132 | China | 3000 | Asian Han Chinese | <8 cycle/year or cycle length >35 days | mFG ≥8 or acne | Above 95th centile; TT: 3.08 nmol/L, FAI: 6.74, A4: 17.67 nmol/L. | CLIA | ≥12 follicles | Rotterdam | 18 |
Zhuang et al. 201436 | China | 1645 | Asian Han Chinese | Cycle length ≥ 35 days or absent for >3 months | mFG ≥6 or acne | Two standard deviations above the normal (FT > 0.0111 nmol/L). | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 18 |
Tehrani et al. 2011a37 | Iran | 1000 | White Iranian | Cycle length >35 days or P4 < 4 ng/mL | mFG ≥8, acne, or alopecia | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | N/A | NIH 1990 | 19 |
Tehrani et al. 2011b38 | Iran | 915 | White Iranian | Cycle length >35 days | mFG ≥8 | Above 95th centile; TT: 3.05 nmol/L, FAI: 5.47, DHEAS: 6.67 µmol/L, A4: 8.03 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Tehrani et al. 201439 | Iran | 602 | White Iranian | Amenorrhea, cycle length <21 days or >35 days | mFG ≥8 | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Gambineri et al. 201340 | Italy | 519 | White Italian | <8 cycles/year | mFG ≥8 or alopecia | TT > 1.57 nmol/L | LC-MS/MS | N/A | Rotterdam | 18 |
Makwe et al. 202346 | Nigeria | 440 | Black African | Cycle length <26 days or >35 days | mFG ≥6 | N/A | N/A | ≥20 follicles or ≥10 cm3 | Rotterdam | 20 |
Suturina et al. 202241 | Russia | 1134 | White Russian, Asian Russian, and Mixed Russian | <8 cycles/year or cycle length < 21 days or >35 days | mFG ≥3 for healthy controls | Above 98th centile; TT: 2.56 nmol/L (White Russians) & 1.42 nmol/L (Asian Russians and Mixed Russians), FAI: 6.9 (White Russians) & 2.9 (Asian Russians and Mixed Russians), DHEAS: 9.62 µmol/L. | LC-MC/MS & CLIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Kim et al. 201142 | South Korea | 499 | Asian Korean | <8 cycles/year or cycle length >35 or amenorrhea | mFG ≥6 | Above 95th centile; TT: 2.36 nmol/L, FAI: 5.36. | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Yildiz et al. 201243 | Turkey | 392 | Turkish | Cycle length ≤23 days or ≥ 35 days or P4 < 5 ng/mL | mFG ≥6 | Above 95th centile; TT: 1.9 nmol/L, FAI: 4.94, DHEAS: 8.83 µmol/L, A4: 10.37 nmol/L. | CLIA, RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Knochenhauer et al. 199845 | USA | 383 | White Americans and Black Americans | ≤8 cycles/year or cycle >35 days | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
Azziz et al. 200444 | USA | 388 | White Americans and Black Americans | ≤8 cycles/year or cycle length <26 or >35 days or day 22-24 P4 < 4 ng/mL. | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
A4, androstenedione; FAI, free androgen index; FT, free testosterone; mFG score, modified Ferriman–Galleway score; TT, total testosterone; DHEAS, dehydroepiandrosterone sulfate; P4, progesterone; CLIA, chemiluminescence immunoassay; EIA, enzyme immunoassay; LC-MS/MS, liquid chromatography-tandem mass spectrometry; RIA, radioimmunoassay. The reported laboratory values were converted to the SI unit using the following conversion factors: for A4 3.49 from ng/mL to nmol/L and 0.00349 from pg/mL to nmol/L; for TT 3.47 from ng/mL to nmol/L and 0.0347 from ng/dL to nmol/L; for FT 0.003467 from pg/mL to nmol/L and 0.03467 from ng/dL to nmol/L; for DHEAS 0.00271 ng/mL to µmol/L and 0.0271 µg/dL to µmol/L, N/A: Data not available, Quality was assessed using the AXIS tool and rated as; High:15-20, Medium:8-14, and Low:1-7.
Dataset . | Country . | Sample size . | Ethnicity . | Ovulatory dysfunction definition . | Clinical hyperandrogenism definition . | Hyperandrogenemia definition . | Assay method . | PCOM definition . | Diagnosis criteria . | Quality score . |
---|---|---|---|---|---|---|---|---|---|---|
Zhao et al. 201132 | China | 3000 | Asian Han Chinese | <8 cycle/year or cycle length >35 days | mFG ≥8 or acne | Above 95th centile; TT: 3.08 nmol/L, FAI: 6.74, A4: 17.67 nmol/L. | CLIA | ≥12 follicles | Rotterdam | 18 |
Zhuang et al. 201436 | China | 1645 | Asian Han Chinese | Cycle length ≥ 35 days or absent for >3 months | mFG ≥6 or acne | Two standard deviations above the normal (FT > 0.0111 nmol/L). | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 18 |
Tehrani et al. 2011a37 | Iran | 1000 | White Iranian | Cycle length >35 days or P4 < 4 ng/mL | mFG ≥8, acne, or alopecia | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | N/A | NIH 1990 | 19 |
Tehrani et al. 2011b38 | Iran | 915 | White Iranian | Cycle length >35 days | mFG ≥8 | Above 95th centile; TT: 3.05 nmol/L, FAI: 5.47, DHEAS: 6.67 µmol/L, A4: 8.03 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Tehrani et al. 201439 | Iran | 602 | White Iranian | Amenorrhea, cycle length <21 days or >35 days | mFG ≥8 | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Gambineri et al. 201340 | Italy | 519 | White Italian | <8 cycles/year | mFG ≥8 or alopecia | TT > 1.57 nmol/L | LC-MS/MS | N/A | Rotterdam | 18 |
Makwe et al. 202346 | Nigeria | 440 | Black African | Cycle length <26 days or >35 days | mFG ≥6 | N/A | N/A | ≥20 follicles or ≥10 cm3 | Rotterdam | 20 |
Suturina et al. 202241 | Russia | 1134 | White Russian, Asian Russian, and Mixed Russian | <8 cycles/year or cycle length < 21 days or >35 days | mFG ≥3 for healthy controls | Above 98th centile; TT: 2.56 nmol/L (White Russians) & 1.42 nmol/L (Asian Russians and Mixed Russians), FAI: 6.9 (White Russians) & 2.9 (Asian Russians and Mixed Russians), DHEAS: 9.62 µmol/L. | LC-MC/MS & CLIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Kim et al. 201142 | South Korea | 499 | Asian Korean | <8 cycles/year or cycle length >35 or amenorrhea | mFG ≥6 | Above 95th centile; TT: 2.36 nmol/L, FAI: 5.36. | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Yildiz et al. 201243 | Turkey | 392 | Turkish | Cycle length ≤23 days or ≥ 35 days or P4 < 5 ng/mL | mFG ≥6 | Above 95th centile; TT: 1.9 nmol/L, FAI: 4.94, DHEAS: 8.83 µmol/L, A4: 10.37 nmol/L. | CLIA, RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Knochenhauer et al. 199845 | USA | 383 | White Americans and Black Americans | ≤8 cycles/year or cycle >35 days | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
Azziz et al. 200444 | USA | 388 | White Americans and Black Americans | ≤8 cycles/year or cycle length <26 or >35 days or day 22-24 P4 < 4 ng/mL. | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
Dataset . | Country . | Sample size . | Ethnicity . | Ovulatory dysfunction definition . | Clinical hyperandrogenism definition . | Hyperandrogenemia definition . | Assay method . | PCOM definition . | Diagnosis criteria . | Quality score . |
---|---|---|---|---|---|---|---|---|---|---|
Zhao et al. 201132 | China | 3000 | Asian Han Chinese | <8 cycle/year or cycle length >35 days | mFG ≥8 or acne | Above 95th centile; TT: 3.08 nmol/L, FAI: 6.74, A4: 17.67 nmol/L. | CLIA | ≥12 follicles | Rotterdam | 18 |
Zhuang et al. 201436 | China | 1645 | Asian Han Chinese | Cycle length ≥ 35 days or absent for >3 months | mFG ≥6 or acne | Two standard deviations above the normal (FT > 0.0111 nmol/L). | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 18 |
Tehrani et al. 2011a37 | Iran | 1000 | White Iranian | Cycle length >35 days or P4 < 4 ng/mL | mFG ≥8, acne, or alopecia | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | N/A | NIH 1990 | 19 |
Tehrani et al. 2011b38 | Iran | 915 | White Iranian | Cycle length >35 days | mFG ≥8 | Above 95th centile; TT: 3.05 nmol/L, FAI: 5.47, DHEAS: 6.67 µmol/L, A4: 8.03 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Tehrani et al. 201439 | Iran | 602 | White Iranian | Amenorrhea, cycle length <21 days or >35 days | mFG ≥8 | Above 95th centile; TT: 3.09 nmol/L, FAI: 5.39, DHEAS: 4.85 µmol/L, A4: 10.12 nmol/L. | EIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 19 |
Gambineri et al. 201340 | Italy | 519 | White Italian | <8 cycles/year | mFG ≥8 or alopecia | TT > 1.57 nmol/L | LC-MS/MS | N/A | Rotterdam | 18 |
Makwe et al. 202346 | Nigeria | 440 | Black African | Cycle length <26 days or >35 days | mFG ≥6 | N/A | N/A | ≥20 follicles or ≥10 cm3 | Rotterdam | 20 |
Suturina et al. 202241 | Russia | 1134 | White Russian, Asian Russian, and Mixed Russian | <8 cycles/year or cycle length < 21 days or >35 days | mFG ≥3 for healthy controls | Above 98th centile; TT: 2.56 nmol/L (White Russians) & 1.42 nmol/L (Asian Russians and Mixed Russians), FAI: 6.9 (White Russians) & 2.9 (Asian Russians and Mixed Russians), DHEAS: 9.62 µmol/L. | LC-MC/MS & CLIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Kim et al. 201142 | South Korea | 499 | Asian Korean | <8 cycles/year or cycle length >35 or amenorrhea | mFG ≥6 | Above 95th centile; TT: 2.36 nmol/L, FAI: 5.36. | RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Yildiz et al. 201243 | Turkey | 392 | Turkish | Cycle length ≤23 days or ≥ 35 days or P4 < 5 ng/mL | mFG ≥6 | Above 95th centile; TT: 1.9 nmol/L, FAI: 4.94, DHEAS: 8.83 µmol/L, A4: 10.37 nmol/L. | CLIA, RIA | ≥12 follicles or ≥10 cm3 | Rotterdam | 17 |
Knochenhauer et al. 199845 | USA | 383 | White Americans and Black Americans | ≤8 cycles/year or cycle >35 days | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
Azziz et al. 200444 | USA | 388 | White Americans and Black Americans | ≤8 cycles/year or cycle length <26 or >35 days or day 22-24 P4 < 4 ng/mL. | mFG ≥6 | Above 95th centile; TT: 2.94 nmol/L, FT: 0.026 nmol/L, DHEAS: 6.64 µmol/L, A4: 8.73 nmol/L. | RIA, RIA after extraction | N/A | NIH 1990 | 17 |
A4, androstenedione; FAI, free androgen index; FT, free testosterone; mFG score, modified Ferriman–Galleway score; TT, total testosterone; DHEAS, dehydroepiandrosterone sulfate; P4, progesterone; CLIA, chemiluminescence immunoassay; EIA, enzyme immunoassay; LC-MS/MS, liquid chromatography-tandem mass spectrometry; RIA, radioimmunoassay. The reported laboratory values were converted to the SI unit using the following conversion factors: for A4 3.49 from ng/mL to nmol/L and 0.00349 from pg/mL to nmol/L; for TT 3.47 from ng/mL to nmol/L and 0.0347 from ng/dL to nmol/L; for FT 0.003467 from pg/mL to nmol/L and 0.03467 from ng/dL to nmol/L; for DHEAS 0.00271 ng/mL to µmol/L and 0.0271 µg/dL to µmol/L, N/A: Data not available, Quality was assessed using the AXIS tool and rated as; High:15-20, Medium:8-14, and Low:1-7.
Participant selection and baseline characteristics
Of 12 513 participants across 12 studies, 1793 were excluded here due to age <18 years (n = 609), age >45 (n = 1178), or missing data on age (n = 6). Another 891 reproductive-age women were excluded due to not providing informed consent (n = 3), missing mFG scores (n = 70), ethnicity (n = 111), or infrequent ethnicities (n = 47). Additionally, women who experienced natural/surgical menopause (n = 70), current pregnancy/lactation (n = 39), or who had elevated FSH (n = 99), prolactin (n = 38), 17-OHP (n = 4), or TSH levels (n = 356); or had ovarian pathology (n = 48), were excluded. A final sample of 9829 participants was included in the analysis (Figure 2). The median interquartile range (IQR) age of the participants was 31 years (25-37 years), with a mean BMI of 23.6 ± 5.2 kg/m2 (range, 12.5-67.0 kg/m2). The median (IQR) mFG score overall was 1 (0-3).

Distribution and cutoffs of mFG scores by ethnicity
In all studies, hirsutism was assessed using mFG visual scale scores by trained healthcare professionals, including gynecologists,32,36 general practitioners,37 midwives,38 staff from medical universities,39 physicians,40,41 gynecologic endocrinologist,42 single physician and repeated by another,43 nurses with reexamined by physicians,44,45 and medically trained research assistants with reevaluated by physicians.46 We encompassed 11 diverse ethnic groups across 8 countries and here revealed variation in the distribution of mFG scores. Notably, White Iranian, White Italian, and Black African populations had higher median mFG scores than other ethnic groups. Median mFG scores differed significantly between almost all ethnic groups compared to the Asian Han Chinese, except for Black Americans (P < 0.001) (Table 2).
Ethnic group . | n (%) . | mFG score (mean ± SD) . | mFG score median (IQR) . | P-valuea . |
---|---|---|---|---|
Asian (Han) Chinese | 4133 (42.1) | 1.8 ± 2.6 | 1 (0-2) | Reference |
Asian Korean | 483 (4.9) | 1.5 ± 2.5 | 0 (0-2) | <0.001 |
Asian Russian | 347 (3.5) | 0.8 ± 1.9 | 0 (0-1) | <0.001 |
Black African | 405 (4.1) | 2.4 ± 2.7 | 2 (1-3) | <0.001 |
Black American | 383 (3.9) | 1.7 ± 2.6 | 1 (0-2) | 0.131 |
Mixed Russian | 112 (1.1) | 1.2 ± 2.0 | 0 (0-1) | <0.001b |
White American | 320 (3.3) | 1.2 ± 2.1 | 0 (0-1) | <0.001 |
White Iranian | 2234 (22.7) | 4.6 ± 4.5 | 3 (1-5) | <0.001 |
White Italian | 181 (1.9) | 5.3 ± 3.3 | 5 (3-7) | <0.001b |
White Russian | 844 (8.6) | 1.3 ± 2.3 | 0 (0-2) | <0.001 |
Turkish | 387 (3.9) | 2.4 ± 2.5 | 2 (0-3) | <0.001 |
Total population | 9829 (100) | 2.4 ± 3.4 | 1 (0-3) | <0.001 |
Ethnic group . | n (%) . | mFG score (mean ± SD) . | mFG score median (IQR) . | P-valuea . |
---|---|---|---|---|
Asian (Han) Chinese | 4133 (42.1) | 1.8 ± 2.6 | 1 (0-2) | Reference |
Asian Korean | 483 (4.9) | 1.5 ± 2.5 | 0 (0-2) | <0.001 |
Asian Russian | 347 (3.5) | 0.8 ± 1.9 | 0 (0-1) | <0.001 |
Black African | 405 (4.1) | 2.4 ± 2.7 | 2 (1-3) | <0.001 |
Black American | 383 (3.9) | 1.7 ± 2.6 | 1 (0-2) | 0.131 |
Mixed Russian | 112 (1.1) | 1.2 ± 2.0 | 0 (0-1) | <0.001b |
White American | 320 (3.3) | 1.2 ± 2.1 | 0 (0-1) | <0.001 |
White Iranian | 2234 (22.7) | 4.6 ± 4.5 | 3 (1-5) | <0.001 |
White Italian | 181 (1.9) | 5.3 ± 3.3 | 5 (3-7) | <0.001b |
White Russian | 844 (8.6) | 1.3 ± 2.3 | 0 (0-2) | <0.001 |
Turkish | 387 (3.9) | 2.4 ± 2.5 | 2 (0-3) | <0.001 |
Total population | 9829 (100) | 2.4 ± 3.4 | 1 (0-3) | <0.001 |
aKruskal–Wallis test for median mFG adjusted for both age and body mass index (BMI); bSmall sample size, IQR, Interquartile range.
Ethnic group . | n (%) . | mFG score (mean ± SD) . | mFG score median (IQR) . | P-valuea . |
---|---|---|---|---|
Asian (Han) Chinese | 4133 (42.1) | 1.8 ± 2.6 | 1 (0-2) | Reference |
Asian Korean | 483 (4.9) | 1.5 ± 2.5 | 0 (0-2) | <0.001 |
Asian Russian | 347 (3.5) | 0.8 ± 1.9 | 0 (0-1) | <0.001 |
Black African | 405 (4.1) | 2.4 ± 2.7 | 2 (1-3) | <0.001 |
Black American | 383 (3.9) | 1.7 ± 2.6 | 1 (0-2) | 0.131 |
Mixed Russian | 112 (1.1) | 1.2 ± 2.0 | 0 (0-1) | <0.001b |
White American | 320 (3.3) | 1.2 ± 2.1 | 0 (0-1) | <0.001 |
White Iranian | 2234 (22.7) | 4.6 ± 4.5 | 3 (1-5) | <0.001 |
White Italian | 181 (1.9) | 5.3 ± 3.3 | 5 (3-7) | <0.001b |
White Russian | 844 (8.6) | 1.3 ± 2.3 | 0 (0-2) | <0.001 |
Turkish | 387 (3.9) | 2.4 ± 2.5 | 2 (0-3) | <0.001 |
Total population | 9829 (100) | 2.4 ± 3.4 | 1 (0-3) | <0.001 |
Ethnic group . | n (%) . | mFG score (mean ± SD) . | mFG score median (IQR) . | P-valuea . |
---|---|---|---|---|
Asian (Han) Chinese | 4133 (42.1) | 1.8 ± 2.6 | 1 (0-2) | Reference |
Asian Korean | 483 (4.9) | 1.5 ± 2.5 | 0 (0-2) | <0.001 |
Asian Russian | 347 (3.5) | 0.8 ± 1.9 | 0 (0-1) | <0.001 |
Black African | 405 (4.1) | 2.4 ± 2.7 | 2 (1-3) | <0.001 |
Black American | 383 (3.9) | 1.7 ± 2.6 | 1 (0-2) | 0.131 |
Mixed Russian | 112 (1.1) | 1.2 ± 2.0 | 0 (0-1) | <0.001b |
White American | 320 (3.3) | 1.2 ± 2.1 | 0 (0-1) | <0.001 |
White Iranian | 2234 (22.7) | 4.6 ± 4.5 | 3 (1-5) | <0.001 |
White Italian | 181 (1.9) | 5.3 ± 3.3 | 5 (3-7) | <0.001b |
White Russian | 844 (8.6) | 1.3 ± 2.3 | 0 (0-2) | <0.001 |
Turkish | 387 (3.9) | 2.4 ± 2.5 | 2 (0-3) | <0.001 |
Total population | 9829 (100) | 2.4 ± 3.4 | 1 (0-3) | <0.001 |
aKruskal–Wallis test for median mFG adjusted for both age and body mass index (BMI); bSmall sample size, IQR, Interquartile range.
Modified Ferriman–Gallwey (mFG) cutoffs by k-means cluster analysis and corresponding centiles by ethnicity
Our model utilized the mFG score as input for clustering. The number of clusters was determined using the average silhouette method (Figure S1) generating an average silhouette coefficient of 0.75, indicating good clustering. The mFG cutoff scores varied across ethnic groups, ranging from 4 to 8, with the highest scores observed among White Iranians with a cutoff of 8 and the corresponding percentile being 82nd, followed by White Italians and Black Africans with a cutoff of 7 and the corresponding percentile being 78th and 94th, respectively. Asian Han Chinese, White Russians, Turkish, and Black Americans share cutoffs of 5 with the corresponding percentile being 90th, 94th, 89th, and 91st, respectively. Notably, White Americans, Asian Koreans, Asian Russians, and Mixed Russians share a lower cutoff of 4 with the corresponding percentile ranging from 87th to 94th (Table 3). When considering all ethnicities combined, an mFG score of ≥7 was considered abnormal.
Generated modified Ferriman–Gallwey cutoffs using k-means means clustering and corresponding centiles by ethnicity.
Ethnic group . | mFG score cutoff . | Corresponding cutoff percentile . | Prevalence of hirsutism (95% CI) . |
---|---|---|---|
Asian (Han) Chinese | 5 | 88th | 12.3 (11.3-13.4) |
Asian Korean | 4 | 85th | 15.3 (12.2-18.8) |
Asian Russian | 4 | 94th | 6.0 (3.8-9.1) |
Black African | 7 | 94th | 6.9 (4.6-9.8) |
Black American | 5 | 91st | 11.5 (8.5-15.1) |
Mixed Russian | 4 | 90th | 13.4 (7.7-21.1)a |
White American | 4 | 91st | 11.9 (8.5-15.9) |
White Iranian | 8 | 82nd | 21.1 (19.4-22.8) |
White Italian | 7 | 75th | 34.3 (27.4-41.7)a |
White Russian | 5 | 94th | 8.3 (6.5-10.4) |
Turkish | 5 | 89th | 16.8 (13.2-20.9) |
All ethnicity (combined) | 7 | 91st | 10.0 (9.4-10.6) |
Ethnic group . | mFG score cutoff . | Corresponding cutoff percentile . | Prevalence of hirsutism (95% CI) . |
---|---|---|---|
Asian (Han) Chinese | 5 | 88th | 12.3 (11.3-13.4) |
Asian Korean | 4 | 85th | 15.3 (12.2-18.8) |
Asian Russian | 4 | 94th | 6.0 (3.8-9.1) |
Black African | 7 | 94th | 6.9 (4.6-9.8) |
Black American | 5 | 91st | 11.5 (8.5-15.1) |
Mixed Russian | 4 | 90th | 13.4 (7.7-21.1)a |
White American | 4 | 91st | 11.9 (8.5-15.9) |
White Iranian | 8 | 82nd | 21.1 (19.4-22.8) |
White Italian | 7 | 75th | 34.3 (27.4-41.7)a |
White Russian | 5 | 94th | 8.3 (6.5-10.4) |
Turkish | 5 | 89th | 16.8 (13.2-20.9) |
All ethnicity (combined) | 7 | 91st | 10.0 (9.4-10.6) |
aSmall sample size.; CI, confidence interval. The prevalence of hirsutism was determined using the mFG score cutoff for each ethnicity as detailed in the table (≥).
Generated modified Ferriman–Gallwey cutoffs using k-means means clustering and corresponding centiles by ethnicity.
Ethnic group . | mFG score cutoff . | Corresponding cutoff percentile . | Prevalence of hirsutism (95% CI) . |
---|---|---|---|
Asian (Han) Chinese | 5 | 88th | 12.3 (11.3-13.4) |
Asian Korean | 4 | 85th | 15.3 (12.2-18.8) |
Asian Russian | 4 | 94th | 6.0 (3.8-9.1) |
Black African | 7 | 94th | 6.9 (4.6-9.8) |
Black American | 5 | 91st | 11.5 (8.5-15.1) |
Mixed Russian | 4 | 90th | 13.4 (7.7-21.1)a |
White American | 4 | 91st | 11.9 (8.5-15.9) |
White Iranian | 8 | 82nd | 21.1 (19.4-22.8) |
White Italian | 7 | 75th | 34.3 (27.4-41.7)a |
White Russian | 5 | 94th | 8.3 (6.5-10.4) |
Turkish | 5 | 89th | 16.8 (13.2-20.9) |
All ethnicity (combined) | 7 | 91st | 10.0 (9.4-10.6) |
Ethnic group . | mFG score cutoff . | Corresponding cutoff percentile . | Prevalence of hirsutism (95% CI) . |
---|---|---|---|
Asian (Han) Chinese | 5 | 88th | 12.3 (11.3-13.4) |
Asian Korean | 4 | 85th | 15.3 (12.2-18.8) |
Asian Russian | 4 | 94th | 6.0 (3.8-9.1) |
Black African | 7 | 94th | 6.9 (4.6-9.8) |
Black American | 5 | 91st | 11.5 (8.5-15.1) |
Mixed Russian | 4 | 90th | 13.4 (7.7-21.1)a |
White American | 4 | 91st | 11.9 (8.5-15.9) |
White Iranian | 8 | 82nd | 21.1 (19.4-22.8) |
White Italian | 7 | 75th | 34.3 (27.4-41.7)a |
White Russian | 5 | 94th | 8.3 (6.5-10.4) |
Turkish | 5 | 89th | 16.8 (13.2-20.9) |
All ethnicity (combined) | 7 | 91st | 10.0 (9.4-10.6) |
aSmall sample size.; CI, confidence interval. The prevalence of hirsutism was determined using the mFG score cutoff for each ethnicity as detailed in the table (≥).
The mFG score for hirsutism exhibited variability in cutoffs as well as in prevalence. Comparing mFG cutoffs across ethnicities revealed significant differences with a Kruskal–Wallis test when adjusted with age and BMI (P < 0.001) (Figure 3). Based on the ethnicity-specific mFG cutoff score, the prevalence of hirsutism ranged from 6.0% in Asian Russians to 34.3% in White Italians, and on average across all ethnicities, was 14.2%.

Modified Ferriman–Gallwey cutoff values for defining hirsutism using k-means clustering by ethnicity.
We also explored the association of age, BMI, and OCP use with hirsutism using a mixed-effect logistic regression model. Accordingly, for a one-year increase in age, the odds of having hirsutism decrease by approximately 5.5% (P < 0.001), whereas for each one-unit increase in BMI, the odds of having hirsutism increase by approximately 5.2% (P < 0.001). OCP use was not significantly associated with hirsutism as measured by mFG score (Table 4). Sensitivity analysis revealed that excluding women on active OCPs had no impact on most ethnicities (Table S3), with a minor impact on the mFG score cutoff for White Russians (from 5 to 4), and Mixed Russians (from 4 to 3).
Parameter . | OR . | s.e . | 95% CI . | P-value . |
---|---|---|---|---|
Intercept | 0.27 | 0.065 | 0.16-0.43 | <0.001 |
Age | 0.94 | 0.005 | 0.94-0.95 | <0.001 |
BMI | 1.05 | 0.007 | 1.04-1.07 | <0.001 |
OCP use | 0.80 | 0.118 | 0.59-1.07 | 0.125 |
Parameter . | OR . | s.e . | 95% CI . | P-value . |
---|---|---|---|---|
Intercept | 0.27 | 0.065 | 0.16-0.43 | <0.001 |
Age | 0.94 | 0.005 | 0.94-0.95 | <0.001 |
BMI | 1.05 | 0.007 | 1.04-1.07 | <0.001 |
OCP use | 0.80 | 0.118 | 0.59-1.07 | 0.125 |
BMI, Body mass index; OCP, oral contraceptive pills; OR, odds ratio; s.e, standard error; CI, confidence interval.
Parameter . | OR . | s.e . | 95% CI . | P-value . |
---|---|---|---|---|
Intercept | 0.27 | 0.065 | 0.16-0.43 | <0.001 |
Age | 0.94 | 0.005 | 0.94-0.95 | <0.001 |
BMI | 1.05 | 0.007 | 1.04-1.07 | <0.001 |
OCP use | 0.80 | 0.118 | 0.59-1.07 | 0.125 |
Parameter . | OR . | s.e . | 95% CI . | P-value . |
---|---|---|---|---|
Intercept | 0.27 | 0.065 | 0.16-0.43 | <0.001 |
Age | 0.94 | 0.005 | 0.94-0.95 | <0.001 |
BMI | 1.05 | 0.007 | 1.04-1.07 | <0.001 |
OCP use | 0.80 | 0.118 | 0.59-1.07 | 0.125 |
BMI, Body mass index; OCP, oral contraceptive pills; OR, odds ratio; s.e, standard error; CI, confidence interval.
Clinical and biochemical features; comparison between hirsute and non-hirsute groups
Based on k-means clustering within each center, participants were categorized into hirsute and non-hirsute groups. Of the 9829 participants, 1397 (14.2%) were classified in the hirsute cluster, while the remaining 8432 (85.8%) were categorized in the non-hirsute cluster. The median age was significantly lower in the hirsute group (28 years) vs the non-hirsute group (31 years), and the mean BMI and waist circumference were significantly higher in the hirsute group (24.1 kg/m2 and 77.4 cm) vs the non-hirsute group (23.5 kg/m2 and 76.3 cm). Based on the population cutoff by k-means clustering at the center level, those who were hirsute had a higher prevalence of PCOS, PCOM, ovulatory dysfunction, and hyperandrogenemia than those who were non-hirsute (Table 5, Table S4-S14, Figure S2).
Distribution of clinical features in groups with and without hirsutism among all ethnicities (n = 9829).
Features . | Hirsute (n = 1397) . | Non-hirsute (n = 8432) . | P-value . | |
---|---|---|---|---|
Age, years | 28.0 (23.0-34.0) | 31.0 (25.0-37.0) | <0.001 | |
BMI, kg/m2 | 24.1 ± 5.6 | 23.5 ± 5.1 | <0.001 | |
WHR | 0.80 ± 0.07 | 0.80 ± 0.08 | 0.010 | |
Waist circumference, cm | 77.4 ± 12.7 | 76.3 ± 11.4 | 0.001 | |
TT, nmol/L | 1.76 ± 0.95 | 1.53 ± 0.99 | <0.001 | |
FT, nmol/L | 0.008 (0.006-0.011) | 0.007 (0.005-0.009) | <0.001 | |
FAI | 2.95 (1.80-4.65) | 2.05 (1.14-3.57) | <0.001 | |
A4, nmol/L | 6.96 ± 4.99 | 7.15 ± 4.45 | 0.293 | |
DHEAS, µmol/L | 4.40 (2.82-6.30) | 4.07 (2.45-5.76) | <0.001 | |
PCOS | Yes | 454 (32.8) | 646 (7.9) | <0.001 |
No | 930 (67.2) | 7564 (92.1) | ||
PCOM | Yes | 301 (28.2) | 1430 (21.4) | <0.001 |
No | 765 (71.8) | 5253 (78.6) | ||
Ovulatory dysfunction | Yes | 366 (26.2) | 1361 (16.1) | <0.001 |
No | 1031 (73.8) | 7065 (83.9) | ||
Hyperandrogenemia | Yes | 283 (27.0) | 774 (15.4) | <0.001 |
No | 764 (73.0) | 4250 (84.6) |
Features . | Hirsute (n = 1397) . | Non-hirsute (n = 8432) . | P-value . | |
---|---|---|---|---|
Age, years | 28.0 (23.0-34.0) | 31.0 (25.0-37.0) | <0.001 | |
BMI, kg/m2 | 24.1 ± 5.6 | 23.5 ± 5.1 | <0.001 | |
WHR | 0.80 ± 0.07 | 0.80 ± 0.08 | 0.010 | |
Waist circumference, cm | 77.4 ± 12.7 | 76.3 ± 11.4 | 0.001 | |
TT, nmol/L | 1.76 ± 0.95 | 1.53 ± 0.99 | <0.001 | |
FT, nmol/L | 0.008 (0.006-0.011) | 0.007 (0.005-0.009) | <0.001 | |
FAI | 2.95 (1.80-4.65) | 2.05 (1.14-3.57) | <0.001 | |
A4, nmol/L | 6.96 ± 4.99 | 7.15 ± 4.45 | 0.293 | |
DHEAS, µmol/L | 4.40 (2.82-6.30) | 4.07 (2.45-5.76) | <0.001 | |
PCOS | Yes | 454 (32.8) | 646 (7.9) | <0.001 |
No | 930 (67.2) | 7564 (92.1) | ||
PCOM | Yes | 301 (28.2) | 1430 (21.4) | <0.001 |
No | 765 (71.8) | 5253 (78.6) | ||
Ovulatory dysfunction | Yes | 366 (26.2) | 1361 (16.1) | <0.001 |
No | 1031 (73.8) | 7065 (83.9) | ||
Hyperandrogenemia | Yes | 283 (27.0) | 774 (15.4) | <0.001 |
No | 764 (73.0) | 4250 (84.6) |
BMI, Body mass index; PCOS, polycystic ovary syndrome; PCOM, polycystic ovary morphology; WHR, Waist-to-hip ratio, A4, androstenedione, FAI, free androgen index, FT, free testosterone, TT, total testosterone, DHEAS, dehydroepiandrosterone sulfate. P values showed mean with standard deviation by t-test for continuous data, median with interquartile range by Mann–Whitney U-test for skewed data, and chi-square test for categorical data.
Distribution of clinical features in groups with and without hirsutism among all ethnicities (n = 9829).
Features . | Hirsute (n = 1397) . | Non-hirsute (n = 8432) . | P-value . | |
---|---|---|---|---|
Age, years | 28.0 (23.0-34.0) | 31.0 (25.0-37.0) | <0.001 | |
BMI, kg/m2 | 24.1 ± 5.6 | 23.5 ± 5.1 | <0.001 | |
WHR | 0.80 ± 0.07 | 0.80 ± 0.08 | 0.010 | |
Waist circumference, cm | 77.4 ± 12.7 | 76.3 ± 11.4 | 0.001 | |
TT, nmol/L | 1.76 ± 0.95 | 1.53 ± 0.99 | <0.001 | |
FT, nmol/L | 0.008 (0.006-0.011) | 0.007 (0.005-0.009) | <0.001 | |
FAI | 2.95 (1.80-4.65) | 2.05 (1.14-3.57) | <0.001 | |
A4, nmol/L | 6.96 ± 4.99 | 7.15 ± 4.45 | 0.293 | |
DHEAS, µmol/L | 4.40 (2.82-6.30) | 4.07 (2.45-5.76) | <0.001 | |
PCOS | Yes | 454 (32.8) | 646 (7.9) | <0.001 |
No | 930 (67.2) | 7564 (92.1) | ||
PCOM | Yes | 301 (28.2) | 1430 (21.4) | <0.001 |
No | 765 (71.8) | 5253 (78.6) | ||
Ovulatory dysfunction | Yes | 366 (26.2) | 1361 (16.1) | <0.001 |
No | 1031 (73.8) | 7065 (83.9) | ||
Hyperandrogenemia | Yes | 283 (27.0) | 774 (15.4) | <0.001 |
No | 764 (73.0) | 4250 (84.6) |
Features . | Hirsute (n = 1397) . | Non-hirsute (n = 8432) . | P-value . | |
---|---|---|---|---|
Age, years | 28.0 (23.0-34.0) | 31.0 (25.0-37.0) | <0.001 | |
BMI, kg/m2 | 24.1 ± 5.6 | 23.5 ± 5.1 | <0.001 | |
WHR | 0.80 ± 0.07 | 0.80 ± 0.08 | 0.010 | |
Waist circumference, cm | 77.4 ± 12.7 | 76.3 ± 11.4 | 0.001 | |
TT, nmol/L | 1.76 ± 0.95 | 1.53 ± 0.99 | <0.001 | |
FT, nmol/L | 0.008 (0.006-0.011) | 0.007 (0.005-0.009) | <0.001 | |
FAI | 2.95 (1.80-4.65) | 2.05 (1.14-3.57) | <0.001 | |
A4, nmol/L | 6.96 ± 4.99 | 7.15 ± 4.45 | 0.293 | |
DHEAS, µmol/L | 4.40 (2.82-6.30) | 4.07 (2.45-5.76) | <0.001 | |
PCOS | Yes | 454 (32.8) | 646 (7.9) | <0.001 |
No | 930 (67.2) | 7564 (92.1) | ||
PCOM | Yes | 301 (28.2) | 1430 (21.4) | <0.001 |
No | 765 (71.8) | 5253 (78.6) | ||
Ovulatory dysfunction | Yes | 366 (26.2) | 1361 (16.1) | <0.001 |
No | 1031 (73.8) | 7065 (83.9) | ||
Hyperandrogenemia | Yes | 283 (27.0) | 774 (15.4) | <0.001 |
No | 764 (73.0) | 4250 (84.6) |
BMI, Body mass index; PCOS, polycystic ovary syndrome; PCOM, polycystic ovary morphology; WHR, Waist-to-hip ratio, A4, androstenedione, FAI, free androgen index, FT, free testosterone, TT, total testosterone, DHEAS, dehydroepiandrosterone sulfate. P values showed mean with standard deviation by t-test for continuous data, median with interquartile range by Mann–Whitney U-test for skewed data, and chi-square test for categorical data.
Discussion
Hirsutism is the most widely used diagnostic criterion for defining clinical hyperandrogenism, in PCOS and other androgen excess disorders. The mFG visual scale, a modification of the method originally proposed by Ferriman and Gallwey, is the most commonly used method for clinically assessing hirsutism by evaluating terminal hair growth on the face and body in a male-like pattern.8 The current International PCOS Guidelines suggest a mFG score of 4-6 to diagnose hirsutism, whilst emphasizing the importance of considering ethnicity in the assessment. Here we addressed the controversy of mFG cutoff scores in hirsutism diagnosis in the first IPD study, in medically unbiased, unselected community-based populations of women aged 18-45 years defining cutoffs and showing variation in cutoffs and prevalence across ethnicities from 8 countries on 4 continents.
Using cluster analysis (k-means) we observed significant variation in mFG cutoff scores across different ethnicities ranging from 4 to 8 with White (Caucasian) Iranians the highest at 8, corresponding to the 82nd centile. White (Caucasian) Italians and Black Africans had an mFG score cutoff of 7 with corresponding centiles of 78th and 94th, respectively. Asian Han Chinese, White (Caucasian) Russian, Turkish, and Black Americans shared a mFG cutoff score of 5, with corresponding percentiles of 90th, 94th, 89th, and 91st respectively. Asian Koreans, Asian (Buryat) Russians, and White (Caucasian) Americans shared a mFG cutoff score of 4, with corresponding percentiles of 87th, 94th, and 91st, respectively. This novel approach groups participants naturally with those in the same cluster being very similar to each other and dissimilar to the other cluster to define cutoffs.29 A few prior studies have assessed mFG scores by healthcare professionals in unselected premenopausal women, suggesting an mFG cutoff of >3 in White and Black women,55 ≥5 in Asian Han Chinese,32 and ≥1 in Caucasian adolescent populations.31 These cluster analysis studies were conducted in single centers and required validation in large multi-ethnic cohorts. Our study did just this by defining hirsutism using cluster analysis on IPD mFG score data gathered from 8 countries worldwide.
In addition to cluster analysis, other approaches to establishing cutoffs for defining a condition such as hirsutism, include associations with clinical outcomes. This occurs in diabetes with cutoff glucose levels, aligning with vascular outcomes. Alternatively, completely arbitrary percentiles can be set as is the case in mFG scores for hirsutism, eg, the 90th, 95th, or 98th percentiles. This latter approach has been used to date to define hirsutism and is unlikely to represent the true natural cutoff for excess male-like terminal face and body hair growth56 as it assumes without fundament, that the population prevalence of hirsutism is 10%, 5%, or 2.5%, respectively. The current study showed that percentiles that equate to natural cluster cutoffs do not align with conventional arbitrary 95th percentiles, yet these have been promulgated in generating mFG cutoff scores including ≥6 in Koreans,42 ≥ 8 in Black and White Americans,55 Spanish Caucasians,57 and Hispanic Mexicans,58 and ≥9 for Turkish women.59
Indeed, Ferriman himself noted the error of defining hirsutism based on arbitrary percentiles. In this original study, Ferriman and his resident Gallwey reported that among a selected population of 161 consecutive women aged 18-38 years (and presumably mostly Anglo-Saxon white), attending a general medical outpatient clinic “[a]n ‘hormonal’ score above 5 was found in 9.9% (16 women), and above 7 in 4.3% (7 women), but scores above 10 were found in only 1.2% (2 women)”.8 Nonetheless, Ferriman went on to define women as “hirsute when body hair scores exceeded 4 and non-hirsute when less”.27 Unfortunately, subsequent observers did not fully grasp the original Ferriman and Gallwey study and incorrectly promulgated that an mFG score of ≥8, a value reflecting the 95th percentile of Ferriman and Gallwey's original population, defined hirsutism, a historical error that this study highlights and hopefully can dispel. Overall, this IPD study significantly advances knowledge on hirsutism cutoffs using clustering to define natural cut points in the data in unselected, community-based populations.
Using medically biased populations has significantly influenced the PCOS phenotypes observed and reported, by potentially introducing referral bias.60,61 To address these limitations, the Androgen Excess & PCOS Society recommended that researchers determine the “natural’ cutoff values in large, unselected populations using cluster analysis.29 Our study addressed this gap using patient-level data from medically unbiased unselected populations with diverse ethnicities across 8 countries. This approach avoided the referral bias often seen in studies of women with PCOS and other forms of androgen excess, allowing the identification of the “natural’ cutoff values for the mFG score, likely leading to a more accurate diagnosis of hirsutism. Future studies should extend this work across broader world regions, which is crucial to further substantiate and generalize our findings, ultimately identifying natural cutoff values of mFG scores in large, unselected populations.
This study has several strengths. First, it leverages one of the largest participant pools assembled for the study of hirsutism and androgen excess and uses IPD to optimize consistent variable definitions across studies, minimizing heterogeneity and enhancing the comparability of results. Second, by encompassing diverse ethnicities around the world, it directly addresses the critical issue of ethnic variations in mFG scores. Third, the inclusion of unselected medically unbiased population- or community-based cohorts minimizes selection and referral bias, a common pitfall of clinic-based studies. Fourth, the cutoff values are derived from cluster analysis, which approximates the “natural” cutoff in the populations. Finally, the observed high prevalence of predictors like PCOS, PCOM, ovulatory dysfunction, and hyperandrogenemia in hirsute populations aligns with established biological plausibility.
Limitations include that the mFG visual scale, while widely used, is inherently subjective and prone to inter and intra-observer variability.62 It is important to note that all mFG scores in this study were assessed by trained healthcare professionals, including nurses, midwives, general practitioners, gynecologists, and reproductive endocrinologists, using the standardized mFG visual scale.9 Furthermore, we mitigated potential subjectivity-related heterogeneity via appropriate statistical methods.
Our findings demonstrate that there is variation in the mFG score distribution and cutoff defined by k-means clustering between ethnicities, emphasizing the need to consider ethnicity in the diagnosis of hirsutism in women. We also recognize that defining ethnicities is challenging, especially in diverse and multi-ethnic populations in areas of high immigration and complex mixed ancestries.63 There is no gold standard for measuring ethnicity, however, self-reported ethnicity data can be used as was the case here.64,65
We relied on primary authors’ definitions for biochemical hyperandrogenemia, with most studies using less specific immunoassay methods, except for 2 that utilized liquid chromatography-tandem mass spectrometry methods (LC-MS/MS) techniques,40,41 all others relied on immunoassay methods which are less specific and sensitive. Seven studies assessed all participants,36,38-41,43,45 3 studies37,42,44 assessed selective subgroups, and the remaining one study collected blood samples from a random half of the population.32 Definitions of hyperandrogenemia varied, with studies relying on first-line tests (total testosterone [TT], free testosterone [FT], or free androgen index [FAI]), or second-line tests (androstenedione [A4] and dehydroepiandrosterone sulfate [DHEAS]). The absence of comprehensive androgen data for the entire population across studies, along with varying assay methods and cutoffs, potentially biasing prevalence estimates across ethnicities. We found significantly higher androgen levels in the hirsute population.
In conclusion, this is the first IPD study to explore the normative mFG score cutoff for defining hirsutism in a large medically unbiased unselected multi-ethnic population of reproductive-age women across the world using cluster analysis. It represents a significant paradigm shift in our understanding of hirsutism, androgen excess, and PCOS diagnosis. Notably, women classified as hirsute based on these cluster-derived cutoffs also exhibited a higher prevalence of PCOS, PCOM, ovulatory dysfunction, and hyperandrogenemia. The finding also suggests a nuanced picture, with prevalence and cutoffs varying by ethnicity, suggesting that clinicians and researchers consider the unique ethnic context for optimal diagnostic accuracy and personalized interventions. These data also largely confirm the current recommendations of the 2023 International PCOS Guidelines on defining hirsutism as an mFG score of between 4 and 6 for the majority of populations studied with a few exceptions.
Acknowledgments
We extend our gratitude to the participants of the original studies, without whom this research would not have been possible.
Supplementary material
Supplementary material is available in European Journal of Endocrinology online.
Funding
This study was funded by the National Health and Medical Research Council Australia (NHMRC) via the Centre for Research Excellence for Women Health in Reproductive Life (CRE-WHiRL) #1171592. A.D.B. is supported by a PhD scholarship funded by the Monash Graduate Scholarship (MGS) and the Monash International Tuition Scholarship (MITS). A.E.J. is funded by NHMRC CRE-WHiRL fellowship. C.T.T. is supported by CRE-WHiRL funded by the NHMRC. C.M.M. received grants from the University of Lagos, Central Research Grant CRC/03/2018, Ferring Pharmaceuticals Grant #2021/2944, and the Foundation for Research and Education Excellence. H.J.T. is funded by an NHMRC Fellowship #2009326.
Authors’ contributions
Asmamaw Bizuneh (Conceptualization [lead], Data curation [lead], Formal analysis [lead], Methodology [lead], Software [lead], Validation [lead], Visualization [lead], Writing—original draft [lead], Writing—review & editing [lead]), Anju Joham (Conceptualization [equal], Formal analysis [equal], Methodology [equal], Project administration [equal], Supervision [equal], Writing—original draft [equal], Writing—review & editing [equal]), Chau Thien Tay (Methodology [equal], Supervision [equal], Writing—original draft [equal], Writing—review & editing [equal]), Sylvia Kiconco (Conceptualization [equal], Resources [equal]), Arul Earnest (Methodology [equal], Supervision [equal], Writing—review & editing [equal]), Raja Dhungana (Methodology [equal]), Larisa V. Suturina (Writing—review & editing [equal]), Xiaomiao Zhao (Writing—review & editing [equal]), ALESSANDRA GAMBINERI (Writing—review & editing [equal]), Fahimeh Ramezani Tehrani (Writing—review & editing [equal]), Bulent O Y Yildiz (Writing—review & editing [equal]), Jin-Ju Kim (Writing—review & editing [equal]), Liangzhi Xu (Writing—review & editing [equal]), Christian Makwe (Writing—review & editing [equal]), Helena Teede (Conceptualization [equal], Methodology [equal], Project administration [equal], Writing—original draft [equal], Writing—review & editing [equal]), and Ricardo Azziz (Conceptualization [equal], Methodology [equal], Writing—original draft [equal], Writing—review & editing [equal])
Data availability
Individual participant data from each contributing center have been provided to the P-PUP collaboration with the understanding that they would be used solely for the purpose of the P-PUP study and would not be released to others. Request for such data should be directed to the data custodians of each respective study. For the P-PUP study protocol, see https://pubmed.ncbi.nlm.nih.gov/34829300/.
References
Author notes
Helena J Teede and Ricardo Azziz Joint senior authors.
Conflict of interest: AEJ serves as a Board member for Androgen Excess and Polycystic Ovary Syndrome, received honoraria for presentations at educational events at NovoBordisk, and is a receipt of Freestyle Libre sensors. CTT serves as Chair of CRE-WHiRL ECR Group 2020-2023 and AEPCOS EC-SIG group 2020-2022, as a Committee member of Endocrine Society of Australia's ECR Group and is an employee of Monash Health. AG serves on the Editorial Board for the European Journal of Endocrinology and on the Advisory Board of the Italian Society of Endocrinology. RA serves as a consultant to May Health, Core Access Surgical Technologies, Spruce Biosciences, and Postera; is an investor in Martin Imaging; received honoraria for speaking from the Davidson-Mestman course, Merck and Stya Paul Oration; and serves on the Editorial Board for the Journal of Clinical Endocrinology and Metabolism, on the Board of Trustees for the Endocrine Society, as a member of the DSMB for grant NCT03625531; ChiCTR180001730 to The First Affiliated of Guangzhou Medical University and grant for the SUPER study to the University of Michigan, and previously served asa CEO of the American Society for Reproductive Medicine. All the other authors declare that they have no conflicts of interest.