Abstract

This perspective article offers a meditation on FST and other quantities developed by Sewall Wright to describe the population structure, defined as any departure from reproduction through random union of gametes. Concepts related to the F-statistics draw from studies of the partitioning of variation, identity coefficients, and diversity measures. Relationships between the first two approaches have recently been clarified and unified. This essay addresses the third pillar of the discussion: Nei’s GST and related measures. A hierarchy of probabilities of identity-by-state provides a description of the relationships among levels of a structured population with respect to genetic diversity. Explicit expressions for the identity-by-state probabilities are determined for models of structured populations undergoing regular inbreeding and recurrent mutation. Levels of genetic diversity within and between subpopulations reflect mutation as well as migration. Accordingly, indices of the population structure are inherently locus-specific, contrary to the intentions of Wright. Some implications of this locus-specificity are explored.

Introduction

Among the fundamental descriptors of genetic variation in structured populations are Wright’s hierarchical F-statistics. Here, population structure corresponds to any departure from random union of gametes, including regular inbreeding and geographical subdivision of the gamete pool. Notably, FST appears throughout the evolutionary literature (see accompanying virtual issue). As one might expect for such a widely used quantity, FST serves a broad array of uses, including as a measure of distance between groups.

Sewall Wright (1921, 1923) introduced the F-statistics at the very inception of the field of population genetics and worked to clarify their meaning through subsequent decades (e.g. Wright 1943, 1951, 1965, 1977). This arena of evolutionary theory featured a collision of multiple conceptual approaches, primary among them the partitioning of variance, probabilities of identity by descent (IBD), and diversity. A few quotes may serve to convey a flavor of the discussion.

F was … proposed as an inbreeding coefficient giving “the departure from the amount of homozygosis under random mating toward complete homozygosis” … It has been used since as a measure of such departure relative to a specified foundation stock, not necessarily random bred. (Wright 1951, p. 325)

FST or Θ was introduced as the correlation between uniting gametes relative to those across all subdivisions … The concept of “relative to” is not an easy one, and it has made the study of population structure difficult. (Weir 2012, p. 638)

… gene diversity is defined by using the gene frequencies at the present generation, so that no assumption is required about the pedigrees of individuals, selection, and migration in the past. (Nei 1977, p. 225)

Weir and Goudet (2017) have provided a masterful unification of the variance partitioning and IBD approaches, clarifying ideas and approaches and summarizing the massive literature devoted to the theoretical aspects alone. Addressed here is the third pillar of thought on the definition and interpretation of FST: the approach through genetic diversity, especially indices associated with Nei’s GST (Nei 1973, 1977).

Wright developed the F-statistics within the context of pedigrees extending few generations into the past relative to the age of segregating genetic variants. In this context, genes belonging to the same allelic class can be assumed to be identical by descent (IBD), meaning derived through an unbroken series of Watson–Crick replication events from a single gene held by an ancestor at the head of the pedigree. Here, identity by state (IBS) describes genes that belong to the same allelic class, with the definition of an allelic class highly context-dependent (see note on terminology at the end of this section).

Upon moving from his position at the United States Department of Agriculture as Senior Animal Husbandman to take a faculty position at the University of Chicago, Wright generalized pedigree-based approaches and concepts to encompass evolutionary biology (Crow 1994). Across evolutionary timescales, the observation of IBS between genes does not necessarily imply IBD, although as a practical matter, inferences about IBD have always derived from observations of IBS. Furthermore, depending on whether one’s notion of descent excludes mutation, the strong inference of IBD does not necessarily imply the observation of IBS.

This article begins with a review of an early study that provides some insight into the patterns that motivated Wright to develop the F-statistics and then rephrases Nei’s (1977) hierarchy of genetic diversity measures in terms of IBS probabilities. Explicit expressions for the IBS probabilities and GST*, Nei’s analog of FST, are determined for the island model with regular inbreeding and a two-deme model with asymmetric migration and coalescence rates. Beyond the substitution of probabilities of IBD by probabilities of IBS, the shift in perspective compels a consideration of the origin of observed genetic variation as well as its pattern.

An explicit statement regarding terminology may facilitate communication among the various components of the large literature in this area. Here, θ denotes the scaled mutation rate fundamental to population genetics:

while Θ denotes IBD probabilities or associations among genes (see Cockerham 1967, 1973). Furthermore, gene here refers to a set of nucleotides passed from parent to offspring, while allele is a shortened form of allelomorph, in the sense of Bateson and Saunders (1902). Alleles or allelic classes represent flavors of genes that might segregate in a population at a given location (locus) in the genome. For example, Mendel attributed the relative proportions of round and wrinkled peas observed in controlled crosses to segregation of the dominant R allele and recessive r allele. Allelic classes for major histocompatibility loci may reflect functional differences, with higher fitness associated with greater heterozygosity. For a single nucleotide site regarded as a locus, each kind of nucleotide segregating at the site in the population may be defined as an allelic class.

Population Structure in the Island Model

Wright (1943, pp. 124–126) characterized hierarchical measures of differentiation in a population subdivided into a number of “intermediate” groups, each of which comprise a number of breeding groups undergoing random mating. Using the expected variance in allele frequency as a basis for comparison, he expressed the relative excess diversity among demes in the form

(1)

(Wright 1943, equation (43)), for Fi and Ft measures of genetic association at the levels of the intermediate groups and the total population, respectively.

A description of the context in which Wright developed the hierarchical F-statistics may serve to clarify their meaning (see Chapter 16 of Wright 1978). Wright’s analysis of the pedigrees of British Shorthorn cattle registered in the Coates herd book illustrates the conceptual foundation and provides an example of a hierarchy in gene diversity.

Case Study

Wright (1923) studied the history of key lines of British Shorthorn cattle developed by brothers Charles and Robert Colling, who were among the earliest adopters of selective breeding techniques. Individual bovines bred by the Collings became widely admired celebrities, including Favourite, considered the greatest stud of its day, and its son Comet, the first bull to break the milestone of the 1,000 guinea selling price. Thomas Bates, another influential breeder, established a direct maternal line from a cow sired by Comet. Wright noted that the relatively low fertility of this highly prized Duchess line only increased market prices.

Figure 1 illustrates Wright’s application of the hierarchical F-statistics to the complex history of intense inbreeding in the British Shorthorns over the first 40 years of the Coates herd book (Wright 1923; McPhee and Wright 1925; Wright 1951). By convention in pedigree studies, members of the base population of 1780 are considered non-inbred and unrelated to one another, with FIT corresponding to the correlation between uniting gametes accumulated over the subsequent generations. For the celebrated stud Favourite, Wright (1923) computed a value of FIT of 0.192: it was nearly as inbred as the product of a parent-offspring mating (1/22). Thomas Bates’s Duchess line, descended from Favourite, shows even higher levels of inbreeding, with declines reflecting deliberate outcrossing.

Hierarchical F-statistics for British Shorthorn pedigrees (Wright 1951). Reprinted with permission.
Fig. 1.

Hierarchical F-statistics for British Shorthorn pedigrees (Wright 1951). Reprinted with permission.

The line labeled FIS appears to correspond to

(2)

in which FIT now represents the average correlation between uniting gametes of registered Shorthorn cattle and Gw the correlation between a pair of gametes randomly sampled from across the breed. Periods with positive values of FIS reflect the formation of offspring through the fusion of gametes more highly correlated than random gametes (FIT>Gw), and negative values indicate gametes less correlated than random (Gw>FIT).

Of particular interest is FST, represented by the dashed line in Fig. 1. Wright (1943, 1951, 1965, 1977) has described FST as the correlation between uniting gametes for hypothetical offspring that would be produced by the institution of a single generation of random mating. Among those offspring, FIS=0 and FIT=Gw (with Gw=Gw), which implies

(3)

This expression together with (2) implies the iconic partitioning

(4)

(e.g. Wright 1943, after his Equation (48)).

Note that this notion of FST is distinct from Wright’s observation regarding the relationship between FST and the Wahlund (1928) variance among allele frequencies. Wright (1969, p. 295) explains that in cases in which reproduction proceeds by random union of gametes in all demes and the demes are completely isolated from one another (absence of gene flow), FIT for the entire assemblage of non-communicating demes corresponds to the Wahlund variance. As FIS=0 within demes, the partitioning (4) implies that FST=FIT corresponds to the Wahlund variance as well. Nei (1965) provides an explicit demonstration. This scenario does not apply to the British Shorthorns, for which the population comprises interlocking webs of reproduction rather than discrete, isolated demes.

Hierarchical Structure

An enduring source of confusion surrounding the hierarchical F-statistics concerns which components are probabilities and which are correlations, whether they are scaled relative to an underlying demographic structure, and how that structure is defined. An explicit description of a hierarchy similar to that discussed by Nei (1977) may serve to clarify the discussion.

Hierarchy of IBS Probabilities

Let hi denote the probability of observing non-identity by state (non-IBS) between a pair of genes randomly sampled from level i of the hierarchy, without regard to the structure at any lower level. For level i, a measure of association analogous to the G-statistics of Nei (1977) corresponds to

(5)

At every level, the measure of association represents the probability of non-identity relative to the probability of non-identity on the next lower level. Rearrangement in the form of (1),

emphasizes the analogy between the Gi* and Wright’s (1951) panmictic indices (complements of the F-statistics).

Extension across an arbitrary number (L) of levels of organization produces

(6)

Identifying the analog of (1FIT) with this product implies

(7)

These expressions mirror those given by Nei (1977, see his (22)), but with non-IBS probabilities replacing average indices of gene diversity (H¯).

Wright’s Island Model

A fundamental starting point for the study of population structure, Wright’s island model postulates d demes, each with N diploid reproductives. Let f represent the probability of observing IBS between the pair of gametes that united to form a random individual. Similarly, gw denotes the probability of IBS between genes randomly sampled from distinct individuals residing in the same deme and gb the IBS probability between genes randomly sampled from distinct demes. While the notation evokes the Malécot coefficients (Malécot 1969; Jacquard 1974), f, gw, and gb here refer to IBS in an evolving population and not to IBD in a pedigree.

The base level of the hierarchy refers to a pair of gametes that united to form a random zygote. Those genes represent distinct allelic classes with probability

At the next level (demes comprising N reproductive zygotes), h1 refers to a gene pair sampled from a given deme without regard to the structuring at the base level (genes organized into zygotes):

At this level of the hierarchy (5),

(8)

To large terms (>O(1/N)), G1* represents the level of IBS within individuals relative to their local deme (Wright 1951). It is the analog of FIS (2).

Proceeding to the next level, the non-IBS probability between a pair of genes sampled from the metapopulation without regard to structuring at lower levels (demes or zygotes) corresponds to

where 1/d is the probability that the gene pair derive from a single deme. At this level of the hierarchy (5),

(9)

This last expression is the analog of Nei’s reformulation of FST, later called Nei’s GST (equation (18) in Nei 1977).

Weir and Goudet (2017) explicitly acknowledge that their allele-based FST (βWT) departs from Wright’s formulation in its comparison of within-deme association to between-deme association (analogous to replacing h2 by (1gb)). They indicate that their formulation is simpler in that relative deme sizes need not be specified. Under the island model, which assumes uniform deme size, G2* is analogous to βWT in populations comprising many demes (d).

In general, a hierarchy might well include structures beyond those considered to this point. An analysis of the British Shorthorns, for example, might address relationships to other bovine breeds, other ungulates, or other mammals. The levels of organization of human variation in the analysis of molecular variance of Rosenberg et al. (2002) provide another example. Restricting consideration to level L=2, one might regard the product (6) as the analog of (1FIT):

(10)

Under the identification of G1* with FIS and G2* with FST, this expression recovers the analog of Wright’s iconic partitioning (4):

Relative Scaling

In the case study of the Shorthorn cattle (Fig. 1), Wright describes FIT as the correlation between uniting gametes relative to the base population of 1780. Here, each Gi* measure (5) is explicitly defined relative to lower levels within the same population. In the extensive literature addressing the F-statistics, the nature of scaling or even whether all components are scaled has not always been clear.

Cockerham (1967, 1969, 1973) addressed relationships among the F-statistics, the partitioning of correlations, and probabilities of IBD:

I found in the treatment of a single hierarchy of subpopulations … that it was only necessary to define two correlations—simply, that between genes within individuals, F, and that between genes of different individuals in the same population, Θ¯. Another correlation, entirely a function of these two, is that between genes within individuals within subpopulations

These three correlations manipulate just as the F-statistics, and obviously F=FIT, f=FIS and Θ¯=FST for all intents and purposes. Another distinction is that F and Θ¯ have not been defined relative to some standard, although they become so in practice and FIT and FST are defined relative to a total. (Cockerham 1973, p. 681)

Chesser (1991, p. 439) indicates that Cockerham’s (1969, 1973) identification of FIT with f tacitly assumes the absence of IBS among genes at the highest level of the hierarchy. In the hierarchical analysis described here (7),

only if genes at level L are constrained to be non-IBS (hL=1). In general, Equation (6) suggests that at all levels, the analogs of Nei’s G-statistics correspond to ratios of non-IBS probabilities, without constraints on their values.

To confirm Wright’s (1977) description of FST as the correlation between uniting gametes in hypothetical offspring produced by a single generation of random mating, I determine the components of the partitioning described in (10) for those offspring. Wright (1921, p. 119), as well as Malécot (1969) and Jacquard (1974), noted that random union of gametes implies that uniting gametes correspond to gametes randomly sampled from the parental gamete pool:

for the tilde indicating values for the hypothetical offspring. Substitution of this expression into (8) produces

and substitution into (10):

which reflects the assumption that changes of g~w and g~b from their counterparts a single generation earlier are O(1/N). These expressions confirm that the overall relationship among the hypothetical offspring corresponds to GST* before the production of those offspring:

IBS Probabilities in Subdivided Populations

This section presents explicit expressions for probabilities of IBS between pairs of genes within individuals (f), between individuals within demes (gw), and between demes (gb) under Wright’s (1943) island model with regular inbreeding and also a two-deme model shown by Weir and Goudet (2017) to yield negative indices of population structure.

Island Model with Regular Inbreeding

Under Wright’s island model with d demes, each with N diploid reproductives, a gene in a given deme descends in the immediately preceding generation from a gene in any other deme at backward migration rate m. All lineages undergo mutation at rate u per generation.

To facilitate the exploration of the IBS-based approach and the hierarchy of population structure it implies (section “Hierarchical Structure”), the model analyzed here also includes regular inbreeding (partial self-fertilization) and the K-allele Jukes–Cantor (1969) model of mutation as incorporated into Takahata’s (1983) analysis of Nei’s GST. Under the K-allele model, a mutational event in a lineage changes its allelic class to any of K1 alternative allelic classes with uniform probability. It provides a simple mechanism for generating IBS apart from IBD.

Reproduction by an individual begins with the production of an egg cell, which is fertilized either by another of its own gametes (with probability s) or by a gamete sampled from the gamete pool of the deme in which it resides (with the complement probability). Under the assumption that gamete production is not limiting, self-fertilization by an individual has no effect on its contribution to the local gamete pool. Accordingly, a pair of genes randomly sampled from distinct individuals residing in the same deme derive from the same reproductive in the parental generation with probability 1/N, for N the number of reproductive individuals, irrespective of the rate of selfing. The probabilities that a random offspring is uniparental (uniting gametes contributed by a single individual) or biparental (uniting gametes contributed by distinct individuals) correspond, respectively, to

(11)

Per-generation rates of mutation (u) and backward migration (m) are assumed to be of order 1/N, with selfing and outcrossing occurring on the much shorter timescale of generations:

(12)

Coalescence between a random pair of autosomal genes held by distinct individuals residing in the same deme requires that their ancestral lineages trace back to the same reproductive in the immediately preceding generation. Given parent-sharing, the lineages either derive from the same complement in the shared parent, with probability c, or descend from distinct complements, with probability (1c). Even though

these events are distinguished for clarity. Figure 2 depicts the ancestry of complements held by a single individual (filled dots), with open dots representing homologs held by ancestors. With probability (1σ), the shared parent of the two lineages is biparental, with the lineages descending from distinct individuals in the preceding generation (outcross). Otherwise, with probability σ, the shared parent is uniparental, with the lineages once again either coalescing (probability c) or remaining distinct (probability 1c) in the immediately preceding generation. Of the two absorbing states (outcross and coalescence), this process resolves to outcross with probability

(13)

and to coalescence with the complement probability. One of the many definitions of effective number (Ne) addresses the rate of coalescence (Ewens 1982; Crow and Denniston 1988). Using that a pair of genes sampled from distinct individuals residing in the same deme share a parent with probability 1/N, the rate of coalescence corresponds to

(14a)

This expression is consistent with Pollak’s (1987, p. 354) definition of effective number under partial selfing in populations without subdivision. The relative rate of coalescence corresponds to

(14b)

This expression indicates that regular inbreeding (s>0) increases the rate of coalescence (Δ>1) by reducing effective number (N>Ne).

Lineages of complements (filled dots) borne by a single individual. Open dots depict homologs not known to be ancestors of the focal lineages.
Fig. 2.

Lineages of complements (filled dots) borne by a single individual. Open dots depict homologs not known to be ancestors of the focal lineages.

In the model under consideration (12), resolution of the process depicted in Fig. 2 occurs virtually instantaneously relative to mutation and migration. In the absence of those events, lineages that trace back to a shared parent are not IBS only if they resolve to separation in distinct individuals (13) and the ancestral lineages are not IBS:

(15)

While gw here actually represents the IBS probability at least one generation prior to the point in time associated with f, we treat the probabilities as contemporary because changes in gw during the short interval required for resolution are of order 1/N. Expression (15) provides the component of the hierarchy identified with GIS (8):

(16)

This classical expression for FIS has been derived on numerous occasions, from models with mutation (Pollak 1987) and without mutation (Haldane 1924; Li 1976, Chapter 13).

While the process portrayed in Fig. 2 resolves over the course of a few generations, IBS probabilities between genes sampled from the same deme (gw) or from distinct demes (gb) evolve on the much longer timescale determined by mutation, migration, and coalescence (12).  Appendix A describes the derivation of the steady-state values of these quantities. Substitution of those expressions (A.2) into (9) produces

(17)

in which M denotes the scaled rate of backward migration and θ the scaled rate of mutation (A.1). Here, the relative rate of coalescence Δ (14b) captures the entire effect of regular inbreeding: the reduction in effective number within demes from N to N(1s/2). Furthermore, Equation (17) indicates that GST* declines uniformly with an increases in rates of migration (M) and mutation (θ). This trend mirrors previous findings that migration and mutation both tend to reduce identity within demes (Spieth 1974; Slatkin 1982; Strobeck 1987).

Under the assumptions of random mating within demes (Δ=1), large deme number (d), the infinite-allele model of mutation (K), and the absence of mutation (θ=0), GST* (17) reduces to a very well-known expression:

(Wright 1951). Although the ultimate fate of the metapopulation is fixation on one of the alleles under any positive rate of migration among demes (M>0), Wright’s analyses did not incorporate mutation. In contrast, Takahata (1983) derived the expectation of Nei’s GST using the stationary distribution of allele frequencies obtained from a diffusion approximation that explicitly incorporates mutation. His results, which are consistent with (17), indicate that the expectation of Nei’s GST depends on the scaled rate of mutation (θ).

Negative Indices of Population Structure

Weir and Goudet (2017) directly address the role of mutation in a two-deme model, allowing unequal migration rates and deme sizes. Their numerical analysis of the underlying recursions governing evolutionary change identifies parameter combinations that yield negative deme-specific FST. As noted earlier, Wright’s observation that FST reduces to the (non-negative) Wahlund variance applies to a metapopulation comprising demes among which no migration occurs (Wright 1969, p. 295).

Uyenoyama et al. (2019) obtained analytical expressions for steady-state IBS probabilities in a model similar to Weir and Goudet’s.  Appendix B describes the approach and provides the IBS probabilities (B.1) for a pair of genes, both sampled from deme 0 (gw,0), both from deme 1 (gw,1), and one from each deme (gb). Those expressions confirm that all IBS measures evolve to unity in the absence of mutation (θ=0).

For Ni is the number of reproductive individuals in deme i (i=0,1), N denotes the average number of reproductives within demes,

and r denotes the proportion of reproductives that reside in deme 0:

(18)

For effective number Ne defined as the rate of coalescence of a pair of lineages sampled from the same deme (compare (14a)):

irrespective of the relative sizes of the demes. In fact, this property holds for any number of demes ( Appendix B). As in the definition of θ and M in the island model (A.1), the Mi correspond to the limits of the product of backward migration rates (mi) and N:

(19)

The analysis presented here restricts consideration to rates of mutation and migration of the order of the inverse of the average deme size (12). In contrast, Fig. 2 of Weir and Goudet (2017) appears to indicate that much of the parameter space associated with negative deme-specific FST includes migration rates of greater order. In addition, note that the demes are labeled 0 and 1 in Uyenoyama et al. (2019) and 1 and 2 in Weir and Goudet (2017).

In terms of the notation used here, Weir and Goudet’s (2017) measure of population structure specific to deme 0 corresponds to

(20a)

for gw,0 is the IBS probability between a pair of genes sampled from deme 0 and gb is the IBS probability between a gene sampled from deme 0 and a gene sampled from deme 1. An analogous deme-specific measure based on GST* (9) might be defined as

(20b)

These expressions imply

While these measures agree only in the absence of population structure (gw,0=gb), both assume negative values only if the IBS probability between genes sampled from different demes exceeds that between genes sampled from deme 0 (gb>gw,0). In all cases in which the measures differ, the absolute value of βWT0 exceeds the absolute value of GST,0*, irrespective of sign.

From (B.1), negativity of the deme-specific indices of population structure (20) requires that r (18), the proportion of reproductives that reside in deme 0, exceed

(21)

This qualitative behavior appears consistent with the trends depicted in Fig. 2 of Weir and Goudet (2017), in which most parameter combinations with negative FST include larger sizes for the specified deme (N1>N2). Negative indices can also arise for values of N1 smaller than N2, provided that backward migration rates in deme 2 are sufficiently large (m2>0.15). For N2=1,000, backward migration of this magnitude would suggest scaled migration rates (our Mi) in excess of 150, values which may be inconsistent with our assumption of rates of migration, mutation, and coalescence of similar orders of magnitude (12).

By analogy to soft selection (Wallace 1975), one might consider soft migration, under which migrant number (Mi) is relatively unconstrained by local deme size. In this case, rmin (21), the minimum relative size of deme 0 that implies non-positive GST,0*, declines uniformly with increasing M0. This trend signifies that the conditions for greater between-deme than within-deme similarity (gb>gw,0) become less stringent as the number of newly migrated genes increases. This behavior is qualitatively consistent with the negative slope of the red curve in Fig. 2 of Weir and Goudet (2017). That large local deme size and high numbers of migrants reduce within-deme identity (gw) relative to between-deme identity (gb) is consistent with previous studies of the island model (e.g. Slatkin 1982; Strobeck 1987).

Alternatively, hard migration might require equality between the numbers of new migrant genes in the two demes:

(22)

Imposition of this constraint on (21) indicates that both analogs of deme-specific FST (20) never take negative values (r>rmin is never satisfied).

Discussion

This analysis addresses the effect of the nature of the mutation process on Nei’s GST and related measures (6), providing expressions for the probability that a pair of genes share their allelic class. The shift in perspective from IBD to identity-by-state (IBS) obviates the need to specify an ancestral base population and to account for subsequent changes in allele frequency or other characteristics (Nei 1977). It also demands explicit characterization of the mutation process.

A rephrasing in terms of IBS probabilities of Nei’s (1977) hierarchy of diversity measures offers a resolution to the long-standing confusion surrounding the meaning of FST as a “relative” measure, as noted by Weir (2012) in the Introduction. The measure of association (5) between a pair of genes sampled at any level of the hierarchy corresponds to the non-IBS probability relative to the non-IBS probability at the level immediately below it.

Locus-Specific Effects on Population Structure

Wright intended that FST (denoted as F in the following quote) primarily reflect genome-wide effects of population structure:

It should be noted that if the coefficient F is used for the purpose for which it was originally introduced, the description of population structure, it cannot take cognizance of rates of mutation or selection since these are specific for each locus. (Wright 1952, p. 312)

Even so, the level of diversity observed segregating at a locus in natural populations generally reflects the rate of mutation, a quantity that is often locus-specific. Weir (2012) has made the important point that Wright’s FST is not a statistic but rather a parameter of the population. From this perspective, ensuring that patterns of variation at multiple loci contribute to estimates of the same quantity may require accounting for the mutation process.

Some authors have found the dependence of FST on the spectrum of allele frequencies in a sample of genes problematic (e.g. Jost 2008). One way to address this concern entails adopting an index of population structure that is less sensitive to allele frequencies. Slatkin’s FST (1991, distinguished here by an asterisk) is defined not in terms of allele frequencies, but rather the expected ages of the most recent common ancestor (MRCA) of randomly sampled pairs of genes:

(23)

for Ts is a random variable representing the age of the MRCA of a pair sampled from the same deme and T is the age of the MRCA of a pair sampled from the population at large without regard to population structure. Inferences based on empirical data still of course require the observation of genetic variation, but estimates can be obtained in cases in which genetic distance between genes increases linearly with time since their separation. Slatkin (1991) noted that FST* converges to Nei’s GST as the rate of mutation vanishes (u0).

Another approach entails directly confronting the origin and nature of the genetic variation that serves as the basis for inferences about population structure. In his analysis of Nei’s GST, Takahata (1983) derived expectations of diversity at the stationary distribution of allele frequencies implied by the mechanism of mutation and population structure. While accounting for mutation is less common in the context of pedigree analysis, its importance as the ultimate source of genetic variation has of course long been acknowledged. Malécot’s (1969) expression for the IBD probability in unstructured populations,

agrees with the expression from the Ewens Sampling Formula (ESF; Ewens 1972) for samples of size 2. Significantly, Thompson (2008) invoked the ESF, which depends on the scaled mutation rate, to characterize the base population from which IBD probabilities among four genes are determined. Pollak (1987) explicitly included mutation in his analysis of the F-statistics under partial selfing.

Within the context of the hierarchy of IBS probabilities considered here (6), mutation as well as migration affects population structure by reducing identity within demes. Expressions for pairwise IBS probabilities at steady state under the island model (17) indicate lower levels of population structure in regions of the genome characterized by higher mutation rates (θ). This pattern is consistent with trends observed in empirical studies suggesting that high-diversity loci, including microsatellites, tend to yield lower values of FST than low-diversity loci, including biallelic single nucleotide polymorphisms (Jakobsson et al. 2013, and references therein). Furthermore, the nature of the process of mutation as well as its rate influences GST*. Under the simple K-allele mutation model, the effect of mutation increases as the number of possible allelic classes declines. For K=4, the magnitude of the term involving θ in GST* (17) increases by a factor of 4/3 over the infinite-alleles model.

FST as an Index of Distance

Among the broad array of empirical uses of FST, a prominent one is an index of genetic distance between populations. For example, it forms the basis of a striking trend supporting the serial founder hypothesis for the origins of humanity (Ramachandran et al. 2005).

Also defined in terms of genetic diversity, Nei’s (1972) distance is closely related to FST. In the context of the two-deme model discussed here and in Weir and Goudet (2017), the analog of this measure in terms of IBS probabilities corresponds to

(24)

in which gb denotes the probability of IBS between a gene sampled from deme 0 and a gene sampled from deme 1 and gw,i is the IBS probability between a pair of genes sampled from distinct individuals in deme i (i=0,1). Other authors (see Slatkin 1982 and references therein) have also proposed expressions of the form gb/gw,i as measures of correlation among demes induced by migration.

Although one might expect a measure of distance to satisfy the triangle inequality, Nei (1972) noted that his D does not necessarily have this property. Arbisser and Rosenberg (2020) showed that in the case of a biallelic SNP that occurs in distinct frequencies across the three subpopulations, FST always fails to satisfy the triangle inequality.

Population- and locus-specific measures of FST may provide a means of identifying populations or genomic regions with unusual selective or demographic histories (Weir et al. 2005). Weir and Goudet (2017) have noted a growing literature on model-specific Bayesian methods designed to detect outlier values of FST, perhaps indicative of selection at particular loci. In this case, accounting for the mutation process becomes all the more important as the relative effects of migration and mutation may well vary across loci.

In a two-deme model with mutation explicitly represented, Weir and Goudet (2017) determined conditions under which deme-specific measures (20) can assume negative values, even in the absence of selection. Negativity reflects higher probability of IBS in between-deme comparisons than within-deme comparisons (gb>gw,0). Examination of the analytical expressions for the IBS probabilities obtained by Uyenoyama et al. (2019) for a similar two-deme model ((21),  Appendix B) indicates that factors that tend to reduce gw,0 include high local deme sizes (large N0) and high numbers of newly arrived migrants (large M0). In addition, the nature of migration affects whether negative βWT0 and GST,0* can arise. Under conservative or hard migration, which requires equality between the numbers of new migrant genes in the two demes (22), the indices of population structure are always positive at steady state. Unlike deme-specific measures of FST (20), the analog of Nei’s distance (24) is always positive in this two-deme model.

Acknowledgments

I thank Molecular Biology and Evolution Editors-in-Chief Claudia Russo and Brandon Gaut for inviting this Perspective piece, and the reviewers for their comments. I am deeply indebted to Sudhir Kumar and all members of the Institute of Genomics and Evolutionary Medicine (iGEM) at Temple University for their support and gracious hospitality during the completion of this essay. I especially appreciate the opportunity to work in Masatoshi Nei’s former office. Public Health Service grant GM 37841 provided partial funding for this research.

Data availability

No new data were generated or analyzed in support of this research.

Appendix A. Wright’s Island Model with Inbreeding

Wright’s (1943) island model describes a population comprising d demes with identical characteristics. Hudson (1990) derived IBS probabilities for a pair of genes sampled from a single deme and from distinct demes. Presented here are modifications to incorporate partial selfing (Fig. 2) and the K-allele model of mutation.

As noted in section “Island Model with Regular Inbreeding”, the process depicted in Fig. 2 resolves on the timescale of generations, virtually instantaneously relative to the processes of mutation, migration, and coalescence. Accordingly, Equation (15) gives the relationship between h0=(1f) and h1=(1gw), and one need to determine only the non-IBS probabilities between genes sampled from a single deme (1gw) or from distinct demes (1gb) to extend Hudson’s analysis.

In the genealogical history of a pair of genes, mutation occurs in at least 1 lineage at rate

at least 1 lineage undergoes backward migration at rate

and coalescence occurs between a pair of lineages residing in the same deme at rate

where Δ is the relative rate of coalescence (14b). Assuming independence among the events and exponentially distributed waiting times, the probability that the most recent event is mutation, for example, corresponds to

Ignoring terms of O(u2) or smaller and taking the limits

(A.1)

reduces this expression to

Similarly, the probability that the most recent event corresponds to migration is

and to coalescence,

The probability that a pair of genes sampled from distinct individuals residing in the same deme are IBS corresponds to

in which the prime indicates the state of the lineages just prior to the most recent evolutionary event. Coalescence between the pair terminates the genealogy in the MRCA. For backward migration as the most recent event, the ancestral state corresponds to an IBS pair of genes in distinct demes, which occurs with probability gb. For mutation as the most recent event, the ancestral lineages must reside in the local deme. Those lineages are IBS with probability gw and non-IBS with probability (1gw). While a single mutation in an IBS pair cannot generate the descendant state, a non-IBS pair can become IBS through a mutation that transfers one gene into the allelic class of the other, which occurs with probability 1/(K1), where K is the number of allelic classes.

For a gene pair sampled from distinct demes, the most recent evolutionary event cannot correspond to coalescence, which can occur only between lineages residing in the same deme. The probability that a pair of genes sampled from distinct demes are IBS corresponds to

A backward migration event that transfers one gene of the descendant pair into the deme of the other gene implies an ancestor comprising IBS genes in the same deme, which occurs with probability gw. Any other origin of the migrating lineage preserves the residence of the lineages in distinct demes.

Solution of these expressions at steady state (gw=gw,gb=gb) produces

(A.2a)
(A.2b)

in which

It is simple to show that this equilibrium state is locally attracting.

That both non-IBS probabilities (A.2) are proportional to the mutation parameter θ reflects that diversity at steady state requires mutation. In subdivided populations (d>1) reproducing through random union of gametes (s=1/N), the expressions obtained from (A.2) for IBS probabilities gw and gb, respectively, reduce to Takahata’s (1983) expected gene identities within (J0) and between (J1) demes, provided that the rightmost vector in his Equation (7) is replaced by

The absence of the four in the lower element appears to be a typographical error in that publication. A labeled coalescent argument (Karlin and McGregor 1972; Uyenoyama et al. 2019) also yields the non-IBS expressions (A.2).

Appendix B. Two-Deme Model

Uyenoyama et al. (2019) addressed the analog of the ESF (Ewens 1972) in a two-deme population with random mating within demes. Their recursion determines the allele frequency spectrum under the infinite-alleles model of mutation inductively, for progressively larger sample sizes. For samples of size 2, their Equation (15) provides the IBS probabilities for a pair of genes sampled within a given deme or a pair sampled from different demes.

In this two-deme model, deme i (i=0,1) comprises Ni diploid reproductives, with

(18) the proportion of reproductives residing in deme 0. Deme i has backward migration rate mi. In both demes, mutation occurs at rate u at the autosomal locus from which the sample derives. Migration and mutation occur at rates of order 1/N, where N is the average number of reproductives:

For a d-deme model, with

we retain the definition of N as the unweighted arithmetic mean of the number of reproductives across demes. Effective number Ne, defined as the rate of coalescence of a pair of lineages sampled from the same deme, corresponds to

This expression implies that irrespective of the number of demes or their relative sizes, the effective number again corresponds to N, the unweighted average number of reproductives across demes:

Accordingly, the relative coalescence rate (14b) corresponds to

reflecting random mating within demes.

For the two-deme case (d=2), Equation (15) of Uyenoyama et al. (2019) provides the IBS probabilities for a pair of genes sampled from deme 0 (gw,0), from deme 1 (gw,1), and from both demes (gb):

(B.1)

for

with θ given in (A.1) and the Mi in (19). As expected, the entire population becomes monomorphic (gw,0=gb=gw,1=1) in the absence of mutation (θ=0).

References

Arbisser
 
IM
,
Rosenberg
 
NA
.
FST and the triangle inequality for biallelic markers
.
Theor Pop Biol
.
2020
:
133
:
117
129
. https://doi.org/10.1016/j.tpb.2019.05.003.

Bateson
 
W
,
Saunders
 
ER
.
The facts of heredity in the light of Mendel’s discovery
.
Rep Evol Commun R Soc
.
1902
:
1
:
125
160
.

Chesser
 
RK
.
Gene diversity and female philopatry
.
Genetics
.
1991
:
127
(
2
):
437
447
. https://doi.org/10.1093/genetics/127.2.437.

Cockerham
 
CC
.
Group inbreeding and coancestry
.
Genetics
.
1967
:
56
(
1
):
89
104
. https://doi.org/10.1093/genetics/56.1.89.

Cockerham
 
CC
.
Variance of gene frequencies
.
Evolution
.
1969
:
23
(
1
):
72
84
. https://doi.org/10.2307/2406485.

Cockerham
 
CC
.
Analysis of gene frequencies
.
Genetics
.
1973
:
74
(
4
):
679
700
. https://doi.org/10.1093/genetics/74.4.679.

Crow
 
JF
.
Sewall Wright 1889–1988. In: Biographical memoirs. Vol. 64. Washington (DC): The National Academies Press; 1994
. p. 439–469.

Crow
 
JF
,
Denniston
 
C
.
Inbreeding and variance effective population numbers
.
Evolution
.
1988
:
42
(
3
):
482
495
. https://doi.org/10.2307/2409033.

Ewens
 
WJ
.
The sampling theory of selectively neutral alleles
.
Theor Pop Biol
.
1972
:
3
(
1
):
87
112
. https://doi.org/10.1016/0040-5809(72)90035-4.

Ewens
 
WJ
.
On the concept of effective population size
.
Theor Pop Biol
.
1982
:
21
(
3
):
373
378
. https://doi.org/10.1016/0040-5809(82)90024-7.

Haldane
 
JBS
.
A mathematical theory of natural and artificial selection. Part II: the influence of partial self-fertilisation, inbreeding, assortative mating, and selective fertilization on the composition of Mendelian populations, and on natural selection
.
Biol Rev
.
1924
:
1
(
3
):
158
163
. https://doi.org/10.1111/j.1469-185X.1924.tb00546.x.

Hudson
 
RR
.
Gene genealogies and the coalescent process. In: Futuyma D, Antonovics J, editors. Oxford surveys in evolutionary biology. Vol. 7. New York: Oxford University Press; 1990. p. 1–44
.

Jacquard
 
A
.
The genetic structure of populations. English translation by D. Charlesworth and B. Charlesworth of Malécot G. 1970. Structures Génétiques des Populations. Paris: Masson & Cie. New York: Springer-Verlag; 1974
.

Jakobsson
 
M
,
Edge
 
MD
,
Rosenberg
 
NA
.
Genotype, haplotype and copy-number variation in worldwide human populations
.
Genetics
.
2013
:
193
(
2
):
515
528
. https://doi.org/10.1534/genetics.112.144758.

Jost
 
L
.
GST and its relatives do not measure differentiation
.
Mol Ecol
.
2008
:
17
(
18
):
4015
4026
. https://doi.org/10.1111/mec.2008.17.issue-18.

Jukes
 
TH
,
Cantor
 
CR
.
Evolution of protein molecules. In: Munro HN, editor. Mammilian protein metabolism. New York: Academic Press; 1969. p. 240–252
.

Karlin
 
S
,
McGregor
 
J
.
Addendum to a paper of W. Ewens
.
Theor Pop Biol
.
1972
:
3
(
1
):
113
116
. https://doi.org/10.1016/0040-5809(72)90036-6.

Li
 
CC
.
First course in population genetics
.
Pacific Grove (CA)
:
Boxwood Press
;
1976
.

Malécot
 
G
.
The mathematics of heredity. English translation by D. M. Yermanos of Malécot G. 1948. Les Mathématiques de l’Hérédité. Paris: Masson. San Francisco (CA): W. H. Freeman & Co.; 1969
.

McPhee
 
HC
,
Wright
 
S
.
Mendelian analysis of the pure breeds of livestock. III. The shorthorns
.
J Hered
.
1925
:
15
(
6
):
205
215
. https://doi.org/10.1093/oxfordjournals.jhered.a102593.

Nei
 
M
.
Variation and covariation of gene frequencies in subdivided populations
.
Evolution
.
1965
:
19
(
2
):
256
258
. https://doi.org/10.2307/2406379.

Nei
 
M
.
Genetic distance between populations
.
Am Nat
.
1972
:
106
(
949
):
283
292
. https://doi.org/10.1086/282771.

Nei
 
M
.
Analysis of gene diversity in subdivided populations
.
Proc Natl Acad Sci USA
.
1973
:
70
(
12
):
3321
3323
. https://doi.org/10.1073/pnas.70.12.3321.

Nei
 
M
.
F-statistics and analysis of gene diversity in subdivided populations
.
Ann Hum Genet
.
1977
:
41
(
2
):
225
233
. https://doi.org/10.1111/ahg.1977.41.issue-2.

Pollak
 
E
.
On the theory of partially inbreeding finite populations. I. Partial selfing
.
Genetics
.
1987
:
117
(
2
):
353
360
. https://doi.org/10.1093/genetics/117.2.353.

Ramachandran
 
S
,
Deshpande
 
O
,
Roseman
 
CC
,
Rosenberg
 
NA
,
Feldman
 
MW
,
Cavalli-Sforza
 
LL
.
Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa
.
Proc Natl Acad Sci USA
.
2005
:
102
(
44
):
15942
15947
. https://doi.org/10.1073/pnas.0507611102.

Rosenberg
 
NA
,
Pritchard
 
JK
,
Weber
 
JL
,
Cann
 
HM
,
Kidd
 
KK
,
Zhivotovsky
 
LA
,
Feldman
 
MW
.
Genetic structure of human populations
.
Science
.
2002
:
298
(
5602
):
2381
2385
. https://doi.org/10.1126/science.1078311.

Russo
 
CAM
,
Eyre-Walker
 
A
,
Katz
 
LA
,
Gaut
 
BS
.
Forty years of inferential methods in the journals of the society for molecular biology and evolution
.
Mol Biol Evol
.
2024
:
41
(
1
):
msad264
. https://doi.org/10.1093/molbev/msad264.

Slatkin
 
M
.
Testing neutrality in subdivided populations
.
Genetics
.
1982
:
100
(
3
):
533
645
. https://doi.org/10.1093/genetics/100.3.533.

Slatkin
 
M
.
Inbreeding coefficients and coalescence times
.
Genet Res
.
1991
:
58
(
2
):
167
175
. https://doi.org/10.1017/S0016672300029827.

Spieth
 
PT
.
Gene flow and genetic differentiation
.
Genetics
.
1974
:
76
(
3
):
961
965
. https://doi.org/10.1093/genetics/78.3.961.

Strobeck
 
C
.
Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision
.
Genetics
.
1987
:
117
(
1
):
149
155
. https://doi.org/10.1093/genetics/117.1.149.

Takahata
 
N
.
Gene identity and genetic differentiation of populations in the finite island model
.
Genetics
.
1983
:
104
(
3
):
497
512
. https://doi.org/10.1093/genetics/104.3.497.

Thompson
 
EA
.
The IBD process along four chromosomes
.
Theor Pop Biol
.
2008
:
73
(
3
):
369
373
. https://doi.org/10.1016/j.tpb.2007.11.011.

Uyenoyama
 
MK
,
Takebayashi
 
N
,
Kumagai
 
S
.
Inductive determination of allele frequency spectrum probabilities in structured populations
.
Theor Pop Biol
.
2019
:
129
:
148
159
. https://doi.org/10.1016/j.tpb.2018.10.004.

Wahlund
 
S
.
Zusammensetzung von population und korrelationserscheinung vom standpunkt der vererbungslehre aus betrachtet
.
Hereditas
.
1928
:
11
(
1
):
65
106
. https://doi.org/10.1111/j.1601-5223.1928.tb02483.x.

Wallace
 
B
.
Hard and soft selection revisited
.
Evolution
.
1975
:
29
(
3
):
465
473
. https://doi.org/10.2307/2407259.

Weir
 
BS
.
Estimating F-statistics: a historical view
.
Philos Sci
.
2012
:
79
(
5
):
637
643
. https://doi.org/10.1086/667904.

Weir
 
BS
,
Cardon
 
LR
,
Anderson
 
AD
,
Nielsen
 
DM
,
Hill
 
WG
.
Measures of human population structure show heterogeneity among genomic regions
.
Genome Res
.
2005
:
15
(
11
):
1468
1476
. https://doi.org/10.1101/gr.4398405.

Weir
 
BS
,
Goudet
 
J
.
A unified characterization of population structure and relatedness
.
Genetics
.
2017
:
206
(
4
):
2085
2103
. https://doi.org/10.1534/genetics.116.198424.

Wright
 
S
.
Systems of mating I, II, III, IV, V
.
Genetics
.
1921
:
6
(
2
):
111
178
. https://doi.org/10.1093/genetics/6.2.111.

Wright
 
S
.
Mendelian analysis of the pure breeds of livestock. II. The duchess family of shorthorns as bred by Thomas Bates
.
J Hered
.
1923
:
14
(
9
):
379
422
. https://doi.org/10.1093/oxfordjournals.jhered.a102373.

Wright
 
S
.
Isolation by distance
.
Genetics
.
1943
:
28
(
2
):
114
138
. https://doi.org/10.1093/genetics/28.2.114.

Wright
 
S
.
The genetical structure of populations
.
Ann Eugen
.
1951
:
15
(
1
):
323
354
. https://doi.org/10.1111/j.1469-1809.1949.tb02451.x.

Wright
 
S
.
The theoretical variance within and among subdivisions of a population that is in a steady state
.
Genetics
.
1952
:
37
(
3
):
312
321
. https://doi.org/10.1093/genetics/37.3.312.

Wright
 
S
.
The interpretation of population structure by F-statistics with special regard to systems of mating
.
Evolution
.
1965
:
19
(
3
):
395
420
. https://doi.org/10.1111/j.1558-5646.1965.tb01731.x.

Wright
 
S
.
Evolution and the genetics of populations
. Vol. 2.
The theory of gene frequencies
.
Chicago (IL)
:
Chicago University Press
;
1969
.

Wright
 
S
.
Evolution and the genetics of populations
. Vol. 3.
Experimental results and evolutionary deductions
.
Chicago (IL)
:
Chicago University Press
;
1977
.

Wright
 
S
.
The relation of livestock breeding to theories of evolution
.
J Anim Sci
.
1978
:
46
(
5
):
1192
1200
. https://doi.org/10.2527/jas1978.4651192x.

Author notes

This perspective is part of a series of articles celebrating 40 years since our journal, Molecular Biology and Evolution, was founded (Russo et al. 2024). The perspective is accompanied by virtual issues, a selection of papers on the measuring population structure published by Genome Biology and Evolution and Molecular Biology and Evolution.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Claudia Russo
Claudia Russo
Associate Editor
Search for other works by this author on: