We conducted three sets of pairwise comparisons: human versus chimpanzee, human versus mouse, and mouse versus rat. Figure 4 shows the distributions (smoothed histograms) of posterior means and the MLEs of t and ω in those comparisons. In the human–chimpanzee comparison, the Bayesian ω estimates are slightly shifted to the right compared with the MLEs for low ω values and shifted to the left for high ω values. The mean, median, and 25% and 75% percentiles of the Bayesian estimates are 0.369, 0.320, and (0.180, 0.500) whereas those of the MLEs are 0.307, 0.193, and (0.062, 0.411) (table 3). The human and chimpanzee genes are very similar and the patterns are similar to those observed in computer simulation for low t values. Moreover, there are 377 and 2,507 gene alignments in which formula = 0 and formula = 0, respectively, as well as 2 and 423 alignments where formula = ∞ and formula = ∞, respectively. The Bayesian method does not produce any such extreme estimates. The number of genes in which the ω estimate is >1 is 1,121 for ML and 299 for the Bayesian method (table 4). The discrepancy is the result of two effects, a short evolutionary distance and a short sequence length, both indicating a lack of information and greater influence from the prior. Genes with formula > 1 tend to be small (median sequence length 313 codons, compared with 454 codons for all genes). For example, one gene among those 1,121 with formula > 1 has formula = 1.22 (95% confidence interval—CI 0.37–4.01) and posterior mean formula = 0.93 (95% credibility interval—CI 0.36–2.43). This gene has a length of 262 codons and has a small evolutionary distance with formula = 0.043 (95% CI 0.024–0.077) and formula = 0.047 (95% CI 0.027–0.082), so that the prior has an impact. Another gene has formula = 1.27 (95% CI 0.75–2.16) and formula = 1.13 (95% CI 0.60–2.13). This gene is 257 codons in length and the ML and Bayesian distance estimates are 0.17 (95% CI 0.13–0.24) and 0.18 (95% CI 0.13–0.24), respectively. The second gene has a similar length to the first but because the sequence distance is greater, the prior is much less important. In a third gene, of length 1,019 codons, the MLEs are formula = 0.041 (95% CI 0.030–0.056) and formula = 1.27 (95% CI 0.77–2.07), compared with the Bayesian estimates formula = 0.042 (95% CI 0.031–0.057) and formula = 1.13 (95% CI 0.59–2.14). In this case, the effect of the prior is unimportant, because the gene is long.
Distributions (smoothed histograms) of Bayesian and ML estimates of t and ω from mammalian and bacterial pairwise gene comparisons. Numbers of genes analyzed in each comparison are shown in the right part of the figure.
Fig. 4.

Distributions (smoothed histograms) of Bayesian and ML estimates of t and ω from mammalian and bacterial pairwise gene comparisons. Numbers of genes analyzed in each comparison are shown in the right part of the figure.

Table 3.

Descriptive Statistics of Bayesian (top, underlined) and ML (bottom) Estimates of t and formula from Pairwise Comparisons of Protein-Coding Genes from Mammalian Species and Bacterial Strains.

ω
t
MeanSDQuartiles
N0NMeanSDQuartiles
N0N
No. of Genes25%50%75%25%50%75%
Human–chimpanzee14,2150.3690.2460.1800.3200.500000.0250.0720.0130.0190.02800
0.3070.4180.0620.1930.41125074230.0220.0420.0100.0160.0253772
Human–mouse14,6240.1300.1250.0440.0930.176000.8120.5740.5030.6910.95800
0.1260.1570.0400.0890.17022100.8491.2520.4990.6860.952030
Mouse–rat13,3590.1680.1680.0550.1180.228000.2420.1790.1630.2150.28100
0.1590.1800.0460.1080.21550900.2380.2320.1610.2120.27803
Escherichia coli K-12–E.coli O1572,6190.1790.1700.0550.1160.252000.0800.3540.0260.0430.06800
0.0990.1740.0010.0340.110912310.0730.5270.0200.0380.0641216
E. coli K-12–Salmonella typhimurium LT22,6190.0370.0420.0160.0250.042002.2611.5461.1531.8363.12900
0.0250.0420.0060.0180.03216405.0528.4811.0871.7484.0660217
ω
t
MeanSDQuartiles
N0NMeanSDQuartiles
N0N
No. of Genes25%50%75%25%50%75%
Human–chimpanzee14,2150.3690.2460.1800.3200.500000.0250.0720.0130.0190.02800
0.3070.4180.0620.1930.41125074230.0220.0420.0100.0160.0253772
Human–mouse14,6240.1300.1250.0440.0930.176000.8120.5740.5030.6910.95800
0.1260.1570.0400.0890.17022100.8491.2520.4990.6860.952030
Mouse–rat13,3590.1680.1680.0550.1180.228000.2420.1790.1630.2150.28100
0.1590.1800.0460.1080.21550900.2380.2320.1610.2120.27803
Escherichia coli K-12–E.coli O1572,6190.1790.1700.0550.1160.252000.0800.3540.0260.0430.06800
0.0990.1740.0010.0340.110912310.0730.5270.0200.0380.0641216
E. coli K-12–Salmonella typhimurium LT22,6190.0370.0420.0160.0250.042002.2611.5461.1531.8363.12900
0.0250.0420.0060.0180.03216405.0528.4811.0871.7484.0660217

Note.—The F61 model is used for codon frequencies. Results for ML have been calculated after removing the infinite estimates. N0 is the number of genes with the MLE formula or formula = 0, whereas N is the number of genes with the MLE formula or formula = ∞.

Table 3.

Descriptive Statistics of Bayesian (top, underlined) and ML (bottom) Estimates of t and formula from Pairwise Comparisons of Protein-Coding Genes from Mammalian Species and Bacterial Strains.

ω
t
MeanSDQuartiles
N0NMeanSDQuartiles
N0N
No. of Genes25%50%75%25%50%75%
Human–chimpanzee14,2150.3690.2460.1800.3200.500000.0250.0720.0130.0190.02800
0.3070.4180.0620.1930.41125074230.0220.0420.0100.0160.0253772
Human–mouse14,6240.1300.1250.0440.0930.176000.8120.5740.5030.6910.95800
0.1260.1570.0400.0890.17022100.8491.2520.4990.6860.952030
Mouse–rat13,3590.1680.1680.0550.1180.228000.2420.1790.1630.2150.28100
0.1590.1800.0460.1080.21550900.2380.2320.1610.2120.27803
Escherichia coli K-12–E.coli O1572,6190.1790.1700.0550.1160.252000.0800.3540.0260.0430.06800
0.0990.1740.0010.0340.110912310.0730.5270.0200.0380.0641216
E. coli K-12–Salmonella typhimurium LT22,6190.0370.0420.0160.0250.042002.2611.5461.1531.8363.12900
0.0250.0420.0060.0180.03216405.0528.4811.0871.7484.0660217
ω
t
MeanSDQuartiles
N0NMeanSDQuartiles
N0N
No. of Genes25%50%75%25%50%75%
Human–chimpanzee14,2150.3690.2460.1800.3200.500000.0250.0720.0130.0190.02800
0.3070.4180.0620.1930.41125074230.0220.0420.0100.0160.0253772
Human–mouse14,6240.1300.1250.0440.0930.176000.8120.5740.5030.6910.95800
0.1260.1570.0400.0890.17022100.8491.2520.4990.6860.952030
Mouse–rat13,3590.1680.1680.0550.1180.228000.2420.1790.1630.2150.28100
0.1590.1800.0460.1080.21550900.2380.2320.1610.2120.27803
Escherichia coli K-12–E.coli O1572,6190.1790.1700.0550.1160.252000.0800.3540.0260.0430.06800
0.0990.1740.0010.0340.110912310.0730.5270.0200.0380.0641216
E. coli K-12–Salmonella typhimurium LT22,6190.0370.0420.0160.0250.042002.2611.5461.1531.8363.12900
0.0250.0420.0060.0180.03216405.0528.4811.0871.7484.0660217

Note.—The F61 model is used for codon frequencies. Results for ML have been calculated after removing the infinite estimates. N0 is the number of genes with the MLE formula or formula = 0, whereas N is the number of genes with the MLE formula or formula = ∞.

Table 4.

The Numbers of Genes with formula Estimate Greater or Less than 1 Using the Bayesian and ML Methods.

DataBayesian
formula < 1formula > 1NL
Human–chimpanzeeformula < 113,094078
formula > 1822299
NB3
Human–mouseformula < 114,61702
formula > 116
NB2
Mouse–ratMLformula < 113,31305
formula > 11036
NB2
Escherichia coli K-12–E. Coli O157formula < 12,57400
formula > 1432
NB0
E. coli K-12–Salmonella typhimurium LT2formula < 12,61700
formula > 120
NB0
DataBayesian
formula < 1formula > 1NL
Human–chimpanzeeformula < 113,094078
formula > 1822299
NB3
Human–mouseformula < 114,61702
formula > 116
NB2
Mouse–ratMLformula < 113,31305
formula > 11036
NB2
Escherichia coli K-12–E. Coli O157formula < 12,57400
formula > 1432
NB0
E. coli K-12–Salmonella typhimurium LT2formula < 12,61700
formula > 120
NB0

Note.—NL is the number of genes with statistically significant formula based on the LRT at the 5% level (one-sided with critical value 2.71) in the likelihood method, whereas NB is the number of genes with P(ω > 1 | x) > 0.95 in the Bayesian analysis.

Table 4.

The Numbers of Genes with formula Estimate Greater or Less than 1 Using the Bayesian and ML Methods.

DataBayesian
formula < 1formula > 1NL
Human–chimpanzeeformula < 113,094078
formula > 1822299
NB3
Human–mouseformula < 114,61702
formula > 116
NB2
Mouse–ratMLformula < 113,31305
formula > 11036
NB2
Escherichia coli K-12–E. Coli O157formula < 12,57400
formula > 1432
NB0
E. coli K-12–Salmonella typhimurium LT2formula < 12,61700
formula > 120
NB0
DataBayesian
formula < 1formula > 1NL
Human–chimpanzeeformula < 113,094078
formula > 1822299
NB3
Human–mouseformula < 114,61702
formula > 116
NB2
Mouse–ratMLformula < 113,31305
formula > 11036
NB2
Escherichia coli K-12–E. Coli O157formula < 12,57400
formula > 1432
NB0
E. coli K-12–Salmonella typhimurium LT2formula < 12,61700
formula > 120
NB0

Note.—NL is the number of genes with statistically significant formula based on the LRT at the 5% level (one-sided with critical value 2.71) in the likelihood method, whereas NB is the number of genes with P(ω > 1 | x) > 0.95 in the Bayesian analysis.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close