Figure 1.
Performance of genotype calling using the naive genotype caller method and BCFtools multiallelic caller with beta distributed errors in the estimation of the quality scores, by read depth. Colors indicate different variances in the beta distribution (0 and 10−5, respectively). The line types indicate the Li GL model (1) and McKenna GL model (2) (for details, see the Supplementary Material, Section 1.4). The genotype calling error rates (y-axis) and call rates (x-axis) are defined in the main text. The average per-site read depth is indicated in the top left corner of each plot. The curves are obtained by varying the GQ threshold for genotype calling. The vertical line segments below 0 on the y-axis denote the minimum GQ threshold of 20, and the horizontal line segments after the final call rate on the x-axis denote the final error rate of each group. The data are from 20 replicates of 100 diploid individuals simulated using msprime, resulting in 328 230 variable sites per simulation replicate (for details, see the Supplementary Material, Section 2). The BCFtools multiallelic caller is used for each population separately, and the prior parameter is disabled.

Performance of genotype calling using the naive genotype caller method and BCFtools multiallelic caller with beta distributed errors in the estimation of the quality scores, by read depth. Colors indicate different variances in the beta distribution (0 and 105, respectively). The line types indicate the Li GL model (1) and McKenna GL model (2) (for details, see the Supplementary Material, Section 1.4). The genotype calling error rates (y-axis) and call rates (x-axis) are defined in the main text. The average per-site read depth is indicated in the top left corner of each plot. The curves are obtained by varying the GQ threshold for genotype calling. The vertical line segments below 0 on the y-axis denote the minimum GQ threshold of 20, and the horizontal line segments after the final call rate on the x-axis denote the final error rate of each group. The data are from 20 replicates of 100 diploid individuals simulated using msprime, resulting in 328 230 variable sites per simulation replicate (for details, see the Supplementary Material, Section 2). The BCFtools multiallelic caller is used for each population separately, and the prior parameter is disabled.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close