-
PDF
- Split View
-
Views
-
Cite
Cite
Aaron B A Shafer, Joshua M Miller, Marty Kardos, Cross-Species Application of SNP Chips is Not Suitable for Identifying Runs of Homozygosity, Journal of Heredity, Volume 107, Issue 2, March 2016, Pages 193–195, https://doi.org/10.1093/jhered/esv137
- Share Icon Share
Abstract
Cross-species application of single-nucleotide polymorphism (SNP) chips is a valid, relatively cost-effective alternative to the high-throughput sequencing methods generally required to obtain a genome-wide sampling of polymorphisms. Kharzinova et al. (2015) examined the applicability of SNP chips developed in domestic bovids (cattle and sheep) to a semi-wild cervid (reindeer). The ancestors of bovids and cervids diverged between 20 and 30 million years ago (Hassanin and Douzery 2003; Bibi et al. 2013). Empirical work has shown that for a SNP chip developed in a bovid and applied to a cervid species, approximately 50% genotype success with 1% of the loci being polymorphic is expected (Miller et al. 2012). The genotyping of Kharzinova et al. (2015) follows this pattern; however, these data are not appropriate for identifying runs of homozygosity (ROH) and can be problematic for estimating linkage disequilibrium (LD) and we caution readers in this regard.
Inbreeding (mating between relatives) results in parts of the genome that are identical-by-descent (IBD) in the offspring. More specifically, IBD segments of the genome occur when identical copies of a chromosome segment—originating from a common ancestor of the parents—are transmitted to an offspring. Importantly, IBD is manifested in runs of homozygosity (ROH) with inbreeding between individuals having a more recent shared ancestor translating to longer ROH (Kardos et al. 2015). Domestic cattle have a long history of selective breeding programs and the rate of inbreeding has generally been increasing (USDA 2006). Screening a variety of cattle breeds with the bovineSNP50 chip, Ferenčaković et al. (2013) found ROH to vary between 2.36 and 4.01Mb. Using the same SNP chip, but with fewer than 1500 polymorphic SNPs, Kharzinova et al. (2015) documented values around 15Mb (with some over 30Mb). This is despite the fact that there is no strong evidence of inbreeding in reindeer (Roed 1985; Holand et al. 2007).
It is difficult, if not impossible, to reliably detect ROH using a small number of polymorphic loci. To demonstrate this, we tested the effect of the number of polymorphic loci on the detection of ROH with data simulated using the approach of Kardos et al. (2015). We simulated a single population with effective populations size (Ne) of 75 diploid individuals, with an immigration rate of m = 1/75 (1 immigrant per generation on average). The simulated genomes included 20 equally sized autosomes, a genetic map length of 3600 cM, and a total physical length of 3 Gb. 100 000 polymorphic SNPs (mean expected heterozygosity = 0.30) were simulated. We compared the length distribution of the true IBD chromosome segments to the lengths of ROH detected with PLINK. We applied the same PLINK settings as in Kharzinova et al. (2015) using 25K, 50K, and 100K SNPs.
The distribution of the lengths of the true IBD segments and the ROH detected with 50K and 100K SNPs are shown in Figure 1. Analysis based on 25K loci failed to detect any ROH. Apparently, the ROH analyses of Kharzinova et al. (2015) were carried out on all of the SNPs in the array—including those that were not polymorphic—which invalidates the analysis and explains the very high ROH values (we thank the authors for providing the original log files that confirmed fixed sites were used). It is inappropriate to include fixed loci in analyses of ROH because these loci are uninformative of the presence of ROH within a population sample (i.e., ROH are inferred using only polymorphic sites). It is clear from our simulations that the majority of IBD segments cannot be detected using small numbers of loci. For this reason, it is recommended that more than 100 000 SNPs be used for ROH analyses (Purcell et al. 2007). Another consideration in the cross-species application of SNP chips is that ROH analysis depends on genome coordinates. This will introduce a bias if karyotypes differ between species. In the study of Kharzinova et al. (2015), reindeer have a karyotype of 2n = 70 (Nes et al. 1965), while cattle 2n = 60 (Wurster and Benirschke 1968) and sheep 2n = 54 (Di Meo et al. 2007). The cross-species application of SNP chips and ROH analysis will thus produce misleading inferences on inbreeding unless a high number of polymorphic SNPs cross amplify from a species with a similar karyotype.

True IBD segments and ROH detected in simulated data. Histograms are provided and show the distribution of the lengths of true IBD segments (a), ROH detected using 100K SNPs (b), and ROH detected using 50K SNPs (c) in PLINK analyses as described in the main text.
The measurements of linkage disequilibrium (LD) by Kharzinova et al. (2015) also appear problematic. Applying the ovineSNP50 chip to wild sheep, Miller et al. (2011) observed a genome-wide r2 of 0.04, with syntenic comparisons (i.e., loci on the same chromosome) approaching a r2 of 0.19. With the bovineSNP50 chip, Bohmanova et al. (2010) observed a mean r2 of 0.24 for SNPs less than 40kb apart. Kharzinova et al. (2015) reported r2 values of 0.54 and 0.41 for these respective chips. The cause of this extreme LD is not immediately clear, but the high level of polymorphism could be the problem. High heterozygosity results in increased power to detect LD (Ott and Rabinowitz 1997); however, the observed levels of heterozygosity in this study are nearly double to what is expected for both chips (0.91–0.48; 0.79–0.44), and in stark contrast to a similar study using the canine SNP chip on a wild phocid, species that are even more divergent (0.24–0.25; Hoffman et al. 2013). It is likely that paralogues, balancing selection, small sample size, or contamination are impacting these estimates (as well as the ROH analysis). Elevated LD and heterozygosity do not appear to be a common trend with cross-species application of SNP chips (i.e., Miller et al. 2011; Hoffman et al. 2013), but it does inhibit the utility of these markers for population genetic inference on reindeer.
In conclusion, we agree that cross-species application of chips is a valid approach to obtain SNPs in non-model organisms. However, these markers will generally be not suitable for estimating meaningful ROH values for reasons stated above. Studies should further ensure LD and heterozygosity estimates produced by chips are in accordance with commonly observed values before making population genetic inferences.
References
Author notes
Address correspondence to Aaron B.A. Shafer at the address above, or e-mail: [email protected].
Corresponding editor: Taras Oleksyk