-
PDF
- Split View
-
Views
-
Cite
Cite
Wanxia Gai, Guangya Wang, W K Jacky Lam, Liz Y P Yuen, Peiyong Jiang, Stephanie C Y Yu, Tak Y Leung, So Ling Lau, Y M Dennis Lo, K C Allen Chan, Universal Targeted Haplotyping by Droplet Digital PCR Sequencing and Its Applications in Noninvasive Prenatal Testing and Pharmacogenetics Analysis, Clinical Chemistry, Volume 70, Issue 8, August 2024, Pages 1046–1055, https://doi.org/10.1093/clinchem/hvae076
- Share Icon Share
Abstract
The analysis of haplotypes of variants is important for pharmacogenomics analysis and noninvasive prenatal testing for monogenic diseases. However, there is a lack of robust methods for targeted haplotyping.
We developed digital PCR haplotype sequencing (dHapSeq) for targeted haplotyping of variants, which is a method that compartmentalizes long DNA molecules into droplets. Within one droplet, 2 target regions are PCR amplified from one template molecule, and their amplicons are fused together. The fused products are then sequenced to determine the phase relationship of the single nucleotide polymorphism (SNP) alleles. The entire haplotype of 10s of SNPs can be deduced after the phase relationship of individual SNPs are determined in a pairwise manner. We applied dHapSeq to noninvasive prenatal testing in 4 families at risk for thalassemia and utilized it to detect NUDT15 diplotypes for predicting drug tolerance in pediatric acute lymphoblastic leukemia (72 cases and 506 controls).
For SNPs within 40 kb, phase relation can be determined with 100% accuracy. In 7 trio families, the haplotyping results for 97 SNPs spanning 185 kb determined by dHapSeq were concordant with the results deduced from the genotypes of both parents and the fetus. In 4 thalassemia families, a 19.3-kb Southeast Asian deletion was successfully phased with 97 downstream SNPs, enabling noninvasive determination of fetal inheritance using relative haplotype dosage analysis. In the NUDT15 analysis, the variant status and phase of the variants were successfully determined in all cases and controls.
The dHapSeq represents a robust and scalable haplotyping approach with numerous clinical and research applications.
Introduction
Determining the phase relationship of genetic variants, known as haplotypes, is important in many clinical applications to understand their biological significance. For example, in pharmacogenetic analysis, the phase relationship of 2 different variants could affect the degree of drug adjustment. The analysis of haplotype is also important in noninvasive prenatal testing (NIPT) for monogenic diseases based on maternal plasma DNA analysis. To determine the inheritance of maternal variants, the dosage of the 2 maternal haplotypes in maternal plasma DNA can be compared. The maternal haplotype that the fetus has inherited would be present at a slightly higher concentration, and this method is called relative haplotype dosage (RHDO) analysis (1). One key piece of information required for RHDO is the phase relationship of the alleles at the single nucleotide polymorphism (SNP) loci of interest and the maternal variants, i.e., the maternal haplotypes.
Currently, there is a lack of existing methods for accurate and cost-effective haplotyping of SNP loci. In previous studies, we have used either genotype information from the fetus (or other family members) to determine maternal haplotypes or whole genome single-molecule barcoding analysis to perform genome-wide haplotyping (1–3). Experimental methods for determining haplotypes include clone pool dilution sequencing (4), transposase enzyme-linked long-read sequencing technology (5), crosslinking and proximal ligation sequencing (6), and third-generation sequencing (7). In general, these haplotyping methods yield genome-wide, nontargeted haplotype assemblies. In clinical applications, a more cost-effective targeted approach that phases disease-related regions is preferable. In the context of NIPT for monogenic disease, phasing of around 20 single SNPs is generally sufficient for RHDO analysis (2). Existing targeted haplotyping methods, including targeted isolation of bacterial artificial chromosomes (8), imaging of DNA fragments on a glass surface (9, 10), and targeted proximal ligation sequencing (11), require relatively complicated experimental protocols and are challenging to adopt for clinical application.
In a previous study, it was shown that 2 distant loci on the same chromosome can be fused together using fusion PCR with single molecule partitioning (12). However, applying this method in a clinical setting is challenging. First, this method only allows the fusion of 2 regions. Haplotyping of dozens of SNPs would involve laborious pairwise analyses. Moreover, emulsion generation by manual shaking for single DNA molecule partitioning is unlikely to produce consistent results. Here, we present a universal targeted haplotyping method—digital PCR haplotype sequencing (dHapSeq)—that can phase multiple target regions in a single reaction. The method utilizes a high-throughput droplet generation system to partition individual DNA molecules so that fusion PCRs can be performed on 2 targets. We first validated the performance of dHapSeq using samples from family trios and then applied dHapSeq for NIPT for α-thalassemia and pharmacogenetic analysis for the nudix hydrolase 15 (NUDT15) gene.
Materials and Methods
Sample Collection and Processing
This study was approved by the institution's ethics committee. Pregnant women were recruited from the Department of Obstetrics and Gynaecology, and children with acute lymphoblastic leukemia (ALL) were recruited from the Department of Pediatrics, Prince of Wales Hospital, Hong Kong, with written consent. Details of sample processing are described in Supplemental Materials and Methods.
The Principle of dHapSeq
The principle of dHapSeq is illustrated in Fig. 1. A droplet-based microfluidic system was used to compartmentalize individual high molecular weight DNA molecules such that fusion PCR could be conducted on a single template DNA molecule (Fig. 1A, Supplemental Material for DNA input amount and partitioning statistics). For dHapSeq of phasing 2 regions (Supplemental Fig. 1), 2 primer pairs were employed: F(I) and R(I) to amplify the first region and F(II) and R(II) to amplify the second region. The 5′ ends of R(I) and F(II) have overlapping regions that complement each another. Thus, the 2 PCR amplicons anneal at their overlapping ends, allowing polymerase to extend them into a fused product containing the phase information of both regions. The outer primers, F(I) and R(II), were used at approximately 25 times higher concentrations than the inner primers, R(I) and F(II). This concentration difference allowed excess F(I) and R(II) to preferentially amplify fully fused products, minimizing residual unfused amplicons.

Schematic illustration of the dHapSeq method. (A), High molecular weight DNA molecules are compartmentalized into droplets; (B), In droplets, multiplexed fusion PCR is conducted to link multiple target regions from a single molecule in a pairwise manner (i.e., pairing each group A region with each group B region). In the initial rounds of PCR, each region is amplified separately using target-specific primers. The overlapping region (OLR1) engineered to the 5′ end of each group A reverse primer and the 5' end of each group B forward primer would make the 3′ end of Strand X complementary to the 3′ end of Strand Z. A fused product is then formed after linking the 2 strands together and extending them with the polymerase. The 2 overlapping regions (OLR2 and OLR3) engineered to the 5' ends of group A forward primers and group B reverse primers serve as annealing sites for a pair of universal primers, respectively. Thus, all fused products can be amplified simultaneously by the pair of universal primers; (C), The fused products are purified from droplets and amplified using adapter primers to form sequencing libraries; (D), After amplified products are sequenced, haplotypes are determined based on the pairwise relationship of the alleles in the fused products.
A key advantage of dHapSeq is its ability to phase multiple target regions simultaneously through linking target regions in a pairwise manner (Fig. 1B). Multiple target regions were divided into 2 groups: group A and group B. The division of target regions into 2 groups prevented the fusion of PCR products from the same locus. With an overlapping region (ORL1) engineered to target-specific primers, PCR products of group A could be linked to those of group B. Additionally, 2 overlapping regions (ORL2 and ORL3) were engineered to generate common binding sites for amplification by a pair of universal primers. The universal primers were used at higher concentrations than target-specific primers, allowing preferential amplification of fused products and minimizing unfused products.
After fusion PCR, the fused products were purified from droplets, amplified with adapter primers, and sequenced (Fig. 1C and D). Each fused product would then contain a group A target region and a group B target region. The alleles of a fused product would reveal a part of a haplotype, and the entire haplotype could then be determined from the linked alleles of different combinations of group A and B loci.
DHapSeq Library Construction and Sequencing Analysis
PCR fusion was performed using a Bio-Rad QX100/QX200 Droplet Digital PCR system, and the fused products were sequenced using an Illumina platform. A detailed description of the dHapSeq method used for haplotyping 97 SNPs and diplotyping NUDT15 is provided in Supplemental Materials and Methods. Primer sequences are provided in Supplemental Dataset 1 and Supplemental Tables 1 and 2.
The bioinformatics analysis for phasing multiple SNPs is illustrated in Supplemental Fig. 2.
Determination of the Paternally and Maternally Transmitted Haplotypes Through Targeted Sequencing of Maternal Plasma
The maternal plasma DNA of 4 α0-thalassemia families was sequenced using a targeted capture approach (Supplemental Materials and Methods).
The paternally transmitted haplotype was determined using a binomial test performed in R (see Supplemental Materials and Methods for details). The maternally transmitted haplotype was determined using the RHDO analysis, as described previously (1) with some modifications (see Supplemental Fig. 3 and Supplemental Materials and Methods for details).
High-resolution Melting Genotyping Analysis
Two high-resolution melting (HRM) assays were designed to detect variants in NUDT15 exons 1 and 3 (see Supplemental Table 2 for primer sequences).
Results
Effect of Distance Between SNPs on Haplotyping Accuracy
We evaluated how the distance between 2 directly linked SNPs would affect the haplotyping results. To establish a gold standard, we determined the haplotype of 10 heterozygous SNPs (Supplemental Table 3) for a pregnant woman based on genotype analysis of the mother and her fetus using microarray. The distances between the first SNP and each of the remaining 9 SNPs were 2.6, 5.1, 11.6, 20.5, 29.6, 40.5, 49.2, 75.1, and 105.5 kb (Supplemental Fig. 4). The maternal genomic DNA was isolated using the MagAttract HMW DNA Kit (Qiagen), and its size was determined to be over 50 kb (Supplemental Fig. 5). Fusion PCRs were performed to link the first SNP with the other 9 SNPs. A median of 452 363 (range: 392 003–554 593) fused products were sequenced for the 9 pairs of SNPs. The percentage of correct allele combinations, i.e., combinations consistent with the gold-standard haplotypes, was 93.9% at a distance of 2.6 kb, and decreased with increasing distance (Supplemental Table 4). Most of the incorrect combinations were caused by “switching errors,” i.e., swapping 2 heterozygous alleles.
To evaluate dHapSeq's ability to determine haplotypes, 100 in silico simulation tests were conducted for each distance. Each simulation test used 200 randomly selected sequenced fragments containing 2 SNP regions. Haplotypes were determined if the summed frequencies of fragments supporting one scenario were ≥60%. If the summed frequency in either scenario was <60%, the haplotypes were undetermined. We observed that haplotype determination rates were 100% for all distances of ≤40.5 kb, while for distances of ≥49.2 kb, the rate decreased to 7% on average (Table 1). All determined haplotypes were 100% accurate regardless of distance. A decrease in the determination rate over long distances was attributed to incorrect linking. These results indicated that designing all target regions within 40 kb may increase the signal to noise ratio. Nevertheless, the total length of the targeted haplotype regions is not constrained to 40 kb, as any 2 regions over 40 kb apart can be further linked by the link relationships with the regions in between.
The rate and accuracy of haplotype determination for SNPs at different distances.
Distance (kb) . | SNP pairing . | Rate of haplotype determination (%) . | Accuracy of determined haplotypes (%) . |
---|---|---|---|
2.6 | SNP 1–SNP 2 | 100 | 100 |
5.1 | SNP 1–SNP 3 | 100 | 100 |
11.6 | SNP 1–NP 4 | 100 | 100 |
20.5 | SNP 1–SNP 5 | 100 | 100 |
29.6 | SNP 1–SNP 6 | 100 | 100 |
40.5 | SNP 1–SNP 7 | 100 | 100 |
49.2 | SNP 1–SNP 8 | 15 | 100 |
75.1 | SNP 1–SNP 9 | 0 | NA |
105.5 | SNP 1–SNP 10 | 7 | 100 |
Distance (kb) . | SNP pairing . | Rate of haplotype determination (%) . | Accuracy of determined haplotypes (%) . |
---|---|---|---|
2.6 | SNP 1–SNP 2 | 100 | 100 |
5.1 | SNP 1–SNP 3 | 100 | 100 |
11.6 | SNP 1–NP 4 | 100 | 100 |
20.5 | SNP 1–SNP 5 | 100 | 100 |
29.6 | SNP 1–SNP 6 | 100 | 100 |
40.5 | SNP 1–SNP 7 | 100 | 100 |
49.2 | SNP 1–SNP 8 | 15 | 100 |
75.1 | SNP 1–SNP 9 | 0 | NA |
105.5 | SNP 1–SNP 10 | 7 | 100 |
The rate and accuracy of haplotype determination for SNPs at different distances.
Distance (kb) . | SNP pairing . | Rate of haplotype determination (%) . | Accuracy of determined haplotypes (%) . |
---|---|---|---|
2.6 | SNP 1–SNP 2 | 100 | 100 |
5.1 | SNP 1–SNP 3 | 100 | 100 |
11.6 | SNP 1–NP 4 | 100 | 100 |
20.5 | SNP 1–SNP 5 | 100 | 100 |
29.6 | SNP 1–SNP 6 | 100 | 100 |
40.5 | SNP 1–SNP 7 | 100 | 100 |
49.2 | SNP 1–SNP 8 | 15 | 100 |
75.1 | SNP 1–SNP 9 | 0 | NA |
105.5 | SNP 1–SNP 10 | 7 | 100 |
Distance (kb) . | SNP pairing . | Rate of haplotype determination (%) . | Accuracy of determined haplotypes (%) . |
---|---|---|---|
2.6 | SNP 1–SNP 2 | 100 | 100 |
5.1 | SNP 1–SNP 3 | 100 | 100 |
11.6 | SNP 1–NP 4 | 100 | 100 |
20.5 | SNP 1–SNP 5 | 100 | 100 |
29.6 | SNP 1–SNP 6 | 100 | 100 |
40.5 | SNP 1–SNP 7 | 100 | 100 |
49.2 | SNP 1–SNP 8 | 15 | 100 |
75.1 | SNP 1–SNP 9 | 0 | NA |
105.5 | SNP 1–SNP 10 | 7 | 100 |
Phasing 97 SNPs Downstream of the Alpha-globin Genes
To evaluate the performance of dHapSeq in phasing multiple regions, we examined SNPs on chromosome 16 downstream of the alpha-globin (HBA1 and HBA2) genes. Deletions within these genes result in α-thalassemia. We selected 97 SNPs with minor allele frequencies exceeding 0.2 to increase the chance of them being heterozygous and informative for downstream RHDO analysis. The 97 SNPs spanned approximately 185 kb. As dHapSeq has optimal performance for phasing SNPs within 40 kb, the 97 SNPs were grouped into 6 sets, each containing 20 to 24 SNPs. The mean distance between the farthest SNPs in each group was 39 kb (Supplemental Fig. 6). The adjacent sets had 6 to 8 overlapping SNPs in order to phase SNPs across the sets. Since these SNPs had minor allele frequencies of more than 0.2, the likelihood of having at least one overlapping SNP being heterozygous is over 90%.
We phased 97 SNPs in 14 parental samples from 7 family trios (Supplemental Table 5). We also genotyped the family trio consisting of the father, mother, and fetus using targeted capture sequencing (Supplemental Table 6). Based on the trio genotypes, the parental haplotypes were determined and served as gold standards for verifying the dHapSeq results. The complete results of the parental haplotypes by dHapSeq and the genotypes of the trios are provided in Supplemental Dataset 2. In 12 of the 14 subjects, all 97 SNPs were phased into 1 haplotype block. The remaining 2 subjects had 97 SNPs phased into 2 and 3 haplotype blocks since all overlapping SNPs between blocks were homozygous. The haplotypes determined by dHapSeq were identical to the haplotype deduced from the genotypes of the family trios in all of the cases.
Application in Noninvasive Assessment of Fetal Southeast Asian Deletion α0-thalassemia
We applied the dHapSeq method to the noninvasive assessment of fetal α0-thalassemia. The disease is associated with the Southeast Asian (SEA) deletion, a deletion of a 19.3-kb region on chromosome 16 that includes the HBA1 and HBA2 genes. Four families seeking prenatal diagnosis for α0-thalassemia were analyzed (Table 2). We successfully resolved the haplotypes for the 8 parents with dHapSeq. The 19.3-kb SEA deletion was linked to the downstream haplotype block comprising the previously described 97 SNPs (Fig. 2A). Across the 8 parents, a mean of 76 (range: 61–97) SNPs were successfully phased into one haplotype block adjacent to the SEA deletion (Table 2). Detailed results are provided in the Supplemental SI Dataset 3.

Application of dHapSeq for noninvasive assessment of fetal SEA deletion α0-thalassemia. (A), Principle of dHapSeq for linking the 19.3-kb SEA deletion to the downstream haplotype block comprising 97 SNPs. A pair of primers (i.e., FSEA and RSEA) flanking the SEA deletion region were designed. The reverse primer for the SEA deletion (i.e., RSEA) overlapped with the forward primer for the downstream adjacent heterozygous SNP (i.e., FSNP). In dHapSeq, only a molecule containing the SEA deletion can be amplified and formed into fused products. The SEA deletion-linked SNP allele and its corresponding haplotype can then be determined. Panels (B) to (E) represent the sequential probability ratio test classification processes for RHDO analysis to determine the fetal inheritance of the maternal variant in the four families at risk for α0-thalassemia. The 2 solid lines of each panel represent the statistical thresholds for determining which maternal haplotype the fetus inherits. The x-axis represents the number of maternal plasma DNA molecules analyzed, and the y-axis represents the fraction of molecules carrying the alleles on maternal Hap I, i.e., the maternal haplotype associated with the variant. Hap II is the maternal haplotype associated with the wild-type. When the data points are above the upper statistical threshold and fall within the upper region, it indicates that the maternal Hap I is statistically overrepresented in maternal plasma and the fetus is deduced to have inherited Hap I. If the data points fall between the upper and lower lines of statistical thresholds, it indicates that the RHDO analysis has not reached statistical significance. If the data points fall below the lower statistical threshold, it indicates that the maternal Hap II is overrepresented in maternal plasma statistically and the fetus is deduced to have inherited the maternal Hap II.
Family . | Fetal sex . | Gestational age . | Fetal DNA fraction (%) . | Genotype . | Number of SNPs in the haplotype block adjacent to the SEA deletion determined by dHapSeq . | SNPs used in NIPT . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Father . | Mother . | Fetus . | Father . | Mother . | Overlappeda . | Paternal informativeb . | Maternal informativec . | ||||
α-thal F1 | M | 14wk 3d | 9.3 | αα/--SEA | αα/--SEA | --SEA/--SEA | 97 | 61 | 61 | 15 | 15 |
α-thal F2 | F | 12wk 3d | 13.2 | αα/αα | αα/--SEA | αα/--SEA | 82 | 61 | 61 | 0d | 13 |
α-thal F3 | F | 18wk 2d | 7.7 | αα/--SEA | αα/--SEA | --SEA/--SEA | 87 | 61 | 61 | 11 | 16 |
α-thal F4 | M | 17wk 1d | 7.1 | αα/--SEA | αα/--SEA | αα/--SEA | 97 | 65 | 65 | 16 | 31 |
Family . | Fetal sex . | Gestational age . | Fetal DNA fraction (%) . | Genotype . | Number of SNPs in the haplotype block adjacent to the SEA deletion determined by dHapSeq . | SNPs used in NIPT . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Father . | Mother . | Fetus . | Father . | Mother . | Overlappeda . | Paternal informativeb . | Maternal informativec . | ||||
α-thal F1 | M | 14wk 3d | 9.3 | αα/--SEA | αα/--SEA | --SEA/--SEA | 97 | 61 | 61 | 15 | 15 |
α-thal F2 | F | 12wk 3d | 13.2 | αα/αα | αα/--SEA | αα/--SEA | 82 | 61 | 61 | 0d | 13 |
α-thal F3 | F | 18wk 2d | 7.7 | αα/--SEA | αα/--SEA | --SEA/--SEA | 87 | 61 | 61 | 11 | 16 |
α-thal F4 | M | 17wk 1d | 7.1 | αα/--SEA | αα/--SEA | αα/--SEA | 97 | 65 | 65 | 16 | 31 |
--SEA, Southeast Asian deletion.
aThis column displays the number of SNPs that overlap between the father and mother's haplotype blocks. These SNPs were used further in analyses to determine paternal and maternal inheritance.
bParental informative SNPs are those that are heterozygous in the father and homozygous in the mother. These SNPs were used to determine the fetal inheritance of the paternal variant.
cMaternal informative SNPs are those that are heterozygous in the mother regardless of whether the SNP is homozygous or heterozygous in the father. These SNPs were used to determine the fetal inheritance of the maternal variant.
dIn the family α-thal F2, the father was homozygous wild-type (αα/αα). The paternal inheritance was not tested since the fetus would certainly inherit a wild-type allele from its father.
Family . | Fetal sex . | Gestational age . | Fetal DNA fraction (%) . | Genotype . | Number of SNPs in the haplotype block adjacent to the SEA deletion determined by dHapSeq . | SNPs used in NIPT . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Father . | Mother . | Fetus . | Father . | Mother . | Overlappeda . | Paternal informativeb . | Maternal informativec . | ||||
α-thal F1 | M | 14wk 3d | 9.3 | αα/--SEA | αα/--SEA | --SEA/--SEA | 97 | 61 | 61 | 15 | 15 |
α-thal F2 | F | 12wk 3d | 13.2 | αα/αα | αα/--SEA | αα/--SEA | 82 | 61 | 61 | 0d | 13 |
α-thal F3 | F | 18wk 2d | 7.7 | αα/--SEA | αα/--SEA | --SEA/--SEA | 87 | 61 | 61 | 11 | 16 |
α-thal F4 | M | 17wk 1d | 7.1 | αα/--SEA | αα/--SEA | αα/--SEA | 97 | 65 | 65 | 16 | 31 |
Family . | Fetal sex . | Gestational age . | Fetal DNA fraction (%) . | Genotype . | Number of SNPs in the haplotype block adjacent to the SEA deletion determined by dHapSeq . | SNPs used in NIPT . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Father . | Mother . | Fetus . | Father . | Mother . | Overlappeda . | Paternal informativeb . | Maternal informativec . | ||||
α-thal F1 | M | 14wk 3d | 9.3 | αα/--SEA | αα/--SEA | --SEA/--SEA | 97 | 61 | 61 | 15 | 15 |
α-thal F2 | F | 12wk 3d | 13.2 | αα/αα | αα/--SEA | αα/--SEA | 82 | 61 | 61 | 0d | 13 |
α-thal F3 | F | 18wk 2d | 7.7 | αα/--SEA | αα/--SEA | --SEA/--SEA | 87 | 61 | 61 | 11 | 16 |
α-thal F4 | M | 17wk 1d | 7.1 | αα/--SEA | αα/--SEA | αα/--SEA | 97 | 65 | 65 | 16 | 31 |
--SEA, Southeast Asian deletion.
aThis column displays the number of SNPs that overlap between the father and mother's haplotype blocks. These SNPs were used further in analyses to determine paternal and maternal inheritance.
bParental informative SNPs are those that are heterozygous in the father and homozygous in the mother. These SNPs were used to determine the fetal inheritance of the paternal variant.
cMaternal informative SNPs are those that are heterozygous in the mother regardless of whether the SNP is homozygous or heterozygous in the father. These SNPs were used to determine the fetal inheritance of the maternal variant.
dIn the family α-thal F2, the father was homozygous wild-type (αα/αα). The paternal inheritance was not tested since the fetus would certainly inherit a wild-type allele from its father.
We assessed the fetal inheritance of paternal and maternal haplotypes in the 4 families by sequencing maternal plasma DNA (Supplemental Table 6). As an illustration, in family 1, both the father and mother were carriers of SEA deletion α0-thalassemia. Based on the dHapSeq results, we denoted the mutant-linked haplotypes in the father and mother as pHap I and mHap I, respectively, and the wildtype-linked haplotypes in the father and mother as pHap II and mHap II, respectively. The paternal and maternal haplotype blocks contained 97 and 61 SNPs, respectively, resulting in 61 overlapped SNPs for further analysis. Using 15 SNPs that were heterozygous in the father and homozygous in the mother, we determined that the fetus had inherited the mutant-linked pHap I from the father (Supplemental Table 7). To determine the maternal inheritance, RHDO analysis was conducted using 15 SNPs that were heterozygous in the mother. The alleles on mHap I were significantly overrepresented in maternal plasma, suggesting the fetus inherited the mutant-linked mHap I (Fig. 2B). Hence, the fetus had inherited mutant alleles from both parents and was a α0-thalassemia major.
The same approach was applied to families 2 to 4 to determine fetal inheritance (Fig. 2C–E, and Supplemental Table 7). The fetuses in families 2 and 4 were deduced to be heterozygous α0-thalassemia carriers, while the fetus in family 3 was a homozygous α0-thalassemia major. The deduced fetal variant status was concordant with the clinical diagnosis as determined by invasive prenatal testing in each family.
Application in Diplotype Analysis of NUDT15 Variants
NUDT15 encodes an enzyme that metabolizes thiopurines, which can be used for the treatment of ALL and autoimmune conditions and for supporting organ transplant recipients. Variants in NUDT15 are associated with thiopurine-induced myelosuppression due to excessive active metabolites. Four coding variants located in exons 1 and 3 of NUDT15 have been reported, resulting in 6 clinically relevant haplotypes known as *1 to *6 (Fig. 3A). In a subject carrying heterozygous c.50_55dup and c.415C > T, it is important to determine whether the variants are in cis (i.e., diplotype *1/*2) or in trans (i.e., diplotype *3/*6), because the 2 types of diplotypes are associated with different degrees of intolerance to thiopurine drugs. We used dHapSeq to analyze the diplotypes of NUDT15 variants in 2 cohorts of subjects: 72 subjects (62 ALL patients and 10 of their family numbers) in cohort 1 and 506 control subjects in cohort 2.

Application of the dHapSeq method in diplotype analysis of NUDT15 variants. (A), Four common variants represent 6 major haplotypes of NUDT15; (B), HRM analysis targeting NUDT15 exon 1 identified the heterozygous c.50_55dup and heterozygous c.52G > A variants, differentiating from the wild-type allele; (C), HRM analysis targeting NUDT15 exon 3 identified heterozygous and homozygous c.415C > T variants; (D), The strategy for combining the dHapSeq and HRM methods to identify NUDT15 diplotypes in 506 control subjects.
For subjects in cohort 1, we validated the dHapSeq results by comparing them with those determined using gold-standard methods, namely Sanger sequencing and family-based analysis. Diplotypes determined by dHapSeq were consistent with those determined by gold-standard methods (Table 3). In particular, dHapSeq correctly determined the diplotypes of 8 subjects who carried heterozygous c.50_55dup and c.415C > T variants, as confirmed by family-based analysis (Supplemental Fig. 7).
Diplotype of NUDT15 gene in 2 cohorts of subjects determined by dHapSeq and HRM.
Cohort 1, 62 ALL patients and 10 of their family members . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by Sanger and HRM . | Genotype of exon 3 by Sanger and HRM . | Number of samples . | Diplotype determined by Sanger and family analysis . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 44 | *1/*1 | *1/*1 | 61.1 |
c.50_55dup/WT | c.415C > T/WT | 8 | *1/*2 | *1/*2 | 11.1 |
WT/WT | c.415C > T/WT | 11 | *1/*3 | *1/*3 | 15.3 |
c.50_55dup /WT | WT/WT | 4 | *1/*6 | *1/*6 | 5.6 |
c.50_55dup /WT | c.415C > T/c.415C > T | 2 | *2/*3 | *2/*3 | 2.8 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | *3/*3 | 2.8 |
c.52G > A/WT | c.415C > T/WT | 1 | *3/*5 | *3/*5 | 1.4 |
Cohort 1, 62 ALL patients and 10 of their family members . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by Sanger and HRM . | Genotype of exon 3 by Sanger and HRM . | Number of samples . | Diplotype determined by Sanger and family analysis . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 44 | *1/*1 | *1/*1 | 61.1 |
c.50_55dup/WT | c.415C > T/WT | 8 | *1/*2 | *1/*2 | 11.1 |
WT/WT | c.415C > T/WT | 11 | *1/*3 | *1/*3 | 15.3 |
c.50_55dup /WT | WT/WT | 4 | *1/*6 | *1/*6 | 5.6 |
c.50_55dup /WT | c.415C > T/c.415C > T | 2 | *2/*3 | *2/*3 | 2.8 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | *3/*3 | 2.8 |
c.52G > A/WT | c.415C > T/WT | 1 | *3/*5 | *3/*5 | 1.4 |
Cohort 2, 506 control subjects . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by HRM . | Genotype of exon 3 by HRM . | Number of samples . | Diplotype determined by HRM . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 380 | *1/*1 | — | 75.1 |
WT/WT | c.415C > T/WT | 62 | *1/*3 | — | 12.3 |
c.52G > A/WT | WT/WT | 12 | *1/*5 | — | 2.4 |
c.50_55dup/WT | WT/WT | 22 | *1/*6 | — | 4.3 |
c.50_55dup/WT | c.415C > T/c.415C > T | 5 | *2/*3 | — | 1.0 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | — | 0.4 |
c.50_55dup/WT | c.415C > T/WT | 22 | — | *1/*2 | 4.3 |
Undetermined | WT/WT | 1 | — | *5/*6 | 0.2 |
Cohort 2, 506 control subjects . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by HRM . | Genotype of exon 3 by HRM . | Number of samples . | Diplotype determined by HRM . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 380 | *1/*1 | — | 75.1 |
WT/WT | c.415C > T/WT | 62 | *1/*3 | — | 12.3 |
c.52G > A/WT | WT/WT | 12 | *1/*5 | — | 2.4 |
c.50_55dup/WT | WT/WT | 22 | *1/*6 | — | 4.3 |
c.50_55dup/WT | c.415C > T/c.415C > T | 5 | *2/*3 | — | 1.0 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | — | 0.4 |
c.50_55dup/WT | c.415C > T/WT | 22 | — | *1/*2 | 4.3 |
Undetermined | WT/WT | 1 | — | *5/*6 | 0.2 |
Diplotype of NUDT15 gene in 2 cohorts of subjects determined by dHapSeq and HRM.
Cohort 1, 62 ALL patients and 10 of their family members . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by Sanger and HRM . | Genotype of exon 3 by Sanger and HRM . | Number of samples . | Diplotype determined by Sanger and family analysis . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 44 | *1/*1 | *1/*1 | 61.1 |
c.50_55dup/WT | c.415C > T/WT | 8 | *1/*2 | *1/*2 | 11.1 |
WT/WT | c.415C > T/WT | 11 | *1/*3 | *1/*3 | 15.3 |
c.50_55dup /WT | WT/WT | 4 | *1/*6 | *1/*6 | 5.6 |
c.50_55dup /WT | c.415C > T/c.415C > T | 2 | *2/*3 | *2/*3 | 2.8 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | *3/*3 | 2.8 |
c.52G > A/WT | c.415C > T/WT | 1 | *3/*5 | *3/*5 | 1.4 |
Cohort 1, 62 ALL patients and 10 of their family members . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by Sanger and HRM . | Genotype of exon 3 by Sanger and HRM . | Number of samples . | Diplotype determined by Sanger and family analysis . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 44 | *1/*1 | *1/*1 | 61.1 |
c.50_55dup/WT | c.415C > T/WT | 8 | *1/*2 | *1/*2 | 11.1 |
WT/WT | c.415C > T/WT | 11 | *1/*3 | *1/*3 | 15.3 |
c.50_55dup /WT | WT/WT | 4 | *1/*6 | *1/*6 | 5.6 |
c.50_55dup /WT | c.415C > T/c.415C > T | 2 | *2/*3 | *2/*3 | 2.8 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | *3/*3 | 2.8 |
c.52G > A/WT | c.415C > T/WT | 1 | *3/*5 | *3/*5 | 1.4 |
Cohort 2, 506 control subjects . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by HRM . | Genotype of exon 3 by HRM . | Number of samples . | Diplotype determined by HRM . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 380 | *1/*1 | — | 75.1 |
WT/WT | c.415C > T/WT | 62 | *1/*3 | — | 12.3 |
c.52G > A/WT | WT/WT | 12 | *1/*5 | — | 2.4 |
c.50_55dup/WT | WT/WT | 22 | *1/*6 | — | 4.3 |
c.50_55dup/WT | c.415C > T/c.415C > T | 5 | *2/*3 | — | 1.0 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | — | 0.4 |
c.50_55dup/WT | c.415C > T/WT | 22 | — | *1/*2 | 4.3 |
Undetermined | WT/WT | 1 | — | *5/*6 | 0.2 |
Cohort 2, 506 control subjects . | |||||
---|---|---|---|---|---|
Genotype of exon 1 by HRM . | Genotype of exon 3 by HRM . | Number of samples . | Diplotype determined by HRM . | Diplotype determined by dHapSeq . | Diplotype frequency (%) . |
WT/WT | WT/WT | 380 | *1/*1 | — | 75.1 |
WT/WT | c.415C > T/WT | 62 | *1/*3 | — | 12.3 |
c.52G > A/WT | WT/WT | 12 | *1/*5 | — | 2.4 |
c.50_55dup/WT | WT/WT | 22 | *1/*6 | — | 4.3 |
c.50_55dup/WT | c.415C > T/c.415C > T | 5 | *2/*3 | — | 1.0 |
WT/WT | c.415C > T/c.415C > T | 2 | *3/*3 | — | 0.4 |
c.50_55dup/WT | c.415C > T/WT | 22 | — | *1/*2 | 4.3 |
Undetermined | WT/WT | 1 | — | *5/*6 | 0.2 |
dHapSeq diplotyping analysis of NUDT15 can benefit subjects with heterozygous variants at both loci. To rapidly identify such cases in a large-scale study, we developed a simple genotyping method based on HRM analysis. Two HRM assays were developed to target exon 1 and exon 3 of NUDT15. Subjects with different genotypes at exon 1 and exon 3 were identified based on the shape of normalized melting peaks (Fig. 3B and C).
We combined HRM and dHapSeq methods to analyze the 506 subjects in cohort 2 (Fig. 3D). The subjects were first genotyped by HRM analysis (Table 3). Diplotypes were directly determined from the genotyping results in 483 subjects. In the remaining 23 subjects, 22 had heterozygous variants for both c.50_55dup and c.415C > T. One subject had an undetermined genotype since the HRM melting peak shape was different from that of all known genotypes in cohort 1 (Supplemental Fig. 8). The 23 subjects were analyzed using dHapSeq. All the 22 double-heterozygous subjects were determined to be *1/*2 diplotype. The subject with an HRM-undetermined genotype was determined to carry both c.50_55dup and c.52G > A variants. Thus, its diplotype was *5/*6. Additionally, the frequencies of NUDT15 diplotypes in our cohort (Table 3) were comparable to those found in the 1000 Genomes East Asian Phased Data (13).
Discussion
In this study, we developed dHapSeq for determining the haplotypes of an individual over a range of variants. This method bridges the gaps in the current technologies for haplotyping. Current haplotyping methods, e.g., transposase enzyme-linked long-read sequencing technology by Illumina (5), provides a nontargeted haplotyping across the whole genome. Such a method is not cost-effective for analyzing target regions of clinical interest. On the other hand, we can perform targeted analysis with overlapping multiplex PCR or probe-based capture enrichment followed by sequencing. In such approaches, the application of second-generation sequencing would pose a challenge as the distance between 2 SNPs are too long to be covered by short sequencing reads. A third-generation sequencing technique can overcome this problem by reading a long molecule that includes 2 or a few SNPs. However, as the sequences flanking the SNPs are not of interest, most of the sequencing power used in this approach is “wasted.” Hence, these current haplotyping methods are not cost-effective or scalable for routine clinical applications. The dHapSeq method, on the other hand, provides a robust and scalable method for haplotyping a range of targeted SNPs.
The dHapMap technology involves 2 key steps: 1) linking multiple variants using high-throughput droplet digital fusion PCR and 2) analyzing the fused products at once with short-read sequencing. The ability to fuse multiple SNPs was achieved by grouping SNPs into 2 groups, linking SNPs in a pairwise manner and amplifying all combinations of paired-linked SNPs using universal primers. The linking of multiple regions is much more challenging than just linking 2 regions as previously described (12). When linking multiple regions, it is important to minimize the number of residual unfused amplicons from each region. These unfused amplicons may be linked in a subsequent PCR following droplet breakage, resulting in haplotyping errors. To solve this problem, we used high concentrations of universal pairs to preferentially amplify the fused products and minimize unfused amplicons. When linking multiple regions, it is advantageous to use high-throughput droplet generation systems, rather than generating emulsions through manual shaking as in the previous study (12). Droplets generated by manual shaking are variable in size, leading to an uneven partitioning of DNA molecules. Partitioning 2 molecules into a droplet would result in incorrect results for haplotyping. However, it is challenging to implement fusion PCR with an automated droplet generation system because most commercially available systems (e.g., BioRad ddPCR droplet generation system) use Taq polymerase. This polymerase adds a single adenosine (i.e., [A]) to the 3′ end of amplified DNA fragments. In a fusion PCR reaction, the [A] extension may result in the amplified strands not being complementary, thus preventing the fusion of the 2 regions. We addressed this by adding an additional nucleotide to the primers (Supplemental Text and Supplemental Figs. 9 and 10). In addition, a high concentration of DNA input can adversely affect haplotyping accuracy by increasing the likelihood of having 2 target molecules in one droplet. Therefore, the input amount of DNA needs to be precisely controlled. In this study, 3.2 ng of genomic DNA was used for droplet generation. As approximately 20 000 droplets were generated, it is expected that less than 0.1% of droplets would be expected to contain 2 or more molecules.
We showed that the dHapSeq method can phase SNP alleles within 40 kb accurately and efficiently (Table 1). When the distance of 2 SNPs goes beyond 40 kb, the probability of successfully phasing the SNP alleles is reduced. This drop in efficiency is likely related to the size of the DNA molecules within the reaction droplet. The optimal distance for haplotyping can further be improved using DNA extraction methods that preserve longer DNA molecules. The phase information of more distant SNPs can be determined through the linkage to the SNPs in between them. In the validated sample set of 7 family trios, the haplotype of 97 SNPs across a 185 kb region could be determined.
We applied dHapSeq for the noninvasive diagnosis of fetal monogenic diseases. While the RHDO method is currently considered the most accurate approach for determining the fetal inheritance of a mutant allele from the mother, it requires parental haplotype information flanking the variant (1). The lack of robust methods for obtaining such haplotype information has limited the widespread adoption of RHDO-based noninvasive prenatal testing in clinical practice. By utilizing dHapSeq, we were able to phase 97 SNPs and a 19.3-kb SEA deletion within 4 families at risk of α0-thalassemia, allowing us to noninvasively determine the fetal inheritance of the mutant allele. As dHapSeq is a targeted analysis of the SNP of interest, the amount of sequencing required for each case is small. Hence, the dHapSeq approach should be scalable and can be applied to large populations for NIPT in monogenic diseases or to other monogenic conditions in addition to α0-thalassemia.
We applied dHapSeq method for the pharmacogenetic analysis of the NUDT15 variants. NUDT15 variants are associated with intolerance to thiopurine, a key drug used for treating ALL (14). NUDT15 diplotype information is essential for individualizing thiopurine dosing, especially for individuals who possess heterozygous variants at 2 loci (13). With dHapSeq, we successfully determined the NUDT15 diplotypes of 62 ALL patients and 10 of their family members. Our dHapSeq method is based on sequencing, so it can target a variety of variant types, including those that are difficult to detect using probe-based methods, for example diplotype *3/*5 (15).
The targeted nature of the dHapSeq approach makes it a cost-effective and scalable solution for clinical applications. As the fused products containing the phase information of 2 SNP alleles is only around 140 bp, the amount of sequencing required for dHapSeq is small. With dHapSeq, the estimated cost for haplotyping 100 SNPs in one sample is less than $100 USD. As illustrated in our examples, haplotyping is playing an increasingly important role in clinical practice, in particular in pharmacogenomics analyses and NIPT. The dHapSeq represents a robust and scalable approach for haplotyping for hundreds of SNPs. It would be a useful tool for clinical practice and research.
Supplemental Material
Supplemental material is available at Clinical Chemistry online.
Nonstandard Abbreviations
dHapSeq, digital PCR haplotype sequencing; NIPT, noninvasive prenatal testing; RHDO, relative haplotype dosage; SNP, single nucleotide polymorphism; ALL, acute lymphoblastic leukemia; ORL, overlapping region; HRM, high-resolution melting; SEA, Southeast Asian.
Human Genes
NUDT15, nudix hydrolase 15 gene; HBA1, hemoglobin subunit alpha 1 gene; HBA2, hemoglobin subunit alpha 2 gene.
Author Contributions
The corresponding author takes full responsibility that all authors on this publication have met the following required criteria of eligibility for authorship: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved. Nobody who qualifies for authorship has been omitted from the list.
Authors’ Disclosures or Potential Conflicts of Interest
Upon manuscript submission, all authors completed the author disclosure form.
Research Funding
This work was supported by the Innovation and Technology Commission under the InnoHK initiative and research grants from the Council of the Hong Kong Special Administrative Region Government under the theme-based research scheme (T12-401/16-W) to Y.M.D. Lo and K.C.A. Chan. Y.M.D. Lo is supported by an endowed chair from the Li Ka Shing Foundation.
Disclosures
These authors have employment or leadership roles with: W.K.J. Lam, DRA Limited; P. Jiang, Insighta, DRA Limited, KingMed Future; K.C.A. Chan, Insighta, DRA Limited, Take2 Holdings Limited, Centre for Novostics. Consultant or advisory roles: P. Jiang, KingMed Future; Y.M.D. Lo, Grail, Decheng Capital. Stock ownership: W.K.J. Lam, Grail/Illumina; P. Jiang, Grail/Illumina, DRA Limited, Take2 Holdings Limited; Y.M.D. Lo, Grail/Illumina, DRA Limited, Take2 Holdings Limited, Insighta; K.C.A. Chan, Grail/Illumina, DRA Limited, Take2 Holdings Limited, Insighta. A patent application on the described technology has been filed by W. Gai, Y.M.D. Lo, and K.C.A. Chan and licensed to DRA Limited founded by K.C.A. Chan, R.W.K Chiu., and Y.M.D. Lo. P. Jiang; patent applications related to cell-free DNA have been filed. P. Jiang, royalties or licenses from Grail/Illumina, Xcelom, DRA Limited, Take2 Holdings Limited. Y.M.D. Lo and K.C.A. Chan, royalties or licenses from Grail/Illumina, Xcelom, DRA Limited, Take2 Holdings Limited, and Insighta. S.C.Y. Yu has received patent royalties from Illumina, Xcelom, Take2, and DRA; has been granted or filed patents on cell-free nucleic acid analyses; and has received travel support from Oxford Nanopore Technologies for meeting participation.
Role of Sponsor
The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, preparation of manuscript, or final approval of manuscript.
Data Availability Statement
Sequencing data for subjects analyzed in this study have been deposited at the European Genome-Phenome Archive, (https://www.ebi.ac.uk/ega/), with the accession number of EGAS00001007600.
References
Author notes
Wanxia Gai and Guangya Wang contributed equally to this study.