-
PDF
- Split View
-
Views
-
Cite
Cite
Vinojini Vivekanandam, Rebecca Ellmers, Dipa Jayaseelan, Henry Houlden, Roope Männikkö, Michael G Hanna, In silico versus functional characterization of genetic variants: lessons from muscle channelopathies, Brain, Volume 146, Issue 4, April 2023, Pages 1316–1321, https://doi.org/10.1093/brain/awac431
- Share Icon Share
Abstract
Accurate determination of the pathogenicity of missense genetic variants of uncertain significance is a huge challenge for implementing genetic data in clinical practice. In silico predictive tools are used to score variants’ pathogenicity. However, their value in clinical settings is often unclear, as they have not usually been validated against robust functional assays. We compared nine widely used in silico predictive tools, including more recently developed tools (EVE and REVEL) with detailed cell-based electrophysiology, for 126 CLCN1 variants discovered in patients with the skeletal muscle channelopathy myotonia congenita. We found poor accuracy for most tools. The highest accuracy was obtained with MutationTaster (84.58%) and REVEL (82.54%). Both of these scores showed poor specificity, although specificity was better using EVE. Combining methods based on concordance improved performance overall but still lacked specificity. Our calculated statistics for the predictive tools were different to reported values for other genes in the literature, suggesting that the utility of the tools varies between genes. Overall, current predictive tools for this chloride channel are not reliable for clinical use, and tools with better specificity are urgently required. Improving the accuracy of predictive tools is a wider issue and a huge challenge for effective clinical implementation of genetic data.
Introduction
The advent of next generation and whole genome sequencing is generating unprecedented volumes of genetic data. Accurate interpretation of novel variants of uncertain significance in the clinical context is arguably one of the biggest challenges in genomic medicine. Accurate classification is paramount. Falsely rejecting pathogenic variants leads to an unnecessary ongoing search for the underlying genetic cause and a missed diagnosis, while attributing pathogenicity incorrectly has significant consequences for patients and their families. Several in silico predictive algorithms have been developed to assist in determining the pathogenicity of missense single-nucleotide variants and are routinely used. However, their efficacy and reliability for specific genes requires assessment.
The prediction tools considered in the variant scoring framework from the American College of Medical Genetics (ACMG) include PolyPhen-2, SIFT, Align-GVGD and MutationTaster2.1 These tools consider the nature of the substituting amino acid and the conservation of the substituted amino acid residue. More recently, metapredictors such as REVEL have been developed, which predict pathogenicity based on a combination of individual tools.2 In 2021, EVE, a predictive model developed with deep generative models based on evolutionary data was released.3 The Association for Clinical Genomic Science (ACGS) and diagnostic laboratory guidelines consider the concordance of tools in building support for the pathogenicity of novel variants.4,5
While studies comparing the efficacy of these tools in specialties such cancer, audiology and cardiology have been performed, few studies have been conducted in neurology.1,6 Moreover, several previously performed studies have compared in silico predictive algorithms to databases such as ClinVar, which introduces concerns regarding circular comparisons—ClinVar variant characterizations already take in silico predictions into consideration.7,8 To our knowledge, aside from the validation performed by the authors of EVE, there have been no other comparisons to EVE scores.
Ion channels provide an attractive model system for the comparison of predicted and recorded measures of pathogenicity, as electrophysiological data assessing the function of channel variants are often readily available. In particular, as part of the diagnostic platform for myotonia, we routinely characterize the function of CLCN1 variants identified in patients with myotonia. CLCN1 encodes skeletal muscle chloride voltage gated channel 1 (CLC-1), which regulates the electrical excitability of muscle.9 Variants that lead to a reduction in chloride conductance increase muscle membrane excitability, causing myotonia.10 Myotonia can be caused by several conditions, of which myotonia congenita is the most common form of non-dystrophic myotonia. Myotonia congenita can be inherited in an autosomal dominant or autosomal recessive manner. We compared in silico predictive tools to the pathogenicity as determined by functional in vivo classification of variants in CLCN1.
Materials and methods
Our dataset included 126 CLC-1 missense variants functionally characterized as a part of the diagnostic platform of skeletal muscle channelopathies. Assessment of pathogenicity for several of these variants was recently reported.11
In silico prediction
Alamut Visual 2.15−64 bit (SOPHiA Genetics) was used to determine pathogenicity scores and classifications with the tools PolyPhen-2, Align-GVGD (a-GVGD), SIFT and MutationTaster.1,12–16 GnomAD frequencies and Grantham distance were also extracted. Ensembl was used to determine pathogenicity scores and classifications for REVEL, MetaLR, CADD and MutationAssessor.17–21 EVE scores were taken from the EVE platform.3
Functional determination
Methods for the generation of channel variants, expression of channel variants in Xenopus oocytes, electrophysiological analysis using two-electrode voltage-clamp and criteria for determination of pathogenicity were recently described by Suetterlin et al.11 Briefly, if the voltage of half-maximal activation was positive to a cut-off value of −18.6 mV, or if the channel variant expressed no or only minimal ClC-1 currents, the variant was considered to be pathogenic (Fig. 1A). Variants with other loss-of-function features as reported in Suetterlin et al.11 were also considered pathogenic.
![Functional assessment of the pathogenicity of CLC-1 variants and comparison to in silico predictive tools. (A) Voltage of half-maximal activation (V1/2) is plotted against current amplitude for 126 CLC-1 variants. Please note change of scale at 0 mV. The vertical red line and horizontal pink line represent the cut-off voltage [vertical (−18.6 mV)] and current amplitude [horizontal (−2.5 µA)]. Data for wild-type channel are shown in red and all the variants in the wild-type channel quadrant defined by the cut-off lines were considered benign. Several variants showed no currents or showed currents that could not be characterized only in terms of V1/2 and current amplitude.11 The V1/2 values of these variants were not assessed but are plotted in the graph with 0 current amplitude in blue. Variants in orange show wild-type-like voltage dependence of activation and current amplitude, but the rate of activation differed from wild-type. Based on the cut-off criteria, these were not classified as pathogenic. (B) ROC curves for in silico predictive tools + Grantham distance.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/brain/146/4/10.1093_brain_awac431/1/m_awac431f1.jpeg?Expires=1747906787&Signature=yJtcXesQQR1~HK8WmbpQkPHwS3v1ddfLJaOKHYcLflydmHKtT5yK-4FSo1C38oRCQWMGQuK8K8HwyrgxQHuBvH4fp4F8BJeBabOt5a8Or7cf-aoMxLYZu~NolICjPxg92bpgLLAfTRfsn70ywtNj0c3-5r6GZk5vLyg4Jy47LX3v1pRddIfIxCIkA-5c9eNasZ9HieAUz4h5fEQ-tvzqTpP7arzm~4v4XyOy68ihlnp4QVYmGVM0GnXKwvfI93LdL1M8NByOWAAduvpbTv1PIUhB~cpl5YewbZ8sxNJGTJqdu2s61Z-zK7uIRUAl7A4Gl-H~VH1rdm-U7ZYiXGriEg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Functional assessment of the pathogenicity of CLC-1 variants and comparison to in silico predictive tools. (A) Voltage of half-maximal activation (V1/2) is plotted against current amplitude for 126 CLC-1 variants. Please note change of scale at 0 mV. The vertical red line and horizontal pink line represent the cut-off voltage [vertical (−18.6 mV)] and current amplitude [horizontal (−2.5 µA)]. Data for wild-type channel are shown in red and all the variants in the wild-type channel quadrant defined by the cut-off lines were considered benign. Several variants showed no currents or showed currents that could not be characterized only in terms of V1/2 and current amplitude.11 The V1/2 values of these variants were not assessed but are plotted in the graph with 0 current amplitude in blue. Variants in orange show wild-type-like voltage dependence of activation and current amplitude, but the rate of activation differed from wild-type. Based on the cut-off criteria, these were not classified as pathogenic. (B) ROC curves for in silico predictive tools + Grantham distance.
Statistical analyses were performed using Excel version 16.65 and IBM SPSS version 26 and expressed as specificity, sensitivity, positive predicative value, negative predictive value, accuracy and receiver operating characteristics (ROC) curve. The following equations were used: sensitivity = true positive (TP) / [(TP + false negative (FN)]; specificity = true negative (TN) / [TN + false positive (FP)]; positive predictive value (PPV) = TP / (TP + FP); negative predictive value (NPV) = TN / (TN + FN); and accuracy = (TP + TN) / (TP + TN + FP + FN).
Data availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
Results
From a total of 126 CLCN1 variants, based on in vivo functional characterization, 91 were pathogenic and 35 were benign (Fig. 1B). The variants were considered pathogenic if the half-maximal voltage dependence of activation was positive to −18.6 mV and the peak tail-current amplitude at −100 mV was smaller than −2.5 µA. For the variants with reduced current amplitude, most of the cells did not show any currents.11 In addition, variants with other loss-of-function features that could not be characterized in terms of voltage of half maximal activation or current amplitude were also considered pathogenic.11
Comparing the prediction tools, MutationTaster, REVEL, EVE and PolyPhen had above 80% accuracy. Sensitivity, specificity, positive and negative predictive values as well as accuracy for each tool are shown in Table 1.
. | MutationTaster . | REVEL . | EVE . | PolyPhen . | MetaLR . | SIFT . | aGVGD . | MutationAssessor . | CADD . |
---|---|---|---|---|---|---|---|---|---|
True positive (% of total pathogenic) | 88 (97) | 87 (96) | 59 (65) | 74 (81) | 91 (100) | 62 (68) | 26 (29) | 39 (43) | 13 (14) |
False positive | 15 | 18 | 8 | 15 | 28 | 9 | 0 | 18 | 3 |
Total actual pathogenic | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 |
True negative (%) | 18 (51) | 17 (49) | 19 (54) | 10 (29) | 7 (20) | 26 (74) | 31 (86) | 17 (49) | 32 (91) |
False negative | 3 | 4 | 9 | 5 | 0 | 29 | 40 | 12 | 78 |
Total actual benign | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 |
Uncertain (n) | 2 | 0 | 31 | 22 | 0 | 0 | 29 | 40 | 0 |
Accuracy | 85.48% | 82.54% | 82.11% | 80.77% | 77.78% | 69.84% | 58.76% | 65.12% | 35.71% |
Sensitivity | 0.97 | 0.96 | 0.87 | 0.94 | 1.00 | 0.68 | 0.39 | 0.76 | 0.14 |
Specificity | 0.55 | 0.49 | 0.70 | 0.40 | 0.20 | 0.74 | 1.00 | 0.49 | 0.91 |
Positive predictive value | 0.85 | 0.83 | 0.88 | 0.83 | 0.76 | 0.87 | 1.00 | 0.68 | 0.81 |
Negative predictive value | 0.86 | 0.81 | 0.68 | 0.67 | 1.00 | 0.47 | 0.44 | 0.59 | 0.29 |
. | MutationTaster . | REVEL . | EVE . | PolyPhen . | MetaLR . | SIFT . | aGVGD . | MutationAssessor . | CADD . |
---|---|---|---|---|---|---|---|---|---|
True positive (% of total pathogenic) | 88 (97) | 87 (96) | 59 (65) | 74 (81) | 91 (100) | 62 (68) | 26 (29) | 39 (43) | 13 (14) |
False positive | 15 | 18 | 8 | 15 | 28 | 9 | 0 | 18 | 3 |
Total actual pathogenic | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 |
True negative (%) | 18 (51) | 17 (49) | 19 (54) | 10 (29) | 7 (20) | 26 (74) | 31 (86) | 17 (49) | 32 (91) |
False negative | 3 | 4 | 9 | 5 | 0 | 29 | 40 | 12 | 78 |
Total actual benign | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 |
Uncertain (n) | 2 | 0 | 31 | 22 | 0 | 0 | 29 | 40 | 0 |
Accuracy | 85.48% | 82.54% | 82.11% | 80.77% | 77.78% | 69.84% | 58.76% | 65.12% | 35.71% |
Sensitivity | 0.97 | 0.96 | 0.87 | 0.94 | 1.00 | 0.68 | 0.39 | 0.76 | 0.14 |
Specificity | 0.55 | 0.49 | 0.70 | 0.40 | 0.20 | 0.74 | 1.00 | 0.49 | 0.91 |
Positive predictive value | 0.85 | 0.83 | 0.88 | 0.83 | 0.76 | 0.87 | 1.00 | 0.68 | 0.81 |
Negative predictive value | 0.86 | 0.81 | 0.68 | 0.67 | 1.00 | 0.47 | 0.44 | 0.59 | 0.29 |
Most accurate tool listed on the left and least accurate on the right.
. | MutationTaster . | REVEL . | EVE . | PolyPhen . | MetaLR . | SIFT . | aGVGD . | MutationAssessor . | CADD . |
---|---|---|---|---|---|---|---|---|---|
True positive (% of total pathogenic) | 88 (97) | 87 (96) | 59 (65) | 74 (81) | 91 (100) | 62 (68) | 26 (29) | 39 (43) | 13 (14) |
False positive | 15 | 18 | 8 | 15 | 28 | 9 | 0 | 18 | 3 |
Total actual pathogenic | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 |
True negative (%) | 18 (51) | 17 (49) | 19 (54) | 10 (29) | 7 (20) | 26 (74) | 31 (86) | 17 (49) | 32 (91) |
False negative | 3 | 4 | 9 | 5 | 0 | 29 | 40 | 12 | 78 |
Total actual benign | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 |
Uncertain (n) | 2 | 0 | 31 | 22 | 0 | 0 | 29 | 40 | 0 |
Accuracy | 85.48% | 82.54% | 82.11% | 80.77% | 77.78% | 69.84% | 58.76% | 65.12% | 35.71% |
Sensitivity | 0.97 | 0.96 | 0.87 | 0.94 | 1.00 | 0.68 | 0.39 | 0.76 | 0.14 |
Specificity | 0.55 | 0.49 | 0.70 | 0.40 | 0.20 | 0.74 | 1.00 | 0.49 | 0.91 |
Positive predictive value | 0.85 | 0.83 | 0.88 | 0.83 | 0.76 | 0.87 | 1.00 | 0.68 | 0.81 |
Negative predictive value | 0.86 | 0.81 | 0.68 | 0.67 | 1.00 | 0.47 | 0.44 | 0.59 | 0.29 |
. | MutationTaster . | REVEL . | EVE . | PolyPhen . | MetaLR . | SIFT . | aGVGD . | MutationAssessor . | CADD . |
---|---|---|---|---|---|---|---|---|---|
True positive (% of total pathogenic) | 88 (97) | 87 (96) | 59 (65) | 74 (81) | 91 (100) | 62 (68) | 26 (29) | 39 (43) | 13 (14) |
False positive | 15 | 18 | 8 | 15 | 28 | 9 | 0 | 18 | 3 |
Total actual pathogenic | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 | 91 |
True negative (%) | 18 (51) | 17 (49) | 19 (54) | 10 (29) | 7 (20) | 26 (74) | 31 (86) | 17 (49) | 32 (91) |
False negative | 3 | 4 | 9 | 5 | 0 | 29 | 40 | 12 | 78 |
Total actual benign | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 | 35 |
Uncertain (n) | 2 | 0 | 31 | 22 | 0 | 0 | 29 | 40 | 0 |
Accuracy | 85.48% | 82.54% | 82.11% | 80.77% | 77.78% | 69.84% | 58.76% | 65.12% | 35.71% |
Sensitivity | 0.97 | 0.96 | 0.87 | 0.94 | 1.00 | 0.68 | 0.39 | 0.76 | 0.14 |
Specificity | 0.55 | 0.49 | 0.70 | 0.40 | 0.20 | 0.74 | 1.00 | 0.49 | 0.91 |
Positive predictive value | 0.85 | 0.83 | 0.88 | 0.83 | 0.76 | 0.87 | 1.00 | 0.68 | 0.81 |
Negative predictive value | 0.86 | 0.81 | 0.68 | 0.67 | 1.00 | 0.47 | 0.44 | 0.59 | 0.29 |
Most accurate tool listed on the left and least accurate on the right.
Assessing ROC curves on sensitivity versus specificity plots demonstrated that the better predictive tools for CLCN1 are EVE, MutationTaster, MetaLR and REVEL (Fig. 1B). The highest area under the curve (AUC) score was for REVEL (Table 2).
. | REVEL . | MutationTaster . | MetaLR . | Mutation Assessor . | SIFT . | EVE . | PolyPhen . | CADD . |
---|---|---|---|---|---|---|---|---|
AUC (SE) | 0.89 (0.3) | 0.88 (0.03) | 0.86 (0.04) | 0.83 (0.04) | 0.82 (0.04) | 0.80 (0.05) | 0.75 (0.05) | 0.66 (0.06) |
95% CI | 0.83–0.95 | 0.81–0.94 | 0.79–0.93 | 0.76–0.9 | 0.74–0.89 | 0.7–0.89 | 0.66–0.85 | 0.54–0.77 |
. | REVEL . | MutationTaster . | MetaLR . | Mutation Assessor . | SIFT . | EVE . | PolyPhen . | CADD . |
---|---|---|---|---|---|---|---|---|
AUC (SE) | 0.89 (0.3) | 0.88 (0.03) | 0.86 (0.04) | 0.83 (0.04) | 0.82 (0.04) | 0.80 (0.05) | 0.75 (0.05) | 0.66 (0.06) |
95% CI | 0.83–0.95 | 0.81–0.94 | 0.79–0.93 | 0.76–0.9 | 0.74–0.89 | 0.7–0.89 | 0.66–0.85 | 0.54–0.77 |
. | REVEL . | MutationTaster . | MetaLR . | Mutation Assessor . | SIFT . | EVE . | PolyPhen . | CADD . |
---|---|---|---|---|---|---|---|---|
AUC (SE) | 0.89 (0.3) | 0.88 (0.03) | 0.86 (0.04) | 0.83 (0.04) | 0.82 (0.04) | 0.80 (0.05) | 0.75 (0.05) | 0.66 (0.06) |
95% CI | 0.83–0.95 | 0.81–0.94 | 0.79–0.93 | 0.76–0.9 | 0.74–0.89 | 0.7–0.89 | 0.66–0.85 | 0.54–0.77 |
. | REVEL . | MutationTaster . | MetaLR . | Mutation Assessor . | SIFT . | EVE . | PolyPhen . | CADD . |
---|---|---|---|---|---|---|---|---|
AUC (SE) | 0.89 (0.3) | 0.88 (0.03) | 0.86 (0.04) | 0.83 (0.04) | 0.82 (0.04) | 0.80 (0.05) | 0.75 (0.05) | 0.66 (0.06) |
95% CI | 0.83–0.95 | 0.81–0.94 | 0.79–0.93 | 0.76–0.9 | 0.74–0.89 | 0.7–0.89 | 0.66–0.85 | 0.54–0.77 |
When considering the concordance of different tools, as is done using ACMG criteria, ACGS recommendations as well as diagnostic lab consensus, with three of four tools requiring concordance to be accepted, we found that 79 variants were classified correctly, 12 were incorrectly classified and 34 were unable to be classified due to a lack of concordance (Table 3). The tools commonly used when applying the ACMG criteria are PolyPhen, SIFT, MutationTaster and aGVGD.
. | Tools used in the ACMG/ACGS guidelines (PolyPhen, SIFT, MutationTaster, aGVGD) . | Tools performing highly based on AUC and specificity (REVEL, MutationTaster, MetaLR, EVE) . | (REVEL, MutationTaster, EVE) . |
---|---|---|---|
Sensitivity | 0.92 | 0.99 | 0.97 |
Specificity | 0.74 | 0.48 | 0.65 |
Accuracy | 86.95% | 87.72% | 90.00% |
Positive predictive value | 0.90 | 0.87 | 0.91 |
Negative predictive value | 0.80 | 0.92 | 0.85 |
Number of concordant scores (% of all variants) | 39 (30.95%) [4 of 4 scores concordant] | 70 (55.56%) [4 of 4 scores concordant] | 0 [3 of 3 scores concordant] |
Number of concordant scores (% of all variants) | 53 (42.06%) [3 of 4 scores concordant] | 44 (34.92%) [3 of 4 scores concordant] | 80 (63.49%) [2 of 3 scores concordant] |
Number of variants unclassified | 34 | 12 | 46 |
. | Tools used in the ACMG/ACGS guidelines (PolyPhen, SIFT, MutationTaster, aGVGD) . | Tools performing highly based on AUC and specificity (REVEL, MutationTaster, MetaLR, EVE) . | (REVEL, MutationTaster, EVE) . |
---|---|---|---|
Sensitivity | 0.92 | 0.99 | 0.97 |
Specificity | 0.74 | 0.48 | 0.65 |
Accuracy | 86.95% | 87.72% | 90.00% |
Positive predictive value | 0.90 | 0.87 | 0.91 |
Negative predictive value | 0.80 | 0.92 | 0.85 |
Number of concordant scores (% of all variants) | 39 (30.95%) [4 of 4 scores concordant] | 70 (55.56%) [4 of 4 scores concordant] | 0 [3 of 3 scores concordant] |
Number of concordant scores (% of all variants) | 53 (42.06%) [3 of 4 scores concordant] | 44 (34.92%) [3 of 4 scores concordant] | 80 (63.49%) [2 of 3 scores concordant] |
Number of variants unclassified | 34 | 12 | 46 |
. | Tools used in the ACMG/ACGS guidelines (PolyPhen, SIFT, MutationTaster, aGVGD) . | Tools performing highly based on AUC and specificity (REVEL, MutationTaster, MetaLR, EVE) . | (REVEL, MutationTaster, EVE) . |
---|---|---|---|
Sensitivity | 0.92 | 0.99 | 0.97 |
Specificity | 0.74 | 0.48 | 0.65 |
Accuracy | 86.95% | 87.72% | 90.00% |
Positive predictive value | 0.90 | 0.87 | 0.91 |
Negative predictive value | 0.80 | 0.92 | 0.85 |
Number of concordant scores (% of all variants) | 39 (30.95%) [4 of 4 scores concordant] | 70 (55.56%) [4 of 4 scores concordant] | 0 [3 of 3 scores concordant] |
Number of concordant scores (% of all variants) | 53 (42.06%) [3 of 4 scores concordant] | 44 (34.92%) [3 of 4 scores concordant] | 80 (63.49%) [2 of 3 scores concordant] |
Number of variants unclassified | 34 | 12 | 46 |
. | Tools used in the ACMG/ACGS guidelines (PolyPhen, SIFT, MutationTaster, aGVGD) . | Tools performing highly based on AUC and specificity (REVEL, MutationTaster, MetaLR, EVE) . | (REVEL, MutationTaster, EVE) . |
---|---|---|---|
Sensitivity | 0.92 | 0.99 | 0.97 |
Specificity | 0.74 | 0.48 | 0.65 |
Accuracy | 86.95% | 87.72% | 90.00% |
Positive predictive value | 0.90 | 0.87 | 0.91 |
Negative predictive value | 0.80 | 0.92 | 0.85 |
Number of concordant scores (% of all variants) | 39 (30.95%) [4 of 4 scores concordant] | 70 (55.56%) [4 of 4 scores concordant] | 0 [3 of 3 scores concordant] |
Number of concordant scores (% of all variants) | 53 (42.06%) [3 of 4 scores concordant] | 44 (34.92%) [3 of 4 scores concordant] | 80 (63.49%) [2 of 3 scores concordant] |
Number of variants unclassified | 34 | 12 | 46 |
We looked at concordance between REVEL, MetaLR, MutationTaster and EVE, as these four scores had good AUC and specificity based on our data. Three of four scores required concordant predictions for their predictions to be included. Using these scores, 100 variants were classified correctly, 14 were incorrectly classified and 12 were unable to be classified due to a lack of concordance. Although more variants were able to be classified using concordance of these four scores (REVEL, MetaLR, MutationTaster and EVE) with good accuracy and sensitivity, the specificity was reduced to 0.48 (Table 3). When MetaLR was no longer included, due to its poor individual specificity, the resultant concordant specificity for the three scores (REVEL, MutationTaster and EVE) was improved to 0.65 as shown in Table 3.
In CLCN1, the location of variants has previously been shown to be important.9,11 Variants in the intracellular domain are more likely to be benign, while those in the transmembrane domains are more likely to be pathogenic. In our dataset, 27 variants were intracellular and 99 were in the transmembrane domain. Looking at our concordance analysis, using REVEL + MutationTaster + EVE, 12 of 27 (44.44%) variants in the intracellular domain were predicted correctly and 88 of 99 (88.89%) in the transmembrane domain were predicted correctly. When using the ACMG/ACGS guidelines-based tools (Polyphen + SIFT + MutationTaster + aGVGD) 17 of 27 (62.96%) variants in the intracellular domain were predicted correctly as were 63 of 99 (63.63%) the variants in the transmembrane domain.
Discussion
In silico prediction tools are commonly used to score novel variants, but their validity is often unclear. To assess this requires comparison against robust datasets assessing clinical and functional features of the variants. We performed a comparison of the functional features of CLC-1 variants against in silico tools. While MutationTaster, REVEL, EVE and PolyPhen had above 80% accuracy and relatively good sensitivity over 0.8, the specificity for all four tools was poor. Of these four, EVE had the best specificity at 0.7. This specificity is far from ideal for clinical application but remains much better than the specificity of the other three tools, with good accuracy and sensitivity. EVE is trained on evolutionary sequences, which lends it to having a higher degree of specificity.3
The AUC of EVE was 0.8. While this is a good score, it was below that of REVEL, MutationTaster, MetaLR and SIFT. The REVEL AUC score was high at 0.89. This was not surprising given the mechanism of REVEL, which combines several individual tools as a meta-predictor. The best scores based on AUC for in silico prediction in CLCN1 were those obtained using REVEL, MutationTaster and MetaLR.
The AUC values we report for CLCN1 are lower than other reported AUC values based on the literature. When EVE was compared to ClinVar datasets, an AUC of 0.91 was reported.3 Similarly, REVEL was compared to SwissVar with an AUC of 0.908.2 MetaLR has a reported AUC of 0.883. However, such comparisons are inherently circular, as such databases (ClinVar, SwissVar) incorporate in silico predictive algorithms in categorizing variants as benign or pathogenic.8,22
Few studies compare predictive algorithms to variants that have been functionally characterized in vitro. Similar characterization can be performed with genes responsible for cardiac channelopathies causing long QT syndromes.23 When in silico prediction tools were compared to KCNQ1, KCNH2 and SCN5A variants characterized in vivo or by co-segregation, AUC for PolyPhen was 0.77 for all genes combined and 0.715 for SIFT. When looking at individual genes, the AUC varied from 0.63 to 0.94 using the same score (PolyPhen).
Comparing PolyPhen, Sift and MutationTaster to the functional characterization of RYR1 variants using in vitro contracture tests on muscle biopsies demonstrated an AUC of 0.94 (PolyPhen), 0.98 (Sift) and 0.92 (MutationTaster).24 These values are much higher than the AUC values we demonstrated in CLCN1.
These studies demonstrate clear differences in the AUC for in silico predictive tools for different genes. It is likely that this is due to variations in complex factors such as penetrance and pattern of inheritance. This is important to consider when interpreting a novel variant. Reported AUC, specificity and sensitivity for in silico predictive tools should not be applied generally to all genes.
Using the concordance of several tools appears to improve performance. Concordance (all three in agreeance) between the REVEL, MutationTaster and EVE improved the accuracy, sensitivity and positive and negative predictive values compared to the ACMG or diagnostic lab based tools (3 of 4 concordant out of Polyphen, Sift, MutationTaster and aGVGD). The specificity was slightly reduced, which is a recurring issue across all in silico predictive tools. However, an accuracy of 90% makes a compelling case for considering the use of the newer predictive tools and concordance in the interim, while better tools are developed. Additionally, these tools appear to be in line with differentiating variant pathogenicity based on variant location within the gene. Domains and loci of variants are likely to be useful aspects to include in the design of future predictive tools.25 However, the pathogenicity of variants in some functional domains may not be accessible with certain functional analyses, for example, some CLC-1 intracellular variants that assert pathogenicity by disrupting muscle-specific protein interactions.26 Practically, we suggest that variants in domains that are less well conserved are those that particularly require functional studies. In CLC-1 for example, variants outside the transmembrane domain are less well conserved.11
The correlation of functional features with clinical characteristics such as inheritance patterns is not 100% and is expected for skeletal muscle channelopathies where variants show variable clinical features within and between pedigrees. Also, depending on the type of functional analysis, only certain forms of pathogenicity can be detected—for example exonic variants affecting splicing or tissue-specific interaction will not be picked using heterologous expression and electrophysiological analysis. However, functional expression is a strong indicator of pathogenicity and is classified as such in the ACMG criteria. This creates a robust dataset, in particular compared to ClinVar based datasets, where some variants are reported without any indicators of pathogenicity.
Additional limitations to functional analysis include the time taken, the labour-intensive process and technical expertise needed. Functional expression of a new variant can take months depending on the assay. Not all genes, and indeed not all channel genes, can be expressed. High throughput electrophysiology platforms that use automated multi-channel patch-clamping may overcome some barriers—time in particular. However, the initial purchase cost and cost per data-point of these platforms is significant. High throughput platforms may be an option in the future as access and costs are reduced. To perform traditional functional expression, significant equipment and technical experience is required. In contrast, more accurate in silico tools could be applied by clinicians, geneticists and bioinformaticians.
As per the ACGM guidelines, multiple lines of in silico predictions provide supporting or moderate evidence of a variant being benign or pathogenic, while functional analyses can provide strong indications. Thus, currently, in the case where a functional assay is available, it should be sought. In silico predictions can provide preliminary estimation that may precede functional analyses by months or years and following the functional analysis provide supportive evidence for the pathogenicity of the variant. In the absence of functional assays or other strong indicators, the in silico predictive tools are part of the main pathway to assess pathogenicity. Developing improved predictive tools that are more specific is a key area of need in genomics, particularly for genes without a method for functional assessment. At present, genes without robust expression systems are more limited to accuracy achieved with traditional parameters, including conservation, nature of mutation, mutation hotspots or clinical validation such as segregation testing, which may not always be possible.
Ideally, improved algorithms would be developed that can be applied rapidly to new variants, and newer machine learning techniques may see this happen. Machine learning techniques such as multi-task learning on channel datasets have been used to develop models to predict variant pathogenicity. In addition, the algorithms may incorporate homology modelling approaches.27 However the key challenge remains in using a large enough dataset to train an algorithm without compromising the validity of the data included.28,29 Larger datasets with more inclusive data tend to incorporate unvalidated data points. For example, the multi-task learning support vector machine (MTL-SVM) model for potassium channels is trained on some data that is non-human and may not appear in a disease context.29
At present, clinical assessment incorporating functional and in silico predictions is imperative. Other causes of myotonia need to be considered and excluded. In patients with other causes of myotonia, for example myotonic dystrophy, pathogenic CLCN1 variants can alter the phenotype and must be considered in clinical assessment. For some variants, electrophysiological patterns may not be able to determine the mode of inheritance, and clinical assessment will be important for genetic counselling.
Our study in CLCN1 using a robust dataset and comparing to newer predictive models supports data in other fields of medicine illustrating the poor utility of current in silico predictive tools. Overall, tools with improved specificity, while maintaining good sensitivity, are urgently required, with future assessments performed against robust datasets that have been functionally validated. Importantly, AUC, specificity and sensitivity of the predictive tools varies between genes and requires independent assessment for each gene. While the predictive tools may be supportive in scoring a variant, functional assessment of the variant is warranted where possible.
Funding
M.G.H. is supported by an MRC strategic award for International Centre for Genomic Medicine in Neuromuscular Diseases. V.V. is supported by the HSS for Channelopathies, Mission Possible (National Brain Appeal) and The Jon Moulton Charity. Our work is supported by the UCLH NIHR Biomedical Research Centre. We provide the only NHS England commissioned national diagnostic service and references the laboratory for child and adult muscle channelopathies led from University College London NHS Foundation Trust—for further details contact [email protected].
Competing interests
The authors report no competing interests.