The |$H_{+}$| metric is an internal validity measure for assessing the performance of induced cluster labels. Multidimensional scaling (MDS) plots with shapes representing true cell type labels from the |$\mathtt{sc\_mixology}$| scRNA-seq data set and colors representing induced (or predicted) cluster labels from four hierarchical clustering methods implemented in the hclust() function in the base R stats package including (a) Ward’s method, (b) single linkage method, (c) complete linkage method, and (d) unweighted pair group method with arithmetic mean (UPGMA). (e) Scatter plot of |$H_{+}$| (an internal validity metric) compared to Adjusted Rand Index (ARI) (an external validity metric) demonstrating shared information between the two metrics, which |$H_{+}$| (calculated with the HPE algorithm 1 using |$p=101$|) recovers without the need of an externally labeled set of observations. (f) A performance plot with three internal validity metrics (|$y$|-axis scaled between 0 and 1): (i) |$1-H_{+}$| (for ease of comparison) calculated from labels induced using with |$k=2,\dots,10$| (|$x$|-axis), (ii) mean silhouette score, and (iii) within-clusters sums of square (WCSS). The “peak” of the |$1-H_{+}$| metric at the correct |$k=3$| indicates that |$H_{+}$| accurately identifies the most accurate label in a comparable fashion to established internal fitness measure, namely a “peak” at the mean silhouette score and a “bend” in the WCSS curve.
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.