Figure 2.
Methods for optimal clusterings. a, b) Strength of clustering (silhouette coefficient) across all 128 tree sets under each tree distance metric in (a) original tree space; (b) 2D PCoA mapping. Box plots denote median and interquartile range; strong evidence that medians differ exists where notches do not overlap. c, d) Number of clusters in optimal clustering under each tree distance method, calculated from (c) original distances; (d) 2D PCoA mapping. Tree sets lacking “reasonable” structure (i.e., silhouette coefficients $< 0.5$) are taken to exhibit a single cluster. e, f) Mean difference (variation of information) between optimal clusterings obtained under (e), each tree distance metric (f), each clustering method, from data sets exhibiting at least “reasonable” clustering structure. Brighter colors represent greater differences. g) Definition (silhouette coefficient) of optimal clustering obtained under each clustering method, summarized for all distances and tree sets. Bars denote medians and interquartile ranges. h) Method obtaining clustering with highest silhouette coefficient, across all tree spaces with at least “reasonable” clustering structure (silhouette coefficient $>$ 0.5).

Methods for optimal clusterings. a, b) Strength of clustering (silhouette coefficient) across all 128 tree sets under each tree distance metric in (a) original tree space; (b) 2D PCoA mapping. Box plots denote median and interquartile range; strong evidence that medians differ exists where notches do not overlap. c, d) Number of clusters in optimal clustering under each tree distance method, calculated from (c) original distances; (d) 2D PCoA mapping. Tree sets lacking “reasonable” structure (i.e., silhouette coefficients |$< 0.5$|⁠) are taken to exhibit a single cluster. e, f) Mean difference (variation of information) between optimal clusterings obtained under (e), each tree distance metric (f), each clustering method, from data sets exhibiting at least “reasonable” clustering structure. Brighter colors represent greater differences. g) Definition (silhouette coefficient) of optimal clustering obtained under each clustering method, summarized for all distances and tree sets. Bars denote medians and interquartile ranges. h) Method obtaining clustering with highest silhouette coefficient, across all tree spaces with at least “reasonable” clustering structure (silhouette coefficient |$>$| 0.5).

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close