Table 1 Open in new tab Overview of the...

Method

Environment and availability

Short description

reference

kmeans

R package (stats)

Iterative partitioning algorithm; used with or without good initial values (centroids).

[10]

cmeans

R package (e1071)

Iterative soft partitioning algorithm; with or without good initial values.

[11]

DBSCAN

R package (dbscan)

Density-based clustering algorithm; the number of sufficient neighbours and radius has to be pre-specified.

[12]

dpcp

R code from GitHub; R shiny app

Two-step approach; DBSCAN is utilized to identify the locations of first-order clusters, then cmeans is applied.

[13]

flowSOM

R package from Bioconductor

Self-organizing map to find winning nodes, followed by hierarchical clustering on those representatives.

[14]

flowPeaks

R package from Bioconductor

Based on finite Gaussian mixture models; start with kmeans to compute smooth density function empirically, then merging is performed; no need to specify the number of clusters.

[15]

flowClust

R package from Bioconductor

Based on t mixture models with Box-Cox transformation; model parameters are inferred using an Expectation-Maximization algorithm; the number of clusters can be pre-specified or automatically chosen by the Bayesian information criterion (BIC).

[16]

flowMerge

R package from Bioconductor

Extension of flowClust and is intended to solve the issue of flowClust producing too many clusters in the automatic mode; the best model is selected by the change point in entropy.

[17]

SamSPECTRAL

R package from Bioconductor

Graph-based method; starts with data reduction due to the computational cost, then computes the similarity matrix, followed by spectral clustering, then kmeans; the number of clusters can be pre-specified or automatically determined by the knee point of the eigenvalue plot.

[18]

calico

R code from GitHub; R shiny app

Based on gridding and kmeans; starts with sample space gridding to reduce the differences in density, then the first round of kmeans is implemented on the gridded data to obtain centroid for the second round of kmeans.

[19]

ddPCRclust

R package from Bioconductor

Ensemble-based approach that combines the outcomes of flowDensity, SamSPECTRAL and flowPeaks.

[20]

Table 1

Open in new tab

Overview of the clustering evaluated in this study

Method	Environment and availability	Short description	reference
kmeans	R package (stats)	Iterative partitioning algorithm; used with or without good initial values (centroids).	[10]
cmeans	R package (e1071)	Iterative soft partitioning algorithm; with or without good initial values.	[11]
DBSCAN	R package (dbscan)	Density-based clustering algorithm; the number of sufficient neighbours and radius has to be pre-specified.	[12]
dpcp	R code from GitHub; R shiny app	Two-step approach; DBSCAN is utilized to identify the locations of first-order clusters, then cmeans is applied.	[13]
flowSOM	R package from Bioconductor	Self-organizing map to find winning nodes, followed by hierarchical clustering on those representatives.	[14]
flowPeaks	R package from Bioconductor	Based on finite Gaussian mixture models; start with kmeans to compute smooth density function empirically, then merging is performed; no need to specify the number of clusters.	[15]
flowClust	R package from Bioconductor	Based on t mixture models with Box-Cox transformation; model parameters are inferred using an Expectation-Maximization algorithm; the number of clusters can be pre-specified or automatically chosen by the Bayesian information criterion (BIC).	[16]
flowMerge	R package from Bioconductor	Extension of flowClust and is intended to solve the issue of flowClust producing too many clusters in the automatic mode; the best model is selected by the change point in entropy.	[17]
SamSPECTRAL	R package from Bioconductor	Graph-based method; starts with data reduction due to the computational cost, then computes the similarity matrix, followed by spectral clustering, then kmeans; the number of clusters can be pre-specified or automatically determined by the knee point of the eigenvalue plot.	[18]
calico	R code from GitHub; R shiny app	Based on gridding and kmeans; starts with sample space gridding to reduce the differences in density, then the first round of kmeans is implemented on the gridded data to obtain centroid for the second round of kmeans.	[19]
ddPCRclust	R package from Bioconductor	Ensemble-based approach that combines the outcomes of flowDensity, SamSPECTRAL and flowPeaks.	[20]

Method	Environment and availability	Short description	reference
kmeans	R package (stats)	Iterative partitioning algorithm; used with or without good initial values (centroids).	[10]
cmeans	R package (e1071)	Iterative soft partitioning algorithm; with or without good initial values.	[11]
DBSCAN	R package (dbscan)	Density-based clustering algorithm; the number of sufficient neighbours and radius has to be pre-specified.	[12]
dpcp	R code from GitHub; R shiny app	Two-step approach; DBSCAN is utilized to identify the locations of first-order clusters, then cmeans is applied.	[13]
flowSOM	R package from Bioconductor	Self-organizing map to find winning nodes, followed by hierarchical clustering on those representatives.	[14]
flowPeaks	R package from Bioconductor	Based on finite Gaussian mixture models; start with kmeans to compute smooth density function empirically, then merging is performed; no need to specify the number of clusters.	[15]
flowClust	R package from Bioconductor	Based on t mixture models with Box-Cox transformation; model parameters are inferred using an Expectation-Maximization algorithm; the number of clusters can be pre-specified or automatically chosen by the Bayesian information criterion (BIC).	[16]
flowMerge	R package from Bioconductor	Extension of flowClust and is intended to solve the issue of flowClust producing too many clusters in the automatic mode; the best model is selected by the change point in entropy.	[17]
SamSPECTRAL	R package from Bioconductor	Graph-based method; starts with data reduction due to the computational cost, then computes the similarity matrix, followed by spectral clustering, then kmeans; the number of clusters can be pre-specified or automatically determined by the knee point of the eigenvalue plot.	[18]
calico	R code from GitHub; R shiny app	Based on gridding and kmeans; starts with sample space gridding to reduce the differences in density, then the first round of kmeans is implemented on the gridded data to obtain centroid for the second round of kmeans.	[19]
ddPCRclust	R package from Bioconductor	Ensemble-based approach that combines the outcomes of flowDensity, SamSPECTRAL and flowPeaks.	[20]

Additional details including software package versions used for each clustering method are included in Appendix Table S2.

This Feature Is Available To Subscribers Only