The workflow of REBET. (A). Inputs of this method: a gene expression data matrix of |$n$| cells in rows and |$p$| genes in columns. (B). SC3 was used to cluster the input data with a range of potential cluster numbers (From top to bottom, |$k $| from 2 to 10). Small squares, triangles and circles all represent cells, and different shapes indicate different cell clusters (there are three cell clusters in total). When the cells are in the same blue background color circle, it means they are partitioned into the same cluster. (C). Each clustering result was treated as a batch variable to remove the batch effect from the input data. To illustrate the effect of removing batch effects, we visualized the batch-effect-removed data by applying t-distributed Stochastic Neighbor Embedding(t-SNE) for dimensionality reduction. When the |$k $| value is less than the true number of cell clusters, the batch-effect-removed data |$\textbf{D}_{k}$| still contains the information of the cell cluster label. Such as Dataset |$\textbf{D}_{2}$| can still be separated by the first principal component. (D). The batch-effect-removed data were clustered separately again, and the mixing degree coefficient, ARNMI, was calculated. The number of clusters corresponding to the minimum ARNMI value is the number of cell clusters determined by REBET.
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.