Fig. 2.
Complete clustering pipeline diagram. Phases that are executed in parallel are represented in stacked boxes. (1) filter near-duplicates, (2) compute suboptimal structures, (3) compute sparse vector encoding, (4) compute global feature index and return top dense sets, (5) refine clusters with structural alignment procedure, (6) build covariance model with remaining high quality instances, (7) populate each cluster with retrieved instances, (8) remove clustered instances and iterate from Step 4 and (9) merge redundant clusters.