-
PDF
- Split View
-
Views
-
Cite
Cite
Stefano Torniamenti, Mario Pasquato, Pierfrancesco Di Cintio, Alessandro Ballone, Giuliano Iorio, M Celeste Artale, Michela Mapelli, Hierarchical generative models for star clusters from hydrodynamical simulations, Monthly Notices of the Royal Astronomical Society, Volume 510, Issue 2, February 2022, Pages 2097–2110, https://doi.org/10.1093/mnras/stab3608
- Share Icon Share
ABSTRACT
Star formation in molecular clouds is clumpy, hierarchically subclustered. Fractal structure also emerges in hydrodynamical simulations of star-forming clouds. Simulating the formation of realistic star clusters with hydrodynamical simulations is a computational challenge, considering that only the statistically averaged results of large batches of simulations are reliable, due to the chaotic nature of the gravitational N-body problem. While large sets of initial conditions for N-body runs can be produced by hydrodynamical simulations of star formation, this is prohibitively expensive in terms of computational time. Here, we address this issue by introducing a new technique for generating many sets of new initial conditions from a given set of star masses, positions, and velocities from a hydrodynamical simulation. We use hierarchical clustering in phase space to inform a tree representation of the spatial and kinematic relations between stars. This constitutes the basis for the random generation of new sets of stars which share the clustering structure of the original ones but have individually different masses, positions, and velocities. We apply this method to the output of a number of hydrodynamical star-formation simulations, comparing the generated initial conditions to the original ones through a series of quantitative tests, including comparing mass and velocity distributions and fractal dimension. Finally, we evolve both the original and the generated star clusters using a direct N-body code, obtaining a qualitatively similar evolution.
1 INTRODUCTION
A large fraction of star formation happens in clusters (Lada & Lada 2003; but see also the recent findings by Reina-Campos et al. 2019 and Ward, Kruijssen & Rix 2020) which are clumpy and hierarchically substructured (Larson 1995; Bastian et al. 2009; Dib & Henning 2019). Young star clusters often show signatures of fractality (Cartwright 2009; Kuhn et al. 2019), complex motions between sub-clumps (Cantat-Gaudin et al. 2019), and possibly rotation (Hénault-Brunet et al. 2012). In addition, the early expulsion of gas due to stellar winds and supernova explosions brings young clusters out of dynamical equilibrium, causing an expansion phase (Hills 1980; Goodwin & Bastian 2006; Baumgardt & Kroupa 2007; Pfalzner 2009). The comprehension of the early evolution of star clusters is of fundamental importance to interpret the later phases of their life. For example, Corsaro et al. (2017) have shown that the rotation signature from the parent molecular cloud persists in the alignment of stellar spins in some open clusters, even today. Rotation in young and open star clusters is confirmed by numerical simulations (e.g. Lee & Hennebelle 2016; Mapelli 2017; Ballone et al. 2020). Also, observations of globular clusters show signatures of rotation, sometimes with significant dynamical effects (Bianchini et al. 2013; Fabricius et al. 2014; Ferraro et al. 2018; Kamann et al. 2018; Dalessandro et al. 2021) that may have been imprinted in the first phases of their evolution.
Gravitational N-body simulations are a key tool to model the early star cluster evolution, but they often start from rather idealized initial conditions, sampled from equilibrium models, such as Plummer (1911) or King (1966) models. Even though more sophisticated models are available (e.g. Lynden-Bell 1962; Michie & Bodenheimer 1963; King 1966; Prendergast & Tomer 1970; Wilson 1975; Bertin & Stiavelli 1984; Lupton & Gunn 1987; Trenti & Bertin 2005; An & Evans 2006; Varri & Bertin 2012; Gieles & Zocchi 2015; Daniel, Heggie & Varri 2017; Claydon et al. 2019), these were developed with the goal of describing the current, quasi-equilibrium state of star clusters, and do not perform well in describing the early and out-of-equilibrium structure of the clusters. Thus, by design, they bear little resemblance to observed primordial conditions in embedded clusters. For more than a decade, fractal initial conditions have been used as a starting point for realistic simulations (e.g. Goodwin & Whitworth 2004; Allison et al. 2010; Küpper et al. 2011a,b; Parker, Goodwin & Allison 2011; Parker et al. 2014; Park, Goodwin & Kim 2018; Di Carlo et al. 2019), but even this approach does not guarantee that all the relevant characteristics of the actual primordial conditions of star clusters are correctly captured.
A seemingly obvious but computationally demanding way to generate realistic initial conditions for star clusters is to run suites of hydrodynamical simulations, coupled with appropriate recipes for handling star formation and other sub-grid physics (e.g. Klessen & Burkert 2000; Bonnell, Bate & Vine 2003; Bate 2009a; Federrath & Klessen 2012; Krumholz, Klein & McKee 2012; Dale, Ercolano & Bonnell 2015; Fujii & Portegies Zwart 2016; Geen et al. 2016; Seifried et al. 2017; Lee & Hennebelle 2019; Wall et al. 2019; Zamora-Avilés et al. 2019). Despite these efforts, large sets of simulations including all the relevant physics are at present hard to come by, notwithstanding ever-advancing hardware capabilities. This is compounded by the fact that N-body simulations, even direct-summation ones, eventually diverge from the true solution of the N-body problem for most initial conditions due to numerical errors and the chaotic nature of the problem (e.g. see Goodman, Heggie & Hut 1993; Hemsendorf & Merritt 2002; Kandrup & Sideris 2003; Boekholt & Portegies Zwart 2015; Di Cintio & Casetti 2019, 2020; Manwadkar, Trani & Leigh 2020; Wang & Hernandez 2021 and references therein), with the consequence that only ensemble-averaged results are considered reliable within the current consensus. To obtain such averages, multiple N-body runs are needed, each with its own initial conditions.
The aim of this work is to introduce a novel approach to obtain initial conditions for N-body simulations, at a tiny fraction of the computational cost of running additional independent hydrodynamical simulations. Our approach relies on a clustering algorithm that is a method to identify similar instances in a sample and assign them to groups or clusters. The usage of clustering algorithms is widespread in cosmology, where different methods have been developed to identify overdense gravitationally bound systems (i.e. groups or dark matter haloes and subhaloes). There is a huge variety of group-finding algorithms, which perform clustering in different ways. For example, density-peak locators and spherical overdensity finders identify density peaks in the particle distribution and draw spheres of decreasing density around them, down to a density threshold (see e.g. Press & Schechter 1974). Direct particle collectors, instead, connect particles based on a linking length criterion. Such scheme is the basis of the friend-of-friends prescription, which is used in the context of halo-finding in cosmological simulations (Davis et al. 1985) and in galaxy clusters from observational data (Murphy, Geach & Bower 2012; see also Feng & Modi 2017 and references therein). Other group finders extend these two approaches to include particle velocity information (see e.g. Diemand, Kuhlen & Madau 2006; Maciejewski et al. 2009).
Here, we adopt a hierarchical clustering algorithm. As opposed to the aforementioned algorithms, hierarchical clustering does not rely on the definition of a density threshold or a length-scale, but is provided with a definition of similarity between groups. In particular, it proceeds in a hierarchical way, by connecting the most similar pair of clusters, starting from individual instances, until a certain number of groups is reached. This hierarchical construction allows not only to identify overdensities in the distribution of stars, but also to draw information about the structure of star systems at different scales. New stellar clusters can thus be obtained by modifying selected nodes in the hierarchical structure, depending on the properties we want to preserve or modify.
Our generative model is meant to produce new large-scale distributions of sink particles, by preserving the properties that make them so realistic, such as their complex fractal structure. The goodness of our method is evaluated by comparing the fractal structure of the new realizations to that of the original cluster. Also, we check if the new velocity distributions are consistent with the original distribution. Finally, we run N-body simulations of the newly generated clusters to test if they are consistent with the stochastic fluctuations of the original simulations.
The paper is organized as follows. In Section 2, we recap the properties of the hydrodynamical simulations we used to generate our original initial condition sets; Section 3 presents our approach for generating new realizations, while in Section 4 we describe our results and run various checks to compare the generated realizations to the original simulations. In Section 5, we discuss and draw conclusions.
2 SMOOTHED-PARTICLE HYDRODYNAMICAL SIMULATIONS
2.1 Initial conditions and simulation set-up
As a starting point for this work, we used the sink particles from 10 smoothed-particle hydrodynamics (hereafter SPH) simulations of molecular clouds performed by Ballone et al. (2020) using the gasoline2 code (Wadsley, Stadel & Quinn 2004; Wadsley, Keller & Quinn 2017). In the following text, we may refer to these sink particles as ‘stars’ for convenience.
The initial conditions of the SPH simulations are spherical molecular clouds with total gaseous mass in the range 104 ≤ Mmc/M⊙ ≤ 105 (see the last column of Table 1), uniform temperature T0 = 10 K and uniform density ρ0 = 2.5 × 102 cm−3. All runs have a fixed number of initial SPH particles equal to 107, corresponding to a gas mass resolution of 10−3 to 10−2 M⊙, depending on the cloud mass. Stars form during the simulation by means of a sink particle algorithm based on the same prescriptions as Bate, Bonnell & Price (1995). The spatial resolution (kernel size) of the simulation can get as small as 0.001 pc in the densest regions.
Properties of the end states of the SPH simulations of Ballone et al. (2020).
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . | |$M_{\rm mc}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|---|
m1e4 | 2523 | 6 | 1.19 | 2.30 | 4.22 × 103 | 104 |
m2e4 | 2571 | 4 | 1.32 | 2.12 | 6.69 × 103 | 2 × 104 |
m3e4 | 2825 | 5 | 1.48 | 2.20 | 1.03 × 104 | 3 × 104 |
m4e4 | 2868 | 2 | 1.47 | 2.17 | 1.44 × 104 | 4 × 104 |
m5e4 | 2231 | 4 | 1.47 | 1.80 | 1.41 × 104 | 5 × 104 |
m6e4 | 3054 | 5 | 1.69 | 2.15 | 2.04 × 104 | 6 × 104 |
m7e4 | 4214 | 9 | 1.50 | 2.20 | 3.15 × 104 | 7 × 104 |
m8e4 | 2945 | 6 | 1.60 | 1.86 | 2.83 × 104 | 8 × 104 |
m9e4 | 3161 | 4 | 1.52 | 1.90 | 3.05 × 104 | 9 × 104 |
m1e5 | 3944 | 6 | 1.46 | 2.20 | 3.80 × 104 | 105 |
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . | |$M_{\rm mc}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|---|
m1e4 | 2523 | 6 | 1.19 | 2.30 | 4.22 × 103 | 104 |
m2e4 | 2571 | 4 | 1.32 | 2.12 | 6.69 × 103 | 2 × 104 |
m3e4 | 2825 | 5 | 1.48 | 2.20 | 1.03 × 104 | 3 × 104 |
m4e4 | 2868 | 2 | 1.47 | 2.17 | 1.44 × 104 | 4 × 104 |
m5e4 | 2231 | 4 | 1.47 | 1.80 | 1.41 × 104 | 5 × 104 |
m6e4 | 3054 | 5 | 1.69 | 2.15 | 2.04 × 104 | 6 × 104 |
m7e4 | 4214 | 9 | 1.50 | 2.20 | 3.15 × 104 | 7 × 104 |
m8e4 | 2945 | 6 | 1.60 | 1.86 | 2.83 × 104 | 8 × 104 |
m9e4 | 3161 | 4 | 1.52 | 1.90 | 3.05 × 104 | 9 × 104 |
m1e5 | 3944 | 6 | 1.46 | 2.20 | 3.80 × 104 | 105 |
Note. After the name of each simulation (Col. 1), we report the number of stars generated (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (αvir ≡ 2K/|W|, Col. 4), the γ coefficient of the mass-spectrum fitting function of (equation 2, Col. 5), the total mass of the stars (Col. 6), and the mass of the parent molecular cloud (Col. 7).
Properties of the end states of the SPH simulations of Ballone et al. (2020).
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . | |$M_{\rm mc}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|---|
m1e4 | 2523 | 6 | 1.19 | 2.30 | 4.22 × 103 | 104 |
m2e4 | 2571 | 4 | 1.32 | 2.12 | 6.69 × 103 | 2 × 104 |
m3e4 | 2825 | 5 | 1.48 | 2.20 | 1.03 × 104 | 3 × 104 |
m4e4 | 2868 | 2 | 1.47 | 2.17 | 1.44 × 104 | 4 × 104 |
m5e4 | 2231 | 4 | 1.47 | 1.80 | 1.41 × 104 | 5 × 104 |
m6e4 | 3054 | 5 | 1.69 | 2.15 | 2.04 × 104 | 6 × 104 |
m7e4 | 4214 | 9 | 1.50 | 2.20 | 3.15 × 104 | 7 × 104 |
m8e4 | 2945 | 6 | 1.60 | 1.86 | 2.83 × 104 | 8 × 104 |
m9e4 | 3161 | 4 | 1.52 | 1.90 | 3.05 × 104 | 9 × 104 |
m1e5 | 3944 | 6 | 1.46 | 2.20 | 3.80 × 104 | 105 |
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . | |$M_{\rm mc}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|---|
m1e4 | 2523 | 6 | 1.19 | 2.30 | 4.22 × 103 | 104 |
m2e4 | 2571 | 4 | 1.32 | 2.12 | 6.69 × 103 | 2 × 104 |
m3e4 | 2825 | 5 | 1.48 | 2.20 | 1.03 × 104 | 3 × 104 |
m4e4 | 2868 | 2 | 1.47 | 2.17 | 1.44 × 104 | 4 × 104 |
m5e4 | 2231 | 4 | 1.47 | 1.80 | 1.41 × 104 | 5 × 104 |
m6e4 | 3054 | 5 | 1.69 | 2.15 | 2.04 × 104 | 6 × 104 |
m7e4 | 4214 | 9 | 1.50 | 2.20 | 3.15 × 104 | 7 × 104 |
m8e4 | 2945 | 6 | 1.60 | 1.86 | 2.83 × 104 | 8 × 104 |
m9e4 | 3161 | 4 | 1.52 | 1.90 | 3.05 × 104 | 9 × 104 |
m1e5 | 3944 | 6 | 1.46 | 2.20 | 3.80 × 104 | 105 |
Note. After the name of each simulation (Col. 1), we report the number of stars generated (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (αvir ≡ 2K/|W|, Col. 4), the γ coefficient of the mass-spectrum fitting function of (equation 2, Col. 5), the total mass of the stars (Col. 6), and the mass of the parent molecular cloud (Col. 7).
In order to induce a non-isotropic evolution, the SPH gas particles are initially given a turbulent, divergence-free, Gaussian random velocity field with a different random seed for each simulation of the set, following a Burgers (1948) velocity power-law spectrum with index −4 (Bate 2009b). With respect to the classical Kolmogorov (1941) power spectrum (with index −11/3), the Burgers power spectrum better matches turbulence in compressive flows, where shocks are present (Federrath 2013). The clouds are in an initial marginally bound state, so that their initial virial ratio αvir ≡ 2K/|W| = 2, where K and W are the gas kinetic and potential energy, respectively.
No stellar feedback was included in this set of simulations, and we simply decided to assume that our clusters are the result of instantaneous gas removal at 3 Myr after the beginning of the hydrodynamical simulation to roughly simulate the effect of the first supernova explosions. Indeed, Dale et al. (2015) have shown that the pre-supernova gas removal is expected to play a minor effect on the survival and dynamics of stellar clusters and we also checked that at 3 Myr the gas accounts for a small fraction of the mass where most of the stellar mass is residing. Furthermore, at 3 Myr all the clouds converted about 30–40 per cent of their gas mass into sink particles, in agreement with previous hydrodynamical simulations showing that stellar feedback should lead to a maximum star formation efficiency of about this amount (e.g. Vázquez-Semadeni et al. 2010; Dale et al. 2015; Gavagnin et al. 2017; Li et al. 2019). For more details on such choices, we refer the reader to Ballone et al. (2020).
2.2 Structural properties of the SPH simulations
Independently of the specific initial value of Mmc, our SPH simulations present a clumpy structure with Ns ≈ 3 × 103 stars,1 organized in a maximum of Nc = 9 main sub-clumps for m7e4 to a minimum of Nc = 2 for m4e4. Sub-clumps are identified heuristically as groups of neighbouring stars containing more than |$0.05\, {} N_{\rm s}$|, where self-potential energy exceeds that of the rest of the system. Fig. 1 shows the x-y, y-z, and z-x projections of the stars position on the three coordinate planes for the system m1e4, with their masses m shown in colour. We find a rather prominent primordial mass segregation, with heavier stars typically well within the central regions of the main clumps and lighter stars at larger distances from the geometric centres of such subsystems. All systems are above the virial condition, with αvir ranging from 1.19 for the m1e4 case, to 1.69 for m6e4.

From left to right, projections in the x-y, y-z, and z-x planes of the end state of the m1e4 simulation. The colour map marks the mass of the individual stars in units of M⊙.
In order to quantify the properties of the end states of the SPH simulations, we have evaluated their distributions of inter-particle distances f(d), mass spectra f(m), and velocity distributions f(v). Fig. 2 shows these distributions for the sink particles of the simulations m1e4, m3e4, m5e4, m7e4, and m9e4. The distribution of inter-particle distances shows a quite complex structure with several slope changes. The clumpy structure of the particles’ spatial distribution gives rise to several peaks in f(d), corresponding to the distances between the clumps themselves. For the specific case of m1e4, the peaks are located roughly at 0.1, 0.45, 1.75, and 3 pc (as highlighted by the vertical dotted lines), which can be identified as the distances between the approximate centres of the main clumps of the particles shown in Fig. 1.

Distributions of inter-particle distances f(d) (left-hand panel), mass spectra f(m) (middle panel), and velocity distribution f(v) (right-hand panel) for the sink particles taken from the simulations m1e4, m3e4, m5e4, m7e4, and m9e4. The vertical dotted lines in the left-hand panel mark the position of the main peaks of f(d), corresponding to the distances between the main sub-clusters, for the m1e4 case. The thin dotted line in the right-hand panel marks the v−3 power-law trend of the velocity distributions.
The velocity distributions f(v) do not show a relevant dependence on the specific initial value of Mmc, as shown in the right hand panel of Fig. 2. Qualitatively, the velocity distribution is well described by a Maxwell–Boltzmann distribution from v = 0 to 5 km s−1 (value corresponding to the peak of f(v)) and then shows a v−3 power-law trend. The properties of the SPH simulations are summarized in Table 1.
3 METHODS
In the following text, we describe our new procedure to build a generative model of star cluster initial conditions. In principle, a generative model’s goal is to learn a representation of an intractable distribution given a usually finite number of samples. The generator typically maps from a latent domain on which a simple distribution is defined, such as a multivariate Gaussian on Rn, to the complex data domain (e.g. Ruthotto & Haber 2021). Recently, most of the interest in generative models is driven by deep learning approaches, such as generative adversarial networks (Goodfellow et al. 2014). However, in principle, much simpler models such as hidden Markov models (Rabiner & Juang 1986; Eddy 2004) or grammars (e.g. Chomsky 1959; Jelinek, Lafferty & Mercer 1992; Beaumont & Stepney 2009) meet the definition of generative model in the broader sense defined above. The latter have proved useful in the description and generation of objects displaying fractal structure, as in the case of Lindenmayer systems applied to plant growth (Lindenmayer 1968a,b; Prusinkiewicz & Hanan 2013).
Our generative approach focuses on reproducing the complex fractal structure of embedded star clusters from hydrodynamical simulations (see e.g. fig 4 in Ballone et al. 2020) by capturing the relations between sub-clusters at different scales through a hierarchical clustering algorithm. This will eventually allow us to generate new realizations by modifying their macro structure, i.e. the relations between large sub-clusters. The parameters that characterize the relevant properties of these clumps and their relations can be treated as the latent domain of our generative model.
We proceed in two steps. First, we use a hierarchical clustering algorithm to identify clumps of stars at different scales in the phase space of the original hydrodynamical simulation output. The clumps are organized by the algorithm into a hierarchical tree |$\mathcal {T}$|, where the root node contains the whole set of stars and each subsequent node represents a two-way split with each branch being a clump of stars, down to the leaf nodes representing individual stars. For each node |$\mathcal {T}_i$|, we describe the relevant physical properties of the cluster in terms of the distance vector between the centres of mass of the clumps |$\mathbf {l}_i$|, their relative velocity vector |$\mathbf {u}_i$|, and the mass ratio between the two clumps. To describe how the mass is split at each node we refer to qi, defined as the ratio between the lightest of the two resulting groups and the total mass of the node. With this definition, mass ratios fall between 0 (maximally unequal split) and 0.5 (equal-mass split). The description of the star clusters in terms of the hierarchical clustering algorithm is given in Section 3.2, but its goal in short is to capture structure as a function of scale, similarly to what was done in e.g. Elmegreen et al. (2006) by applying smoothing kernels of different sizes.
Second, we generate a new realization of particle positions and velocities by placing clumps of stars (and sub-clumps down to the individual stars) in phase space. To build a new realization of total mass M (details in Section 3.3), we start with one particle at rest in the origin of our coordinate system, initially containing the total mass of the cluster M. Then, we iteratively split it into new particles and place them, at each step i, at a distance |$\mathbf {l}_i$| from each other, moving with relative velocity |$\mathbf {u}_i$|. The relevant variables |$\mathbf {l}_i$|, |$\mathbf {u}_i$|, and the relevant mass ratio qi are taken from the tree |$\mathcal {T}$| except for the first step(s), which are drawn from a tree |$\mathcal {T}^\prime$| built on a different simulation. While this does not guarantee that the outcome will be described by a tree with statistical properties that match those of |$\mathcal {T}$|, it is at least heuristically convincing in the case of very hierarchical distributions. Moreover, we will check ex post that the realizations generated in this way have a set of desirable properties with respect to the original cluster. The details about the generative procedure are given in Section 3.3.
3.1 Hierarchical clustering
Hierarchical clustering algorithms arrange data into a tree-like structure representing nested groups, capturing clustering structure at different scales. In particular, we use an agglomerative clustering algorithm (see the chapter on agnes in Kaufman & Rousseeuw 1990). This means that the tree-like hierarchy of clusters is built from the bottom up: the algorithm starts from individual points, and merges the most similar ones into clusters until some stopping criterion is satisfied (e.g. until only a specified number of clusters are left). This way of proceeding can be thought as drawing a tree with a branch for every pair of clusters that merge.2 A dendrogram can be used to display the resulting tree structure, with leaf nodes corresponding to individual points and the root corresponding to the whole data set. We refer the interested reader to Pasquato & Milone (2019) for an illustration of this and other clustering algorithms in an astronomical context. Here, we selected this algorithm because it is well suited for studying the complex structure of the hydrodynamical simulations described in Section 2, since it is informative on very different scales and it can capture clusters (and sub-clusters) of various sizes. We use the implementation offered by the scikit-learn library (Pedregosa et al. 2011).3
3.1.1 Linkage
Moving towards the root of the tree, an agglomerative clustering algorithm merges at each node either two groups with each other or a lone point into a group. This process is based on a notion of (dis)similarity between groups which may be defined in multiple ways, or linkages. We considered four different linkages and evaluated their performance in clustering the sink particle spatial distribution.
- The single linkage merges the two clusters that have the minimum distance between any points in the two groups:where i and j represent sink particles belonging to group A and B, respectively, and li, j is the distance between two such particles.(3)$$\begin{eqnarray} \Delta _{AB} {:=} \min {(l_{{i \in A},\, {}{j \in B}})}, \end{eqnarray}$$
- The average linkage merges the two clusters that have the smallest average distance between all their points:(4)$$\begin{eqnarray} \Delta _{AB} {:=} {\rm mean}{(l_{{i \in A},\, {}{j \in B}})}. \end{eqnarray}$$
- The complete linkage (also known as maximum linkage) merges the two clusters that have the smallest maximum distance between their points:(5)$$\begin{eqnarray} \Delta _{AB} {:=} \max {(l_{{i \in A},\, {}{j \in B}})}. \end{eqnarray}$$
- Ward’s linkage merges two clusters such that the variance within all clusters increases the least. This often leads to clusters that are relatively equally sized. Ward’s linkage is defined as follows:where the index i denotes the generic i-th particle and cA, cB, and cA∪B denote the centroids of sets A, B, and A∪B respectively. Equation (6) corresponds to the increase in variance with respect to the relevant centroids as groups A and B are merged. Merging groups decrease the number of centroids by one, so variance is bound to increase, but using Ward’s linkage results in cluster mergers that minimize its increase at each step.(6)$$\begin{eqnarray} \Delta ^2_{AB} = \sum _{i \in {A \cup B}} l^2_{i, c_{A \cup B}} - \left(\sum _{i \in A} l^2_{i, c_A} + \sum _{i \in B} l^2_{i, c_B} \right), \end{eqnarray}$$
Fig. 3 shows how the choice of the linkage affects the structure of the first three nodes of the tree of m1e4. The single linkage approach leads to a single, big sub-clump, separated from a few isolated stars. In fact, following this prescription, two blobs that just touch in one point are considered similar and get merged into one pretty quickly, even if their centres of mass are far from each other. In contrast, single isolated stars are merged only in the final branches. The average and complete linkage perform poorly as well, likely because their merging criterion is too simple to fit the complex structure of the hydrodynamical clusters. Finally, Ward’s linkage performs well in describing the large-scale structure of the cluster, as it correctly identifies the main clumps and is thus informative about the structure of the cluster. For this reason, hereafter we will consider only Ward’s linkage.

First nodes from the trunk in the hierarchical tree for the m1e4 simulation, obtained by considering different linkages: single (first column), average (second column), complete (third column), and Ward (last column) linkage. The panels in the first row show the first node of the tree that splits the sink particles of the simulation into two groups (blue and orange). In the second node, the blue group is split further into two subgroups (blue and green, panels in the second row). The third node splits the blue group into the blue and the red groups (panels in the lower row).
3.2 Application of hierarchical clustering to stellar clusters
We applied agglomerative clustering to the stellar clusters from hydrodynamical simulations introduced in Section 2. The trees are built by relying on Euclidean distance between sink particles in the phase space as a measure of dissimilarity, so that particles sharing both similar positions and similar velocities tend to be grouped together. Before applying the algorithm, we scaled the positions and the velocities by their standard deviations. This step or some such is necessary so that the result of our clustering does not depend on the arbitrary choice of the unit of measurement of time.
The right column of Fig. 3 shows the groups of sink particles corresponding to the first two nodes of the learned tree (starting from the root). The first node splits the sinks into two big chunks, and the second node splits off a smaller clump from one of these.4 Our choice of using Ward linkage results in the splitting off of the most massive sub-clumps in the first branches of the tree, leading to an overall balanced tree. The first splitting thus gives information about the distribution of the sub-clumps at large scales and, moving towards the leaves of the trees, sub-clusters are split in smaller and smaller sub-clumps, as desired for our task.
Fig. 4 shows the mass ratios between sub-clumps branching off at different depths within the tree. The distribution of mass ratios is not particularly affected by the tree depth. This is expected if the structure of the sink particle distribution is scale invariant, as moving down the tree (towards the leaves) probes smaller scales by construction. Additionally, Fig. 4 shows that the distribution is similar across different simulations, spanning a range of total mass of an order of magnitude. To assess if the mass ratios can be considered as drawn from the same distribution (after properly rescaling the mass), we performed pairwise Kolmogorov–Smirnov tests. Despite multiple testing we never obtain a p-value below 10−2, so we have no reason to suspect that the distributions are different. Also, we performed the same test on the sub-distributions shown in Fig. 4 separately. Our test always obtains p-values above 10−1, with the only exception for the comparison between the middle nodes of m1e4 and m9e4, where p-value = 10−1.2. This result suggests that, despite some statistical fluctuations, the splitting in mass is performed in the same way at different scales for all the simulations.

Distribution of the mass of the lightest of the two resulting groups at any given split, in units of the parent group. The top left panel shows the distribution calculated for all nodes in the learned tree. The top right panel shows the distribution for the top 1/3 of the nodes from the root (big clumps), the bottom left for the middle 1/3 of the nodes (intermediate-size clumps), and the bottom right for the lower 1/3 of the nodes (small clumps to individual stars).
Similar information on the scaling behaviour of our simulations can be extracted from Figs 5 and 6, where we show the distribution of the distances between the clumps (|$l=|\mathbf {l}|$|) and that of their relative velocities (|$u=|\mathbf {u}|$|). In particular, the positions of the maxima of the distributions shift towards lower values by moving from the top to the bottom nodes, confirming that the tree is considering smaller and smaller scales. Also in this case, all the simulations show very similar distributions at each level for both the distances and the relative velocities. The distribution of the angles between the relative velocity and the distance, |$\theta = \arccos {(\mathbf {l} \cdot \mathbf {u} \, (l \, u)^{-1})}$|, is shown in Fig. 7. This distribution appears flat except for a rise at cos θ ≈ 1 which corresponds to relative velocity parallel to the separation vector between clumps, which is expected in a supervirial cluster undergoing overall expansion.

Same as Fig. 4 but for the distribution of the distances (scaled by their variance) between the centres of mass of two resulting groups at any given split of the agglomerative clustering hierarchical tree.

Same as Fig. 4 but for the distribution of the relative velocities (scaled by their variance) between the centres of mass of two resulting groups at any given split of the agglomerative clustering hierarchical tree.

Same as Fig. 4 but for the distribution of the cosine of the angle between the relative velocity and the distance of the centres of mass of two resulting groups at any given split of the agglomerative clustering hierarchical tree.
Relevant physical information can be drawn by considering the relation between quantities of the same node in the agglomerative clustering tree. Fig. 8 shows the relation between the distance of the sub-clumps and their relative velocity, for each node. The main sub-clumps, which correspond to the nodes closest to the root, show a direct proportionality between these two quantities, possibly due to rigid rotation. In contrast, on the smallest scales, the single particle relative velocity shows a tendency to decline with the square root of their distance, as would happen for two clumps (or even two individual stars) orbiting one another under the influence of each other’s monopole potential. Interestingly, all relative motions between clumps take place between the rigid and Keplerian extremes.

Scatter plot of the relative velocity between the centres of mass of two different sub-clumps corresponding to a given node in the agglomerative clustering tree as a function of their distance. The colour gradient maps the depth of the node (from the root, in blue, to the leaves, in yellow) within the hierarchical tree, Nnode. The superimposed lines represent two limit slopes corresponding to rigid rotation (blue) and Keplerian motion (orange).
3.3 Generating new realizations
As explained in Section 3.2, the application of the agglomerative clustering algorithm to stellar clusters allows us to inform a tree |$\mathcal {T}$| encoding their hierarchical structure. Each node of the three |$\mathcal {T}_i$| is associated with the relevant properties |$\mathbf {l}_i$|, |$\mathbf {u}_i$|, and qi, which quantify the relations between the sub-clumps corresponding to the branches departing from the node. Thus, the tree essentially encodes instructions to generate a new star cluster, as it can be traversed from the top, iteratively splitting an initial particle until the leaf level is reached, where individual stars have been produced. In our case, the goal is to change the cluster at the global structure level, nearest to the trunk of the tree, thus creating different sub-clumps configurations while preserving the small-scale properties of the sub-clumps (such as their fractal structure). We thus take the quantities |$\mathbf {l}_i$|, |$\mathbf {u}_i$|, and qi associated with the nodes |$\mathcal {T}_i$| for i < k and replace them with the quantities |$\mathbf {l}^\prime _i$|, |$\mathbf {u}^\prime _i$|, and |$q^\prime _i$| associated with the nodes |$\mathcal {T^\prime }_i$| of another tree |$\mathcal {T^\prime }$|, learned from a different set of sink particles. This grafting procedure represents a way to combine the large-scale properties of one simulation with the small-scale properties of another. For the results presented in Section 4, these nodes are sampled randomly from other simulations.
The generation procedure is implemented as follows. First, we consider a particle with a mass M1 equal to the total mass of the cluster considered, placed at the centre of mass of the cluster. The particle is first split into two particles of masses M11 and M12 such that M11 + M12 = M1 and |$\min ({M_1}_1, {M_1}_2)/M_1 = q^\prime _1$|. The positions and velocities of the new particles are assigned such that their centre of mass is at rest in the origin of the system, their distance vector is |$\mathbf {l}^\prime _i$|, and their relative velocity |$\mathbf {u}_1^\prime$|. This splitting procedure is then repeated until a cluster with the same number of particles as the reference one is obtained. At each step, the particle-to-split is chosen by considering the same order of splitting as the original reference tree. This procedure may at times result in very low mass particles. We remove these planet-sized objects with a cut-off at the minimum mass of the original stars on which |$\mathcal {T}$| was learned.
3.3.1 Grafting depth
In the procedure described above, the choice of the grafting depth k determines how different the new realizations are from the original system. A low value of k produces generations that are very similar to the original one at all scales. In contrast, when k is high, also the small scales are modified substantially. In our case, we want to generate new clusters that are similar to the original one but, at the same time, cannot be considered as its copies. We evaluated how the choice of the grafting depth affects the spatial structure of the new generations. In particular, we considered the distributions of distances and the fractal dimensions obtained by generating sets of one hundred new realizations for m1e4, at different values of k. The left-hand panel of Fig. 9 shows the general shape of the distributions of inter-particle distances. Predictably, the realizations obtained by modifying just one node match the original distribution better than those that change two or three nodes, and present the smallest spread. The peaks correspond to sub-clumps of sinks that are formed in different numbers and sizes in each realization. At small distances, the new realizations recover the general trend of the original distribution, as meant for our method. For the case with k = 1, this happens at about 1 pc, meaning that only the very large scales (the distance between the main sub-clumps) are modified. By increasing k, also the smaller scales are altered, and the original shape is recovered later. This suggests that very few changes are sufficient to produce generations that can be defined as different from the original cluster. The distributions for k ≤ 3 are consistent with the original simulation throughout the range of distances.

Left-hand panel: distribution of inter-particle distances f(d) for the sink particles taken from the m1e4 simulation (thick purple line) and three distributions of new generations obtained by replacing the first 1 (blue), 2 (green, hatched area), and 3 (yellow) nodes, corresponding to k = 2, 3, and 4 in the notation used above. The shaded area encloses the distribution of the new generations, and the solid line is the median of the distribution. Right-hand panel: average number of neighbours Nr around a star, within a sphere with radius rneigh, for different values of rneigh. Lines and colours are the same as in the left-hand panel. The black dotted lines represent the trend expected for distributions with a uniform fractal dimension, for β = 1.6, 2, and 3.
In the right-hand panel of Fig. 9, we have computed the fractal dimension by means of the average number of neighbours of the stars within a given distance, following Ballone et al. (2020). The distribution of neighbours of m1e4 is not described by a single power law of non-integer index β, as one would expect in a simple fractal structure, but presents two slope changes at around ≈10−1 and ≈2 pc (see also Ballone et al. 2020). To guide the eye, the three dotted lines mark the theoretical distance distributions in the case of a pure fractal distribution with |$N_{\rm r}\propto r_{\rm neigh}^{\beta }$|, for β = 1.6, 2, and 3. The generated distributions match the general trend and the changes in the slope of the original simulation very well, showing that our method captured the underlying structure of the particle distribution in the 3D space at all scales. Like for the inter-particle distance distribution, the choice of k = 2 produces only minimal differences from the original m1e4 profile. In the following text, we will focus on generations with k = 3, which allows to produce a distribution of clusters that are distinguishable but still consistent with the original one at all scales.
Fig. 10 shows the distribution of velocities for the new generations obtained by setting k = 3, as compared to the original sink particle trend. The median of the new generations matches the original distribution at all velocities, both on the low-velocity tail, where the Maxwell–Boltzmann trend seems to be preserved, and on the sharper power-law trend at high velocities. At very low values (|$u\lt 1 \, \mathrm{km/s}$|), the very low number of stars causes large fluctuations in the distribution of new generations, but their median trend is still well consistent with the original one.

Distribution of velocities f(u) for the sink particles taken from the m1e4 simulation (orange line) and for a distribution of new generations obtained by considering k = 3 (blue). The shaded area encloses the distribution of the new generations, and the solid line is the median of the distribution.
4 RESULTS
4.1 Properties of the newly generated systems
In this section, we discuss the properties of the systems generated using our procedure starting from the simulation m1e4, which presents the highest resolution.
Fig. 11 shows the spatial distributions of five new generations obtained with the method described in Section 3.3, compared to the original one. The new generations show a strong substructured configuration, with a different number of clumps, depending on the single realization, which has drawn branches from different simulations. Also, a strong degree of mass segregation is still present in the single sub-clumps, as highlighted by the colour coding. This primordial mass segregation in the individual realizations qualitatively matches the one present in the original cluster.

x-y projection of the m1e4 system (top left panel, see also Fig. 1) and five different new generations. The colour code marks the different masses of the sink particles and their new generations.
In Fig. 12, we compare the mass distribution of m1e4 to those of the new generations. In this case, our method leaves the slope of the mass function largely unaltered for most of the mass spectrum. At the boundaries of the mass spectrum, some discrepancies are present. This is due to the fact that the change in the first nodes may split up a relatively small particle more times than in the original cluster, and leaves more massive particles less split. This explains the higher number of particles at the boundaries of the mass spectrum with respect to the original one. The sharper cut-off at |$m\approx {10^{-1}}\, {}{\rm M}_{\odot }$| is due to the fact that all masses below this threshold are systematically removed. In general, the fit with equation (2) is rather good, yielding values of γ around 2.3, reminiscent of a Salpeter (1955) slope.

Mass spectrum of the sink particles of the simulation m1e4 (thick purple line) and of five different generated systems (thin lines).
Due to the redistribution of particle positions, velocities, and masses in the generation process, the value of the total virial ratio αvir may be significantly altered (with respect, in this case, to the value of 1.19 for the m1e4 case) ranging from a minimum of 0.46 to a maximum of 2.08. Clearly, the future dynamical evolution depends heavily on the virial ratio, which, in turn, is heavily affected by the left tail of the particle pairwise distances. There is indeed a margin of variation in short distances between realizations, as shown in Fig. 9. However, the shortest distances in any stellar system essentially correspond to binary-star semiaxes. Our hydrodynamical simulations were not designed to faithfully reproduce an observational initial mass function (Ballone et al. 2021) nor to capture binary properties. In Torniamenti et al. (2021), we introduce a realistic binary distribution with a separate procedure. While binary binding energy is a large fraction of the total binding energy in many realistic scenarios, the time-scale over which this energy is exchanged with the cluster at large is much longer than the dissolution time for the typical system under consideration: hard binaries are dynamically inert in the short term. To check that this is the source of the observed virial ratio mismatch, we have operated two diagnostics. First of all, we have recomputed the virial ratio αvir for Ns times excluding each time a different particle. This gives us a robust way to quantify the virial ratio, as the spread in the resulting distribution will be driven by instances in which a member of a very close binary was excluded. In all cases, the value of αvir of the original system lies well inside the distribution of αvir obtained by removing one particle at time from a given generation.
Second, we have also computed the αvir by excluding the binding energy of stars with separation under a varying threshold between one-tenth and one-half of the average inter-particle distance. We found that the large variations in the value of αvir observed for the generated clusters is essentially due to the different distributions of tightly bound particles in the generated clusters and the parent sink particle system produced by our SPH simulations. Thus, different values of the virial ratio will result in a similar dynamical evolution on the time-scales of interest, as shown below by evolving our realization through direct N-body simulations. We list the nominal virial coefficients of our generated realizations together with other properties in Table 2.
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|
m1e4g1 | 2006 | 6 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g2 | 2509 | 6 | 1.41 | 2.3 | 4.20 × 103 |
m1e4g3 | 2512 | 6 | 1.57 | 2.3 | 4.20 × 103 |
m1e4g4 | 1998 | 8 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g5 | 2512 | 5 | 1.68 | 2.3 | 4.20 × 103 |
m1e4g6 | 2491 | 7 | 1.50 | 2.3 | 4.20 × 103 |
m1e4g7 | 2081 | 8 | 0.48 | 2.3 | 4.20 × 103 |
m1e4g8 | 2512 | 9 | 1.81 | 2.3 | 4.20 × 103 |
m1e4g9 | 2196 | 4 | 0.46 | 2.3 | 4.20 × 103 |
m1e4g10 | 2496 | 7 | 1.57 | 2.3 | 4.20 × 103 |
m3e4g1 | 2765 | 5 | 0.80 | 2.2 | 1.03 × 104 |
m3e4g2 | 2805 | 7 | 1.39 | 2.2 | 1.03 × 104 |
m3e4g3 | 2811 | 5 | 1.16 | 2.2 | 1.03 × 104 |
m3e4g4 | 2719 | 5 | 1.20 | 2.2 | 1.03 × 104 |
m3e4g5 | 2747 | 7 | 1.48 | 2.2 | 1.03 × 104 |
m3e4g6 | 2774 | 6 | 1.40 | 2.2 | 1.03 × 104 |
m3e4g7 | 2750 | 6 | 1.46 | 2.2 | 1.03 × 104 |
m3e4g8 | 2770 | 7 | 1.13 | 2.2 | 1.03 × 104 |
m3e4g9 | 2628 | 7 | 0.94 | 2.2 | 1.03 × 104 |
m3e4g10 | 2764 | 6 | 0.94 | 2.2 | 1.03 × 104 |
m6e4g1 | 2747 | 7 | 1.65 | 2.1 | 2.04 × 104 |
m6e4g2 | 2823 | 7 | 1.80 | 2.1 | 2.04 × 104 |
m6e4g3 | 2900 | 5 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g4 | 2718 | 6 | 1.66 | 2.1 | 2.04 × 104 |
m6e4g5 | 2967 | 6 | 1.75 | 2.1 | 2.04 × 104 |
m6e4g6 | 2752 | 5 | 1.30 | 2.1 | 2.04 × 104 |
m6e4g7 | 2998 | 6 | 1.55 | 2.1 | 2.04 × 104 |
m6e4g8 | 2833 | 6 | 1.36 | 2.1 | 2.04 × 104 |
m6e4g9 | 3001 | 6 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g10 | 3015 | 5 | 1.82 | 2.1 | 2.04 × 104 |
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|
m1e4g1 | 2006 | 6 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g2 | 2509 | 6 | 1.41 | 2.3 | 4.20 × 103 |
m1e4g3 | 2512 | 6 | 1.57 | 2.3 | 4.20 × 103 |
m1e4g4 | 1998 | 8 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g5 | 2512 | 5 | 1.68 | 2.3 | 4.20 × 103 |
m1e4g6 | 2491 | 7 | 1.50 | 2.3 | 4.20 × 103 |
m1e4g7 | 2081 | 8 | 0.48 | 2.3 | 4.20 × 103 |
m1e4g8 | 2512 | 9 | 1.81 | 2.3 | 4.20 × 103 |
m1e4g9 | 2196 | 4 | 0.46 | 2.3 | 4.20 × 103 |
m1e4g10 | 2496 | 7 | 1.57 | 2.3 | 4.20 × 103 |
m3e4g1 | 2765 | 5 | 0.80 | 2.2 | 1.03 × 104 |
m3e4g2 | 2805 | 7 | 1.39 | 2.2 | 1.03 × 104 |
m3e4g3 | 2811 | 5 | 1.16 | 2.2 | 1.03 × 104 |
m3e4g4 | 2719 | 5 | 1.20 | 2.2 | 1.03 × 104 |
m3e4g5 | 2747 | 7 | 1.48 | 2.2 | 1.03 × 104 |
m3e4g6 | 2774 | 6 | 1.40 | 2.2 | 1.03 × 104 |
m3e4g7 | 2750 | 6 | 1.46 | 2.2 | 1.03 × 104 |
m3e4g8 | 2770 | 7 | 1.13 | 2.2 | 1.03 × 104 |
m3e4g9 | 2628 | 7 | 0.94 | 2.2 | 1.03 × 104 |
m3e4g10 | 2764 | 6 | 0.94 | 2.2 | 1.03 × 104 |
m6e4g1 | 2747 | 7 | 1.65 | 2.1 | 2.04 × 104 |
m6e4g2 | 2823 | 7 | 1.80 | 2.1 | 2.04 × 104 |
m6e4g3 | 2900 | 5 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g4 | 2718 | 6 | 1.66 | 2.1 | 2.04 × 104 |
m6e4g5 | 2967 | 6 | 1.75 | 2.1 | 2.04 × 104 |
m6e4g6 | 2752 | 5 | 1.30 | 2.1 | 2.04 × 104 |
m6e4g7 | 2998 | 6 | 1.55 | 2.1 | 2.04 × 104 |
m6e4g8 | 2833 | 6 | 1.36 | 2.1 | 2.04 × 104 |
m6e4g9 | 3001 | 6 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g10 | 3015 | 5 | 1.82 | 2.1 | 2.04 × 104 |
Note.After the name of the generated cluster (Col. 1), we report the total number of stars (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (Col. 4), the γ coefficient of the mass-spectrum fitting function (equation 2, Col. 5), and the total mass of the stars (Col. 6).
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|
m1e4g1 | 2006 | 6 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g2 | 2509 | 6 | 1.41 | 2.3 | 4.20 × 103 |
m1e4g3 | 2512 | 6 | 1.57 | 2.3 | 4.20 × 103 |
m1e4g4 | 1998 | 8 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g5 | 2512 | 5 | 1.68 | 2.3 | 4.20 × 103 |
m1e4g6 | 2491 | 7 | 1.50 | 2.3 | 4.20 × 103 |
m1e4g7 | 2081 | 8 | 0.48 | 2.3 | 4.20 × 103 |
m1e4g8 | 2512 | 9 | 1.81 | 2.3 | 4.20 × 103 |
m1e4g9 | 2196 | 4 | 0.46 | 2.3 | 4.20 × 103 |
m1e4g10 | 2496 | 7 | 1.57 | 2.3 | 4.20 × 103 |
m3e4g1 | 2765 | 5 | 0.80 | 2.2 | 1.03 × 104 |
m3e4g2 | 2805 | 7 | 1.39 | 2.2 | 1.03 × 104 |
m3e4g3 | 2811 | 5 | 1.16 | 2.2 | 1.03 × 104 |
m3e4g4 | 2719 | 5 | 1.20 | 2.2 | 1.03 × 104 |
m3e4g5 | 2747 | 7 | 1.48 | 2.2 | 1.03 × 104 |
m3e4g6 | 2774 | 6 | 1.40 | 2.2 | 1.03 × 104 |
m3e4g7 | 2750 | 6 | 1.46 | 2.2 | 1.03 × 104 |
m3e4g8 | 2770 | 7 | 1.13 | 2.2 | 1.03 × 104 |
m3e4g9 | 2628 | 7 | 0.94 | 2.2 | 1.03 × 104 |
m3e4g10 | 2764 | 6 | 0.94 | 2.2 | 1.03 × 104 |
m6e4g1 | 2747 | 7 | 1.65 | 2.1 | 2.04 × 104 |
m6e4g2 | 2823 | 7 | 1.80 | 2.1 | 2.04 × 104 |
m6e4g3 | 2900 | 5 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g4 | 2718 | 6 | 1.66 | 2.1 | 2.04 × 104 |
m6e4g5 | 2967 | 6 | 1.75 | 2.1 | 2.04 × 104 |
m6e4g6 | 2752 | 5 | 1.30 | 2.1 | 2.04 × 104 |
m6e4g7 | 2998 | 6 | 1.55 | 2.1 | 2.04 × 104 |
m6e4g8 | 2833 | 6 | 1.36 | 2.1 | 2.04 × 104 |
m6e4g9 | 3001 | 6 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g10 | 3015 | 5 | 1.82 | 2.1 | 2.04 × 104 |
Name . | Ns . | Nc . | αvir . | γ . | |$M_{\rm sink}\, \left[{\rm M}_\odot \right]$| . |
---|---|---|---|---|---|
m1e4g1 | 2006 | 6 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g2 | 2509 | 6 | 1.41 | 2.3 | 4.20 × 103 |
m1e4g3 | 2512 | 6 | 1.57 | 2.3 | 4.20 × 103 |
m1e4g4 | 1998 | 8 | 0.60 | 2.3 | 4.20 × 103 |
m1e4g5 | 2512 | 5 | 1.68 | 2.3 | 4.20 × 103 |
m1e4g6 | 2491 | 7 | 1.50 | 2.3 | 4.20 × 103 |
m1e4g7 | 2081 | 8 | 0.48 | 2.3 | 4.20 × 103 |
m1e4g8 | 2512 | 9 | 1.81 | 2.3 | 4.20 × 103 |
m1e4g9 | 2196 | 4 | 0.46 | 2.3 | 4.20 × 103 |
m1e4g10 | 2496 | 7 | 1.57 | 2.3 | 4.20 × 103 |
m3e4g1 | 2765 | 5 | 0.80 | 2.2 | 1.03 × 104 |
m3e4g2 | 2805 | 7 | 1.39 | 2.2 | 1.03 × 104 |
m3e4g3 | 2811 | 5 | 1.16 | 2.2 | 1.03 × 104 |
m3e4g4 | 2719 | 5 | 1.20 | 2.2 | 1.03 × 104 |
m3e4g5 | 2747 | 7 | 1.48 | 2.2 | 1.03 × 104 |
m3e4g6 | 2774 | 6 | 1.40 | 2.2 | 1.03 × 104 |
m3e4g7 | 2750 | 6 | 1.46 | 2.2 | 1.03 × 104 |
m3e4g8 | 2770 | 7 | 1.13 | 2.2 | 1.03 × 104 |
m3e4g9 | 2628 | 7 | 0.94 | 2.2 | 1.03 × 104 |
m3e4g10 | 2764 | 6 | 0.94 | 2.2 | 1.03 × 104 |
m6e4g1 | 2747 | 7 | 1.65 | 2.1 | 2.04 × 104 |
m6e4g2 | 2823 | 7 | 1.80 | 2.1 | 2.04 × 104 |
m6e4g3 | 2900 | 5 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g4 | 2718 | 6 | 1.66 | 2.1 | 2.04 × 104 |
m6e4g5 | 2967 | 6 | 1.75 | 2.1 | 2.04 × 104 |
m6e4g6 | 2752 | 5 | 1.30 | 2.1 | 2.04 × 104 |
m6e4g7 | 2998 | 6 | 1.55 | 2.1 | 2.04 × 104 |
m6e4g8 | 2833 | 6 | 1.36 | 2.1 | 2.04 × 104 |
m6e4g9 | 3001 | 6 | 1.82 | 2.1 | 2.04 × 104 |
m6e4g10 | 3015 | 5 | 1.82 | 2.1 | 2.04 × 104 |
Note.After the name of the generated cluster (Col. 1), we report the total number of stars (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (Col. 4), the γ coefficient of the mass-spectrum fitting function (equation 2, Col. 5), and the total mass of the stars (Col. 6).
4.2 N-body simulations
Our method aims to generate large samples of initial conditions for N-body simulations. To test that our realizations are indeed suitable for this use, we evolve via direct N-body simulations the three original clusters (m1e4, m3e4, and m6e4) and 10 different generated clusters per each of the three original ones. Finding that the evolution of the generated clusters is neither identical nor dramatically different with respect to the original cluster is one of the main test-beds of our method. In fact, our method can be successfully used only if the new clusters evolve in a similar way as the original one, but are sufficiently different not to be an exact copy. Ideally, the generated clusters should behave as different random realizations of the same underlying physical distributions.
We ran our simulations with the direct N-body code nbody6+ + gpu (Wang et al. 2015). Due to a neighbour scheme (Nitadori & Aarseth 2012), nbody6+ + gpu efficiently handles the collisional force contributions at short time-scales as well as those at longer time intervals, to which all the members in the system contribute. The force integration also includes a solar neighbourhood-like static external tidal field (Wang et al. 2016). Stellar evolution is not included in our runs, for the sake of simplicity and to make the comparison with the original cluster more straightforward. We evolved the clusters for 10 Myr.
Table 2 shows the main initial properties of the generated clusters for which we ran the N-body simulations. Fig. 13 shows the projection in the x-y plane of the original m1e4 cluster and of four generations, at different times. The global evolution of the new generated clusters shows a variety of configurations depending on the different distribution of mass. In some cases, distinct sub-clumps are present at |$t \gt 1 \, \mathrm{Myr}$| and tidally interact with each other before eventually merging. In the case of m1e4g4, two distinct sub-clumps are still present at |$10 \, \mathrm{Myr}$|.

Projection in the x-y plane of the evolution of the original cluster (m1e4, upper panel) and four different generated clusters (lower panels) as a function of time. The clusters are shown at their initial configuration (first column) and at three different time steps: 1 Myr (second column), 5 Myr (third column), and 10 Myr (last column). The colour code marks the different masses of the sink particles and their generations.
A more quantitative description of the global evolution of the clusters can be given in terms of the evolution of the 10 per cent and 50 per cent Lagrangian radii (r10 and r50), centred in the centre of density.5 Figs 14 and 15 show the evolution of r10 and r50 for the original clusters and the generated ones. In all the cases, the original evolution lies within the limits of the distribution of the generated clusters, which shows a large spread. This spread is consistent with the large stochastic fluctuations that we expect in the evolution of such low-mass clusters (see e.g. Torniamenti et al. 2021).

Evolution of the 10 per cent Lagrangian radius for the original sink particles and for 10 different generations of m1e4 (upper panel), m3e4 (middle panel), and m6e4 (lower panel). The orange line represents the original sink particle system, and the blue lines are the generated clusters.

5 DISCUSSION AND CONCLUSIONS
We introduced a new method for generating a number of new realizations from a given set of initial conditions (particle masses, positions, and velocities) produced by hydrodynamical simulations. The realizations are built to display a different large-scale structure, but share similar properties at smaller scales, preserving in particular the fractal dimension of the original simulation. We have shown that they can be used as initial conditions for N-body simulations, producing a comparable evolution to the original cluster. This suggests that our method is suitable for drawing the initial conditions of a large set of N-body simulations at an infinitesimal fraction of the computational cost of generating initial conditions from a hydrodynamical simulation.
Our novel approach relies on informing a hierarchical clustering structure (represented as a tree) from the original initial condition data through agglomerative clustering. This is later turned into new realizations by modifying the initial branches of the tree (encoding the relations between the biggest sub-clumps in the simulation). This results in realizations with different macroscopic properties from the original one (e.g. the number of big clumps and their distances), while approximately preserving the characteristics of small-scale structure responsible for most of dynamical evolution (e.g. the distribution of pairwise distances between individual stars). In principle, this scheme is very flexible, allowing to choose how much of the large-scale structure we control directly, by choosing the number of initial branches we modify.
The realizations we obtained with our method qualitatively resemble the original simulation when visualized in three-dimensional space. In our case, the original distribution of stars was generated by hydrodynamical simulations of embedded clusters, so our new realizations appear qualitatively indistinguishable from the output of these simulations. The mass spectrum and the velocity distribution are also very similar to the original simulation. The distribution of the number of neighbours as a function of distance reveals that the fractal dimension of our realizations and that of the original simulation match on different scales (they both show a similar complex fractal pattern).
Finally, we ran direct N-body simulations of a sample of generated initial conditions for three different original star clusters. In all the cases, the new generations show a realistic evolution on all scales, bracketing that of the original one, as shown by the trend of the 10 per cent and 50 per cent Lagrangian radii. Our analysis suggests that this method is a promising way to generate new mass and phase-space distributions from existing hydrodynamical simulations, thus increasing our sample of initial conditions for N-body simulations. The speed-up in computation obtained by our new method is tremendous: generating initial conditions from hydrodynamical simulations requires about 1.5 × 105 core hours per simulation, while our procedure takes about 10 core seconds to train the initial tree distribution and generate a new realization.
ACKNOWLEDGEMENTS
We thank the anonymous referee for their useful comments, which helped to improve this work. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 896248. MP’s initial contribution to this material is based upon work supported by Tamkeen under the NYU Abu Dhabi Research Institute grant CAP3. MM, AB, and GI acknowledge financial support from the European Research Council for the ERC Consolidator grant DEMOBLACK, under contract no. 770017. PFDC acknowledges financial support from MIUR-PRIN2017 project Coarse-grained description for non-equilibrium systems and transport phenomena (CO-NEST) n.201798CZL. MM and MCA acknowledge financial support from the FWF Austrian Science Fund grant P31154-N27.
DATA AVAILABILITY
The data underlying this article will be shared on reasonable request to the corresponding authors.
Footnotes
The approximately constant value of Ns follows from the fact that the star formation efficiency is roughly independent of Mmc (see table 1 in Ballone et al. 2020). The star formation efficiency is indeed rather dictated by the physical processes involved in the simulations, which, in all cases, start from the same values of cloud temperature and density.
The procedure of drawing a hierarchy of merging substructures may recall the merger tree history, which is used in cosmology to track the assembly of substructures across time (see e.g. Rodriguez-Gomez et al. 2015). Agglomerative clustering algorithms, however, do not imply any evolution in time, but use the tree-like structure to identify groups of instances at different scales.
The details about the implementation of the algorithm can be found at this link.
Even as we describe the tree from the root up (writing occasionally in terms of splits/splitting) agglomerative methods build the tree from the leaves, i.e. the individual sink particles.
The local density around each star was calculated as the density of the sphere that includes the 300 closest stars.