Hierarchical generative models for star clusters from hydrodynamical simulations

Properties of the end states of the SPH simulations of Ballone et al. (2020).

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|	\|$M_{\rm mc}\, \left[{\rm M}_\odot \right]$\|
m1e4	2523	6	1.19	2.30	4.22 × 10³	10⁴
m2e4	2571	4	1.32	2.12	6.69 × 10³	2 × 10⁴
m3e4	2825	5	1.48	2.20	1.03 × 10⁴	3 × 10⁴
m4e4	2868	2	1.47	2.17	1.44 × 10⁴	4 × 10⁴
m5e4	2231	4	1.47	1.80	1.41 × 10⁴	5 × 10⁴
m6e4	3054	5	1.69	2.15	2.04 × 10⁴	6 × 10⁴
m7e4	4214	9	1.50	2.20	3.15 × 10⁴	7 × 10⁴
m8e4	2945	6	1.60	1.86	2.83 × 10⁴	8 × 10⁴
m9e4	3161	4	1.52	1.90	3.05 × 10⁴	9 × 10⁴
m1e5	3944	6	1.46	2.20	3.80 × 10⁴	10⁵

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|	\|$M_{\rm mc}\, \left[{\rm M}_\odot \right]$\|
m1e4	2523	6	1.19	2.30	4.22 × 10³	10⁴
m2e4	2571	4	1.32	2.12	6.69 × 10³	2 × 10⁴
m3e4	2825	5	1.48	2.20	1.03 × 10⁴	3 × 10⁴
m4e4	2868	2	1.47	2.17	1.44 × 10⁴	4 × 10⁴
m5e4	2231	4	1.47	1.80	1.41 × 10⁴	5 × 10⁴
m6e4	3054	5	1.69	2.15	2.04 × 10⁴	6 × 10⁴
m7e4	4214	9	1.50	2.20	3.15 × 10⁴	7 × 10⁴
m8e4	2945	6	1.60	1.86	2.83 × 10⁴	8 × 10⁴
m9e4	3161	4	1.52	1.90	3.05 × 10⁴	9 × 10⁴
m1e5	3944	6	1.46	2.20	3.80 × 10⁴	10⁵

Note. After the name of each simulation (Col. 1), we report the number of stars generated (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (α_vir ≡ 2K/|W|, Col. 4), the γ coefficient of the mass-spectrum fitting function of (equation 2, Col. 5), the total mass of the stars (Col. 6), and the mass of the parent molecular cloud (Col. 7).

Table 1.

Properties of the end states of the SPH simulations of Ballone et al. (2020).

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|	\|$M_{\rm mc}\, \left[{\rm M}_\odot \right]$\|
m1e4	2523	6	1.19	2.30	4.22 × 10³	10⁴
m2e4	2571	4	1.32	2.12	6.69 × 10³	2 × 10⁴
m3e4	2825	5	1.48	2.20	1.03 × 10⁴	3 × 10⁴
m4e4	2868	2	1.47	2.17	1.44 × 10⁴	4 × 10⁴
m5e4	2231	4	1.47	1.80	1.41 × 10⁴	5 × 10⁴
m6e4	3054	5	1.69	2.15	2.04 × 10⁴	6 × 10⁴
m7e4	4214	9	1.50	2.20	3.15 × 10⁴	7 × 10⁴
m8e4	2945	6	1.60	1.86	2.83 × 10⁴	8 × 10⁴
m9e4	3161	4	1.52	1.90	3.05 × 10⁴	9 × 10⁴
m1e5	3944	6	1.46	2.20	3.80 × 10⁴	10⁵

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|	\|$M_{\rm mc}\, \left[{\rm M}_\odot \right]$\|
m1e4	2523	6	1.19	2.30	4.22 × 10³	10⁴
m2e4	2571	4	1.32	2.12	6.69 × 10³	2 × 10⁴
m3e4	2825	5	1.48	2.20	1.03 × 10⁴	3 × 10⁴
m4e4	2868	2	1.47	2.17	1.44 × 10⁴	4 × 10⁴
m5e4	2231	4	1.47	1.80	1.41 × 10⁴	5 × 10⁴
m6e4	3054	5	1.69	2.15	2.04 × 10⁴	6 × 10⁴
m7e4	4214	9	1.50	2.20	3.15 × 10⁴	7 × 10⁴
m8e4	2945	6	1.60	1.86	2.83 × 10⁴	8 × 10⁴
m9e4	3161	4	1.52	1.90	3.05 × 10⁴	9 × 10⁴
m1e5	3944	6	1.46	2.20	3.80 × 10⁴	10⁵

Note. After the name of each simulation (Col. 1), we report the number of stars generated (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (α_vir ≡ 2K/|W|, Col. 4), the γ coefficient of the mass-spectrum fitting function of (equation 2, Col. 5), the total mass of the stars (Col. 6), and the mass of the parent molecular cloud (Col. 7).

In order to induce a non-isotropic evolution, the SPH gas particles are initially given a turbulent, divergence-free, Gaussian random velocity field with a different random seed for each simulation of the set, following a Burgers (1948) velocity power-law spectrum with index −4 (Bate 2009b). With respect to the classical Kolmogorov (1941) power spectrum (with index −11/3), the Burgers power spectrum better matches turbulence in compressive flows, where shocks are present (Federrath 2013). The clouds are in an initial marginally bound state, so that their initial virial ratio α_vir ≡ 2K/|W| = 2, where K and W are the gas kinetic and potential energy, respectively.

During the hydrodynamical simulation, the gas equation of state has been set to be adiabatic, while radiative cooling by dust has been modelled as in Boley (2009) and Boley et al. (2010). The amount of energy lost by cooling was calculated through the divergence of the heat flux

$$\begin{eqnarray} \nabla \cdot F_{\rm cool}=-\frac{(36\pi)^{1/3}\sigma (T^4-T_{\rm irr}^4)}{s(\Delta \tau +1/\Delta \tau)}. \end{eqnarray}$$

(1)

In the Equation above, σ is the Stefan–Boltzmann constant, T the gas temperature, T_irr the irradiation temperature, s = (m/ρ)^1/3, and |$\Delta \tau =s\, {}k\, {}\rho$|⁠, where m and ρ are the gas particle mass and density and k is the local opacity. The dust-to-gas ratio has been fixed to a constant value for each different dust species. For k, the adopted Planck and Rosseland dust opacities are taken from D’Alessio, Calvet & Hartmann (2001). The irradiation temperature, which represents the minimum temperature allowed by the dust that acts as a thermostat for the gas, is set to |$T_{\rm irr}=10 \, \mathrm{K}$|⁠.

No stellar feedback was included in this set of simulations, and we simply decided to assume that our clusters are the result of instantaneous gas removal at 3 Myr after the beginning of the hydrodynamical simulation to roughly simulate the effect of the first supernova explosions. Indeed, Dale et al. (2015) have shown that the pre-supernova gas removal is expected to play a minor effect on the survival and dynamics of stellar clusters and we also checked that at 3 Myr the gas accounts for a small fraction of the mass where most of the stellar mass is residing. Furthermore, at 3 Myr all the clouds converted about 30–40 per cent of their gas mass into sink particles, in agreement with previous hydrodynamical simulations showing that stellar feedback should lead to a maximum star formation efficiency of about this amount (e.g. Vázquez-Semadeni et al. 2010; Dale et al. 2015; Gavagnin et al. 2017; Li et al. 2019). For more details on such choices, we refer the reader to Ballone et al. (2020).

2.2 Structural properties of the SPH simulations

Independently of the specific initial value of M_mc, our SPH simulations present a clumpy structure with N_s ≈ 3 × 10³ stars,¹ organized in a maximum of N_c = 9 main sub-clumps for m7e4 to a minimum of N_c = 2 for m4e4. Sub-clumps are identified heuristically as groups of neighbouring stars containing more than |$0.05\, {} N_{\rm s}$|⁠, where self-potential energy exceeds that of the rest of the system. Fig. 1 shows the x-y, y-z, and z-x projections of the stars position on the three coordinate planes for the system m1e4, with their masses m shown in colour. We find a rather prominent primordial mass segregation, with heavier stars typically well within the central regions of the main clumps and lighter stars at larger distances from the geometric centres of such subsystems. All systems are above the virial condition, with α_vir ranging from 1.19 for the m1e4 case, to 1.69 for m6e4.

Figure 1.

From left to right, projections in the x-y, y-z, and z-x planes of the end state of the m1e4 simulation. The colour map marks the mass of the individual stars in units of M_⊙.

In order to quantify the properties of the end states of the SPH simulations, we have evaluated their distributions of inter-particle distances f(d), mass spectra f(m), and velocity distributions f(v). Fig. 2 shows these distributions for the sink particles of the simulations m1e4, m3e4, m5e4, m7e4, and m9e4. The distribution of inter-particle distances shows a quite complex structure with several slope changes. The clumpy structure of the particles’ spatial distribution gives rise to several peaks in f(d), corresponding to the distances between the clumps themselves. For the specific case of m1e4, the peaks are located roughly at 0.1, 0.45, 1.75, and 3 pc (as highlighted by the vertical dotted lines), which can be identified as the distances between the approximate centres of the main clumps of the particles shown in Fig. 1.

Figure 2.

Distributions of inter-particle distances f(d) (left-hand panel), mass spectra f(m) (middle panel), and velocity distribution f(v) (right-hand panel) for the sink particles taken from the simulations m1e4, m3e4, m5e4, m7e4, and m9e4. The vertical dotted lines in the left-hand panel mark the position of the main peaks of f(d), corresponding to the distances between the main sub-clusters, for the m1e4 case. The thin dotted line in the right-hand panel marks the v⁻³ power-law trend of the velocity distributions.

The mass spectra of sink particles approximately follow the same power-law structure between a low-mass and a high-mass cut-off. The differences in the lower mass limit are due to the different mass resolution of the hydrodynamical simulations, which are initialized with different total masses but the same number of particles, as explained in Section 2.1. At higher masses, where the physical processes involved in the simulation become the dominant factor in shaping the mass function, all the spectra recover the same slope. We have fitted the numerically recovered mass spectra with the bona fide function:

$$\begin{eqnarray} f(m)=\frac{C}{\left(m^2+m_*^2\right)^{\gamma /2}}, \end{eqnarray}$$

(2)

where C is a normalization constant, m_* is a scale mass that for the explored systems is always in the range 0.8–4 M_⊙, while exponent γ ranges from ≈1.8 to ≈2.3.

The velocity distributions f(v) do not show a relevant dependence on the specific initial value of M_mc, as shown in the right hand panel of Fig. 2. Qualitatively, the velocity distribution is well described by a Maxwell–Boltzmann distribution from v = 0 to 5 km s⁻¹ (value corresponding to the peak of f(v)) and then shows a v⁻³ power-law trend. The properties of the SPH simulations are summarized in Table 1.

3 METHODS

In the following text, we describe our new procedure to build a generative model of star cluster initial conditions. In principle, a generative model’s goal is to learn a representation of an intractable distribution given a usually finite number of samples. The generator typically maps from a latent domain on which a simple distribution is defined, such as a multivariate Gaussian on Rⁿ, to the complex data domain (e.g. Ruthotto & Haber 2021). Recently, most of the interest in generative models is driven by deep learning approaches, such as generative adversarial networks (Goodfellow et al. 2014). However, in principle, much simpler models such as hidden Markov models (Rabiner & Juang 1986; Eddy 2004) or grammars (e.g. Chomsky 1959; Jelinek, Lafferty & Mercer 1992; Beaumont & Stepney 2009) meet the definition of generative model in the broader sense defined above. The latter have proved useful in the description and generation of objects displaying fractal structure, as in the case of Lindenmayer systems applied to plant growth (Lindenmayer 1968a,b; Prusinkiewicz & Hanan 2013).

Our generative approach focuses on reproducing the complex fractal structure of embedded star clusters from hydrodynamical simulations (see e.g. fig 4 in Ballone et al. 2020) by capturing the relations between sub-clusters at different scales through a hierarchical clustering algorithm. This will eventually allow us to generate new realizations by modifying their macro structure, i.e. the relations between large sub-clusters. The parameters that characterize the relevant properties of these clumps and their relations can be treated as the latent domain of our generative model.

We proceed in two steps. First, we use a hierarchical clustering algorithm to identify clumps of stars at different scales in the phase space of the original hydrodynamical simulation output. The clumps are organized by the algorithm into a hierarchical tree |$\mathcal {T}$|⁠, where the root node contains the whole set of stars and each subsequent node represents a two-way split with each branch being a clump of stars, down to the leaf nodes representing individual stars. For each node |$\mathcal {T}_i$|⁠, we describe the relevant physical properties of the cluster in terms of the distance vector between the centres of mass of the clumps |$\mathbf {l}_i$|⁠, their relative velocity vector |$\mathbf {u}_i$|⁠, and the mass ratio between the two clumps. To describe how the mass is split at each node we refer to q_i, defined as the ratio between the lightest of the two resulting groups and the total mass of the node. With this definition, mass ratios fall between 0 (maximally unequal split) and 0.5 (equal-mass split). The description of the star clusters in terms of the hierarchical clustering algorithm is given in Section 3.2, but its goal in short is to capture structure as a function of scale, similarly to what was done in e.g. Elmegreen et al. (2006) by applying smoothing kernels of different sizes.

Second, we generate a new realization of particle positions and velocities by placing clumps of stars (and sub-clumps down to the individual stars) in phase space. To build a new realization of total mass M (details in Section 3.3), we start with one particle at rest in the origin of our coordinate system, initially containing the total mass of the cluster M. Then, we iteratively split it into new particles and place them, at each step i, at a distance |$\mathbf {l}_i$| from each other, moving with relative velocity |$\mathbf {u}_i$|⁠. The relevant variables |$\mathbf {l}_i$|⁠, |$\mathbf {u}_i$|⁠, and the relevant mass ratio q_i are taken from the tree |$\mathcal {T}$| except for the first step(s), which are drawn from a tree |$\mathcal {T}^\prime$| built on a different simulation. While this does not guarantee that the outcome will be described by a tree with statistical properties that match those of |$\mathcal {T}$|⁠, it is at least heuristically convincing in the case of very hierarchical distributions. Moreover, we will check ex post that the realizations generated in this way have a set of desirable properties with respect to the original cluster. The details about the generative procedure are given in Section 3.3.

3.1 Hierarchical clustering

Hierarchical clustering algorithms arrange data into a tree-like structure representing nested groups, capturing clustering structure at different scales. In particular, we use an agglomerative clustering algorithm (see the chapter on agnes in Kaufman & Rousseeuw 1990). This means that the tree-like hierarchy of clusters is built from the bottom up: the algorithm starts from individual points, and merges the most similar ones into clusters until some stopping criterion is satisfied (e.g. until only a specified number of clusters are left). This way of proceeding can be thought as drawing a tree with a branch for every pair of clusters that merge.² A dendrogram can be used to display the resulting tree structure, with leaf nodes corresponding to individual points and the root corresponding to the whole data set. We refer the interested reader to Pasquato & Milone (2019) for an illustration of this and other clustering algorithms in an astronomical context. Here, we selected this algorithm because it is well suited for studying the complex structure of the hydrodynamical simulations described in Section 2, since it is informative on very different scales and it can capture clusters (and sub-clusters) of various sizes. We use the implementation offered by the scikit-learn library (Pedregosa et al. 2011).³

3.1.1 Linkage

Moving towards the root of the tree, an agglomerative clustering algorithm merges at each node either two groups with each other or a lone point into a group. This process is based on a notion of (dis)similarity between groups which may be defined in multiple ways, or linkages. We considered four different linkages and evaluated their performance in clustering the sink particle spatial distribution.

The single linkage merges the two clusters that have the minimum distance between any points in the two groups:
$$\begin{eqnarray} \Delta _{AB} {:=} \min {(l_{{i \in A},\, {}{j \in B}})}, \end{eqnarray}$$
(3)
where i and j represent sink particles belonging to group A and B, respectively, and l_{i, j} is the distance between two such particles.
The average linkage merges the two clusters that have the smallest average distance between all their points:
$$\begin{eqnarray} \Delta _{AB} {:=} {\rm mean}{(l_{{i \in A},\, {}{j \in B}})}. \end{eqnarray}$$
(4)
The complete linkage (also known as maximum linkage) merges the two clusters that have the smallest maximum distance between their points:
$$\begin{eqnarray} \Delta _{AB} {:=} \max {(l_{{i \in A},\, {}{j \in B}})}. \end{eqnarray}$$
(5)
Ward’s linkage merges two clusters such that the variance within all clusters increases the least. This often leads to clusters that are relatively equally sized. Ward’s linkage is defined as follows:
$$\begin{eqnarray} \Delta ^2_{AB} = \sum _{i \in {A \cup B}} l^2_{i, c_{A \cup B}} - \left(\sum _{i \in A} l^2_{i, c_A} + \sum _{i \in B} l^2_{i, c_B} \right), \end{eqnarray}$$
(6)
where the index i denotes the generic i-th particle and c_A, c_B, and c_A∪B denote the centroids of sets A, B, and A∪B respectively. Equation (6) corresponds to the increase in variance with respect to the relevant centroids as groups A and B are merged. Merging groups decrease the number of centroids by one, so variance is bound to increase, but using Ward’s linkage results in cluster mergers that minimize its increase at each step.

Fig. 3 shows how the choice of the linkage affects the structure of the first three nodes of the tree of m1e4. The single linkage approach leads to a single, big sub-clump, separated from a few isolated stars. In fact, following this prescription, two blobs that just touch in one point are considered similar and get merged into one pretty quickly, even if their centres of mass are far from each other. In contrast, single isolated stars are merged only in the final branches. The average and complete linkage perform poorly as well, likely because their merging criterion is too simple to fit the complex structure of the hydrodynamical clusters. Finally, Ward’s linkage performs well in describing the large-scale structure of the cluster, as it correctly identifies the main clumps and is thus informative about the structure of the cluster. For this reason, hereafter we will consider only Ward’s linkage.

Figure 3.

First nodes from the trunk in the hierarchical tree for the m1e4 simulation, obtained by considering different linkages: single (first column), average (second column), complete (third column), and Ward (last column) linkage. The panels in the first row show the first node of the tree that splits the sink particles of the simulation into two groups (blue and orange). In the second node, the blue group is split further into two subgroups (blue and green, panels in the second row). The third node splits the blue group into the blue and the red groups (panels in the lower row).

3.2 Application of hierarchical clustering to stellar clusters

We applied agglomerative clustering to the stellar clusters from hydrodynamical simulations introduced in Section 2. The trees are built by relying on Euclidean distance between sink particles in the phase space as a measure of dissimilarity, so that particles sharing both similar positions and similar velocities tend to be grouped together. Before applying the algorithm, we scaled the positions and the velocities by their standard deviations. This step or some such is necessary so that the result of our clustering does not depend on the arbitrary choice of the unit of measurement of time.

The right column of Fig. 3 shows the groups of sink particles corresponding to the first two nodes of the learned tree (starting from the root). The first node splits the sinks into two big chunks, and the second node splits off a smaller clump from one of these.⁴ Our choice of using Ward linkage results in the splitting off of the most massive sub-clumps in the first branches of the tree, leading to an overall balanced tree. The first splitting thus gives information about the distribution of the sub-clumps at large scales and, moving towards the leaves of the trees, sub-clusters are split in smaller and smaller sub-clumps, as desired for our task.

Fig. 4 shows the mass ratios between sub-clumps branching off at different depths within the tree. The distribution of mass ratios is not particularly affected by the tree depth. This is expected if the structure of the sink particle distribution is scale invariant, as moving down the tree (towards the leaves) probes smaller scales by construction. Additionally, Fig. 4 shows that the distribution is similar across different simulations, spanning a range of total mass of an order of magnitude. To assess if the mass ratios can be considered as drawn from the same distribution (after properly rescaling the mass), we performed pairwise Kolmogorov–Smirnov tests. Despite multiple testing we never obtain a p-value below 10⁻², so we have no reason to suspect that the distributions are different. Also, we performed the same test on the sub-distributions shown in Fig. 4 separately. Our test always obtains p-values above 10⁻¹, with the only exception for the comparison between the middle nodes of m1e4 and m9e4, where p-value = 10^−1.2. This result suggests that, despite some statistical fluctuations, the splitting in mass is performed in the same way at different scales for all the simulations.

Figure 4.

Distribution of the mass of the lightest of the two resulting groups at any given split, in units of the parent group. The top left panel shows the distribution calculated for all nodes in the learned tree. The top right panel shows the distribution for the top 1/3 of the nodes from the root (big clumps), the bottom left for the middle 1/3 of the nodes (intermediate-size clumps), and the bottom right for the lower 1/3 of the nodes (small clumps to individual stars).

Similar information on the scaling behaviour of our simulations can be extracted from Figs 5 and 6, where we show the distribution of the distances between the clumps (⁠|$l=|\mathbf {l}|$|⁠) and that of their relative velocities (⁠|$u=|\mathbf {u}|$|⁠). In particular, the positions of the maxima of the distributions shift towards lower values by moving from the top to the bottom nodes, confirming that the tree is considering smaller and smaller scales. Also in this case, all the simulations show very similar distributions at each level for both the distances and the relative velocities. The distribution of the angles between the relative velocity and the distance, |$\theta = \arccos {(\mathbf {l} \cdot \mathbf {u} \, (l \, u)^{-1})}$|⁠, is shown in Fig. 7. This distribution appears flat except for a rise at cos θ ≈ 1 which corresponds to relative velocity parallel to the separation vector between clumps, which is expected in a supervirial cluster undergoing overall expansion.

Figure 5.

Same as Fig. 4 but for the distribution of the distances (scaled by their variance) between the centres of mass of two resulting groups at any given split of the agglomerative clustering hierarchical tree.

Figure 6.

Same as Fig. 4 but for the distribution of the relative velocities (scaled by their variance) between the centres of mass of two resulting groups at any given split of the agglomerative clustering hierarchical tree.

Figure 7.

Same as Fig. 4 but for the distribution of the cosine of the angle between the relative velocity and the distance of the centres of mass of two resulting groups at any given split of the agglomerative clustering hierarchical tree.

Relevant physical information can be drawn by considering the relation between quantities of the same node in the agglomerative clustering tree. Fig. 8 shows the relation between the distance of the sub-clumps and their relative velocity, for each node. The main sub-clumps, which correspond to the nodes closest to the root, show a direct proportionality between these two quantities, possibly due to rigid rotation. In contrast, on the smallest scales, the single particle relative velocity shows a tendency to decline with the square root of their distance, as would happen for two clumps (or even two individual stars) orbiting one another under the influence of each other’s monopole potential. Interestingly, all relative motions between clumps take place between the rigid and Keplerian extremes.

Figure 8.

Scatter plot of the relative velocity between the centres of mass of two different sub-clumps corresponding to a given node in the agglomerative clustering tree as a function of their distance. The colour gradient maps the depth of the node (from the root, in blue, to the leaves, in yellow) within the hierarchical tree, N_node. The superimposed lines represent two limit slopes corresponding to rigid rotation (blue) and Keplerian motion (orange).

3.3 Generating new realizations

As explained in Section 3.2, the application of the agglomerative clustering algorithm to stellar clusters allows us to inform a tree |$\mathcal {T}$| encoding their hierarchical structure. Each node of the three |$\mathcal {T}_i$| is associated with the relevant properties |$\mathbf {l}_i$|⁠, |$\mathbf {u}_i$|⁠, and q_i, which quantify the relations between the sub-clumps corresponding to the branches departing from the node. Thus, the tree essentially encodes instructions to generate a new star cluster, as it can be traversed from the top, iteratively splitting an initial particle until the leaf level is reached, where individual stars have been produced. In our case, the goal is to change the cluster at the global structure level, nearest to the trunk of the tree, thus creating different sub-clumps configurations while preserving the small-scale properties of the sub-clumps (such as their fractal structure). We thus take the quantities |$\mathbf {l}_i$|⁠, |$\mathbf {u}_i$|⁠, and q_i associated with the nodes |$\mathcal {T}_i$| for i < k and replace them with the quantities |$\mathbf {l}^\prime _i$|⁠, |$\mathbf {u}^\prime _i$|⁠, and |$q^\prime _i$| associated with the nodes |$\mathcal {T^\prime }_i$| of another tree |$\mathcal {T^\prime }$|⁠, learned from a different set of sink particles. This grafting procedure represents a way to combine the large-scale properties of one simulation with the small-scale properties of another. For the results presented in Section 4, these nodes are sampled randomly from other simulations.

The generation procedure is implemented as follows. First, we consider a particle with a mass M₁ equal to the total mass of the cluster considered, placed at the centre of mass of the cluster. The particle is first split into two particles of masses M₁₁ and M₁₂ such that M₁₁ + M₁₂ = M₁ and |$\min ({M_1}_1, {M_1}_2)/M_1 = q^\prime _1$|⁠. The positions and velocities of the new particles are assigned such that their centre of mass is at rest in the origin of the system, their distance vector is |$\mathbf {l}^\prime _i$|⁠, and their relative velocity |$\mathbf {u}_1^\prime$|⁠. This splitting procedure is then repeated until a cluster with the same number of particles as the reference one is obtained. At each step, the particle-to-split is chosen by considering the same order of splitting as the original reference tree. This procedure may at times result in very low mass particles. We remove these planet-sized objects with a cut-off at the minimum mass of the original stars on which |$\mathcal {T}$| was learned.

3.3.1 Grafting depth

In the procedure described above, the choice of the grafting depth k determines how different the new realizations are from the original system. A low value of k produces generations that are very similar to the original one at all scales. In contrast, when k is high, also the small scales are modified substantially. In our case, we want to generate new clusters that are similar to the original one but, at the same time, cannot be considered as its copies. We evaluated how the choice of the grafting depth affects the spatial structure of the new generations. In particular, we considered the distributions of distances and the fractal dimensions obtained by generating sets of one hundred new realizations for m1e4, at different values of k. The left-hand panel of Fig. 9 shows the general shape of the distributions of inter-particle distances. Predictably, the realizations obtained by modifying just one node match the original distribution better than those that change two or three nodes, and present the smallest spread. The peaks correspond to sub-clumps of sinks that are formed in different numbers and sizes in each realization. At small distances, the new realizations recover the general trend of the original distribution, as meant for our method. For the case with k = 1, this happens at about 1 pc, meaning that only the very large scales (the distance between the main sub-clumps) are modified. By increasing k, also the smaller scales are altered, and the original shape is recovered later. This suggests that very few changes are sufficient to produce generations that can be defined as different from the original cluster. The distributions for k ≤ 3 are consistent with the original simulation throughout the range of distances.

$Left-hand panel: distribution of inter-particle distances f(d) for the sink particles taken from the m1e4 simulation (thick purple line) and three distributions of new generations obtained by replacing the first 1 (blue), 2 (green, hatched area), and 3 (yellow) nodes, corresponding to k = 2, 3, and 4 in the notation used above. The shaded area encloses the distribution of the new generations, and the solid line is the median of the distribution. Right-hand panel: average number of neighbours Nr around a star, within a sphere with radius rneigh, for different values of rneigh. Lines and colours are the same as in the left-hand panel. The black dotted lines represent the trend expected for distributions with a uniform fractal dimension, for β = 1.6, 2, and 3.$

Figure 9.

Left-hand panel: distribution of inter-particle distances f(d) for the sink particles taken from the m1e4 simulation (thick purple line) and three distributions of new generations obtained by replacing the first 1 (blue), 2 (green, hatched area), and 3 (yellow) nodes, corresponding to k = 2, 3, and 4 in the notation used above. The shaded area encloses the distribution of the new generations, and the solid line is the median of the distribution. Right-hand panel: average number of neighbours N_r around a star, within a sphere with radius r_neigh, for different values of r_neigh. Lines and colours are the same as in the left-hand panel. The black dotted lines represent the trend expected for distributions with a uniform fractal dimension, for β = 1.6, 2, and 3.

In the right-hand panel of Fig. 9, we have computed the fractal dimension by means of the average number of neighbours of the stars within a given distance, following Ballone et al. (2020). The distribution of neighbours of m1e4 is not described by a single power law of non-integer index β, as one would expect in a simple fractal structure, but presents two slope changes at around ≈10⁻¹ and ≈2 pc (see also Ballone et al. 2020). To guide the eye, the three dotted lines mark the theoretical distance distributions in the case of a pure fractal distribution with |$N_{\rm r}\propto r_{\rm neigh}^{\beta }$|⁠, for β = 1.6, 2, and 3. The generated distributions match the general trend and the changes in the slope of the original simulation very well, showing that our method captured the underlying structure of the particle distribution in the 3D space at all scales. Like for the inter-particle distance distribution, the choice of k = 2 produces only minimal differences from the original m1e4 profile. In the following text, we will focus on generations with k = 3, which allows to produce a distribution of clusters that are distinguishable but still consistent with the original one at all scales.

Fig. 10 shows the distribution of velocities for the new generations obtained by setting k = 3, as compared to the original sink particle trend. The median of the new generations matches the original distribution at all velocities, both on the low-velocity tail, where the Maxwell–Boltzmann trend seems to be preserved, and on the sharper power-law trend at high velocities. At very low values (⁠|$u\lt 1 \, \mathrm{km/s}$|⁠), the very low number of stars causes large fluctuations in the distribution of new generations, but their median trend is still well consistent with the original one.

Figure 10.

Distribution of velocities f(u) for the sink particles taken from the m1e4 simulation (orange line) and for a distribution of new generations obtained by considering k = 3 (blue). The shaded area encloses the distribution of the new generations, and the solid line is the median of the distribution.

4 RESULTS

4.1 Properties of the newly generated systems

In this section, we discuss the properties of the systems generated using our procedure starting from the simulation m1e4, which presents the highest resolution.

Fig. 11 shows the spatial distributions of five new generations obtained with the method described in Section 3.3, compared to the original one. The new generations show a strong substructured configuration, with a different number of clumps, depending on the single realization, which has drawn branches from different simulations. Also, a strong degree of mass segregation is still present in the single sub-clumps, as highlighted by the colour coding. This primordial mass segregation in the individual realizations qualitatively matches the one present in the original cluster.

Figure 11.

x-y projection of the m1e4 system (top left panel, see also Fig. 1) and five different new generations. The colour code marks the different masses of the sink particles and their new generations.

In Fig. 12, we compare the mass distribution of m1e4 to those of the new generations. In this case, our method leaves the slope of the mass function largely unaltered for most of the mass spectrum. At the boundaries of the mass spectrum, some discrepancies are present. This is due to the fact that the change in the first nodes may split up a relatively small particle more times than in the original cluster, and leaves more massive particles less split. This explains the higher number of particles at the boundaries of the mass spectrum with respect to the original one. The sharper cut-off at |$m\approx {10^{-1}}\, {}{\rm M}_{\odot }$| is due to the fact that all masses below this threshold are systematically removed. In general, the fit with equation (2) is rather good, yielding values of γ around 2.3, reminiscent of a Salpeter (1955) slope.

Figure 12.

Mass spectrum of the sink particles of the simulation m1e4 (thick purple line) and of five different generated systems (thin lines).

Due to the redistribution of particle positions, velocities, and masses in the generation process, the value of the total virial ratio α_vir may be significantly altered (with respect, in this case, to the value of 1.19 for the m1e4 case) ranging from a minimum of 0.46 to a maximum of 2.08. Clearly, the future dynamical evolution depends heavily on the virial ratio, which, in turn, is heavily affected by the left tail of the particle pairwise distances. There is indeed a margin of variation in short distances between realizations, as shown in Fig. 9. However, the shortest distances in any stellar system essentially correspond to binary-star semiaxes. Our hydrodynamical simulations were not designed to faithfully reproduce an observational initial mass function (Ballone et al. 2021) nor to capture binary properties. In Torniamenti et al. (2021), we introduce a realistic binary distribution with a separate procedure. While binary binding energy is a large fraction of the total binding energy in many realistic scenarios, the time-scale over which this energy is exchanged with the cluster at large is much longer than the dissolution time for the typical system under consideration: hard binaries are dynamically inert in the short term. To check that this is the source of the observed virial ratio mismatch, we have operated two diagnostics. First of all, we have recomputed the virial ratio α_vir for N_s times excluding each time a different particle. This gives us a robust way to quantify the virial ratio, as the spread in the resulting distribution will be driven by instances in which a member of a very close binary was excluded. In all cases, the value of α_vir of the original system lies well inside the distribution of α_vir obtained by removing one particle at time from a given generation.

Second, we have also computed the α_vir by excluding the binding energy of stars with separation under a varying threshold between one-tenth and one-half of the average inter-particle distance. We found that the large variations in the value of α_vir observed for the generated clusters is essentially due to the different distributions of tightly bound particles in the generated clusters and the parent sink particle system produced by our SPH simulations. Thus, different values of the virial ratio will result in a similar dynamical evolution on the time-scales of interest, as shown below by evolving our realization through direct N-body simulations. We list the nominal virial coefficients of our generated realizations together with other properties in Table 2.

Table 2.

Properties of the generated clusters starting from m1e4, m3e4, and m6e4.

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|
m1e4g1	2006	6	0.60	2.3	4.20 × 10³
m1e4g2	2509	6	1.41	2.3	4.20 × 10³
m1e4g3	2512	6	1.57	2.3	4.20 × 10³
m1e4g4	1998	8	0.60	2.3	4.20 × 10³
m1e4g5	2512	5	1.68	2.3	4.20 × 10³
m1e4g6	2491	7	1.50	2.3	4.20 × 10³
m1e4g7	2081	8	0.48	2.3	4.20 × 10³
m1e4g8	2512	9	1.81	2.3	4.20 × 10³
m1e4g9	2196	4	0.46	2.3	4.20 × 10³
m1e4g10	2496	7	1.57	2.3	4.20 × 10³
m3e4g1	2765	5	0.80	2.2	1.03 × 10⁴
m3e4g2	2805	7	1.39	2.2	1.03 × 10⁴
m3e4g3	2811	5	1.16	2.2	1.03 × 10⁴
m3e4g4	2719	5	1.20	2.2	1.03 × 10⁴
m3e4g5	2747	7	1.48	2.2	1.03 × 10⁴
m3e4g6	2774	6	1.40	2.2	1.03 × 10⁴
m3e4g7	2750	6	1.46	2.2	1.03 × 10⁴
m3e4g8	2770	7	1.13	2.2	1.03 × 10⁴
m3e4g9	2628	7	0.94	2.2	1.03 × 10⁴
m3e4g10	2764	6	0.94	2.2	1.03 × 10⁴
m6e4g1	2747	7	1.65	2.1	2.04 × 10⁴
m6e4g2	2823	7	1.80	2.1	2.04 × 10⁴
m6e4g3	2900	5	1.82	2.1	2.04 × 10⁴
m6e4g4	2718	6	1.66	2.1	2.04 × 10⁴
m6e4g5	2967	6	1.75	2.1	2.04 × 10⁴
m6e4g6	2752	5	1.30	2.1	2.04 × 10⁴
m6e4g7	2998	6	1.55	2.1	2.04 × 10⁴
m6e4g8	2833	6	1.36	2.1	2.04 × 10⁴
m6e4g9	3001	6	1.82	2.1	2.04 × 10⁴
m6e4g10	3015	5	1.82	2.1	2.04 × 10⁴

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|
m1e4g1	2006	6	0.60	2.3	4.20 × 10³
m1e4g2	2509	6	1.41	2.3	4.20 × 10³
m1e4g3	2512	6	1.57	2.3	4.20 × 10³
m1e4g4	1998	8	0.60	2.3	4.20 × 10³
m1e4g5	2512	5	1.68	2.3	4.20 × 10³
m1e4g6	2491	7	1.50	2.3	4.20 × 10³
m1e4g7	2081	8	0.48	2.3	4.20 × 10³
m1e4g8	2512	9	1.81	2.3	4.20 × 10³
m1e4g9	2196	4	0.46	2.3	4.20 × 10³
m1e4g10	2496	7	1.57	2.3	4.20 × 10³
m3e4g1	2765	5	0.80	2.2	1.03 × 10⁴
m3e4g2	2805	7	1.39	2.2	1.03 × 10⁴
m3e4g3	2811	5	1.16	2.2	1.03 × 10⁴
m3e4g4	2719	5	1.20	2.2	1.03 × 10⁴
m3e4g5	2747	7	1.48	2.2	1.03 × 10⁴
m3e4g6	2774	6	1.40	2.2	1.03 × 10⁴
m3e4g7	2750	6	1.46	2.2	1.03 × 10⁴
m3e4g8	2770	7	1.13	2.2	1.03 × 10⁴
m3e4g9	2628	7	0.94	2.2	1.03 × 10⁴
m3e4g10	2764	6	0.94	2.2	1.03 × 10⁴
m6e4g1	2747	7	1.65	2.1	2.04 × 10⁴
m6e4g2	2823	7	1.80	2.1	2.04 × 10⁴
m6e4g3	2900	5	1.82	2.1	2.04 × 10⁴
m6e4g4	2718	6	1.66	2.1	2.04 × 10⁴
m6e4g5	2967	6	1.75	2.1	2.04 × 10⁴
m6e4g6	2752	5	1.30	2.1	2.04 × 10⁴
m6e4g7	2998	6	1.55	2.1	2.04 × 10⁴
m6e4g8	2833	6	1.36	2.1	2.04 × 10⁴
m6e4g9	3001	6	1.82	2.1	2.04 × 10⁴
m6e4g10	3015	5	1.82	2.1	2.04 × 10⁴

Note.After the name of the generated cluster (Col. 1), we report the total number of stars (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (Col. 4), the γ coefficient of the mass-spectrum fitting function (equation 2, Col. 5), and the total mass of the stars (Col. 6).

Table 2.

Properties of the generated clusters starting from m1e4, m3e4, and m6e4.

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|
m1e4g1	2006	6	0.60	2.3	4.20 × 10³
m1e4g2	2509	6	1.41	2.3	4.20 × 10³
m1e4g3	2512	6	1.57	2.3	4.20 × 10³
m1e4g4	1998	8	0.60	2.3	4.20 × 10³
m1e4g5	2512	5	1.68	2.3	4.20 × 10³
m1e4g6	2491	7	1.50	2.3	4.20 × 10³
m1e4g7	2081	8	0.48	2.3	4.20 × 10³
m1e4g8	2512	9	1.81	2.3	4.20 × 10³
m1e4g9	2196	4	0.46	2.3	4.20 × 10³
m1e4g10	2496	7	1.57	2.3	4.20 × 10³
m3e4g1	2765	5	0.80	2.2	1.03 × 10⁴
m3e4g2	2805	7	1.39	2.2	1.03 × 10⁴
m3e4g3	2811	5	1.16	2.2	1.03 × 10⁴
m3e4g4	2719	5	1.20	2.2	1.03 × 10⁴
m3e4g5	2747	7	1.48	2.2	1.03 × 10⁴
m3e4g6	2774	6	1.40	2.2	1.03 × 10⁴
m3e4g7	2750	6	1.46	2.2	1.03 × 10⁴
m3e4g8	2770	7	1.13	2.2	1.03 × 10⁴
m3e4g9	2628	7	0.94	2.2	1.03 × 10⁴
m3e4g10	2764	6	0.94	2.2	1.03 × 10⁴
m6e4g1	2747	7	1.65	2.1	2.04 × 10⁴
m6e4g2	2823	7	1.80	2.1	2.04 × 10⁴
m6e4g3	2900	5	1.82	2.1	2.04 × 10⁴
m6e4g4	2718	6	1.66	2.1	2.04 × 10⁴
m6e4g5	2967	6	1.75	2.1	2.04 × 10⁴
m6e4g6	2752	5	1.30	2.1	2.04 × 10⁴
m6e4g7	2998	6	1.55	2.1	2.04 × 10⁴
m6e4g8	2833	6	1.36	2.1	2.04 × 10⁴
m6e4g9	3001	6	1.82	2.1	2.04 × 10⁴
m6e4g10	3015	5	1.82	2.1	2.04 × 10⁴

Name	N_s	N_c	α_vir	γ	\|$M_{\rm sink}\, \left[{\rm M}_\odot \right]$\|
m1e4g1	2006	6	0.60	2.3	4.20 × 10³
m1e4g2	2509	6	1.41	2.3	4.20 × 10³
m1e4g3	2512	6	1.57	2.3	4.20 × 10³
m1e4g4	1998	8	0.60	2.3	4.20 × 10³
m1e4g5	2512	5	1.68	2.3	4.20 × 10³
m1e4g6	2491	7	1.50	2.3	4.20 × 10³
m1e4g7	2081	8	0.48	2.3	4.20 × 10³
m1e4g8	2512	9	1.81	2.3	4.20 × 10³
m1e4g9	2196	4	0.46	2.3	4.20 × 10³
m1e4g10	2496	7	1.57	2.3	4.20 × 10³
m3e4g1	2765	5	0.80	2.2	1.03 × 10⁴
m3e4g2	2805	7	1.39	2.2	1.03 × 10⁴
m3e4g3	2811	5	1.16	2.2	1.03 × 10⁴
m3e4g4	2719	5	1.20	2.2	1.03 × 10⁴
m3e4g5	2747	7	1.48	2.2	1.03 × 10⁴
m3e4g6	2774	6	1.40	2.2	1.03 × 10⁴
m3e4g7	2750	6	1.46	2.2	1.03 × 10⁴
m3e4g8	2770	7	1.13	2.2	1.03 × 10⁴
m3e4g9	2628	7	0.94	2.2	1.03 × 10⁴
m3e4g10	2764	6	0.94	2.2	1.03 × 10⁴
m6e4g1	2747	7	1.65	2.1	2.04 × 10⁴
m6e4g2	2823	7	1.80	2.1	2.04 × 10⁴
m6e4g3	2900	5	1.82	2.1	2.04 × 10⁴
m6e4g4	2718	6	1.66	2.1	2.04 × 10⁴
m6e4g5	2967	6	1.75	2.1	2.04 × 10⁴
m6e4g6	2752	5	1.30	2.1	2.04 × 10⁴
m6e4g7	2998	6	1.55	2.1	2.04 × 10⁴
m6e4g8	2833	6	1.36	2.1	2.04 × 10⁴
m6e4g9	3001	6	1.82	2.1	2.04 × 10⁴
m6e4g10	3015	5	1.82	2.1	2.04 × 10⁴

Note.After the name of the generated cluster (Col. 1), we report the total number of stars (Col. 2), the number of macroscopic sub-clumps (Col. 3), the virial ratio (Col. 4), the γ coefficient of the mass-spectrum fitting function (equation 2, Col. 5), and the total mass of the stars (Col. 6).

4.2 N-body simulations

Our method aims to generate large samples of initial conditions for N-body simulations. To test that our realizations are indeed suitable for this use, we evolve via direct N-body simulations the three original clusters (m1e4, m3e4, and m6e4) and 10 different generated clusters per each of the three original ones. Finding that the evolution of the generated clusters is neither identical nor dramatically different with respect to the original cluster is one of the main test-beds of our method. In fact, our method can be successfully used only if the new clusters evolve in a similar way as the original one, but are sufficiently different not to be an exact copy. Ideally, the generated clusters should behave as different random realizations of the same underlying physical distributions.

We ran our simulations with the direct N-body code nbody6+ + gpu (Wang et al. 2015). Due to a neighbour scheme (Nitadori & Aarseth 2012), nbody6+ + gpu efficiently handles the collisional force contributions at short time-scales as well as those at longer time intervals, to which all the members in the system contribute. The force integration also includes a solar neighbourhood-like static external tidal field (Wang et al. 2016). Stellar evolution is not included in our runs, for the sake of simplicity and to make the comparison with the original cluster more straightforward. We evolved the clusters for 10 Myr.

Table 2 shows the main initial properties of the generated clusters for which we ran the N-body simulations. Fig. 13 shows the projection in the x-y plane of the original m1e4 cluster and of four generations, at different times. The global evolution of the new generated clusters shows a variety of configurations depending on the different distribution of mass. In some cases, distinct sub-clumps are present at |$t \gt 1 \, \mathrm{Myr}$| and tidally interact with each other before eventually merging. In the case of m1e4g4, two distinct sub-clumps are still present at |$10 \, \mathrm{Myr}$|⁠.

Figure 13.

Projection in the x-y plane of the evolution of the original cluster (m1e4, upper panel) and four different generated clusters (lower panels) as a function of time. The clusters are shown at their initial configuration (first column) and at three different time steps: 1 Myr (second column), 5 Myr (third column), and 10 Myr (last column). The colour code marks the different masses of the sink particles and their generations.

A more quantitative description of the global evolution of the clusters can be given in terms of the evolution of the 10 per cent and 50 per cent Lagrangian radii (r₁₀ and r₅₀), centred in the centre of density.⁵ Figs 14 and 15 show the evolution of r₁₀ and r₅₀ for the original clusters and the generated ones. In all the cases, the original evolution lies within the limits of the distribution of the generated clusters, which shows a large spread. This spread is consistent with the large stochastic fluctuations that we expect in the evolution of such low-mass clusters (see e.g. Torniamenti et al. 2021).

Figure 14.

Evolution of the 10 per cent Lagrangian radius for the original sink particles and for 10 different generations of m1e4 (upper panel), m3e4 (middle panel), and m6e4 (lower panel). The orange line represents the original sink particle system, and the blue lines are the generated clusters.

Figure 15.

Same as Fig. 14 but for the 50 per cent Lagrangian radius.

10.1111/j.1365-2966.2010.16939.x

5 DISCUSSION AND CONCLUSIONS

We introduced a new method for generating a number of new realizations from a given set of initial conditions (particle masses, positions, and velocities) produced by hydrodynamical simulations. The realizations are built to display a different large-scale structure, but share similar properties at smaller scales, preserving in particular the fractal dimension of the original simulation. We have shown that they can be used as initial conditions for N-body simulations, producing a comparable evolution to the original cluster. This suggests that our method is suitable for drawing the initial conditions of a large set of N-body simulations at an infinitesimal fraction of the computational cost of generating initial conditions from a hydrodynamical simulation.

Our novel approach relies on informing a hierarchical clustering structure (represented as a tree) from the original initial condition data through agglomerative clustering. This is later turned into new realizations by modifying the initial branches of the tree (encoding the relations between the biggest sub-clumps in the simulation). This results in realizations with different macroscopic properties from the original one (e.g. the number of big clumps and their distances), while approximately preserving the characteristics of small-scale structure responsible for most of dynamical evolution (e.g. the distribution of pairwise distances between individual stars). In principle, this scheme is very flexible, allowing to choose how much of the large-scale structure we control directly, by choosing the number of initial branches we modify.

The realizations we obtained with our method qualitatively resemble the original simulation when visualized in three-dimensional space. In our case, the original distribution of stars was generated by hydrodynamical simulations of embedded clusters, so our new realizations appear qualitatively indistinguishable from the output of these simulations. The mass spectrum and the velocity distribution are also very similar to the original simulation. The distribution of the number of neighbours as a function of distance reveals that the fractal dimension of our realizations and that of the original simulation match on different scales (they both show a similar complex fractal pattern).

Finally, we ran direct N-body simulations of a sample of generated initial conditions for three different original star clusters. In all the cases, the new generations show a realistic evolution on all scales, bracketing that of the original one, as shown by the trend of the 10 per cent and 50 per cent Lagrangian radii. Our analysis suggests that this method is a promising way to generate new mass and phase-space distributions from existing hydrodynamical simulations, thus increasing our sample of initial conditions for N-body simulations. The speed-up in computation obtained by our new method is tremendous: generating initial conditions from hydrodynamical simulations requires about 1.5 × 10⁵ core hours per simulation, while our procedure takes about 10 core seconds to train the initial tree distribution and generate a new realization.

ACKNOWLEDGEMENTS

We thank the anonymous referee for their useful comments, which helped to improve this work. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 896248. MP’s initial contribution to this material is based upon work supported by Tamkeen under the NYU Abu Dhabi Research Institute grant CAP³. MM, AB, and GI acknowledge financial support from the European Research Council for the ERC Consolidator grant DEMOBLACK, under contract no. 770017. PFDC acknowledges financial support from MIUR-PRIN2017 project Coarse-grained description for non-equilibrium systems and transport phenomena (CO-NEST) n.201798CZL. MM and MCA acknowledge financial support from the FWF Austrian Science Fund grant P31154-N27.

DATA AVAILABILITY

The data underlying this article will be shared on reasonable request to the corresponding authors.

Footnotes

1

The approximately constant value of N_s follows from the fact that the star formation efficiency is roughly independent of M_mc (see table 1 in Ballone et al. 2020). The star formation efficiency is indeed rather dictated by the physical processes involved in the simulations, which, in all cases, start from the same values of cloud temperature and density.

2

The procedure of drawing a hierarchy of merging substructures may recall the merger tree history, which is used in cosmology to track the assembly of substructures across time (see e.g. Rodriguez-Gomez et al. 2015). Agglomerative clustering algorithms, however, do not imply any evolution in time, but use the tree-like structure to identify groups of instances at different scales.

3

The details about the implementation of the algorithm can be found at this link.

4

Even as we describe the tree from the root up (writing occasionally in terms of splits/splitting) agglomerative methods build the tree from the leaves, i.e. the individual sink particles.

5

The local density around each star was calculated as the density of the sphere that includes the 300 closest stars.

REFERENCES

Allison

R. J.

,

Goodwin

S. P.

,

Parker

R. J.

,

Portegies Zwart

S. F.

,

de Grijs

R.

,

2010

,

MNRAS

,

407

,

1098

An

J. H.

,

Evans

N. W.

,

2006

,

AJ

,

131

,

782

10.1086/499305

Ballone

A.

,

Mapelli

M.

,

Di Carlo

U. N.

,

Torniamenti

S.

,

Spera

M.

,

Rastello

S.

,

2020

,

MNRAS

,

496

,

49

10.1093/mnras/staa1383

Ballone

A.

,

Torniamenti

S.

,

Mapelli

M.

,

Di Carlo

U. N.

,

Spera

M.

,

Rastello

S.

,

Gaspari

N.

,

Iorio

G.

,

2021

,

MNRAS

,

501

,

2920

10.1093/mnras/staa3763

10.1111/j.1365-2966.2008.14107.x

Bastian

N.

,

Gieles

M.

,

Ercolano

B.

,

Gutermuth

R.

,

2009

,

MNRAS

,

392

,

868

10.1111/j.1365-2966.2008.14165.x

Bate

M. R.

,

2009a

,

MNRAS

,

392

,

1363

10.1111/j.1365-2966.2009.14970.x

Bate

M. R.

,

2009b

,

MNRAS

,

397

,

232

Bate

M. R.

,

Bonnell

I. A.

,

Price

N. M.

,

1995

,

MNRAS

,

277

,

362

10.1093/mnras/277.2.362

10.1111/j.1365-2966.2007.12209.x

Baumgardt

H.

,

Kroupa

P.

,

2007

,

MNRAS

,

380

,

1589

10.1088/0004-637X/772/1/67

Beaumont

D.

,

Stepney

S.

,

2009

,

2009 IEEE Congress on Evolutionary Computation

.

IEEE

,

Piscataway, New Jersey, Stati Uniti

, p.

2446

Bertin

G.

,

Stiavelli

M.

,

1984

,

A&A

,

137

,

26

Bianchini

P.

,

Varri

A. L.

,

Bertin

G.

,

Zocchi

A.

,

2013

,

ApJ

,

772

,

67

10.1186/s40668-014-0005-3

Boekholt

T.

,

Portegies Zwart

S.

,

2015

,

Comput. Astrophys. Cosmol.

,

2

,

2

10.1088/0004-637X/695/1/L53

Boley

A. C.

,

2009

,

ApJ

,

695

,

L53

10.1016/j.icarus.2010.01.015

Boley

A. C.

,

Hayfield

T.

,

Mayer

L.

,

Durisen

R. H.

,

2010

,

Icarus

,

207

,

509

10.1046/j.1365-8711.2003.06687.x

Bonnell

I. A.

,

Bate

M. R.

,

Vine

S. G.

,

2003

,

MNRAS

,

343

,

413

10.1016/S0065-2156(08)70100-5

Burgers

J. M.

,

1948

,

Adv. Appl. Mech.

,

1

,

171

10.1051/0004-6361/201834957

Cantat-Gaudin

T.

et al. ,

2019

,

A&A

,

626

,

A17

10.1111/j.1365-2966.2009.15540.x

Cartwright

A.

,

2009

,

MNRAS

,

400

,

1427

https://doi.org/10.1016/S0019-9958(59)90362-6

Chomsky

N.

,

1959

,

Inf. Control

,

2

,

137

Claydon

I.

,

Gieles

M.

,

Varri

A. L.

,

Heggie

D. C.

,

Zocchi

A.

,

2019

,

MNRAS

,

487

,

147

10.1093/mnras/stz1109

Corsaro

E.

et al. ,

2017

,

Nat. Astron.

,

1

,

0064

10.1038/s41550-017-0064

D’Alessio

P.

,

Calvet

N.

,

Hartmann

L.

,

2001

,

ApJ

,

553

,

321

10.1086/320655

Dale

J. E.

,

Ercolano

B.

,

Bonnell

I. A.

,

2015

,

MNRAS

,

451

,

987

10.1093/mnras/stv913

Dalessandro

E.

,

Raso

S.

,

Kamann

S.

,

Bellazzini

M.

,

Vesperini

E.

,

Bellini

A.

,

Beccari

G.

,

2021

,

MNRAS

,

506

,

813

10.1093/mnras/stab1257

Daniel

K. J.

,

Heggie

D. C.

,

Varri

A. L.

,

2017

,

MNRAS

,

468

,

1453

10.1093/mnras/stx571

Davis

M.

,

Efstathiou

G.

,

Frenk

C. S.

,

White

S. D. M.

,

1985

,

ApJ

,

292

,

371

10.1086/163168

Di Carlo

U. N.

,

Giacobbo

N.

,

Mapelli

M.

,

Pasquato

M.

,

Spera

M.

,

Wang

L.

,

Haardt

F.

,

2019

,

MNRAS

,

487

,

2947

10.1093/mnras/stz1453

Di Cintio

P.

,

Casetti

L.

,

2019

,

MNRAS

,

489

,

5876

10.1093/mnras/stz2531

10.1017/S1743921319006744

Di Cintio

P.

,

Casetti

L.

,

2020

, in

Bragaglia

A.

,

Davies

M.

,

Sills

A.

,

Vesperini

E.

, eds,

Star Clusters: From the Milky Way to the Early Universe

.

Vol. 351

,

Cambridge University Press

,

Cambridge, United Kingdom

, p.

426

Dib

S.

,

Henning

T.

,

2019

,

A&A

,

629

,

A135

10.1051/0004-6361/201834080

Diemand

J.

,

Kuhlen

M.

,

Madau

P.

,

2006

,

ApJ

,

649

,

1

10.1086/506377

Eddy

S. R.

,

2004

,

Nat. Biotechnol.

,

22

,

1315

PubMed

Elmegreen

B. G.

,

Elmegreen

D. M.

,

Chandar

R.

,

Whitmore

B.

,

Regan

M.

,

2006

,

ApJ

,

644

,

879

10.1086/503797

10.1088/2041-8205/787/2/L26

Fabricius

M. H.

et al. ,

2014

,

ApJ

,

787

,

L26

Federrath

C.

,

2013

,

MNRAS

,

436

,

1245

10.1093/mnras/stt1644

10.1088/0004-637X/761/2/156

Federrath

C.

,

Klessen

R. S.

,

2012

,

ApJ

,

761

,

156

10.1016/j.ascom.2017.05.004

Feng

Y.

,

Modi

C.

,

2017

,

Astron. Comput.

,

20

,

44

Ferraro

F. R.

et al. ,

2018

,

ApJ

,

860

,

50

10.3847/1538-4357/aabe2f

10.3847/0004-637X/817/1/4

Fujii

M. S.

,

Portegies Zwart

S.

,

2016

,

ApJ

,

817

,

4

Gavagnin

E.

,

Bleuler

A.

,

Rosdahl

J.

,

Teyssier

R.

,

2017

,

MNRAS

,

472

,

4155

10.1093/mnras/stx2222

Geen

S.

,

Hennebelle

P.

,

Tremblin

P.

,

Rosdahl

J.

,

2016

,

MNRAS

,

463

,

3129

10.1093/mnras/stw2235

Gieles

M.

,

Zocchi

A.

,

2015

,

MNRAS

,

454

,

576

10.1093/mnras/stv1848

Goodfellow

I. J.

,

Pouget-Abadie

J.

,

Mirza

M.

,

Xu

B.

,

Warde-Farley

D.

,

Ozair

S.

,

Courville

A.

,

Bengio

Y.

,

2014

;

preprint (arXiv:1406.2661)

Goodman

J.

,

Heggie

D. C.

,

Hut

P.

,

1993

,

ApJ

,

415

,

715

10.1086/173196

10.1111/j.1365-2966.2006.11078.x

Goodwin

S. P.

,

Bastian

N.

,

2006

,

MNRAS

,

373

,

752

10.1051/0004-6361:20031529

Goodwin

S. P.

,

Whitworth

A. P.

,

2004

,

A&A

,

413

,

929

Hemsendorf

M.

,

Merritt

D.

,

2002

,

ApJ

,

580

,

606

10.1086/343027

10.1051/0004-6361/201219472

Hénault-Brunet

V.

et al. ,

2012

,

A&A

,

545

,

L1

Hills

J. G.

,

1980

,

ApJ

,

235

,

986

10.1086/157703

Jelinek

F.

,

Lafferty

J. D.

,

Mercer

R. L.

,

1992

, in

Laface

P.

,

De Mori

R.

, eds,

Speech Recognition and Understanding

.

Springer Berlin Heidelberg

,

Berlin, Heidelberg

, p.

345

Kamann

S.

et al. ,

2018

,

MNRAS

,

473

,

5591

10.1093/mnras/stx2719

Kandrup

H. E.

,

Sideris

I. V.

,

2003

,

ApJ

,

585

,

244

10.1086/345948

Kaufman

L.

,

Rousseeuw

P. J.

,

1990

,

Finding Groups in Data: An Introduction to Cluster Analysis

.

Wiley Series in Probability and Statistics

,

Hoboken, New Jersey, United States

King

I. R.

,

1966

,

AJ

,

71

,

64

10.1086/109857

Klessen

R. S.

,

Burkert

A.

,

2000

,

ApJS

,

128

,

287

10.1086/313371

10.1088/0004-637X/754/1/71

Kolmogorov

A.

,

1941

,

Akademiia Nauk SSSR Doklady

,

30

,

301

Krumholz

M. R.

,

Klein

R. I.

,

McKee

C. F.

,

2012

,

ApJ

,

754

,

71

Kuhn

M. A.

,

Hillenbrand

L. A.

,

Sills

A.

,

Feigelson

E. D.

,

Getman

K. V.

,

2019

,

ApJ

,

870

,

32

10.3847/1538-4357/aaef8c

10.1111/j.1365-2966.2011.19412.x

Küpper

A. H. W.

,

Maschberger

T.

,

Kroupa

P.

,

Baumgardt

H.

,

2011a

,

Astrophysics Source Code Library, record ascl: 1107.015

.

Küpper

A. H. W.

,

Maschberger

T.

,

Kroupa

P.

,

Baumgardt

H.

,

2011b

,

MNRAS

,

417

,

2300

10.1146/annurev.astro.41.011802.094844

Lada

C. J.

,

Lada

E. A.

,

2003

,

ARA&A

,

41

,

57

Larson

R. B.

,

1995

,

MNRAS

,

272

,

213

10.1093/mnras/272.1.213

10.1051/0004-6361/201527981

Lee

Y.-N.

,

Hennebelle

P.

,

2016

,

A&A

,

591

,

A30

10.1051/0004-6361/201834428

Lee

Y.-N.

,

Hennebelle

P.

,

2019

,

A&A

,

622

,

A125

Li

H.

,

Vogelsberger

M.

,

Marinacci

F.

,

Gnedin

O. Y.

,

2019

,

MNRAS

,

487

,

364

10.1093/mnras/stz1271

https://doi.org/10.1016/0022-5193(68)90079-9

Lindenmayer

A.

,

1968a

,

J. Theor. Biol.

,

18

,

280

https://doi.org/10.1016/0022-5193(68)90080-5

Lindenmayer

A.

,

1968b

,

J. Theor. Biol.

,

18

,

300

Lupton

R. H.

,

Gunn

J. E.

,

1987

,

AJ

,

93

,

1106

10.1086/114394

Lynden-Bell

D.

,

1962

,

MNRAS

,

123

,

447

10.1093/mnras/123.5.447

10.1111/j.1365-2966.2009.14825.x

Maciejewski

M.

,

Colombi

S.

,

Springel

V.

,

Alard

C.

,

Bouchet

F. R.

,

2009

,

MNRAS

,

396

,

1329

Manwadkar

V.

,

Trani

A. A.

,

Leigh

N. W. C.

,

2020

,

MNRAS

,

497

,

3694

10.1093/mnras/staa1722

Mapelli

M.

,

2017

,

MNRAS

,

467

,

3255

10.1093/mnras/stx304

Michie

R. W.

,

Bodenheimer

P. H.

,

1963

,

MNRAS

,

126

,

269

10.1093/mnras/126.3.269

10.1111/j.1365-2966.2011.19782.x

Murphy

D. N. A.

,

Geach

J. E.

,

Bower

R. G.

,

2012

,

MNRAS

,

420

,

1861

10.1111/j.1365-2966.2012.21227.x

Nitadori

K.

,

Aarseth

S. J.

,

2012

,

MNRAS

,

424

,

545

10.1111/j.1365-2966.2011.19646.x

Parker

R. J.

,

Goodwin

S. P.

,

Allison

R. J.

,

2011

,

MNRAS

,

418

,

2565

Parker

R. J.

,

Wright

N. J.

,

Goodwin

S. P.

,

Meyer

M. R.

,

2014

,

MNRAS

,

438

,

620

10.1093/mnras/stt2231

Park

S.-M.

,

Goodwin

S. P.

,

Kim

S. S.

,

2018

,

MNRAS

,

478

,

183

10.1093/mnras/sty1083

10.1051/0004-6361/200912056

Pasquato

M.

,

Milone

A.

,

2019

,

preprint (arXiv:1906.04983)

Pedregosa

F.

et al. ,

2011

,

J. Mach. Learn. Res.

,

12

,

2825

Pfalzner

S.

,

2009

,

A&A

,

498

,

L37

Plummer

H. C.

,

1911

,

MNRAS

,

71

,

460

10.1093/mnras/71.5.460

Prendergast

K. H.

,

Tomer

E.

,

1970

,

AJ

,

75

,

674

10.1086/111008

Press

W. H.

,

Schechter

P.

,

1974

,

ApJ

,

187

,

425

10.1086/152650

10.1109/MASSP.1986.1165342

Prusinkiewicz

P.

,

Hanan

J.

,

2013

,

Lindenmayer Systems, Fractals, and Plants. Lecture Notes in Biomathematics

. Vol.

79

,

Springer Science & Business Media

,

Berlin, Germany

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Rabiner

L.

,

Juang

B.

,

1986

,

IEEE ASSP Magazine

,

3

,

4

Reina-Campos

M.

,

Kruijssen

J. M. D.

,

Pfeffer

J. L.

,

Bastian

N.

,

Crain

R. A.

,

2019

,

MNRAS

,

486

,

5838

10.1093/mnras/stz1236

Rodriguez-Gomez

V.

et al. ,

2015

,

MNRAS

,

449

,

49

10.1093/mnras/stv264

Ruthotto

L.

,

Haber

E.

,

2021

,

preprint (arXiv:2103.05180)

Salpeter

E. E.

,

1955

,

ApJ

,

121

,

161

10.1086/145971

Seifried

D.

et al. ,

2017

,

MNRAS

,

472

,

4797

10.1093/mnras/stx2343

Torniamenti

S.

,

Ballone

A.

,

Mapelli

M.

,

Gaspari

N.

,

Di Carlo

U. N.

,

Rastello

S.

,

Giacobbo

N.

,

Pasquato

M.

,

2021

,

MNRAS

,

507

,

2253

10.1093/mnras/stab2238

10.1051/0004-6361:20041023

Trenti

M.

,

Bertin

G.

,

2005

,

A&A

,

429

,

161

10.1051/0004-6361/201118300

Varri

A. L.

,

Bertin

G.

,

2012

,

A&A

,

540

,

A94

10.1088/0004-637X/715/2/1302

Vázquez-Semadeni

E.

,

Colín

P.

,

Gómez

G. C.

,

Ballesteros-Paredes

J.

,

Watson

A. W.

,

2010

,

ApJ

,

715

,

1302

10.1016/j.newast.2003.08.004

Wadsley

J. W.

,

Stadel

J.

,

Quinn

T.

,

2004

,

New Astron.

,

9

,

137

Wadsley

J. W.

,

Keller

B. W.

,

Quinn

T. R.

,

2017

,

MNRAS

,

471

,

2357

10.1093/mnras/stx1643

Wall

J. E.

,

McMillan

S. L. W.

,

Mac Low

M.-M.

,

Klessen

R. S.

,

Portegies Zwart

S.

,

2019

,

ApJ

,

887

,

62

10.3847/1538-4357/ab4db1

Wang

L.

et al. ,

2016

,

MNRAS

,

458

,

1450

10.1093/mnras/stw274

Wang

L.

,

Hernandez

D. M.

,

2021

,

preprint (arXiv:2104.10843)

Wang

L.

,

Spurzem

R.

,

Aarseth

S.

,

Nitadori

K.

,

Berczik

P.

,

Kouwenhoven

M. B. N.

,

Naab

T.

,

2015

,

MNRAS

,

450

,

4070

10.1093/mnras/stv817

Ward

J. L.

,

Kruijssen

J. M. D.

,

Rix

H.-W.

,

2020

,

MNRAS

,

495

,

663

10.1093/mnras/staa1056

Wilson

C. P.

,

1975

,

AJ

,

80

,

175

10.1086/111729

Zamora-Avilés

M.

et al. ,

2019

,

MNRAS

,

487

,

2200

10.1093/mnras/stz1235