Large-scale earthquake sequence simulations on 3-D non-planar faults using the boundary element method accelerated by lattice H-matrices

Ozawa, So; Ida, Akihiro; Hoshino, Tetsuya; Ando, Ryosuke

doi:10.1093/gji/ggac386

SUMMARY

Large-scale earthquake sequence simulations using the boundary element method (BEM) incur extreme computational costs through multiplying a dense matrix with a slip rate vector. Hierarchical matrices (H-matrices) have often been used to accelerate this multiplication. However, the complexity of the structures of the H-matrices and the communication costs between processors limit their scalability, and they therefore cannot be used efficiently in distributed memory computer systems. Lattice H-matrices have recently been proposed as a tool to improve the parallel scalability of H-matrices. In this study, we developed a method for earthquake sequence simulations applicable to 3-D non-planar faults with lattice H-matrices. We present a simulation example and verify the mesh convergence of our method for a 3-D non-planar thrust fault using rectangular and triangular discretizations. We also performed performance and scalability analyses of our code. Our simulations, using over |${10}^5$| degrees of freedom, demonstrated a parallel acceleration beyond |${10}^4$| MPI processors and a > 10-fold acceleration over the best performance when the normal H-matrices are used. Using this code, we can perform unprecedented large-scale earthquake sequence simulations on geometrically complex faults with supercomputers. The software is made an open-source and freely available.

Seismic cycle, Numerical modelling, Computational seismology, Rheology and friction of fault zones, Dynamics and mechanics of faulting, Boundary element method, H-matrices

1 INTRODUCTION

Physics-based numerical simulations are an important tool in studying underlying mechanisms of earthquake processes. Among them, earthquake sequence simulations using rate and state friction laws, originating from Tse & Rice (1986) and Rice (1993), are widespread. Compared with single dynamic rupture simulations, which is the other widespread discipline in physics-based simulations of earthquakes, earthquake sequence simulations do not require the assumption of initial conditions as they solve the movement of the fault during both coseismic and interseismic periods in a single numerical framework. Several researchers now use earthquake sequence simulations to understand how faults behave under various conditions and how different model ingredients (e.g. fault rheology) influence an earthquake sequence (Erickson et al. 2020).

Among various computational methods, the boundary element method (BEM) is often used because of its ease in handling complex fault geometries (Hori et al. 2004; Ohtani et al. 2014; Qiu et al. 2016; Yu et al. 2018; Thompson & Meade 2019). Another superiority of BEM is its relatively smaller computational cost. In fact, a recent comparison also highlights the computational cost of BEM is smaller than that of volume-discretized methods for a given grid spacing and accuracy (Jiang et al. 2022).

Nevertheless, large-scale BEM simulations require huge computational costs. To increase the resolution of the simulation, characteristic element sizes often need to be reduced. In 3-D simulations (2-D fault in 3-D space), if the characteristic element size is reduced by a factor of 2, the increase in N is a factor of 4. In the original BEM, the computational cost for each time step scales with |$O( {{N}^2} )$|⁠, where N is the number of discretized elements. This is because multiplications of a dense matrix and a vector (slip rate distribution) are necessary to evaluate the stress change on each element at every time step. Furthermore, the time step width must be small if we use small elements, which increases the repetition of matrix-vector multiplications. Thus, the computational cost increases rapidly with a decrease in the element size.

Several methods can reduce the complexity of |$O( {{N}^2} )$| to |$O( {N{\rm{log}}N} )$|⁠. The fast Fourier transform (FFT) method is often used for this purpose (Kato 2003; Lapusta & Liu 2009), but is limited to planar or relatively simple fault geometries due to the assumption of translational symmetry (Romanet & Ozawa 2022; Barbot 2021). The FFT also cannot process non-vertical faults. Hierarchical matrices (H-matrices, Hackbusch 1999) are an alternative that can be used for more general fault geometries. Ohtani et al. (2011) showed a significant acceleration in earthquake cycle simulations with H-matrices, and it is now common to use H-matrices in BEM-based quasi-dynamic earthquake sequence simulations (Ohtani et al. 2014; Hyodo et al. 2016; Romanet 2017; Galvez et al. 2020; Heimisson 2020; Ozawa & Ando 2021). For example, Hyodo et al. (2016) performed earthquake sequence simulations in the Nankai trough megathrust using ∼300 000 elements. H-matrices have also recently been used in elastodynamic problems (Chaillat et al. 2017; Sato & Ando 2021).

A few parallelized libraries of H-matrices have been developed to deal with large-scale computation. For example, H-lib pro (Bebendorf & Kriemann 2005) is a software for parallel H-matrices on distributed memory systems. Additionally, there are several parallel H-matrices libraries on GPU, such as HiCMA (Keyes et al. 2020), hmglib (Zaspel 2019), and HACApK on GPU (Hoshino et al. 2018). In the earthquake modelling community, hmmvp (Bradley 2014) was developed for arbitrary-shaped faults composed of rectangular elements, which was later used in QDYN (Luo et al. 2017; Galvez et al. 2020), an open-source earthquake sequence code.

However, Ida et al. (2014) showed that conventional H-matrices have weakness in parallel scalability, which is important in computations using supercomputers. Owing to the increase in communication costs and load imbalance, the parallel speed increase is generally less than the expectation from the ideal linear scalability. Ida et al. (2014) showed that the computational speed of an H-matrix-vector multiplication (HMVM) saturates < 100 cores in the Poisson equation of the |$N\sim 100\,000$| problem. Thus, we could not efficiently use a large number of cores in the H-matrices.

Recently, Ida (2018) proposed the lattice H-matrices. The lattice H-matrices contain convenient structures to construct an efficient communication pattern compared with the normal H-matrices while maintaining the |$O( {N{\rm{log}}N} )$| memory compression. In addition, a relatively adequate load balance is maintained in the case of lattice H-matrices, even if a large number of MPI (Message Passing Interface) processes are used. It has been applied to micromagnetic simulations (Ida et al. 2020). The implementation of lattice H-matrices is freely available as an open-source in the HACApK library.

In this paper, we present a state-of-the-art quasi-dynamic earthquake simulator using conventional and lattice H-matrices, which is applicable to arbitrarily shaped fault(s) embedded in an elastic half-space. Our computational code HBI is open-source and freely available under the MIT license. The code has also been validated against a benchmark problem for a planar strike-slip fault as defined by the Simulation of Earthquakes and Aseismic Slip (SEAS) project (see Jiang et al. 2022). The structure of the manuscript is as follows. In Section 2, we describe the method for BEM-based earthquake sequence simulations. In Section 3, normal and lattice H-matrices are described. In Section 4, we show the simulation results and parallel scalability. Section 5 discusses and concludes the study.

2 METHOD OF EARTHQUAKE SEQUENCE SIMULATIONS ON 3-D NON-PLANAR FAULTS

The basic structure of our method is similar to many previous earthquake sequence simulations. Using BEM (Section 2.1), the shear and normal stress changes due to slip are obtained. Coupling them with the rate and state friction law for each element leads to three ODEs as shown in Section 2.2. To solve the time evolution problem, we use the Runge–Kutta method with error-based control of the step width.

2.1 Boundary element method

We assume a homogeneous and isotropic medium. The medium satisfies the equilibrium equation and Hooke's law with Lame constants |$\lambda $| and |$\mu $|⁠,

$$\begin{eqnarray*} {\sigma }_{ij,j} = \ 0, \end{eqnarray*}$$

(1)

$$\begin{eqnarray*} {\sigma }_{ij} = \ \lambda {\epsilon }_{kk}{\delta }_{ij} + 2\mu {\epsilon }_{ij,} \end{eqnarray*}$$

(2)

where |${\sigma }_{ij}$| is the stress tensor, |${\epsilon }_{ij}$| the strain tensor, |${\delta }_{ij}$| the Kronecker's delta, respectively and subscripts i and j run |$x,y,z$|⁠. Subscript|$,j$| indicates the partial derivative with respect to the spatial coordinate |${x}_j$|⁠. The Einstein summation convention is used.

We use the BEM to compute the elastic interaction (Bonnet 1999), in which the elliptic partial differential eqs (1) and (2) are transformed into integral equations. The shear stress change |${\rm{\Delta }}\tau $| and normal stress change |${\rm{\Delta }}\sigma $| are represented as the integral of the kernel function multiplied by the slip distribution on the fault surface (displacement discontinuity):

$$\begin{eqnarray*} {\rm{\Delta }}\tau \ \left( {\boldsymbol{x}} \right) = \int {K}_{\mathrm{ shear}}\left( {{\boldsymbol{x}},{\boldsymbol{\xi }}} \right){\rm{\Delta }}u\left( {\boldsymbol{\xi }} \right)dS\left( {\boldsymbol{\xi }} \right), \end{eqnarray*}$$

(3)

$$\begin{eqnarray*} {\rm{\Delta }}\sigma \ \left( {\boldsymbol{x}} \right) = \int {K}_{\mathrm{ normal}}\left( {{\boldsymbol{x}},{\boldsymbol{\xi }}} \right){\rm{\Delta }}u\left( {\boldsymbol{\xi }} \right)dS\left( {\boldsymbol{\xi }} \right), \end{eqnarray*}$$

(4)

where |${K}_{\mathrm{ shear}}$| and |${K}_{\mathrm{ normal}}$| are elastostatic integration kernels derived from the Green's functions (e.g. Segall 2010), and |${\rm{\Delta }}u$| is the slip distribution.

To numerically evaluate eqs (3) and (4), we divide the fault surface into N elements and denoted the index set as |$I\ = \ 1, \ldots ,N$|⁠. The shapes of the elements are either rectangular or triangular. In a discretized form using step functions as the base functions, the stress changes on the ith element are represented as:

$$\begin{eqnarray*} {\rm{\Delta }}{\tau }_i = \mathop \sum \limits_j^N {A}_{ij}{D}_j\ , \end{eqnarray*}$$

(5)

$$\begin{eqnarray*} {\rm{\Delta }}{\sigma }_i = \mathop \sum \limits_j^N {B}_{ij}{D}_j\ . \end{eqnarray*}$$

(6)

where |$D\ \in {\mathbb{R}}^N$| is the slip vector and A and |$B \in {\mathbb{R}}^{N \times N}$| are dense stiffness matrices. The entries of A and B are calculated using the half-space solutions of Nikkhoo & Walter (2015) and Okada (1992) for uniform slip (i.e. piecewise-constant interpolation) in triangular and rectangular elements, respectively. The evaluation point of the stress component is the centroid of each element. Triangular unstructured elements have more flexibility in fault geometry than rectangular elements. Note that Okada's solution also has a limitation that two parallel sides of an element must be horizontal. However, Barall & Tullis (2016) found that rectangles outperform triangles in terms of the accuracy of the stress value. We will also compare the performance of triangular and rectangular meshes in later sections.

Notably, the normal stress change has often been neglected in several previous earthquake sequence simulations, unlike in single-event dynamic rupture simulations. Normal stress changes originate from broken symmetries such as non-planar faults, free surfaces and material heterogeneities.

2.2 Governing equations

The boundary condition of each element is governed by the regularized rate and state friction law. Following Rice et al. (2001), the shear and normal tractions at each element are related as follows:

$$\begin{eqnarray*} \frac{{{\tau }_i}}{{{\sigma }_i}} = a{\rm{arcsinh}}\left( {\frac{{{V}_i}}{{2{V}_0}}{e}^{{\psi }_i/a}} \right)\ , \end{eqnarray*}$$

(7)

where |${V}_i\ ( t ) = \frac{{d{D}_i}}{{dt}}\ $| is the slip rate, |${\psi }_i( t )$| is the state variable, a is the coefficient of the direct effect and |${V}_0$| is the reference slip rate.

The evolution law for the state variable is given by the aging law (Dieterich 1979; Ruina 1983):

$$\begin{eqnarray*} \frac{{d{\psi }_i}}{{dt}} = \frac{b}{{{d}_c}}\ \left[ {{V}_0\exp \left( {\frac{{{f}_0 - {\psi }_i}}{b}} \right) - {V}_i} \right], \end{eqnarray*}$$

(8)

where |${f}_0$| is the reference friction coefficient, b is the coefficient of the evolution effect and |${d}_c$| is the characteristic slip distance. Using the stiffness matrices A and B calculated in the previous section, the shear and normal stress changes are given as follows:

$$\begin{eqnarray*} \frac{{d{\tau }_i}}{{dt}} = \mathop \sum \limits_j^N {A}_{ij}{V}_j\ + {\dot{\tau }}_i - \frac{\mu }{{2{c}_s}}\frac{{d{V}_i}}{{dt}}, \end{eqnarray*}$$

(9)

$$\begin{eqnarray*} \frac{{d{\sigma }_i}}{{dt}} = \mathop \sum \limits_j^N {B}_{ij}{V}_j\ + {\dot{\sigma }}_i, \end{eqnarray*}$$

(10)

where |$\mu $| is the rigidity, |${c}_s$| is the S wave speed, |${\dot{\tau }}_i$| and |${\dot{\sigma }}_i$| are the tectonic loading rates for shear and normal stresses on the ith element, respectively. The first terms in both eqs (9) and (10) represent the stress rates caused by slip (time derivative of eqs 5 and 6). The third term for the shear stress is radiation damping, which is an approximation of inertia (Rice 1993). Earthquake sequence simulations using this approximation are ‘quasi-dynamic,’ and the effect of this approximation has been explored in previous studies (e.g. Lapusta & Liu 2009).

We eliminate |$d{V}_i/dt$| from eq. (9) using the total derivative of V:

$$\begin{eqnarray*} \frac{{d{V}_i}}{{dt}} = \frac{{\partial {V}_i}}{{\partial {\tau }_i}}\ \frac{{d{\tau }_i}}{{dt}} + \frac{{\partial {V}_i}}{{\partial {\sigma }_i}}\frac{{d{\sigma }_i}}{{dt}} + \frac{{\partial {V}_i}}{{\partial {\psi }_i}}\frac{{d{\psi }_i}}{{dt}}, \end{eqnarray*}$$

(11)

so that:

$$\begin{eqnarray*} \frac{{d{\tau }_i}}{{dt}} = {\left( {1 + \frac{\mu }{{2{c}_s}}\frac{{\partial {V}_i}}{{\partial {\tau }_i}}} \right)}^{ - 1}\ \left[ {\mathop \sum \limits_j^N {A}_{ij}{V}_j + {{\dot{\tau }}}_i - \frac{\mu }{{2{c}_s}}\left( {\frac{{\partial {V}_i}}{{\partial {\sigma }_i}}\frac{{d{\sigma }_i}}{{d{t}_i}} + \frac{{\partial {V}_i}}{{\partial {\psi }_i}}\frac{{d{\psi }_i}}{{dt}}} \right)} \right],\nonumber \\ \end{eqnarray*}$$

(12)

where the partial derivatives are (from eq. 7):

$$\begin{eqnarray*} \frac{{\partial {V}_i}}{{\partial {\tau }_i}} = \frac{{2{V}_0}}{{a{\sigma }_i}}\ {e}^{ - {\psi }_i/a}\cosh \left( {\frac{{{\tau }_i}}{{a{\sigma }_i}}} \right), \end{eqnarray*}$$

(13a)

$$\begin{eqnarray*} \frac{{\partial {V}_i}}{{\partial {\sigma }_i}} = \ - \frac{{2{V}_0{\tau }_i}}{{a\sigma _i^2}}{e}^{ - {\psi }_i/a}\cosh \left( {\frac{{{\tau }_i}}{{a{\sigma }_i}}} \right), \end{eqnarray*}$$

(13b)

$$\begin{eqnarray*} \frac{{\partial {V}_i}}{{\partial {\psi }_i}} = \ - \frac{{2{V}_0}}{a}{e}^{ - {\psi }_i/a}\sinh \left( {\frac{{{\tau }_i}}{{a{\sigma }_i}}} \right). \end{eqnarray*}$$

(13c)

Eqs (13a)–(13c) are substituted into eq. (12). Eqs (8), (10) and (12) form |$3N$| ordinary differential equations (ODEs) |$\frac{{d{\boldsymbol{y}}}}{{dt}} = \ f( {\boldsymbol{y}} )$|⁠, where |${\boldsymbol{y\ }} = \ ( {{\psi }_1, \ldots ,{\psi }_N,{\tau }_1, \ldots {\tau }_N,{\sigma }_1, \ldots ,{\sigma }_N} )$|⁠. We solve these equations using the Runge–Kutta method with adaptive time-stepping (Press et al. 2007). We compute |$y( {t + {\rm{\Delta }}{t}_{\mathrm{ try}}} )$| with fifth-order accuracy. If the maximum value of the relative difference between the fourth- and fifth-order solutions is larger than the allowance |${\varepsilon }_{\mathrm{ RK}}$|⁠, we retry the time integration as follows:

$$\begin{eqnarray*} {\rm{\Delta \ }}{t}_{\mathrm{ new}} = \max \left( {\frac{{{\rm{\Delta }}{t}_{\mathrm{ try}}}}{2},\ \ 0.9{\rm{\Delta }}{t}_{\mathrm{ try}}{\varepsilon }_{\mathrm{ RK}}^{ - 0.25}} \right)\ \end{eqnarray*}$$

(14)

We choose |${\varepsilon }_{\mathrm{ RK}} = {10}^{ - 4}\ .$| If larger values are used, numerical instability sometimes arises. If the error is below the threshold, we update the variables and calculate the next time step using the following formula:

$$\begin{eqnarray*} {\rm{\Delta \ }}{t}_{\mathrm{ next}} = \min \left( {2{\rm{\Delta }}{t}_{\mathrm{ try}},\ \ 0.9{\rm{\Delta }}{t}_{try}{\varepsilon }_{\mathrm{ RK}}^{ - 0.2}} \right)\ . \end{eqnarray*}$$

(15)

As a result, the time step is approximately inversely proportional to the maximum slip rate. This property results from the displacement in each time step having to be smaller than the characteristic state evolution distance. Lapusta et al. (2000) and many other studies explicitly adapted inverse-slip rate time step widths based on stability analyses. The resultant |${\rm{\Delta }}t$| weakly decreases with decreasing the element size if other parameters are identical. Finally, the slip is updated as follows:

$$\begin{eqnarray*} {D}_i\ \left( {t + {\rm{\Delta }}t} \right) = {D}_i\ \left( t \right) + \frac{{\Delta t}}{2}\left( {{V}_i\left( t \right) + {V}_i\left( {t + {\rm{\Delta }}t} \right)} \right). \end{eqnarray*}$$

(16)

3 H-MATRICES

The most time-consuming step in solving the ODEs is the matrix-vector multiplication in eqs (9) and (10) because it has an |$O( {{N}^2} )$| complexity, whereas others display an |$O( N )$| complexity (hereafter referred to as the |$O( N )$| part). We aim to reduce the complexity of the matrix-vector multiplications to |$O( {N{\rm{log}}N} )$| using conventional and lattice H-matrices. Our method is the same as Ida et al. (2014) and Ida (2018), except for the connection between the H-matrix-vector multiplication (HMVM) and the |$O( N )$| part, which is specific to earthquake sequence simulations, In the implementation, we use the open-source library HACApK for the construction of H-matrices and HMVM. We will evaluate the performance of the normal H-matrices (Section 3.1) and lattice H-matrices (Section 3.2) in Section 4.5.

3.1 H-matrices

H-matrices are an efficient method to compress the memory of the dense matrix derived from the integral operator (Hackbusch 1999; Borm et al. 2006). The 3-D elastostatic kernel (⁠|${K}_{\mathrm{ shear}}( {{\boldsymbol{x}},{\boldsymbol{\xi }}} )$| and |${K}_{\mathrm{ normal}}( {{\boldsymbol{x}},{\boldsymbol{\xi }}} )$|⁠) exhibits |${| {{\boldsymbol{x}} - {\boldsymbol{\xi }}} |}^{ - 3}\ $|decay, and this kernel function can locally be approximated as a degenerate kernel (i.e. |$K( {{\boldsymbol{x}},{\boldsymbol{\xi }}} )\sim \mathop \sum \limits_k {g}_k( {\boldsymbol{x}} ){h}_k( {\boldsymbol{\xi }} ))$| for a distant source and receiver points. This allows for constructing H-matrices for dense matrices |$A\ {\rm{and}}\ B$| in eqs (5) and (6) for typical mesh geometries. We construct H-matrices by the procedure in Sections 3.1.1 and 3.1.2 before the time integration. To advance the one-time integration, we perform six HMVM for shear and normal stress rates (Section 3.1.3). The parallelization technique is described in Section 3.1.4.

3.1.1 Block partitioning

The construction of a block structure of an H-matrix consists of the following steps (Borm et al. 2006). First, we construct a binary cluster tree for the set of triangular or rectangular elements using the |$\ ( {x,\ y,\ z} )\ $| coordinates of their centroid. We denote a cluster set as |${{\rm{\Omega }}}_i\ ( {i\ = \ 1, \ldots ,{N}_{\rm{\Omega }}} )$|⁠. We set the minimum cluster size to 15 (Ida 2018). Then, we construct a partition structure of the matrix using the following admissibility condition:

$$\begin{eqnarray*} \mathrm{ min}\left( {diam\left( {{{\rm{\Omega }}}_i} \right), diam\left( {{{\rm{\Omega }}}_j} \right)} \right) < \ \eta \ dist\left( {{{\rm{\Omega }}}_i,{{\rm{\Omega }}}_j} \right) \end{eqnarray*}$$

(17)

where diam is the diameter of the cluster and dist is the distance between the two clusters. This condition is derived from the ability of the kernel function to approximately degenerate for distant source and receiver points. Typically, we set the parameter |$\eta \ = \ 2$|⁠, following Ohtani et al. (2011).

Fig. 1(b) shows an example of an obtained partition structure of the H-matrices for a rectangular-shaped fault plane (Fig. 1a) divided by triangular meshes. We reorder the index of the elements |$I\ = \ \{ {1, \ldots ,N} \}$| according to the structure of the cluster tree in the construction of the matrix structure. Blocks located at far-diagonal parts tend to be larger than those around diagonal parts because they correspond to the interactions of distant locations, and the admissibility condition is easy to satisfy (see eq. 17). The partition structure is further described regarding the parallelization later.

$Left: a square-shaped fault using 6646 unstructured triangular elements created by Gmsh. Right: the corresponding block structures of H-matrices made by the admissibility condition $\eta \ = \ 2$.$

Figure 1.

Left: a square-shaped fault using 6646 unstructured triangular elements created by Gmsh. Right: the corresponding block structures of H-matrices made by the admissibility condition |$\eta \ = \ 2$|⁠.

Open in new tab Download slide

3.1.2 Low-rank approximation

Let |$L,M\ \subset I,\ {\rm{\ and}}\ {A}_{LM} \in {\mathbb{R}}^{L \times M}\ {\rm{be\ a\ submatrix\ of\ \ }}A \in {\mathbb{R}}^{N \times N}$|⁠. A submatrix |${A}_{LM}$| is compressed by a low-rank approximation (LRA) if possible; otherwise, we use the dense matrix (full-rank matrix). A low-rank approximated submatrix |${\skew7\tilde{A}}_{LM}$| is represented as follows:

$$\begin{eqnarray*} {A}_{ij} \approx \ {\skew7\tilde{A}}_{ij} = \ \mathop \sum \limits_{k\ = \ 1}^{{r}_{LM}} {g}_{ik}{h}_{kj}, \end{eqnarray*}$$

(18)

where |$g \in {\mathbb{R}}^{| L | \times {r}_{LM}},\ h \in {\mathbb{R}}^{{r}_{LM} \times | M |},$| and |${r}_{LM}$| is the rank of the approximated matrix. As the LRA, we apply the adaptive cross approximation (ACA, Bebendorf 2000) whose computational complexity is O(||$L|{r}_{LM}$|⁠) when |$| L | \ge | M |$|⁠. Although the singular value decomposition is the optimal method for LRA, its complexity is O(⁠|$| L |{| M |}^2$|⁠) as it repeats matrix-vector multiplications M times, which is too computationally expensive because we suppose |$M \gg {r}_{LM}$|⁠. The rank |${r}_{LM}$| is controlled by the error tolerance, |${\varepsilon }_{ACA}$|⁠;

$$\begin{eqnarray*} \frac{{\|A - \skew7\tilde{A}{\|}_F}}{{\|A{\|}_F}} < {\varepsilon }_{ACA}, \end{eqnarray*}$$

(19)

where |$\| \cdot {\|}_F$| denotes the Frobenius norm. Note that this condition is not rigorously achieved in ACA because we do not compute all entries of the matrix. Instead, in ACA we increase the rank one-by-one by computing a row or column of the matrix and stop the iteration once the difference becomes smaller than |${\varepsilon }_{ACA}$|⁠. This is why the complexity is O(||$L|{r}_{LM}$|⁠). We will evaluate the effect of the value of |${\varepsilon }_{ACA}$| in numerical experiments. We also use the method proposed by Ida et al. (2015) to prevent the H-matrix from having an excessively large rank.

3.1.3 H-matrix and vector multiplication

A matrix-vector multiplication |$AV$| is performed submatrix-wise. For full submatrices |${A}_{LM}$|⁠, the arithmetic is the normal matrix-vector multiplication. For low-rank approximated submatrices |${\skew7\tilde{A}}_{LM}$|⁠, we perform the arithmetic as

$$\begin{eqnarray*} \mathop \sum \limits_j^{\left| L \right|} {A}_{ij}{V}_j \approx \ \ \mathop \sum \limits_j^{\left| L \right|} \mathop \sum \limits_k^{{r}_{LM}} {g}_{ik}{h}_{kj}{V}_j = \mathop \sum \limits_k^{{r}_{LM}} {g}_{ik}\ \left( {\mathop \sum \limits_j^{\left| L \right|} {h}_{kj}{V}_j} \right). \end{eqnarray*}$$

(20)

The original computation using dense matrices requires |$O( {| L || M |} )$|⁠, while it becomes |$O( {( {| L | + | M |} ){r}_{LM}} )$| in an HMVM. If the rank |${r}_{LM}$| is much smaller than |$min( {| L |,| M |} )$|⁠, then the number of operations is significantly reduced. The arithmetic of eq. (20) results in a part of the vector, and the full vector is obtained by taking the summation of all the submatrix-wise HMVMs

3.1.4 Parallel earthquake sequence simulation using H-matrices

Our earthquake sequence simulation code is parallelized using an MPI. Submatrices on the H-matrix are assigned to MPI processes, and each MPI process contains a quasi-1-D-sliced portion of the entire matrix (Fig. 3a). This does not represent a complete 1-D slice because a submatrix cannot be separated into multiple MPI processes. For the HMVM, each processor possesses a full slip rate vector, but the resultant stress rate vector comprises a part of the slip rate vector in general. To obtain the full stress-rate vector, communication between processes is necessary. We do this by sending the vector between neighbouring MPI processes by calling MPI_iSEND and MPI_iRECV. Thus, the complexity of the communication cost is |$O( {N{N}_p} )$|⁠, where |${N}_p$| denotes the number of MPI processes. The algorithm is described in detail in Ida et al. (2014).

For |$O( N )$| part (element-wise computation), each MPI process is responsible for part of the vector. To construct the full-size vector required for the HMVM, MPI_Allgather is called before the HMVM. To perform a parallel computation of the |$O( N )$| part, MPI_Scatter is called after the HMVM. As the number of MPI processes increases, the performance deteriorates owing to the MPI communication costs (both inside and outside the HMVM) in this method.

3.2 Lattice H-matrices

As explained above, earthquake sequence simulations using conventional H-matrices are not suitable for large-scale parallel computations because of the communication cost and load imbalance resulting from their extremely complex structure. To overcome this difficulty, Ida (2018) proposed lattice H-matrices. In this section, we describe the method for earthquake sequence simulations using lattice H-matrices. Hereafter, the H-matrices described in the previous section are referred to as normal H-matrices.

We first construct a cluster tree in the same way as the normal H-matrices, except that we truncate the depth L of the cluster tree. We then construct a lattice structure using a truncated cluster tree. Then, an H-matrix is constructed for each lattice block in the same way as the normal H-matrices if it is admissible in terms of eq. (17). The depth L determines the number of lattice blocks (Figs 2b and c). As a result, the block structure of lattice H-matrices is simpler than normal H-matrices. LRA of each submatrix is the same as normal H-matrices.

$Comparison of the block structure and assignments to MPI processes of normal and lattice H matrices. Colours correspond to MPI processes $( {\ {N}_p = \ 9} )$. (a) is a normal H-matrix. (b) and (c) represent lattice H-matrices using 3 × 3 process grid (shown in blue frames). (b) is a 4 × 4 lattice ($q\ = \ 1$) and (c) is an 8 × 8 lattice ($q\ = \ 2$).$

Figure 2.

Comparison of the block structure and assignments to MPI processes of normal and lattice H matrices. Colours correspond to MPI processes |$( {\ {N}_p = \ 9} )$|⁠. (a) is a normal H-matrix. (b) and (c) represent lattice H-matrices using 3 × 3 process grid (shown in blue frames). (b) is a 4 × 4 lattice (⁠|$q\ = \ 1$|⁠) and (c) is an 8 × 8 lattice (⁠|$q\ = \ 2$|⁠).

Open in new tab Download slide

We utilize the lattice structure for assigning MPI processes. This is achieved by introducing a process grid that has |${N}_{pr}$| rows and |${N}_{pl}$| columns (⁠|${N}_{pr}\ \times \ \ {N}_{pl} = \ {N}_p$|⁠). We 2-D-cyclically array this process grid on the lattice blocks, which means that each MPI process has discontinuous blocks of the matrix (Figs 2b and c). The number of lattice blocks is determined by the number of MPI processes, which ensures that q process grids are repeated in rows and columns (Figs 2b and c). This condition gives |$L=\lfloor {\log }_2( {\sqrt {{N}_p} q} )\rfloor$| because the binary tree is adopted. In a fixed q, as |${N}_p$| increases, each lattice becomes smaller. Thus, the entire matrix is divided into a larger number of submatrices, and the memory becomes larger compared with the normal H-matrices as confirmed later. However, in the procedure of HMVM using lattice H-matrices, we significantly reduce the communication traffic compared with the algorithm used in normal H-matrices. After the arithmetic of HMVM (eq. 20) assigned to each MPI process, diagonal MPI processes obtain part of the stress rate vector using MPI_Reduce along each row in the process grid, and then send it to other MPI processes in each column in the process grid using MPI_Bcast. This algorithm, which was first proposed by Ida et al. (2018) for block low-rank matrices, utilizes the lattice structure, and to perform this algorithm, the number of processors must be a squared number (Fig. 3). Owing to the existence of the diagonal processes, only MPI communications between |$\sqrt {{N}_p} $| processes are necessary, not all-to-all communication. It is also notable that only one MPI_Reduce and MPI_Bcast are called per each HMVM regardless of the number of MPI processes. Hence, the complexity of the communication costs for the HMVM using lattice H-matrices is |$O( N )$|⁠, which is reduced from that of normal H-matrices |$O( {N{N}_p} )$|⁠.

Figure 3.

Schematic illustration of the algorithm of the MPI communication for HMVM (From Ida 2018).

Open in new tab Download slide

The HMVM in the lattice H-matrices requires only a part of the slip rate vector (size ∼ |$N/\sqrt {{N}_p} $|⁠) for each MPI process. In addition, each MPI process has identical indices of the resultant stress rate vector to the slip rate vector owing to the use of a squared number of processes. Because each MPI process is in charge of the same part of the vector for element-wise computation (the O(N) part) as the HMVM, unlike the normal H-matrix algorithm, MPI communication is not necessary before and after the HMVM. Note that this algorithm performs redundant computations for the O(N) part between |$\sqrt {{N}_p} \ $|MPI processes. However, as confirmed later, HMVM comprises ∼90 per cent of the computational time in the case of |$O( {{{10}}^5} )$| problems and a few tens of thousands of MPI processes, thus this redundant computation does not deteriorate the overall performance.

4 NUMERICAL EXPERIMENTS

In this section, we perform numerical experiments using our code HBI, which implements the algorithm detailed in the previous sections. For convergence analysis, we use lattice H-matrices. For performance analysis, we use both normal and lattice H-matrices.

4.1 Problem setting

A non-planar fault is embedded in an elastic half-space, with elastic constants of|$\ \ {c}_s = \ 3.464\ $| km s⁻¹, |$\ \ {c}_p = \ 6\ $| km s⁻¹ and |$\mu \ = \ 32.04$| GPa. The fault geometry is shown in Fig. 4. The fault is 50 km in the along-strike length and 20 km in the along-dip length. The shallower (30° dip angle) and deeper (10° dip angle) parts are smoothly connected. The upper edge of the fault cut the free surface. We fix |$b\ = \ 0.020$| and vary the |$a-b$| values, as shown in Fig. 4 in colour. We set |$a/b\ = \ 0.75$| in the velocity-weakening zone. The characteristic slip distance |${d}_c$| is uniformly set to 0.02 m. The initial normal and shear stressesa re uniformly set to 58 and 100 MPa, respectively. For simplicity, we neglect the depth dependence of the initial shear and normal stresses.

Figure 4.

Fault geometry used in this study. The fault is 50 km in the along-strike length and 20 km in the along-dip length. There is a bend at dip = 10 km. The dip angle is 30º at the surface and 10º at the bottom. The colour indicates the distribution of a–b values.

Open in new tab Download slide

The loading approach is the backslip method with a plate rate |${V}_{pl} = {10}^{ - 9}\ $| m s⁻¹ for both the shear and normal stresses (e.g. Heimisson 2020).

$$\begin{eqnarray*} {\dot{\tau }}_i = \ - {V}_{pl}\mathop \sum \limits_{j\ = \ 1}^N {A}_{ij}, \end{eqnarray*}$$

(21)

$$\begin{eqnarray*} {\dot{\sigma }}_i = \ - {V}_{pl}\mathop \sum \limits_{j\ = \ 1}^N {B}_{ij}, \end{eqnarray*}$$

(22)

4.2 Simulation results and convergence test

Fig. 5 shows the slip distribution of the first six events on the fault. Both partial ruptures and full ruptures occur in the velocity weakening zone of the fault. The interval of two partial ruptures (∼100 d) is much smaller than the period of the earthquake cycle (∼160 yr), thus triggering of the latter rupture by the former rupture can be inferred. The cross section of cumulative slip is shown in Fig. 6 and indicates that ruptures nucleate at the edge of the velocity weakening area and propagate toward the centre of the velocity-weakening zone. It also shows the free surface produces significant coseismic slip despite its velocity-strengthening friction.

Figure 5.

Slip distribution of the first 6 earthquakes. The result of 100 000 rectangular elements.

Open in new tab Download slide

$Cumulative slip at every 20 yr during the interseismic period (green solid lines) and every 5 s during the coseismic period (purple dashed lines). (a) Cross-section along $x\ = \ 0$ km. (b) Cross-section along 10 km along dip. The result of 100 000 rectangular elements.$

Figure 6.

Cumulative slip at every 20 yr during the interseismic period (green solid lines) and every 5 s during the coseismic period (purple dashed lines). (a) Cross-section along |$x\ = \ 0$| km. (b) Cross-section along 10 km along dip. The result of 100 000 rectangular elements.

Open in new tab Download slide

A previous study with a 2-D planar fault demonstrated that, in this type of loading (back slip), the condition of the occurrence of the partial rupture is |$W/{h}^* \gg 1$| (Cattania 2019), where W is the dimension of the velocity-weakening region and |${h}^*$| is given by |${h}^* = \frac{{2\mu {d}_c}}{{\pi ( {b - a} )\sigma }}\ $|⁠. Otherwise, only full system size ruptures occur. We assume |$W/{h}^*$|>10 (⁠|${h}^*$| ∼2 km and |$W > $|20 km), and the condition of partial ruptures is met. We believe that free surface effects and fault bends would also contribute to the earthquake sequence by modulating the elastic stress transfer, but a detailed discussion on the complexity mechanism is beyond the scope of this study.

To attain the convergence of the numerical results, previous studies have shown that the following length scale must be resolved by at least a few elements (e.g. Rubin & Ampuero 2005)

$$\begin{eqnarray*} {L}_b = \frac{{\mu {d}_c}}{{b\sigma }}\ . \end{eqnarray*}$$

(23)

We perform a convergence test. Fig. 7 shows the evolution of the mean friction coefficient using different mesh sizes for |${\rm{\Delta }}s$|⁠. In the case of unstructured triangular elements, |${\rm{\Delta }}s$| is defined as an input parameter of a free software Gmsh, which is used in creating unstructured meshes allowing the factor of 1.5 variability of the side lengths. Coarse meshes lead to different timing of ruptures, while finer meshes exhibit good agreements (Fig. 7). It seems that |${L}_b/{\rm{\Delta }}s > 3$| is enough for obtaining the same event history. However, the number of elements N for triangular meshes is around twice as large as that of rectangular meshes for a given |${\rm{\Delta }}s$|⁠. Thus, triangular elements need a larger number of elements than rectangular elements to attain an appropriate result.

Figure 7.

The time evolution of mean friction coefficient on the simulations using different element sizes. (a) Rectangular elements and (b) triangular elements. The result of 100 000 rectangular elements.

Open in new tab Download slide

4.3 Error due to low-rank approximation

We here examine the effect of approximation errors in LRA employed in our simulation. Fig. 8 shows the results using different error tolerances of the H-matrices. All the cases show a good agreement, although the timing of the event has a discrepancy. For instance, the timing of the event that occurs around |$t\ = \ 650$| yr has 5 yr difference between |${\varepsilon }_{\mathrm{ ACA}} = {10}^{ - 2}\ $| and |${10}^{ - 3}$| and 0.5 yr between |${\varepsilon }_{\mathrm{ ACA}} = {10}^{ - 3}\ $| and |${10}^{ - 4}$|⁠. Ohtani et al. (2011) documented a larger discrepancy in the timing of the event than that observed here. We suspect that the non-uniform |${d}_c$| distribution (and thus non-uniform |${L}_b/{\rm{\Delta }}s$| distribution) of Ohtani's model might be the cause of this discrepancy. Galvez et al. (2020) also reported that a much smaller |${\varepsilon }_{\mathrm{ ACA}} = {10}^{ - 8}\ $| is necessary to match the exact solution using a highly heterogeneous |${d}_c$| distribution.

Figure 8.

The time evolution of mean friction coefficient on the simulations using different error tolerance of the ACA in constructing a lattice H-matrix.

Open in new tab Download slide

4.4 Memory usage of H-matrices and lattice H-matrices

As in Ohtani et al. (2011), we investigate the compression efficiencies of the normal and lattice H-matrices. Fig. 9 shows the memory size of the normal H-matrices as a function of the number of elements. We fix the fault geometry and change the element size to vary the number of elements. We confirm a roughly |$O( {N{\rm{log}}N} )$| dependence on the memory size for both the shear and normal stresses. For shear stresses and rectangular meshes, the compressibility against the original dense matrix is 8 per cent for |$N\ = \ 16\,000$| and 0.7 per cent for |$N\ = \ 400\,000$|⁠. Triangular meshes have larger memories than rectangular elements. This is presumably caused by the slightly non-uniform element sizes in unstructured meshes, which lowers the efficiency of the LRA of the submatrices.

$Memory sizes of the normal H matrix with respect to the number of elements. The memory size of the dense matrix ($O( {{N}^2} )$) and an $O( {N{\rm{log}}N} )$ slope are also shown as a reference.$

Figure 9.

Memory sizes of the normal H matrix with respect to the number of elements. The memory size of the dense matrix (⁠|$O( {{N}^2} )$|⁠) and an |$O( {N{\rm{log}}N} )$| slope are also shown as a reference.

Open in new tab Download slide

Next, we measure the memory size of lattice H-matrices by varying the number of MPI processes (in principle, the memory size of the normal H-matrices does not depend on the number of MPI processes). We set |$q\ = \ 4$| except for |${N}_p = \ 1$|⁠. As expected, the overall memory size of the lattice H-matrices increases with the number of MPI processes because of the smaller off-diagonal block sizes (Fig. 10a). However, the maximum memory among the MPI processes is the bottleneck in the computation of HMVM, which is plotted in Fig. 10(b). For |${N}_p$|<1000, normal H-matrices are superior because the memory sizes of the diagonal MPI processes in the process grid tend to be large in lattice H-matrices. For |${N}_p$|>1000, the lattice H-matrices show better load balance. The saturation of the maximum memory in normal H-matrices over 200 processes corresponds to the submatrix that has the largest memory.

Figure 10.

(a) Overall memory sizes of the normal and lattice H-matrices. (b) Maximum memory size among MPI processes. The case for shear stress and 100 000 rectangular elements.

Open in new tab Download slide

4.5 Execution time and parallel scalability

Here we measure the execution time of the numerical simulations. All measurements were performed in Oakforest-PACS(OFP) at the University of Tokyo, which is equipped with an Intel^® Xeon Phi™ 7250 (68 cores, 1.4 GHz) and 96 GB(DDR4) memory in addition to 16 GB(MCDRAM) memory. The OFP system utilizes Intel^® Omni-Path for the interconnect network, which has a link throughput of 100 Gbps. We used 64 cores per CPU node. We also used an Intel Fortran compiler with the -O3 optimization option and an Intel MPI Library. All results are flat MPI parallelization.

For lattice H-matrices, we measure the dependence of the number of elements on the execution time of 100 time steps using 100 and 900 MPI processes (Fig. 11). As expected, we confirm an |$O( {N\mathrm{ log}N} )$| or slightly steeper increase in the execution time.

$Number of elements versus execution time of 100 time steps with the lattice H-matrices. $O( {N{\rm{log}}N} )$ curve is also shown as a reference.$

Figure 11.

Number of elements versus execution time of 100 time steps with the lattice H-matrices. |$O( {N{\rm{log}}N} )$| curve is also shown as a reference.

Open in new tab Download slide

Next, we measure parallel scalability (Fig. 12). Our simulation using lattice H-matrices shows a consistent sublinear acceleration beyond 30 000 cores in the case of |$N\ = \ 400\,000$|⁠. We also show the result of the normal H-matrices, which exhibit an almost linear acceleration up to ∼20 cores but rapidly saturate ∼100 cores. The speed-down over 100 cores is caused by the increase in the communication cost, which is proportional to |${N}_p$|⁠.

Figure 12.

Parallel scalability when 100 time steps are performed (N = 400 000 rectangular elements).

Open in new tab Download slide

By comparing the two methods, the normal H-matrices are faster by up to a few hundred MPI processes. The deceleration of lattice H-matrices from normal H-matrices occurred because the maximum memory for an MPI process is larger than that of normal H-matrices, as shown in the previous section (Fig. 10b). The lattice H-matrices outperform normal H-matrices beyond a few 100 s of cores owing to the reduction in the communication cost. We do not observe the saturation of the computation speed for lattice H-matrices, even with more than 10 000 cores. Fig. 13 shows how a large fraction of the computation time is used in the HMVM in lattice H-matrices. The ratio of HMVM decreases with an increase in MPI processes, but it is always above 90 per cent. From this figure, we expect a further acceleration in performance using additional processors.

$The fraction of calculation time of HMVM over a time step for N = 400 000.$

Figure 13.

The fraction of calculation time of HMVM over a time step for N = 400 000.

Open in new tab Download slide

5 DISCUSSION AND CONCLUSIONS

In this study, we developed a method for earthquake sequence simulations with BEM using normal and lattice H-matrices. This method is highly flexible with fault geometry. Numerical experiments were conducted in a 3-D non-planar thrust fault to demonstrate the accuracy of our method in terms of convergence with decreasing mesh sizes. To our best knowledge, this work is the first comparison of triangular and rectangular discretizations in the context of earthquake sequence simulations. Our numerical simulation using a curved thrust fault showed complex patterns in earthquake sequences, which motivates us to conduct further studies focusing on earthquake physics. Our code can also be applied to natural fault systems worldwide and is potentially highly useful in physics-based earthquake hazard assessment.

In our numerical experiments, we confirmed |$O( {N{\rm{log}}N} )$| complexity for the execution time for lattice H-matrices over 100 000 elements. Ohtani et al. (2011) observed a more rapid increase, and this was due to the increase of the rank of far off-diagonal blocks of the H-matrices. Hence, they had to set some artificial upper limit of the rank of submatrices to make large-scale simulations tractable. On the other hand, no matter how large N was increased, any rapid increase of the rank was not observed in our simulations, even though our curved fault geometry is more complex than the planar fault model used by them. This might be due to the difference in the kernel: Ohtani et al. (2011) used Comninou & Dundurs (1975), while we used Nikkhoo & Walter (2015), which removed artefacts in previous solutions.

One question is whether this |$O( {N{\rm{log}}N} )$| complexity of our result is maintained for further complex geometries, such as rough faults and/or fault networks (Ozawa & Ando 2021). Additionally, inhomogeneous meshes can be used if the required resolution is not uniform due to spatial variation in friction and stress conditions. The use of inhomogeneous meshes might change the compressibility of the dense matrices even for planar faults. Further studies are necessary to answer these questions.

We evaluated the parallel scalability of our simulation code using a supercomputer Oakforest-PACS. The lattice H-matrices overcame the high communication costs between MPI processes and enabled efficient computation using a large number of cores. The maximum computation speed for the lattice H-matrices was greater than 10 times faster compared to the normal H-matrices. However, the lattice H-matrices were not as efficient as the normal H-matrices for a small number of cores. Thus, the lattice H-matrices should be used especially when a large number of CPUs is available.

Although we only performed flat-MPI simulations, we expect further acceleration using OpenMP and MPI hybrid parallelization. Hybrid parallelization is especially important for extremely large (N > 1000 000) problems, as only a few MPI processes can be used per CPU node because of memory limitations that cannot be distributed, such as the information of the coordinates.

ACKNOWLEDGEMENTS

This work is supported by JSPS KAKENHI [grant nos 19J21676, 21H03447 and 19K04031], the ‘Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures’ and ‘High Performance Computing Infrastructure’ in Japan (Project ID: jh210023-NAH) and MEXT under its Earthquake and Volcano Hazards Observation and Research Program. Numerical simulations were executed in Oakforest-PACS and Wisteria, supercomputer systems at University of Tokyo.

DATA AVAILABILITY

All the simulation results are obtained using open-source software HBI (https://github.com/sozawa94/hbi).

REFERENCES

Barall

M.

,

Tullis

T. E.

,

2016

.

The performance of triangular fault elements in earthquake simulators

,

Seismol. Res. Lett.

,

87

(

1

),

164

–

170

.

Barbot

S.

,

2021

.

A spectral boundary‐integral method for quasi‐dynamic ruptures of multiple parallel faults

,

Bull. seism. Soc. Am.

,

111

,

1614

–

1630

.

10.1016/j.tecto.2019.228171

Bebendorf

M.

,

2000

.

Numerische Mathematik Approximation of boundary element matrices

,

Numer. Math.

,

86

,

565

–

589

.

Bebendorf

M.

,

Kriemann

R.

,

2005

.

Fast parallel solution of boundary integral equations and related problems

,

Comput. Visual. Sci.

,

8

,

121

–

135

.

10.1007/s00791-005-0001-x

Bonnet

M.

,

1999

.

Boundary Integral Equation Methods for Solids and Fluids

,

John Wiley and Sons

.

10.1023/A:1004795120236

Borm

S.

,

Grasedyck

L.

,

Hackbusch

W.

,

2006

.

Hierarchical matrices

, in

Hierarchical Matrices (Lecture Note)

.

Max-Plunk-Institut fur Mathematik

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Bradley

A. M.

,

2014

.

Software for efficient static dislocation–traction calculations in fault simulators

,

Seismol. Res. Lett.

,

85

(

6

),

1358

–

1365

.

Cattania

C.

,

2019

.

Complex earthquake sequences on simple faults

,

Geophys. Res. Lett.

,

46

(

17-18

),

10384

–

10393

.

Chaillat

S.

,

Desiderio

L.

,

Ciarlet

P.

,

2017

.

Theory and implementation of H-matrix based iterative and direct solvers for Helmholtz and elastodynamic oscillatory kernels

,

J. Comput. Phys.

,

351

,

165

–

186

.

10.1016/j.jcp.2017.09.013

Comninou

M.

,

Dundurs

J.

,

1975

.

The angular dislocation in a half space

,

J. Elasticity

,

5

(

3

),

203

–

216

.

Dieterich

J. H.

,

1979

.

Modeling of rock friction 1. Experimental results and constitutive equations

,

J. geophys. Res.: Solid Earth

,

84

(

B5

),

2161

–

2168

.

10.1029/JB084iB05p02161

Erickson

B. A.

et al. ,

2020

.

The community code verification exercise for simulating sequences of earthquakes and aseismic slip (seas)

,

Seismol. Res. Lett.

,

91

(

2A

),

874

–

890

.

Galvez

P.

,

Somerville

P.

,

Petukhin

A.

,

Ampuero

J. P.

,

Peter

D.

,

2020

.

Earthquake cycle modelling of multi-segmented faults: dynamic rupture and ground motion simulation of the 1992 M w 7.3 Landers Earthquake

,

Pure appl. Geophys.

,

177

(

5

),

2163

–

2179

.

10.1007/s00024-019-02228-x

Hackbusch

W.

,

1999

.

A sparse matrix arithmetic based on H-atrices. Part I: introduction to H-matrices

,

Computing

,

62

(

2

),

89

–

108

.

10.1007/s006070050015

Heimisson

E. R.

,

2020

.

Crack to pulse transition and magnitude statistics during earthquake cycles on a self-similar rough fault

,

Earth planet. Sci. Lett.

,

537

,

116202

.

10.1016/j.epsl.2020.116202

Hori

T.

,

Kato

N.

,

Hirahara

K.

,

Baba

T.

,

Kaneda

Y.

,

2004

.

A numerical simulation of earthquake cycles along the Nankai Trough in southwest Japan: lateral variation in frictional property due to the slab geometry controls the nucleation position

,

Earth planet. Sci. Lett.

,

228

(

3–4

),

215

–

226

.

10.1016/j.epsl.2004.09.033

Hoshino

T.

,

Ida

A.

,

Hanawa

T.

,

Nakajima

K.

,

2018

.

Load-balancing-aware parallel algorithms of H-matrices with adaptive cross approximation for GPUs

, in

2018 IEEE International Conference on Cluster Computing (CLUSTER)

(pp.

35

–

45

.).

10.1109/CLUSTER.2018.00016

Hyodo

M.

,

Hori

T.

,

Kaneda

Y.

,

2016

.

A possible scenario for earlier occurrence of the next Nankai earthquake due to triggering by an earthquake at Hyuga-nada, off southwest Japan 4

,

Seismol. Earth Planets Space

,

68

(

6

),

doi: 10.1186/s40623-016-0384-6

.

10.1186/s40623-016-0384-6

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Ida

A.

,

2018

.

Lattice h-matrices on distributed-memory systems

,

Proceedings—2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

, pp.

389

–

398

.,

IEEE

.

10.1109/IPDPS.2018.00049

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Ida

A.

,

Ataka

T.

,

Furuya

A.

,

2020

.

Lattice H-matrices for massively parallel micromagnetic simulations of current-induced domain wall motion

,

IEEE Trans. Magn.

,

56

(

4

),

18

–

21

.

10.1109/TMAG.2019.2959349

Ida

A.

,

Iwashita

T.

,

Mifune

T.

,

Takahashi

Y.

,

2014

.

Parallel hierarchical matrices with adaptive cross approximation on symmetric multiprocessing clusters

,

J. Inform. Process.

,

22

,

642

–

650

.

10.2197/ipsjjip.22.642

Ida

A.

,

Iwashita

T.

,

Ohtani

M.

,

Hirahara

K.

,

2015

.

Improvement of hierarchical matrices with adaptive cross approximation for large-scale simulation

,

J. Inform. Process.

,

23

(

3

),

366

–

372

.

10.2197/ipsjjip.23.366

Ida

A.

,

Nakashima

H.

,

Kawai

M.

,

2018

.

Parallel hierarchical matrices with block low-rank representation on distributed memory computer systems

, in

ACM International Conference Proceeding Series

, pp.

232

–

240

.,

IEEE

.

Jiang

J.

,

Erickson

B. A.

,

Lambert

V. R.

,

Ampuero

J. P.

,

Ando

R.

,

Barbot

S. D.

,

van Dinther

Y.

et al. ,

2022

.

Community-driven code comparisons for three-dimensional dynamic modeling of sequences of earthquakes and aseismic slip

,

J. Geophys. Res.: Solid Earth

,

127

(

3

),

doi: 10.1029/2021JB023519

.

10.1029/2021JB023519

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Kato

N.

,

2003

.

Repeating slip events at a circular asperity: numerical simulation with a rate- and state- dependent friction law

,

Bull. Earthq. Res. Inst.

,

78

,

151

–

166

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Keyes

D. E.

,

Ltaief

H.

,

Turkiyyah

G.

,

2020

.

Hierarchical algorithms on hierarchical architectures

,

Philos. Trans. R. Soc. A

,

378

(

2166

),

1

–

16

.

10.1098/rsta.2019.0055

Lapusta

N.

,

Liu

Y.

,

2009

.

Three-dimensional boundary integral modeling of spontaneous earthquake sequences and aseismic slip

,

J. geophys. Res.: Solid Earth

,

114

(

9

),

doi: 10.1029/2008JB005934

.

10.1029/2008JB005934

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Lapusta

N.

,

Rice

J. R.

,

Ben-Zion

Y.

,

Zheng

G.

,

2000

.

Elastodynamic analysis for slow tectonic loading with spontaneous rupture episodes on faults with rate- and state-dependent friction

,

J. geophys. Res.: Solid Earth

,

105

(

B10

),

23765

–

23789

.

Luo

Y.

,

Ampuero

J. P.

,

Galvez

P.

,

van den Ende

M.

,

Idini

B.

,

2017

.

QDYN: a Quasi-DYNamic earthquake simulator (v1. 1), Zenodo

.

doi: 10.5281/zenodo.322459

.

10.5281/zenodo.322459

OpenURL Placeholder Text

WorldCat

Crossref

Nikkhoo

M.

,

Walter

T. R.

,

2015

.

Triangular dislocation: an analytical, artefact-free solution

,

Geophys. J. Int.

,

201

(

2

),

1119

–

1141

.

Ohtani

M.

,

Hirahara

K.

,

Hori

T.

,

Hyodo

M.

,

2014

.

Observed change in plate coupling close to the rupture initiation area before the occurrence of the 2011 Tohoku earthquake: implications from an earthquake cycle model

,

Geophys. Res. Lett.

,

41

(

6

),

1899

–

1906

.

Ohtani

M.

,

Hirahara

K.

,

Takahashi

Y.

,

Hori

T.

,

Hyodo

M.

,

Nakashima

H.

,

Iwashita

T.

,

2011

.

Fast computation of quasi-dynamic earthquake cycle simulation with Hierarchical Matrices

,

Proc. Comput. Sci.

,

4

,

1456

–

1465

.

10.1016/j.procs.2011.04.158

Okada

Y.

,

1992

.

Internal deformation due to shear and tensile faults in a half-space

,

Bull. seism. Soc. Am.

,

82

(

2

), http://pubs.geoscienceworld.org/ssa/bssa/article-pdf/82/2/1018/2707430/BSSA0820021018.pdf.

10.1785/BSSA0820021018

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Ozawa

S.

,

Ando

R.

,

2021

.

Mainshock and aftershock sequence simulation in geometrically complex fault zones

,

J. geophys. Res.: Solid Earth

,

126

(

2

),

doi: 10.1029/2020JB020865

.

10.1029/2020JB020865

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Press

W. H.

,

Teukolsky

S. A.

,

Vetterling

W. T.

,

Flannery

B. P.

,

2007

.

Numerical Recipes 3rd edition: The Art of Scientific Computing

.

Cambridge University Press

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Qiu

Q.

et al. ,

2016

.

The mechanism of partial rupture of a locked megathrust: the role of fault morphology

,

Geology

,

44

(

10

),

875

–

878

.

Rice

J. R.

,

1993

.

Spatio-temporal complexity of slip on a fault

,

J. geophys. Res.

,

98

(

B6

),

9885

–

9907

.

Rice

J. R.

,

Lapusta

N.

,

Ranjith

K.

,

2001

.

Rate and state dependent friction and the stability of sliding between elastically deformable solids

,

J. Mech. Phys. Solids

,

49

(

9

),

1865

–

1898

.

10.1016/S0022-5096(01)00042-4

Romanet

P.

,

2017

.

Fast algorithms to model quasi-dynamic earthquake cycles in complex fault networks

,

PhD thesis.

https://harshasbhat.github.io/files/Romanet2017a.pdf.

Romanet

P.

,

Ozawa

S.

,

2022

.

Fully dynamic earthquake cycle simulations on a nonplanar fault using the spectral boundary integral element method (sBIEM)

,

Bull. seism. Soc. Am.

,

112

(

1

),

78

–

97

.

Rubin

A. M.

,

Ampuero

J. P.

,

2005

.

Earthquake nucleation on (aging) rate and state faults

,

J. geophys. Res.: Solid Earth

,

110

(

11

),

1

–

24

.

.

Ruina

A.

,

1983

.

Slip instability and state variable friction laws

,

J. geophys. Res.: Solid Earth

,

88

(

B12

),

10359

–

10370

.

10.1029/JB088iB12p10359

Sato

D. S.

,

Ando

R.

,

2021

.

A log-linear time algorithm for the elastodynamic boundary integral equation method

,

Eng. Anal. Boundary Elem.

,

133

,

407

–

450

.

10.1016/j.enganabound.2021.08.026

Segall

P.

,

2010

.

Earthquake and volcano deformation

, in

Earthquake and Volcano Deformation

.

Princeton University Press

.

10.1515/9781400833856

Thompson

T.

,

Meade

B.

,

2019

.

Earthquake cycle modeling of the Cascadia subduction zone

,

EarthArXiv preprint

,

doi: 10.31223/osf.io/xzhuk

.

10.31223/osf.io/xzhuk

Tse

S. T.

,

Rice

J. R.

,

1986

.

Crustal earthquake instability in relation to the depth variation of frictional slip properties

,

J. geophys. Res.: Solid Earth

,

91

(

B9

),

9452

–

9472

.

10.1029/JB091iB09p09452

Yu

H.

,

Liu

Y.

,

Yang

H.

,

Ning

J.

,

2018

.

Modeling earthquake sequences along the Manila subduction zone: effects of three-dimensional fault geometry

,

Tectonophysics

,

733

(

August 2017

),

73

–

84

.

10.1016/j.tecto.2018.01.025

Zaspel

P.

,

2019

.

Algorithmic patterns for H-Matrices on many-core processors

,

J. Sci. Comput.

,

78

(

2

),

1174

–

1206

.

10.1007/s10915-018-0809-4

Author notes

Now at: Department of Geophysics, Stanford University, Stanford, USA

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
October 2022	44
November 2022	90
December 2022	80
January 2023	31
February 2023	57
March 2023	43
April 2023	21
May 2023	34
June 2023	40
July 2023	28
August 2023	22
September 2023	31
October 2023	25
November 2023	33
December 2023	18
January 2024	43
February 2024	39
March 2024	67
April 2024	39
May 2024	62
June 2024	37
July 2024	52
August 2024	42
September 2024	44
October 2024	67
November 2024	48
December 2024	36
January 2025	50
February 2025	44
March 2025	52
April 2025	39

Article Contents

Large-scale earthquake sequence simulations on 3-D non-planar faults using the boundary element method accelerated by lattice H-matrices

SUMMARY

1 INTRODUCTION

2 METHOD OF EARTHQUAKE SEQUENCE SIMULATIONS ON 3-D NON-PLANAR FAULTS

2.1 Boundary element method

2.2 Governing equations

3 H-MATRICES

3.1 H-matrices

3.1.1 Block partitioning

3.1.2 Low-rank approximation

3.1.3 H-matrix and vector multiplication

3.1.4 Parallel earthquake sequence simulation using H-matrices

3.2 Lattice H-matrices

4 NUMERICAL EXPERIMENTS

4.1 Problem setting

4.2 Simulation results and convergence test

4.3 Error due to low-rank approximation

4.4 Memory usage of H-matrices and lattice H-matrices

4.5 Execution time and parallel scalability

5 DISCUSSION AND CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

REFERENCES

Author notes

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Large-scale earthquake sequence simulations on 3-D non-planar faults using the boundary element method accelerated by lattice H-matrices

SUMMARY

1 INTRODUCTION

2 METHOD OF EARTHQUAKE SEQUENCE SIMULATIONS ON 3-D NON-PLANAR FAULTS

2.1 Boundary element method

2.2 Governing equations

3 H-MATRICES

3.1 H-matrices

3.1.1 Block partitioning

3.1.2 Low-rank approximation

3.1.3 H-matrix and vector multiplication

3.1.4 Parallel earthquake sequence simulation using H-matrices

3.2 Lattice H-matrices

4 NUMERICAL EXPERIMENTS

4.1 Problem setting

4.2 Simulation results and convergence test

4.3 Error due to low-rank approximation

4.4 Memory usage of H-matrices and lattice H-matrices

4.5 Execution time and parallel scalability

5 DISCUSSION AND CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

REFERENCES

Author notes

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only