-
PDF
- Split View
-
Views
-
Cite
Cite
Lukas Holbach, Michael Gurnis, Georg Stadler, A Bayesian level set method for identifying subsurface geometries and rheological properties in Stokes flow, Geophysical Journal International, Volume 235, Issue 1, October 2023, Pages 260–272, https://doi.org/10.1093/gji/ggad220
- Share Icon Share
SUMMARY
We aim to simultaneously infer the shape of subsurface structures and material properties such as density or viscosity from surface observations. Modelling mantle flow using incompressible instantaneous Stokes equations, the problem is formulated as an infinite-dimensional Bayesian inverse problem. Subsurface structures are described as level sets of a smooth auxiliary function, allowing for geometric flexibility. As inverting for subsurface structures from surface observations is inherently challenging, knowledge of plate geometries from seismic images is incorporated into the prior probability distributions. The posterior distribution is approximated using a dimension-robust Markov-chain Monte Carlo sampling method, allowing quantification of uncertainties in inferred parameters and shapes. The effectiveness of the method is demonstrated in two numerical examples with synthetic data. In a model with two higher-density sinkers, their shape and location are inferred with moderate uncertainty, but a trade-off between sinker size and density is found. The uncertainty in the inferred is significantly reduced by combining horizontal surface velocities and normal traction data. For a more realistic subduction problem, we construct tailored level-set priors, representing “seismic” knowledge and infer subducting plate geometry with their uncertainty. A trade-off between thickness and viscosity of the plate in the hinge zone is found, consistent with earlier work.
1 INTRODUCTION
Imaging structures at depth, in the crust, lithosphere and mantle and constraining their mechanical properties is a major challenge in geodynamics. While seismic tomography provides images of the lithosphere and mantle, they tend to be blurry with no direct translation from reconstructed velocity anomalies to rheology. Combining seismic images as prior knowledge with models for the mechanical long-term behaviour of the mantle along with consistency between model predictions and observational data is a promising approach. By modelling mantle flow using the Stokes equations, we aim at inferring subsurface structures and material parameters from surface measurements of plate velocities and normal tractions while incorporating seismic information as prior knowledge.
Computations with varying geometries are challenging, in particular when the topology may change. For many physics simulations, level set and phase field methods have been established as flexible and efficient tools accomplishing this (e.g. Osher & Sethian 1988; Boettinger et al. 2002). Inferring geometries from observational data is particularly challenging when access to data is limited and if one is additionally interested in quantifying uncertainties in the inferred parameters. Various approaches for inferring geometric structures have been explored, including (i) inverting for fields, for example, Worthen et al. (2014), (ii) inferring constant parameters in pre-defined geometries, for example, Baumann & Kaus (2015), (iii) explicitly parametrizing geometries, for example, Carpio et al. (2020) and (iv) using level-set parametrizations, for example, Kadu et al. (2017); Muir & Tsai (2019).
When aiming to identify high-dimensional parameter fields in geodynamics, the results often suffer from blurry edges due to the ill-posedness of the problem (Worthen et al. 2014). Therefore, one may resort to pre-defined regions in which one infers spatially restrictive constants (Baumann & Kaus 2015; Reuber et al. 2020). While this approach does not allow one to infer geometric features, it reduces the dimension and thus allows for quantification of uncertainties using sampling methods. Alternatively, one can rely on parametrizations of geometric structures. These include explicit parametrizations of geometric interfaces (Carpio et al. 2020), for which a moderate number of parameters may be sufficient. However, this approach limits geometric flexibility. Implicit parametrizations such as describing geometric structures as level sets of an auxiliary function has become a particularly popular tool for geometric inverse problems due to the shape flexibility it provides (Santosa 1996).
In the context of seismic imaging and full waveform inversion, parametric level set methods have been employed to identify subsurface salt bodies (Kadu et al. 2017), whereas a non-parametric level set method has been used to improve image resolution in travel-time tomography (Muir & Tsai 2019). Additionally, the level-set technique has been applied to joint inversion of density contrast and seismic slowness using gravity and traveltime data (Li & Qian 2016). The above papers deal with a purely deterministic setting and do not allow for quantification of uncertainties. Recently, methods to estimate geometric uncertainties have been applied to level-set based traveltime tomography (Muir et al. 2022). In general, Bayesian inference is a powerful tool to infer geometric uncertainties. In this approach, prior beliefs are stated in terms of probabilities, and the solution to the inverse problem is not one particular shape but instead a probability distribution for the shapes. On the other hand, the curse of dimensionality typically forces one to either work in low dimensions or use dimension-robust inference methods, which is the approach taken here.
Some geodynamic inverse studies have assumed the geometry of the present-day subsurface (e.g., mantle structure) using adjoints of the Stokes equations while employing surface velocity measurements (Ratnaswamy et al. 2015; Rudi et al. 2022). These studies have proven to be efficient at recovering rheological parameters and the covariance between them. However, the models are limited in their ability to decipher the trade-offs between rheology and density or assessing the role of uncertain geometry. Other studies have attempted to recover geometry in the geological past through an integration of the convection equations backward in time (Bunge et al. 2003; Liu et al. 2008).
Here, we build on Iglesias et al. (2016) and use flexible geometric priors for binary functions with parameter-to-observable maps that are expensive to evaluate. We propose a method to construct tailored level-set priors with adjusted mean and variance representing seismic knowledge, building a bridge between seismic and mechanical models. Then, we study the trade-offs between geometry, density and viscosity using a Bayesian framework in which dimension-robust sampling methods are employed to approximate the posterior distribution and quantify uncertainties in the inferred shapes and material parameters. Finally, we investigate how informative different observation types are and demonstrate the benefits of combining them.
2 FORWARD PROBLEM
Over long timescales, the slow deformation of rock can be modelled as an incompressible fluid modeled by the steady-state Stokes equations on a spatial domain Ω:
where the Boussinesq approximation is employed. Here, |$\boldsymbol{\sigma }$| is the stress tensor, ρ the density field and g the gravitational acceleration vector. The velocity and pressure fields are denoted by |$\boldsymbol{u}$| and p, respectively, |$\dot{\boldsymbol{\varepsilon }}(\boldsymbol{u}) = \tfrac{1}{2}(\nabla \boldsymbol{u}+ \nabla \boldsymbol{u}^{\mathsf {T}})$| is the strain rate tensor and |$\rm{\mathbf {I}}$| the second-order identity tensor. We assume the viscosity η to be independent of the velocity and pressure such that the above system of partial differential equations (PDEs) is linear. However, the viscosity may be spatially varying. For illustration, we focus on a bounded rectangular domain |$\Omega \subset { {\mathbb {R}}}^2$|.
In order to close the system, we need to define appropriate boundary conditions. Let ∂Ω denote the boundary of Ω. We consider no normal flow and free-slip boundary conditions everywhere on ∂Ω:
where |$\boldsymbol{n}$| denotes the unit-length outer normal vector, and |$\boldsymbol{T}:= \rm{\mathbf {I}}- \boldsymbol{n}\otimes \boldsymbol{n}$| the tangential projection on ∂Ω with the outer product ⊗, that is , |$\boldsymbol{n}\otimes \boldsymbol{n}= \boldsymbol{n}\boldsymbol{n}^T$|.
Since we discretize the governing equations using a finite element method (see Section 6), we require the weak form of the eqs (1) and (2). With the free-slip boundary conditions, the pressure is only unique up to a constant. To eliminate this additional degree of freedom, we consider only square-integrable pressure fields with zero mean. Let
Multiplying eqs (1) and (2) with test functions |$\boldsymbol{v}\in V$| and |$q\in L^2_0(\Omega )$|, respectively, integrating by parts and adding the equations yields the weak form of the Stokes equations: Find |$(\boldsymbol{u}, p) \in V\times L^2_0(\Omega )$| such that
for all |$(\boldsymbol{v}, q) \in V\times L^2_0(\Omega )$|. The boundary integral in Green’s first identity vanishes due to the free-slip boundary conditions eqs (4) and (5) and thus does not occur in eq. (7). Standard theory about linear saddle point problems (e.g. Girault & Raviart 1986) yields existence of a unique solution of eq. (7) under the assumption |$\rho , \eta \in L^{\infty }_{+}(\Omega )$|, that is, the density and viscosity are essentially bounded from above and below by positive constants.
3 INVERSE PROBLEM
Although mantle structures can be inferred with seismic or with electrical and magnetic imaging methods, rheological parameters cannot be directly determined through measurements and density is poorly determined at regional and smaller scales. Consequently, we estimate rheology and density by solving an inverse problem. More precisely, we are interested in recovering a vector m of unknown parameters (e.g. density, viscosity, etc.) from a finite number of noisy measurements |$\mathbf {y}\in { {\mathbb {R}}}^{d}$|, that is,
where |${\boldsymbol{\delta }}\in { {\mathbb {R}}}^{d}$| models measurement and model errors and |$\mathcal {G}$| is the parameter-to-observable map. In our case, |$\mathcal {G}$| involves solving the Stokes equations in their weak form eq. (7) given the parameters m and evaluating the solution at the measurement locations. Inverse problems governed by PDEs are ill-posed, that is, they do not have a unique solution or small perturbations in the data may lead to large differences in the reconstructions (e.g. Hanke 2017). There are several ways of dealing with this issue, for example, regularizing the problem by incorporating additional penalty terms in a deterministic inversion (e.g. Engl et al. 1996). Instead, we will overcome this issue by formulating the inverse problem in the Bayesian framework in Section 3.1, which has the additional advantage of providing information about uncertainties in the parameters.
One of our aims is to investigate the effect of different data types on the inversion. In all calculations, the data y are observed on the upper boundary of the domain, which corresponds to the Earth’s surface. We will consider two data types individually and in combination: (i) horizontal velocities u1 and (ii) normal traction |$\sigma _{{\boldsymbol{n}}}= \boldsymbol{n}\cdot \boldsymbol{\sigma }\boldsymbol{n}$|, which is related to dynamic topography. We assume that the normal traction has zero mean as topography in practice is measured with respect to some reference.
3.1 Bayesian inversion
We briefly introduce Bayesian inversion as applied here, but a more detailed introduction can be found in Kaipio & Somersalo (2005). In this approach, all quantities in eq. (8) are treated as random variables and we assume some prior knowledge on our parameters of interest in the form of probability distributions. We assume all components of the parameter vector m to be mutually independent and denote the joint Gaussian prior distribution |$\mathbf {m}\sim \mu _0= \mathcal {N}(\mathbf {m}_0, \mathcal {C}_0)$|. Additionally, we assume that the noise vector is independent of m and normally distributed with mean zero and covariance matrix Γ, |${\boldsymbol{\delta }}\sim \mathcal {N}(\mathbf {0}, \mathbf {\Gamma })$|. The goal of Bayesian inversion is to find and describe the posterior probability distribution μy, which is a conditional distribution of the parameters m given the data y and prior distribution μ0. This posterior distribution is defined through Bayes’ law. Since our inversion parameter vector m contains a function, the infinite-dimensional version of Bayes’ law is needed (e.g. Stuart 2010):
where
and we dropped the normalization constant in eq. (9) because it is not needed for the methods used here. Note that |$\mathcal {J}$| is the least-squares data misfit weighted by the inverse noise covariance matrix and thus the posterior distribution takes its largest values around minimizers of |$\mathcal {J}$|, linking Bayesian inversion to deterministic approaches to solving inverse problems.
3.2 Level set parametrization for uncertain geometries
The rheological parameters of interest often exhibit large variation over small length scales. Therefore, we assume them to be piecewise constant functions. As we are primarily concerned with inferring geometric structures, we impose as few restrictions as possible on the interfaces between different structural entities, which we refer to as phases. This is achieved by using a Bayesian version of the level set method that was introduced in Iglesias et al. (2016). The main idea is to describe interfaces as level sets of a smooth function and infer this function from observation data. Note that different from classical level set methods that use the level set function to evolve shapes in some time-dependent or iterative process, here, level set functions are only used to define and sample distributions of geometric subsurface structures. Next, we detail level set parametrizations using the density field ρ as an example.
We denote the indicator function of a set A⊆Ω with |${1\!\!1}_{A}$|, that is,
Let
for some ρi > 0, Ωi⊆Ω, i = 1, …, n, with Ωi∩Ωj = ∅ for i ≠ j. The sets Ωi determine the shape of the different phases. The idea of the level set method is to parametrize these sets with the help of a smooth function |$h:\Omega \longrightarrow { {\mathbb {R}}}$|, the so-called level set function. For given thresholds |$c_i \in { {\mathbb {R}}}$|, i = 0, …, n, we define
such that the shape of the phases is fully determined by the level set function h. This leaves considerable flexibility in describing the geometric structure of ρ and allows us to change the interfaces implicitly by updating h.
The level set function is naturally linked to the physical parameter of interest ρ through the so-called level set map F defined via
where |${\boldsymbol{\rho }} = (\rho _1,\ldots ,\rho _{n})^{\mathsf {T}}$|. This procedure is visualized in Fig. 1. With this parametrization of the density, we can infer the smooth level set function h rather than the piecewise constant function ρ. For clarity, we simplify notation and refer to the adjusted parameter-to-observable map as |$\mathcal {G}$|. Note that by choosing the thresholds ci in eq. (14), an assumption is made on the relative area of the different phases. Furthermore, it is not necessary to specify the phase values, ρi, of ρ a priori. By incorporating these scalar values in the definition of F in eq. (15), they can be included in the parameter vector m for the inverse problem. We will follow this procedure in the computations.

Visualization of the level set map F. (a) Smooth level set function h with different colours indicating different values of the function and the white lines indicating the level set threshold c1 between the two phases. (b) Corresponding piecewise constant density field |$\rho = F(h, {\boldsymbol{\rho }})$|.
As shown in Iglesias et al. (2016), the level set map F and hence the adjusted parameter-to-observable map are discontinuous, leading to difficulties if the problem is treated in a deterministic context. This is another reason for choosing the Bayesian framework for the inverse problem. It also means that the new parameter-to-observable map is nonlinear even if the data depend linearly on the physical parameters of interest. In the following, we focus on linear Stokes equations to demonstrate the method for the sake of simplicity. However, the framework is also applicable to nonlinear rheologies.
3.3 Prior distribution for the level set function
The prior measure on the level set function h is Gaussian, |$\mathcal {N}(h_0, \mathcal {C}_{0,h})$|, with mean h0 and covariance operator |$\mathcal {C}_{0,h}$|. In the function space setting, it is common to choose |$\mathcal {C}_{0,h}$| to be a power of an inverse differential operator. This allows exploitation of fast PDE solvers for sampling from the distribution and yields control over the samples’ regularity (Bui-Thanh et al. 2013; Stuart 2010).
In particular, we define the covariance operator |$\mathcal {C}_{0,h}$| on the domain Ω in terms of the PDE operator |$\mathcal {A}$| as
where |$\operatorname{{\it I}}$| is the identity and Δ the Laplace operator. The use of inverse elliptic operators as covariance operators is motivated by their relationship to Matérn covariance functions, and an empirical relation between τ and α to the correlation length of the resulting random fields can be found in Lindgren et al. (2011). In particular, the constant τ ≥ 0 controls the (inverse) correlation length, whereas α > 0 determines the regularity of the samples. For example, increasing τ leads to small-scale features in realizations of the random field, whereas increasing α yields smoother samples. In the level set context, the latter results in smoother interfaces between the different phases in physical space. Amongst other applications, this type of prior covariance operator has been used in the context of Bayesian level set priors in Dunlop et al. (2016).
For the PDE operator |$\mathcal {A}$|, we employ zero Dirichlet boundary conditions on the bottom part of the boundary, ∂Ωb, and zero Neumann boundary conditions everywhere else:
The implications of these prior modelling choices can be seen in Fig. 1: When the threshold parameter satisfies c1 > 0, the subdomain Ω2 cannot reach the bottom boundary. This is desirable in our applications where we are typically only interested in the upper part of a physical domain and the bottom boundary is artificial and only chosen for computational reasons.
The prior covariance operator |$\mathcal {A}$| is discretized using a finite element method. Defining H ≔ {h ∈ H1(Ω) | h = 0on∂Ωb}, the weak solution h of |$\mathcal {A}h= r$| for arbitrary r ∈ L2(Ω) is defined as follows: Find h ∈ H such that
for all k ∈ H. Existence of a unique solution of eq. (19) follows immediately from the Lax–Milgram theorem (e.g. Evans 1998).
Sampling from the prior distribution requires access to the square root of the covariance operator, |$\mathcal {C}_{0,h}^{1/2} = \mathcal {A}^{-\alpha /2}$|. This is realized by solving systems of the form |$\mathcal {A}h= r$| subsequently α/2 times. Hence, we limit the choices of α to positive even integers.
4 INCORPORATING SEISMIC IMAGES AS PRIOR KNOWLEDGE
Inverting for subsurface structures from surface observations is challenging. Thus, it is advantageous to incorporate any available information about these structures into the prior distribution. For example, seismic images can provide insight into subsurface geometries a priori. Since we do not invert for the physical fields directly, the question arises of how to translate this knowledge into prior information for the level set function. In this section, we propose a method to construct the mean h0 for the prior distribution of the level set function given a seismic image. For the presentation of this idea, we use a density field ρ with two phases and known values |${\boldsymbol{\rho }} = (\rho _1,\rho _2)^{\mathsf {T}}$| as in Fig. 1.
Assume we have a seismic image of our computational domain Ω. Through experimental data or first principle calculations, estimates of the prior density can be made. Let |$\hat{\rho }:\Omega \longrightarrow [\rho _1, \rho _2]$| denote this estimated density field. Our aim is to construct the mean h0 for the level set function in such a way that the expected value of the corresponding binary functions matches the seismic image |$\hat{\rho }$|, that is,
Next, we use eq. (20) to derive a computable expression for h0. For that purpose, we denote the probability density function and cumulative distribution function of the standard normal distribution |$\mathcal {N}(0, 1)$| as φ and Φ, respectively. Furthermore, let |$c:\Omega \times \Omega \longrightarrow { {\mathbb {R}}}$| be the covariance function corresponding to the prior operator |$\mathcal {C}_{0,h}$|. Note that |$h(\mathbf {x}) \sim \mathcal {N}(h_0(\mathbf {x}), c(\mathbf {x}, \mathbf {x}))$| for arbitrary x ∈ Ω, which is equivalent to
In the case with two phases, the level set map simplifies to
where the threshold c1 is given. The inequality h(x) ≤ c1 holds if and only if |$Z\le (c_1 - h_0(\mathbf {x}))/\sqrt{c(\mathbf {x}, \mathbf {x})}$|. Substituting this and eq. (21) into eq. (20) yields
which can be rearranged to obtain the desired prior mean of the level set function:
Note that, in discretized form, the values c(x, x) are the diagonal values of the covariance matrix and h0 can be computed easily.
5 NUMERICAL METHODS
5.1 Discretization and prior distribution
The discretization of the governing equations and the prior distribution are based on the finite element method. In all computations, the mesh consists of triangular elements. Since the domain and local mesh refinement is different for the two models in Sections 6.1 and 6.2, the exact properties are stated there.
For the discretization of the weak form of the Stokes eq. (7), we choose continuous piecewise quadratic basis functions for the velocity components and continuous piecewise linear basis functions for the pressure. This choice of basis functions, also known as Taylor–Hood elements, is stable and a standard choice for the Stokes equations (see e.g. Elman et al. 2014). For computational reasons, we use a different approach for the discretization of the pressure space than stated in Section 2. Instead of enforcing a zero mean on the pressure field, we remove the excess degree of freedom by setting the pressure to zero at one fixed node, resulting in a shift of the pressure field.
We discretize the weak form eq. (19) of the PDE operator underlying the prior covariance using continuous piecewise linear finite elements. More information about the discretization of the relevant inner products can be found in Bui-Thanh et al. (2013). We choose the inverse length scale τ = 14 and the exponent α = 8, amounting in a rather smooth level set function with relatively large correlation length. Choosing (τ, α) constitutes prior modeling, in which one incorporates all prior knowledge into the prior distribution. Practically, to find appropriate (τ, α), we visualize multiple samples from the resulting distribution and aim at choosing parameters (τ, α) such that samples represent potential models for the true parameters . Alternatively, it is possible to consider (τ, α) as hyper-parameters, as in Dunlop et al. (2016), at the cost of increasing the computational requirements. Our implementation is based on FEniCS (Logg et al. 2012), an open-source software library for solving PDEs using the finite element method. FEniCS offers Python interfaces to efficient sparse direct solver tools, such as MUMPS (Amestoy et al. 2001) or SuperLU (Li 2005), which we use for the computations in Section 6.
5.2 Approximating the posterior distribution
Since the parameter-to-observable map |$\mathcal {G}$| is nonlinear, the posterior distribution is non-Gaussian and does not have a closed form. Using Gaussian approximations to the posterior distribution typically requires derivatives of |$\mathcal {G}$|, which are not available here as |$\mathcal {G}$| is discontinuous. Therefore, we rely on sampling methods to explore the distribution. The goal is to find a sequence of samples from the posterior measure μy and use them to compute statistical quantities like expected value or (point-wise) variance of the inversion parameters. In order to construct such a sequence of samples, we employ the pre-conditioned Crank–Nicolson Markov Chain Monte Carlo method (pCN-MCMC; see Cotter et al. 2013), Algorithm 1.
reconditioned Crank-Nicolson Markov Chain Monte Carlo method (pCN-MCMC)
![]() |
![]() |
reconditioned Crank-Nicolson Markov Chain Monte Carlo method (pCN-MCMC)
![]() |
![]() |
This sampling method does not require derivatives of the parameter-to-observable map and is thus applicable to problems using a level-set prior. Another advantage of the pCN-MCMC algorithm is that it is well-defined on infinite-dimensional function spaces, yielding mesh-independent convergence of the Markov chain. However, in practice, one might need a large number of samples for accurate approximations of the posterior distribution, an aspect addressed further in Section 6.1.
As subsequent samples of the Markov chain are correlated, it is important to choose the parameters in Algorithm 1 carefully to obtain a sufficiently large effective sample size. A larger jump parameter β leads to a better exploration of the full space of the posterior distribution but will, in turn, decrease the acceptance rate of the samples, whereas a smaller jump parameter will increase the acceptance rate but might lead to a poor exploration of the space. Both situations result in a slow convergence and large autocorrelation times of the Markov chain. As a compromise, we adjust the jump parameter adaptively, aiming at a statistically desirable acceptance rate of 15–30 per cent (cf. Roberts & Rosenthal 1998).
6 NUMERICAL EXPERIMENTS
We apply the proposed method to two different scenarios with synthetic data: first, one where the data are generated using two sinkers in an otherwise homogeneous rectangular domain. This problem discussed in Section 6.1 is used to develop intuition for the method. Secondly, we use a more realistic geophysical setup with a subduction zone as detailed in Section 6.2. All quantities are non-dimensional. We visualize the results, interpret them in terms of expectations from viscous fluid dynamics and discuss the prospects and limitations of this inference approach.
In both scenarios, the measurements are smoothed point observations taken at some xj, j = 1, …, d, located on the upper part of the boundary, that is, the Earth’s surface. More precisely, defining
the horizontal velocity and mean-zero normal traction measurements are given by
respectively. The exact locations of the measurement points xj, as well as the model setup, used to create synthetic data for the inversion are specified for each model separately. For both models, the synthetic velocity and traction data are corrupted by 5 per cent and 10 per cent uncorrelated relative noise, respectively, to mitigate the “inverse crime” (Kaipio & Somersalo 2005).
6.1 Model 1: Sinkers
The first model consists of two higher-density sinkers in the rectangular domain Ω = [0, 4] × [0, 1], which is discretized with a uniform triangular mesh constructed from 200 squares in x- and 50 squares in y-direction, each of which is split into two triangles (Fig. 2). We aim to recover the geometric structure of the density field ρ, the higher density value ρ2 and the constant background viscosity ηref. Since the velocity is driven by the density difference between ρ1 and ρ2, we consider ρ1 = 1 to be known. To ensure that the density of the sinkers ρ2 is larger than ρ1, we define
and invert for ρlog. For the viscosity, we use ηref = exp (ηlog) and invert for ηlog. As the level set method presented here requires a Gaussian prior for the inversion parameters, ρlog and ηlog, we assume substantial prior uncertainty in ρ2 and ηref. Since the geometric structure of ρ is determined by the level set function h, the full parameter vector is m = (h, ρlog, ηlog), where h is a function and ρlog and ηlog are scalars.

Sinker model setup: density field ρ (in shades of blue) used to generate synthetic observations and corresponding velocity field (arrows). The measurement locations are depicted by red squares . Inversion parameters are the shape of the sinkers, their density, ρ2, and the constant background viscosity, ηref.
In this example, we do not incorporate “seismic” knowledge about the subsurface into the prior mean and choose h0 ≡ 0 as mean for the level set function. The prior mean and variance of the scalar parameters can be found in Table 1. Synthetic data are created with the density field shown in Fig. 2 and the true parameters listed in Table 1, where the observations are taken at 16 equidistant surface points xj indicated by red squares in Fig. 2.
Mean and variance of the Gaussian prior distributions for the scalar parameters and their true values used to create synthetic data for the sinker model.
Model 1: Sinkers . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
ηlog | 4 | 4 | 0 (i.e. ηref = 1) |
Model 1: Sinkers . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
ηlog | 4 | 4 | 0 (i.e. ηref = 1) |
Mean and variance of the Gaussian prior distributions for the scalar parameters and their true values used to create synthetic data for the sinker model.
Model 1: Sinkers . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
ηlog | 4 | 4 | 0 (i.e. ηref = 1) |
Model 1: Sinkers . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
ηlog | 4 | 4 | 0 (i.e. ηref = 1) |
In all computations, we draw five million samples and discard the first million as burn-in. When using Markov chains to explore the posterior distribution, it is important to ensure that the chain is converged and exhibits sufficient mixing to obtain a good approximation of the distribution. For this purpose, we estimate the autocorrelation time for scalar quantities of m and visually inspect their traces. Using the norm of the level set function |$\left\Vert h\right\Vert _{L^2(\Omega )}$| (Fig. 3), we consider the statistical dependence of samples for which autocorrelation drops below 0.1 as negligible. This suggests that roughly every 15Ī000th sample of the chain is statistically independent, which shows that a large number of samples is needed to obtain a reasonable number of statistically independent samples. The trace of |$\left\Vert h\right\Vert _{L^2(\Omega )}$| in Fig. 3 suggests reasonable (but slow) mixing of the chain and is another visualization for the correlation of subsequent samples.

Panel (a) shows autocorrelations for the quantity |$\left\Vert h\right\Vert _{L^2(\Omega )}$|, with the grey region indicating that we consider samples with autocorrelation of 0.1 as statistically independent. Panel (b) shows the trace of the same quantity.
Comparing samples from the prior and posterior distribution of the density field, the horizontal location of the sinkers is recovered well, whereas there is some variance in the vertical location and the sizes of the sinkers (Fig. 4). This is expected since the measurements at the surface are mostly determined by the sinkers’ mass—a larger lower-density sinker excites almost the same surface flow as a smaller higher-density sinker. Furthermore, small artefacts with higher density can appear close to the bottom of the domain. Again, this is due to the observations being taken on the surface and since higher density material leads to less flow when closer to the bottom of the domain due to the no-outflow boundary condition at the bottom boundary.
![Sample density fields from the prior distribution [left two columns, (a)–(f)] and from the posterior distribution [right two columns, (g)–(l)]. Both data types were used in the inversion.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gji/235/1/10.1093_gji_ggad220/1/m_ggad220fig4.jpeg?Expires=1747895148&Signature=Ji0OKHoDxWBKkgnxlynUd9GSnZmV6LwjnDcCoMxaG0MKszUp0XCndgRaXax5yb~7lHt6ClmO8yPTLmyNlllwAkv0C5UHneR49cXG52GYQkIhm4E9rBKPUsnLeZ2fVn8REaToS32p6InDQ27iwxhPPJjLqKfl-7gy91rWAePvevdr~Rf321TM7a0esoLcKwNVT7vPz3GMhK2n4mpzkVvf-fb6wl73i3k3-cjqyU7hoQWX77An9f9YQ4JKVhhEbnG3UDso3dxrQcSshZEqBGQErmfPS-Eivl1ZC24np0eybcvxozOvJ66pylWI4VFVANRTu2YnpU-VNGoE0o0MH2vnIg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Sample density fields from the prior distribution [left two columns, (a)–(f)] and from the posterior distribution [right two columns, (g)–(l)]. Both data types were used in the inversion.
We study the role of different observation data on the recovered density field comparing three different settings, namely (i) using only horizontal velocities, (ii) using only normal tractions and (iii) using both horizontal velocities and normal tractions as data. For cases (i) and (ii), measurements are taken at 32 equidistant points on the surface whereas 16 equidistant points are used for case (iii) such that the total number of measurements is the same for all three cases.
In Fig. 5, we compare the expected values |$\operatorname{\mathbb {E}}\left[F(h, {\boldsymbol{\rho }})\right]$| of the density field and its point-wise variance for the prior distribution and the posterior distributions corresponding to the different observation data. The point-wise prior variance shows that there is no information about the location of the sinkers a priori, whereas the low variance close to the bottom of the domain is due to our choice of the prior that incorporates the knowledge that sinkers are unlikely to be close to the domain bottom. Using only velocity data, only the horizontal locations of the sinkers are recoverable with their size and depth extent remaining largely uncertain. Parameter recovery is better when only normal traction is used as data, although there are uncertainties mainly in the exact shape of the sinkers. Combining both data types yields the best result. Shape and locations of the sinkers are recovered and the uncertainties are reduced significantly. The advantage of combining both data types is further supported by the prior and posterior distributions of the sinkers’ mass (Fig. 6), i.e., the sinker area multiplied by its density. The true mass is recovered with small uncertainty, particularly compared to the cases with only one type of observation data.

Expected value and point-wise variance of the density field’s prior and posterior distributions. The first row shows the prior (a) mean and (b) point-wise variance, the other rows show the posterior mean and point-wise variance for different data types: (c) and (d) only horizontal velocities (V), (e) and (f) only normal traction (T) and (g) and (h) both horizontal velocities and normal traction (V+T).

Prior and posterior distributions of the sinkers’ mass depending on the type of observation data used in the inference.
Finally, the 2-D joint marginal distributions of the background viscosity ηref and the sinkers’ density ρ2 (Fig. 7) are visualized through the prior probability density function (pdf) as well as kernel density estimates of the posterior pdfs. Using both data types, the background viscosity is constrained well (Fig. 7c). As expected, this uncertainty is larger for the case with just velocity data (Fig. 7a). Furthermore, there is a positive correlation between the background viscosity and the density, in accordance with physical expectations. In contrast to these cases, we do not learn much about ηref or ρ2 when using only normal tractions as data (Fig. 7b). However, one should keep in mind that this is a 2-D marginal distribution of a high-dimensional distribution—in fact, using normal tractions as data allowed us to infer the shape and locations of the sinkers reasonably well (Figs 5e and f).

Comparison of joint marginal distributions of the viscosity ηref and density ρ2 for different data types: (a) only horizontal velocities, (b) only normal traction and (c) both horizontal velocities and normal traction. The contour lines indicate highest density regions containing (from dark to light): 10 per cent, 25 per cent, 50 per cent and 75 per cent of the total probability mass.
Note that despite there being substantial uncertainty in the inferred density for all three observation data types (Fig. 7), we observe that the uncertainty in the overall sinker mass is small (Fig. 6). This is due to the different sinker geometries whose size is anticorrelated to the density, as can also be seen from the posterior samples (Figs 4 g–l).
In a similar model setup, we performed computations where, in addition to the geometric structure of the density field, the structure of the viscosity field was also unknown. However, we used one level set function for both fields, coupling their shapes. We then inverted for the sinker viscosity instead of a constant background viscosity. Independent of the data type used, the locations of the sinkers were well recovered (see Fig. A1). On the other hand, neither the sinkers’ density nor viscosity could be well constrained, leading to more uncertainty about the exact size of the sinkers compared to the model presented in this section. The sinkers’ mass, however, was again inferred with low uncertainty.
6.2 Model 2: subduction zone
The second model describes a subduction zone in the domain Ω = [0, 3] × [0, 1], which is discretized with a mesh that is locally refined around the ridge and hinge zone and further refined around the dipping weak zone that represents the mega-thrust fault (Figs 8 and A2). The goal is to recover the shape of the subducting plate, its density and a “weakened” viscosity in the hinge zone.

Model setup for Section 6.2: (a) density field ρ with corresponding velocity field and (b) log-viscosity field log10(η). The measurement locations are shown in (b): normal traction at red squares , horizontal velocities at white squares □. Inversion parameters are the shape of the subducting plate, the subducting plate’s density, ρ2, and the weakened viscosity in the hinge zone, ηw.
As the location and shape of continental plates are typically well constrained, we assume that the properties of the over-riding plate (including the weak zone) are known and invert for both the density field ρ and viscosity field η in the remaining part of the domain. We assume that the geometries of ρ and η coincide, which is accomplished by using a single level set function h for the shape of both fields. Using the same parametrization as in Section 6.1, we write ρ2 = ρ1 + exp (ρlog) with ρ1 = 1 given and invert for ρlog. Furthermore, we consider the background viscosity η1 = 102 to be known and parametrize the subducting plate’s viscosity as
where the reference viscosity ηpl = 105 is given and the prescribed function γ controls the size, location and smoothness of the hinge zone (see Fig. 8b, cf. Rudi 2019). The physical parameter of interest is the minimal “weakened” viscosity in the hinge zone, denoted by ηw = min xη2(x). However, to allow for sufficiently large variance in this value, we invert for wlog, leading to the full parameter vector m = (h, ρlog, wlog).
Synthetic data are created with the density and viscosity fields (Fig. 8) with the true scalar parameters (Table 2). Normal tractions are measured at all 20 squares depicted on the overriding and subducting plates, whereas horizontal velocities are observed at 12 open squares located only on the subducting plate (Fig. 8b) because the continental plate does not move horizontally in our model due do the boundary conditions.
Mean and variance of the Gaussian prior distributions for the scalar parameters and their true values used to create synthetic data for the plate subduction model.
Model 2: Subduction zone . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
wlog | −3.368 | 1.237 | −4.605 (i.e. ηw = 1000) |
Model 2: Subduction zone . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
wlog | −3.368 | 1.237 | −4.605 (i.e. ηw = 1000) |
Mean and variance of the Gaussian prior distributions for the scalar parameters and their true values used to create synthetic data for the plate subduction model.
Model 2: Subduction zone . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
wlog | −3.368 | 1.237 | −4.605 (i.e. ηw = 1000) |
Model 2: Subduction zone . | |||
---|---|---|---|
Parameter . | Prior mean . | Prior variance . | True value . |
ρlog | 0.105 | 1.046 | 0.693 (i.e. ρ2 = 3) |
wlog | −3.368 | 1.237 | −4.605 (i.e. ηw = 1000) |
Prior mean and variance of the scalar parameters are given in Table 2. For this inversion, we construct a non-zero mean h0 for the prior distribution of the level set function as described in Section 4. We also computed cases with a zero prior mean for comparison. After one million samples, even the most likely samples of the chain obtained using the latter approach were not near the true parameter values (Fig. A3) despite fitting the data reasonably well, supporting the need for a tailored level-set prior mean to infer reasonable values.
In all computations, we draw five million samples and discard the first million as burn-in. Again, traces of the norm of the level set function |$\left\Vert h\right\Vert _{L^2(\Omega )}$| were inspected to check for convergence and sufficient mixing. We estimate the autocorrelation time for these computations was about 15 000, similar to the results in Section 6.1.
Comparing prior and posterior mean and point-wise variance of the density field ρ, the shape of the slab as well as its density are recovered well and the point-wise variance is significantly reduced (Fig. 9). At first sight, the tailored prior mean and corresponding point-wise variance (Figs 9 a–b) seem restrictive. However, comparing samples from the prior and posterior distributions (Fig. 10) shows that this is not the case. The prior density samples mostly do not resemble plates with a large variance in the plate’s density ρ2. On the other hand, the shapes of the posterior density samples closely resemble a true subducting plate with less variance in ρ2. As observed in the sinker model in Section 6.1, higher-density artefacts can appear close to the bottom of the domain because their influence on surface measurements is small.

Expected value and point-wise variance of the density field’s prior and posterior distribution. The first row shows the prior (a) mean and (b) point-wise variance, the second row shows the posterior (c) mean and (d) point-wise variance. For this simulation, both plate velocities and normal traction were used as observations.

Sample density fields from the (a)–(h) prior distribution (left two columns) and (i)–(p) posterior distribution (right two columns). Both data types were used for the inversion.
If just velocity data are used, the conditional distribution illustrates well the trade-off between recovered ρ and ηw (Fig. 11). However, if just traction data are used, the density is substantially narrowed to values around the true value, but the viscosity of the hinge zone is not much changed from the prior marginal. By combining the two data types, the recovery of the density (from the traction) allows for the viscosity of the hinge zone to be much better estimated. The final marginal from the combined data is firmly centred on the true value (Fig. 11). Nevertheless, the shape of the plate does differ somewhat from the true shape within the hinge zone with a slight down warping of the top side of the slab (Fig. 9d). In addition, the top side of the slab has a wider zone of higher point-wise variance. This is consistent with earlier work that has shown that plate velocity is expected to go as |$u_p \approx \frac{C_1}{C_2 + C_3\frac{\eta _L D^3}{R^3}}$| where ηL is the viscosity of the plate, D the thickness of the plate, R the radius of curvature of the plate in the hinge zone and C1, C2 and C3 are constants (Conrad & Hager 1999). The posterior variance on the hinge zone viscosity is roughly 4, and this trades off with plate thickness as ≈41/3, consistent with the size of the zone of high uncertainty around the hinge zone (Fig. 9d).

Comparison of joint marginal and conditional distributions for different data types. The contour lines indicate highest density regions containing (from dark to light): 10 per cent, 25 per cent, 50 per cent and 75 per cent of the total probability mass. The first row shows the joint marginal distributions of the viscosity ηw in the hinge zone and density ρ2 of the subducting plate, where the columns indicate the different data types used: (a) only plate velocities, (b) only normal traction and (c) both plate velocities and normal traction. The second row (d)–(f) depicts the conditional distributions of viscosity and density given the true shape of the subducting plate.
We are able to compare the inversion results when different data types are used given the true shape of the subducting plate, that is, the only unknown parameters are ηw and ρ2 (Figs 11 d–f). This reduces the dimension of the parameter space drastically, and as expected in all three cases, the parameters are better constrained than when geometries are uncertain. Using only velocity data, there is a nonlinear positive correlation between the hinge zone viscosity and subducting plate’s density. If the density is higher (and thus a larger force is pulling the plate), the hinge zone viscosity does not need to be as small for the same bending to occur, and vice versa. In the computation with only normal traction data, the parameters are very well constrained. Moreover, the correlation between the parameters changes its sign. As before, combining the data types yields the best result, in this case with rather low uncertainty.
7 DISCUSSION
We have demonstrated the potential of using Bayesian level set methods to infer subsurface structures and material parameters in Stokes flow and quantify their uncertainties. Through construction of tailored level-set priors, we have formulated a method to incorporate geometric knowledge from seismic imaging in the inversion while imposing few restrictions on the specific shapes. The method, furthermore, allows inversion for density and rheological parameter fields, complementing seismic imaging methods.
A focus on linear Stokes equations allowed for a clearer presentation of the underlying ideas. While the method is applicable to models with nonlinear viscosities, this becomes computationally more challenging due to the substantially higher computational cost per sample. Due to the high-dimensional parameter space and the resulting slow decay of autocorrelation times, many samples are needed to approximate the posterior distribution. Since each sample requires solving the Stokes equations, the overall computational cost increases substantially when nonlinear Stokes equations are used. More efficient sampling methods can potentially alleviate the effect to some extent. A Gaussian approximation to the posterior distribution can be employed to enable faster sampling or as a direct approximation to the distribution (Pinski et al. 2015; Bui-Thanh et al. 2013). Although derivative information is typically required for these approaches and the non-differentiable parameter-to-observable map would need to be adjusted, a derivative-free method to generate samples from a Gaussian approximation to the posterior distribution was proposed recently (Garbuno-Inigo et al. 2020).
In one of the presented model problems, we used locally refined meshes to resolve the subduction physics of the setup. This refinement was static, that is, it was kept the same for all samples. An extension of the present method could be to use dynamic refinement that aims at resolving the level set interface, which changes in each sampling step. However, this would increase the cost per MCMC sample substantially, in particular for the case where the viscosity is constant and thus the Stokes discretization matrix only has to be assembled and factorized once, since these factors can be reused for all samples.
In two numerical examples with synthetic data, the shape of two sinkers as well as the shape of a subducting plate were inferred with their uncertainties, along with densities and viscosity parameters. There were important trade-offs: in models with a pair of sinkers, only their mass could be inferred with low uncertainty as variations in their size and density can lead to the same negative buoyancy force. In the hinge zone of the subducting plate, the plate thickness and viscosity have similar effects on the flow and therefore lead to higher uncertainties in this region, which is consistent with earlier work (Conrad & Hager 1999; Ratnaswamy et al. 2015). The results of both models demonstrate the benefits of combining two different surface data types, horizontal (plate) velocities and normal tractions, which reduced the uncertainties significantly compared to using more measurements of the same data type.
A limitation of some optimization methods meant to infer rheological parameters using adjoints of the Stokes equations (Ratnaswamy et al. 2015; Rudi et al. 2022) is that the density field driving the flow is not perfectly known. These methods could be expanded to include a scaling factor between assumed geometry of the temperature (or density) and their magnitude. For subduction zone problems, this is not ideal as plate velocity is partly resisted by forces within the hinge zone. Specifically, within the hinge zone, the resistance from the bending is the product of the effective viscosity of the plate and the cube of the plate thickness normalized by the radius of curvature of the bending (Conrad & Hager 1999; Ribe 2001; Buffet 2006). Plate geometry and radius of curvature are geometric quantities driving the flow and why substantial variance in the recovered viscosity within the hinge zone was found (Fig. 9d). Uncertainty quantification of mantle rheology needs to account for the uncertainty in the radius of curvature and plate thickness within the hinge zone, for the subduction zone problem.
Although sampling is computationally expensive, there are important applications of the method to better understand the dynamics of subduction zones if both surface displacements and dynamic topography estimates are used together with seismic images. The bathymetry of oceanic trenches are an example of dynamic topography (Zhong & Gurnis 1994; Crameri et al. 2017): Trench depth increases with the age of the subducting plate, the shallow dip of the plate interface and the depth extent of the slab (Hilde & Uyeda 1983), trends that are reproducible with dynamic, forward models of subduction (Zhong & Gurnis 1994). However, in models of trench depth and plate convergence, there are important trade-offs between the inverted viscosity of the hinge zone and plate interface with slab dip, shape and density. Consequently, there are opportunities to exploit the Bayesian level set method to understand these relations better with seismic constraints.
There are now numerous constraints on slab shape that could be used in such inversions with level sets. When inverting for the shape of the subducting plate, it became evident that a tailored level-set prior representing “seismic” knowledge is crucial when dealing with more realistic models, as we were not able to obtain meaningful results without a tailored prior mean. Constraints on the prior mean could come from mapping traditional body wave seismic tomographic images (Xue & Allen 2007; Zhao et al. 1992) while prior variance could come from both tomography and detailed constraints on velocity gradients from seismic wave forms (Chen et al. 2007; Chu et al. 2012) or seismic interferometry (Shen & Zhan 2020). Construction of such density priors requires the mapping from seismic velocities to densities, which is often accomplished through thermodynamic relations determined experimentally or from first principle calculations, such as through the framework in Stixrude & Lithgow-Bertelloni (2005). Taken together, there are now sufficient surface data and seismic controls within individual subduction zones to formally infer rheological parameters within the hinge zone and formally quantify uncertainties.
ACKNOWLEDGEMENTS
The authors would like to acknowledge helpful discussions with Matthew Dunlop and thank Gaël Choblet as well as Jack Muir for their valuable and insightful reviews. The work of LH was supported by the Fulbright Foreign Student Program. MG was supported by the National Science Foundation through OCE-2049086.
DATA AVAILABILITY
The authors declare that all other data supporting the findings of this study are available within the paper and its supplementary material files. The code used for the forward models and the inference is available upon request.
References
APPENDIX A: SUPPLEMENTARY MATERIAL

Expected value and point-wise variance of the density field’s prior and posterior distributions when inverting for the sinkers’ viscosity. The first row shows the prior (a) mean and (b) point-wise variance, the other rows show the posterior mean and point-wise variance for different data types: (c) and (d) only horizontal velocities (V), (e) and (f) only normal traction (T), (g) and (h) both horizontal velocities and normal traction (V+T).

Log-viscosity field log10(η) with mesh refinement: (a) plate collision zone, including weak zone and (b) ridge (top-left corner of domain).

Most likely sample in the Markov chain after one million samples using zero prior mean for the level set function. The density field does not resemble the true shape of the subducting plate. Both data types were used for the simulation.