Divergence-free solutions to electromagnetic forward and adjoint problems: a regularization approach

SUMMARY

We discuss a regularization approach to improving numerical stability of the frequency domain Maxwell equations of electromagnetic geophysics. To enforce divergence-free conditions we add a scaled grad-div operator to the curl–curl equation for electric fields. This deflates the null space of the curl–curl operator and significantly reduces the condition number of the linear system derived for the finite difference (FD) approximation, resulting in faster and more stable iterative solution. We explicitly discuss extensions needed to solve the adjoint problems and consider two apparently different approaches to the sensitivity calculation, which we show are ultimately equivalent. To complete assessment of the new approach in practice we have implemented the modified solver in the ModEM inversion code, and compared it to more standard methods (based on solving with a divergence correction) on inversion of a real data set. This, and simpler tests of the forward solver in isolation, demonstrate that, with appropriate scaling of the added grad-div term, the regularization approach can dramatically improve the efficiency of Krylov subspace methods. Run times for the inversion test are reduced by almost a factor of three, making 3-D FD inversion significantly more practical.

Electromagnetic theory, Magnetotellurics, Numerical modelling, Geomagnetism and Electromagnetism

1 INTRODUCTION

At the low frequencies appropriate to electromagnetic (EM) induction methods, including magnetotellurics (MT) and most controlled source problems, displacement currents can be ignored. In this quasi-static approximation, the time harmonic Maxwell equations can be reduced to a system of partial different equations (PDEs), either

\begin{eqnarray*} \nabla \times \nabla \times E + i\omega \mu \sigma E = {S_E} \end{eqnarray*}

(1)

or:

\begin{eqnarray*} \nabla \times \rho \nabla \times H + i\omega \mu H = {S_H}, \end{eqnarray*}

(2)

where E and H are the electric and magnetic fields, ω is angular frequency, σ is the (spatially variable) conductivity of the earth, |$\mu$| is vacuum permeability (here taken as a constant), and |${S_E}$| and |${S_H}$|denote the applied electrical and magnetic sources. In most 3-D applications in EM geophysics one or the other of these second-order systems, with appropriate boundary conditions, is discretized as

\begin{eqnarray*} {{\bf Ax}} = {{\bf b}} \end{eqnarray*}

(3)

and solved numerically (Sadiku 2000). Here we focus on finite difference (FD) solutions to the electric field system (1). A similar approach could be given for magnetic field system in (2). For much of our initial discussion we will also omit the explicit source term in (1), as in the case of MT with a total field formulation, where sources are taken to be outside the model domain. Of course, the right-hand side of (3) is not zero in this case, as boundary data are also included in |${{\bf b}}$|⁠.

For most geophysical applications (1) must be solved in a domain which includes air layers above the Earth where, at least nominally |$\sigma = 0$|⁠, and the PDE reduces to:

\begin{eqnarray*} \nabla \times \nabla \times E = S_E. \end{eqnarray*}

(4)

Because the curl operator has a large null space (i.e. |$\nabla \times \nabla \varphi = 0$| for any scalar |$\varphi$|⁠) (1) will not have a unique solution if |$\sigma$|vanishes in any part of the domain. Commonly this difficulty can be overcome by assuming a small positive constant value for the air conductivity, for exmple, |${\sigma _{air}} = {10^{ - 10}}$| (e.g. Chave & Jones 2012). While this makes the PDE formally non-singular, the discretized system (3) will still typically be very ill-conditioned, especially as ω → 0, and the second term on the left-hand side of (1) becomes increasingly negligible compared to the first. Poor conditioning can cause severe problems for the iterative Krylov space solvers such as BiCGstab, GMRES or QMR (e.g. Saad & van der Vorst 2000) that commonly must be used for the large linear systems that occur in most 3-D EM applications. In particular, with (1) reduced to (4), as in the air and at low frequencies, only the curl of the field is constrained, and spurious modes with large divergence can appear in the iterative solutions (e.g. Jiang et al.1996). Furthermore, for ill-posed systems iterative solvers often encounter slow convergence, or stall before reaching the desired tolerance.

Taking the divergence of (1) (for now omitting the source term, as in MT method) shows that the current-conservation condition:

\begin{eqnarray*} \nabla \cdot J = \nabla \cdot \sigma E = 0 \end{eqnarray*}

(5)

must hold wherever conductivity does not vanish. Furthermore, it is readily shown that under the quasi-static approximation there can be no free charges where conductivity is constant, implying:

\begin{eqnarray*} \nabla \cdot E = 0 \end{eqnarray*}

(6)

in the air. While both of these conditions can be formally derived from (1) (the second requires assuming a small positive value for |${\sigma _{\rm {air}}}$|⁠), they are only weakly enforced in the iterative solution of this system, especially for low frequencies. If these two conditions are enforced explicitly, the null space of the curl–curl operator is deflated (Newman & Alumbaugh 2002), spurious solutions are eliminated, and convergence of iterative solvers can be accelerated.

To this end, a number of closely related approaches have been proposed and used, from the Helmholtz decomposition of classical physics, to the so called ‘divergence correction’ commonly used in EM geophysics (Mackie et al.1994; Smith 1996; Avdeev 2007). Here we focus on a variant on the divergence correction idea, described by Clemens & Weiland (2002). In this scheme, which has so far not been widely discussed in the EM geophysical literature (but see Weaver et al.1999; Mackie & Watts 2012), a modified system of equations is derived by essentially adding scaled versions of (5)–(6) to (1). With proper scaling, the resulting system ensures that the electric fields satisfy both the curl and divergence constraints at the same time, and does not suffer from the singularity of the curl–curl equation in the air.

In the following, we first present the basic theory for the modified equation approach of Clemens & Weiland (2002) and compare this with the more standard iterative divergence correction schemes that have been widely used in EM modelling for geophysics. In contrast to most previous discussions of these sorts of schemes, we pay special attention to aspects of the problem that are critical to the adjoint solutions required for sensitivity calculations. We then discuss practical application of the technique to the 3-D forward and inverse problem for the MT method. Finally, we investigate the factors that influence the efficiency and accuracy of the modified equation approach by running numerical experiments for synthetic models and inversions with field MT data.

2 A MODIFIED (REGULARIZED) SYSTEM OF EQUATIONS

From Helmholtz decomposition to divergence correction

An intuitive move to address the near singularity (1) is to augment these equations with the divergence constraints of (5) and (6). However, if the unknowns are only the electric fields |$E$|⁠, this combined system will obviously be overdetermined. To address this problem, additional variables must also be introduced. There are several ways to do this.

The Helmholtz decomposition writes the electric field as the sum of a solenoidal field and an irrotational field:

\begin{eqnarray*} E = A + \nabla \varphi \end{eqnarray*}

(7)

with the Coulomb gauge condition:

\begin{eqnarray*} \nabla \cdot A = 0, \end{eqnarray*}

(8)

where A and φ are vector and scalar potentials of the electromagnetic field. The overdetermination issue is then resolved by substituting (2.1) and (8) into (1) and directly solving for the electromagnetic vector and scalar potentials, instead of the electric field (e.g. Haber et al.2000; Aruliah et al.2001). Alternatively, the vector electric (or magnetic) fields can be retained, but now augmented with an additional unknown scalar field, essentially a scalar potential (Jiang et al.1996; Schwarzbach 2009). A combined system of equations for electric fields (or magnetic in the case of Jiang et al.1996) and the potential, are then solved simultaneously to enforce both the curl–curl eq. (1) and the divergence constraints (5)–(6). In this |$E - \varphi$| formulation the solution for the scalar potential is identically zero. With either approach the conservation conditions are explicitly enforced, eliminating the spurious modes, and improving convergence.

There is another popular class of methods that also exploits the Helmholtz decomposition. While still solving (1), a preconditioner is built enforcing the conservation conditions of (7) and (8) to deflate the null space in the curl operator and eliminate the spurious modes of (1). The auxiliary problem that arises in the preconditioner can be easily solved by discrete Fourier Transform (Druskin et al.1999), Krylov subspace methods (Newman & Alumbaugh 2002) or algebraic multigrid methods (Arnold et al.2000; Hiptmair & Xu 2007).

Perhaps the most straight-forward approach is the ‘divergence correction’ method of Mackie et al. (1994) and Smith (1996), which is commonly implemented in finite-difference-based geophysical EM modelling and inversion codes (e.g. Mackie et al.2001; Siripunvaraporn et al.2005; Kelbert et al.2014). In a typical application, the linear system derived by discretization of (1) is solved approximately with a Krylov-space method. After a small number of iterations (say, |$n = 20 - 40$|⁠), an intermediate solution |${E_n}$| is obtained. Since (1) does not constrain the divergence, for this intermediate solution we normally will have

\begin{eqnarray*} \nabla \cdot \sigma {E_n} = \kappa {^{\prime}} \ne 0 \end{eqnarray*}

(9)

in the earth, and:

\begin{eqnarray*} \nabla \cdot {E_n} = \kappa \ne 0 \end{eqnarray*}

(10)

in the air, where |$\kappa {^{\prime}}$| and |$\kappa$| denote the current and charge sources, respectively. Therefore, if we solve:

\begin{eqnarray*} \nabla \cdot \sigma (\nabla \varphi ) = \kappa {^{\prime}} \end{eqnarray*}

(11)

in earth and:

\begin{eqnarray*} \nabla \cdot (\nabla \varphi ) = \kappa \end{eqnarray*}

(12)

in air (with appropriate boundary conditions, and matching conditions at the air–earth interface), we can define a potential field |${E_\varphi } = \nabla \varphi$| such that a ‘corrected’ electric field

\begin{eqnarray*} {E_c} = {E_n} - {E_\varphi } \end{eqnarray*}

(13)

satisfies the divergence constraints (5) and (6). The Krylov solver is then reinitialized with |${E_c}$| and the iterative solution of (1) is continued. The two steps, with Krylov iterations to reduce residuals in the curl–curl equation alternating with the correction to enforce the divergence conditions, are repeated until the desired misfit is reached. The divergence correction scheme can be viewed as a variant on the |$E - \varphi$| approach, that splits the full set of variables into two groups (⁠|$E$| and |$\varphi$|⁠) and alternately improves solutions for each. At convergence of the divergence correction scheme |$\varphi \equiv 0$|⁠, as in the |$E - \varphi$| formulation.

Instead of enforcing the explicit divergence conditions sequentially, with an auxiliary system preconditioner or through an expanded system of equations (and unknowns), Weiland (1985) and Clemens & Weiland (2002) discuss an alternative approach: regularize the problem by adding the (appropriately scaled) divergence equations to the curl–curl equation. Weaver et al. (1999) used an essentially equivalent approach to stabilize the magnetic field eq. (2), adding a term to enforce the condition |$\nabla \cdot B = 0$|⁠.

For motivation, note if we take the gradient of (6), add the result to (1), and use the identity |$\nabla \times \nabla \times E = \nabla \nabla \cdot E - {\nabla ^2}E$|⁠, in the air the system of equations reduces to

\begin{eqnarray*} {\nabla ^2}E = 0 \end{eqnarray*}

(14)

with appropriate boundary conditions, the Laplace operator has only a trivial null space, essentially eliminating the singularity problem in the air. By analogy we should add the gradient of (5) to (1) in the Earth where conductivity does not vanish. However, the constraint on current divergence is not dimensionally consistent with the curl–curl equations (and indeed, in SI units, generally |$\sigma \ll 1$|⁠). To effectively enforce both (1) and (5) we should thus instead add something like |$\sigma _0^{ - 1}\nabla (\nabla \cdot \sigma E) = 0$|⁠, for an appropriately chosen reference conductivity |${\sigma _0}$|⁠.

To simplify further discussion, we assume henceforth that the air conductivity |${\sigma _{\rm {air}}} > 0$| is finite. Then, as first suggest by Weiland (1985), we can introduce a (possibly spatially variable) scaling factor |$\lambda$| and regularize (1) in both air and Earth by adding |$\lambda \nabla (\nabla \cdot \sigma E) = 0$| to obtain

\begin{eqnarray*} \nabla \times \nabla \times E - \lambda \nabla (\nabla \cdot \sigma E) + i\omega \mu \sigma E = 0. \end{eqnarray*}

(15)

With an appropriate choice of |$\lambda$|⁠, this modified system now simultaneously constrains the curl and divergence of the electric field in both subdomains, even when the third term vanishes as ω → 0. Note that in the second term of (15) we apply the scaling factor after the divergence and gradient operations. We can also move the scaling factor before the gradient operator (⁠|$\lambda$| is applied on the nodes) since our objective is to enforce |$\nabla \cdot \sigma {{\bf E}} = 0$|⁠; we consider both possibilities here.

In the air we set |$\lambda = 1/{\sigma _{air}}$|⁠, and because the conductivity is constant in this domain the curl–curl operator still reduces to the Laplacian, and (15) becomes |${\nabla ^2}E + i\omega \mu {\sigma _{air}}E = 0$|⁠, a good approximation to (14) when |${\sigma _{air}}$| is small. As we discuss further below, a similar scaling should be applied in the Earth. Ideally, we would have

\begin{eqnarray*} \lambda \nabla (\nabla \cdot \sigma E) \approx \nabla \nabla \cdot E, \end{eqnarray*}

(16)

so that the curl–curl operator would be reduced to the Laplacian. However, as the Earth conductivity is not generally uniform, this simplification cannot be achieved exactly. Intuitively, some sort of average of |${\sigma _{\rm {earth}}}$| should be a reasonable value for a constant scaling factor |$\lambda$|⁠, particularly if the Earth conductivity does not vary widely. In fact, |$\lambda$| can vary spatially to achieve a better approximation of (16). As we show in Section 4, using |$\lambda = 1/{\sigma _{\rm {earth}}}$| for the scaling seems to work best, but only if the regularization term is implemented as |$\nabla \lambda (\nabla \cdot \sigma ){{\bf E}}$|⁠. In the following we use notation consistent with this formulation, allowing for a spatially variable scaling factor.

Numerical implementation

We turn now to details of the numerical implementation of the modified system (15) in the ModEM inversion code described by Egbert & Kelbert (2012) and Kelbert et al. (2014). In the original version of this code (1) is discretized using a standard staggered grid (Yee 1966). The solution to the resulting linear system is found with a pre-conditioned Krylov solver, modified to incorporate the divergence correction scheme outlined above. To derive the correct discretization of (15) requires a bit of care with regard to boundaries. Following Egbert & Kelbert (2012; Appendix B), we decompose the discrete electric field vector|$e$|and curl operator |${{\bf C}}$| into interior and boundary components

\begin{eqnarray*} {{\bf C}} = \left[ {\begin{array}{*{20}{c}} {{{\bf \tilde{C}}}}&{{{{\bf C}}_b}} \end{array}} \right],{{\bf e}} = \left[ \begin{array}{@{}l@{}} {{{\bf \tilde{e}}}}\\ {{{\bf e}}_b} \end{array} \right]. \end{eqnarray*}

(17)

Note that |${{\bf C}}$| maps from all edges to interior faces, while the domain of |${{\bf \tilde{C}}}$| is restricted to interior edges. The electric fields on the boundaries (⁠|${{{\bf e}}_b}$|⁠) are specified, while |${{\bf \tilde{e}}}$| is to be solved for. For the MT case (⁠|$S = 0$|⁠) the discrete form of (1), is then

\begin{eqnarray*} \left[ {\begin{array}{*{20}{c}} {{{{{\bf \tilde{C}}}}^\dagger }{{\bf \tilde{C}}} + {{\bf diag}}(i\omega \mu \sigma )}&{{{{{\bf \tilde{C}}}}^\dagger }{{{\bf C}}_b}} \end{array}} \right]{{\bf e}} = 0, \end{eqnarray*}

(18)

where |${{{\bf \tilde{C}}}^{^\dagger }}$| is the adjoint of |${{\bf \tilde{C}}}$|⁠, which maps from interior faces to interior edges. Note that in (18) there is one equation for each interior edge, while |$e$| includes all edges, boundary and interior. These equations are readily reduced to the square system that we actually solve

\begin{eqnarray*} {{\bf S\tilde{e}}} = \left[ {{{{{\bf \tilde{C}}}}^\dagger }{{\bf \tilde{C}}} + {{\bf diag}}(i\omega \mu \sigma )} \right]{{\bf \tilde{e}}} = - {{{\bf \tilde{C}}}^\dagger }{{{\bf C}}_b}{{{\bf e}}_b}. \end{eqnarray*}

(19)

The simplest way to derive the correct discrete form for the divergence constraint of eq. (5) is to apply a discrete divergence operator |${{\bf \tilde{D}}}$| to (19). Note that the discrete divergence is properly defined as a mapping from interior edges to interior nodes—nodes on the boundaries connect to an incomplete set of edges. Due to the conservation properties of the staggered grid it is readily verified that |${{\bf \tilde{D}}}{{{\bf \tilde{C}}}^\dagger } = 0$|⁠, so applying |${{\bf \tilde{D}}}$| to (18) we obtain the discrete current conservation equation

\begin{eqnarray*} {{\bf \tilde{D}\Sigma \tilde{e}}} = 0, \end{eqnarray*}

(20)

where |${{\bf \Sigma }} = {{\bf diag}}(\sigma )$|⁠. There is one equation in this system for each interior node. To complete our derivation let |${{\bf G}}$| be the gradient operator, a mapping from all nodes to interior edges, and let |${{\bf \tilde{G}}}$| be the restriction of this operator to interior nodes. |${{\bf \tilde{G}}}$| computes a gradient component on all interior edges, but effectively assumes that the input scalar field is 0 on the boundary. Thus for any diagonal matrix of scaling factors |${{\bf \Lambda }}$|⁠,

\begin{eqnarray*} {{\bf \tilde{G}\Lambda \tilde{D}\Sigma \tilde{e}}} = 0 \end{eqnarray*}

(21)

is defined on all interior edges, and enforces the desired discrete condition (20). Thus, the second order difference eq. (15) (here, and subsequently, modified by moving the scaling between the gradient and divergence operators) can be approximated on a discretised grid as:

\begin{eqnarray*} {{{\bf S}}_{\,\rm mod\,}}{{\bf \tilde{e}}} = [{{{\bf \tilde{C}}}^\dagger }{{\bf \tilde{C}}} - {{\bf \tilde{G}\Lambda \tilde{D}\Sigma }} + {{\bf diag}}(i\omega \mu \sigma )]{{\bf \tilde{e}}} = {{\bf b}}, \end{eqnarray*}

(22)

where |${{\bf \Lambda }}$| is the diagonal matrix of scaling values, and the right hand side is |${{\bf b}} = - {{{\bf \tilde{C}}}^\dagger }{{{\bf C}}_b}{{{\bf e}}_b}$|⁠. To simplify notation, we drop the tildes below, assuming implicitly that all operators and vectors are restricted to interior nodes and edges.

All of the matrices appearing in (22) are already used within the ModEM code, so setting up the modified system is easy. The matrix |${{\bf S}}_{\,\rm mod}$| differs from the original (unmodified) matrix only in the additional |${{\bf G\Lambda D}}$| term, and both |${{\bf G}}$| and |${{\bf D}}$| are already used in the conventional iterative divergence correction scheme. We can thus build the modified system of equations by simply adding the additional sparse matrix |${{\bf G\Lambda D\Sigma e}}$| to the original matrix |${{\bf S}}$|⁠. Matrix elements that cancel to within numerical precision (e.g. for edges in the air) are set to zero. It also worth noting that even with the added |${{\bf G\Lambda D}}$| term, our regularized system (22) solves the same physical problem as the original system (19). Apparently, if we could obtain an exact solution of the curl–curl equation with no numerical error, the solution should automatically satisfy the conservation condition, hence it is a solution of the Grad-Div equation (therefore also a solution for our regularized system). We have verified this equivalence using a direct solver for problems of modest size.

Scaling factors

The factor |$\lambda$| will be vastly different between earth and air domains and depending on where the scaling is applied (i.e. on edges or on nodes), some care may be necessary. The situation is illustrated in Fig. 1, allowing for a non-planar interface between the two domains. Clearly, all edges that bound an earth cell (i.e. any cell with non-zero conductivity, shaded in the figure) will have non-zero edge conductivity, and should be considered to be within the earth domain. The remainder, which are surrounded by only air cells, are in the air domain. It is also clear how to distinguish between Earth nodes (cell corners) where (5) should hold and air nodes where the appropriate conservation condition is given by (6). Earth nodes are connected to at least one earth edge (but earth nodes on the interface will also be connected to one or more air edges), while air nodes are only connected to air edges. An apparent complication is that if we apply the scaling factor |${{\bf \Lambda }}$| to the edges, after computing gradients. Some edges (dotted lines in Fig. 1) are used to compute divergences for nodes in both domains, and it is not immediately obvious how these should be scaled. This issue can be resolved by defining separate gradient operators for each subdomain.

Figure 1.

Discretization of a simple 3-D topology model for forward problem and adjoint problem in sensitivity calculation. The earth cells are marked with shaded colour. The filled circles and open circles show the earth nodes and air nodes, respectively. The solid lines are the earth edges, while the dashed lines denote the air edges. Note that the dotted edges are used both in the air and the earth in divergence operator.

Open in new tab Download slide

Thus we define |${{{\bf G}}_A}$| as the gradient operator for air edges. For air edges which connect earth and air nodes, the Earth node is excluded from the gradient computation. Thus, gradients are computed on all interior air edges, under the assumption that the value of the scalar field on all boundary nodes (including the air–earth interface) are zero. For these air edges we set |$\lambda \equiv 1/{\sigma _{\rm {air}}}$|⁠, so we effectively add |${{{\bf G}}_A}{{\bf De}} = 0$| to the original system of equations. For the earth edges, we add |${{{\bf \Lambda }}_E}{{{\bf G}}_E}{{\bf D\Sigma e}} = 0$|⁠, where |${{{\bf G}}_E}$| is the gradient operator restricted to earth nodes, mapping to earth edges, and |${{{\bf \Lambda }}_E}$| represents scaling of earth edges.

If we put |${{\bf \Lambda }}$| before the gradient operator to apply the scaling factor on the nodes we simply add |${{\bf G\Lambda D\Sigma e}} = 0$| to the full system. Although scaling factors are vastly different between earth and air domains, there is no ambiguity about which domain a node belongs to, so using an average node conductivity leads to at least approximately correct scaling of the divergence correction.

Coefficient matrix structure

For the staggered grid, the standard second-order FD stencil for the curl–curl operator has 13 non-zero elements (nzs), while the stencil for the grad-div operator has 11. However, in the air (or anywhere where |$\sigma$| is constant and |$\lambda = 1/\sigma$|⁠) many of these elements cancel out as the modified system of equations reduces to the vector Laplacian, with at most 7 non-zero elements per row. The change in sparsity pattern is illustrated in Fig. 2 for a toy problem (⁠|$6 \times 6 \times 6$| grid with 2 air layers). The reduction in the number of non-zero matrix elements reduces computational time for the matrix-vector multiplies used by iterative solvers. More importantly, collapsing the null space of the curl operator leads to a substantial decrease in matrix condition number. For the toy problem of Fig. 2 the condition number drops from 1.45 × 10¹³ to 474.5. We can thus anticipate convergence of Krylov solvers for the modified system in fewer iterations. Further gains in efficiency can be expected from omission of the divergence correction iterations that solve (12) and (13).

Figure 2.

Sparsity pattern of a 6 by 6 by 6 (including 2 air layers) FD grid for: (a) original system of equations; (b) the grad-div equations; (c) the modified system of equations. Nz denotes the number of non-zero elements and cond is the condition number of the matrix. Note that only the matrix for interior edges (excluding boundaries) is shown. In this simple illustrative case there is some cancellation in the Earth, as the conductivity is homogeneous.

Open in new tab Download slide

A few comments on symmetry of the modified system are also appropriate. As shown in Egbert & Kelbert (2012) the original system matrix |${{\bf S}}$| can be symmetrized by pre-multiplying by |${{\bf V}}$|⁠, the diagonal matrix of edge volume elements for the grid, that is, |${{\bf VS}} = {( {{{\bf VS}}} )^T} = {{{\bf S}}^T}{{\bf V}}$| is a symmetric (but not Hermitian) matrix. It is often easier to solve a symmetric problem, so often (in particular in the ModEM code) the forward solution is obtained by solving |${{\bf VSe}} = {{\bf Vb}}$|⁠, where the regularization term becomes:

\begin{eqnarray*} {{\bf VG\Lambda D\Sigma e}} = 0. \end{eqnarray*}

(23)

It is easily shown that |${{\bf VGD}}$| is also a symmetric matrix, but the diagonal matrices |${{\bf \Lambda }}$| and |${{\bf \Sigma }}$| break the symmetry. It is possible to modify the equations and still maintain symmetry—for example following the node based scaling approach, instead of adding |${{\bf VG\Lambda D\Sigma e}} = 0$| we can add |${{\bf \Sigma VG}}{{{\bf \Lambda }}^2}{{\bf D\Sigma e}} = 0$|⁠, which still enforces the divergence-free conditions. However, our tests suggest that this alternative approach does not perform as well as a scheme based on (22), at least when we use a Krylov solver such as BiCGstab or QMR, which can handle non-symmetric systems. Partial symmetrization with |${{\bf V}}$| still does appear to accelerate convergence.

Non-zero current source

So far our discussion of the modified system has focused on the MT forward problem, where there is no source term within the model domain. However, currents in the earth will not necessarily be divergence free when there is an imposed current source, as in the case of controlled source methods. In fact, even for MT we will need to allow for a non-zero right-hand side of eq. (1) to compute data sensitivities needed for linearized inversion. Formally, the extension to allow for external source currents is quite simple. Taking the divergence of eq. (1) leads to the condition

\begin{eqnarray*} \nabla \cdot \sigma E = {(i\omega \mu )^{ - 1}}\nabla \cdot {S_E} \end{eqnarray*}

(24)

in the conductive earth. Below the subscript|$_E$| is dropped in the source term to simplify notation. Thus, in the presence of an external source the right-hand side for the modified system also should be modified. In the air the divergence of (1) results in a condition on the source

\begin{eqnarray*} \nabla \cdot S = 0. \end{eqnarray*}

(25)

This is physically reasonable, and just says that an imposed current system in a perfect resistor can have no divergence. Mathematically, if the source term does not satisfy (25), there will be no solution to (1) (if air conductivity is exactly zero). The divergence condition (6) still must hold in the air, even in the presence of external sources. Putting all of this together, if a source |${{\bf s}}$| (defined on interior edges) is added to the discrete original system (18) the right-hand side of the modified system (22) should be

\begin{eqnarray*} {{\bf s}} + {(i\omega \mu )^{ - 1}}{{\bf G\Lambda Ds}} + {{\bf b}}, \end{eqnarray*}

(26)

where |${{\bf b}}$| is the boundary condition term, and it is understood that |${{\bf Ds}}$| should vanish in the air.

3 SENSITIVITY CALCULATIONS FOR INVERSE PROBLEMS

Egbert & Kelbert (2012) derive a formal expression for the sensitivity matrix (or Jacobian) |${{\bf J}}$| evaluated at model parameter |${{{\bf m}}_0}$|

\begin{eqnarray*} {{\bf J}} = {{\bf L}}{{{\bf S}}^{ - 1}}{{\bf P}} + {{\bf Q}}, \end{eqnarray*}

(27)

where |${{\bf L}}$| is the matrix of linearized data functionals, |${{{\bf S}}^{ - 1}}$| represents application of the forward solver (with coefficient matrix |${{\bf S}}$|⁠, computed from model parameter |${{{\bf m}}_0}$|⁠), |${{\bf Q}}$| accounts for direct dependence of the data functionals on the model parameters, and

\begin{eqnarray*} {{\bf P}} = {\left. { - \frac{\partial }{{\partial {{\bf m}}}}\left[ {{{\bf S}}{{{\bf e}}_0}} \right]} \right|_{{{{\bf m}}_0}}}. \end{eqnarray*}

(28)

Each column of the matrix |${{\bf P}}$| provides the forcing needed to solve for the sensitivity for a single model parameter. An explicit expression for the transpose of the sensitivity matrix is also useful

\begin{eqnarray*} {{{\bf J}}^T} = {{{\bf P}}^T}{\left( {{{{\bf S}}^T}} \right)^{ - 1}}{{{\bf L}}^T} + {{{\bf Q}}^T}, \end{eqnarray*}

(29)

where |${( {{{{\bf S}}^T}} )^{ - 1}}$| represents solution of the transposed (or adjoint) system. The matrix |${{\bf Q}}$| does not interact with the forward solver, so we will ignore this term here.

There are two (ultimately equivalent) ways to view computation of the sensitivity for the modified system. First, as the formal notation of (27) and (29) already suggests, we can view the solver as a ‘black box’ which solves the discretized equations for a specified input (i.e. forcing and boundary conditions). From this perspective, using the modified system of equations represents a regularization, or pre-conditioner, that improves efficiency of the solver, but does not ultimately change the forward problem. The inputs to any nominally equivalent solver, whether based on the original system matrix |${{\bf S}}$|(with divergence correction), or on the modified system |${{{\bf S}}_{\,\rm mod}}$|⁠, should remain unchanged. Thus the matrices |${{\bf P}}$| and |${{{\bf L}}^T}$| still define the inputs to the (abstract black box) forward and adjoint solvers, respectively. However, as discussed at the end of the previous section, if inputs to the original forward problem are |${{\bf b}}$| (boundary data) and |${{\bf s}}$| (imposed source current; this will be |${{\bf P}}$| for the sensitivity calculation), the right-hand side for the modified system needs to be changed by addition of |${(i\omega \mu )^{ - 1}}{{\bf G\Lambda Ds}}$|⁠. Similar modifications will be required for the right-hand side of the adjoint system, as discussed later.

A different, and conceptually simpler, approach to the sensitivity calculation, which we refer to as the ‘direct’ approach, is based on the observation that the coefficient matrix has been modified to |${{\bf S}}_{\,\rm mod}$|⁠, and, since the added term depends on the conductivity parameter |${{\bf m}}$|⁠, a different expression for |${{{\bf P}}_{\,\rm mod}}$| will be obtained from (28). As we now show, the two perspectives are easily reconciled. Following Egbert & Kelbert (2012) we can make the dependence of the model operators on model parameters explicit. For the modified system of discrete equations, we have

\begin{eqnarray*} {{{\bf S}}_{\,\rm mod\,}}{{{\bf e}}_0} = {{{\bf S}}_0} + \left[ {{{{\bf U}}_{\,\rm mod\,}} + {{\bf U}}} \right]\left( {\pi ({{\bf m}}) \circ {{{\bf e}}_0}} \right), \end{eqnarray*}

(30)

where component discrete operators are given by |${{{\bf S}}_0} = {{{\bf C}}^\dagger }{{\bf C}}$|⁠, |${{\bf U}} = i\omega {\mu _0}{{\bf I}}$|⁠, |${{{\bf U}}_{\,\rm mod\,}} = {{\bf G\Lambda D}}$|⁠, |$\pi ({{\bf m}})$| gives the mapping from the model parameter space to the edges of the staggered grid where electric field components are defined, and |$\circ$| is the pointwise (Hadamard) product (so |$\pi ({{\bf m}}) \circ {{{\bf e}}_0}$| represents currents associated with the background solution). The original (unmodified) operator |${{\bf S}}$| omits |${{{\bf U}}_{\,\rm mod}}$|⁠, but is otherwise identical. Using the decomposition of (30). Egbert & Kelbert (2012) show that

\begin{eqnarray*} {{{\bf P}}_{\,\rm mod\,}} = - \left[ {{{{\bf U}}_{\,\rm mod\,}} + {{\bf U}}} \right]{{\bf diag}}({{{\bf e}}_0}){\left. {\frac{{\partial \pi }}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}}, \end{eqnarray*}

(31)

which is indeed different from |${{\bf P}}$| for the original system, which again omits |${{{\bf U}}_{\,\rm mod}}$|⁠. If we compute the sensitivity using the second, more direct, approach we first solve

\begin{eqnarray*} {{{\bf S}}_{\,\rm mod\,}}{{\bf X}} = {{{\bf P}}_{\,\rm mod\,}} \end{eqnarray*}

(32)

and then form the product |${{\bf LX}}$|⁠. We solve the same equations in the first case, but compute the right-hand side using |${{\bf P}}$| derived via (28) from the original system, modified as in (26),

\begin{eqnarray*} {{\bf P}} + {{\bf G\Lambda DP}} &=& - \left( {{{\bf I}} + {{\bf G\Lambda D}}} \right){{\bf Udiag}}({{{\bf e}}_0}){\left. {\displaystyle\frac{{\partial \pi }}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}}\nonumber \\ &=& - \left( {{{\bf I}} + {{\bf G\Lambda D}}} \right){{\bf Udiag}}({{{\bf e}}_0}){\left. {\displaystyle\frac{{\partial \pi }}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}}\nonumber \\ &=& - \left[ {{{\bf U}} + {{{\bf U}}_{\,\rm mod\,}}} \right]{{\bf diag}}({{{\bf e}}_0}){\left. {\displaystyle\frac{{\partial \pi }}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}}. \end{eqnarray*}

(33)

Thus, (32) is ultimately solved for the same right-hand side, and the two approaches are equivalent. In ModEM |${{\bf L}}$|⁠, |${{{\bf S}}^{ - 1}}$| and |${{\bf P}}$| are independent modules, so we find it more convenient to treat the forward solver as a black box which automatically adjusts the inputs before computing the solution. Then we can leave |${{\bf P}}$| unchanged, and use either the original, or the modified approach for the forward solver, without modifying other parts of the sensitivity calculation.

In arriving at the expression (31) for |${{{\bf P}}_{\,\rm mod}}$| we implicitly assumed that the matrix |${{{\bf U}}_{\,\rm mod}}$| was not a function of the unknown model parameters. In fact, for variable node scaling

\begin{eqnarray*} {{{\bf U}}_{\,\rm mod\,}} = {{\bf Gdiag}}({\pi _n}{({{\bf m}})^{ - 1}}){{\bf D}}, \end{eqnarray*}

(34)

where |${\pi _n}({{\bf m}})$| represents the mapping from conductivity parameter |${{\bf m}}$| to some sort of node-average conductivity. Using the usual product rule for derivatives the extra term in |${{{\bf P}}_{\,\rm mod}}$| must be modified to

\begin{eqnarray*} {\left. {\frac{{\partial {{{\bf U}}_{\,\rm mod\,}}\left( {\pi ({{\bf m}}) \circ {{{\bf e}}_0}} \right)}}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}} &=& {{{\bf U}}_{\,\rm mod\,}}{{\bf diag}}({{{\bf e}}_0}){\left. {\frac{{\partial \pi }}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}} \nonumber \\ && \,+ {\left. {\frac{{\partial {{{\bf U}}_{\,\rm mod\,}}}}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}}\left( {\pi ({{\bf m}}) \circ {{{\bf e}}_0}} \right). \end{eqnarray*}

(35)

Using (35), and following the approach given in Egbert & Kelbert (2012), the second term is

\begin{eqnarray*} G{{\bf diag}}\left( {{{\bf D\Sigma }}{{{\bf e}}_0}} \right){\left. {\frac{{\partial \pi _n^{ - 1}}}{{\partial {{\bf m}}}}} \right|_{{{{\bf m}}_0}}}. \end{eqnarray*}

(36)

But |${{\bf D\Sigma }}{{{\bf e}}_0} \equiv 0$| so (35) reduces to the simpler expression (31). The same argument holds for other possible scaling factors which might have some dependence on model parameters.

Computations with the transposed Jacobian of (29) can be implemented with the same two approaches discussed for the forward problem. Although indirect arguments show that the same result will be obtained with either, this equivalence is apparently not readily established through simple closed-form expressions.

The most obvious way to compute the transposed Jacobian using the modified equations would be to replace |${{\bf S}}$| with |${{\bf S}}_{\,\rm mod\,}^T$| and |${{\bf P}}$| with |${{\bf P}}_{\,\rm mod\,}^T$| in (29). It is also possible to adopt the ‘black box’ approach for adjoint calculations. As already mentioned above, the transpose of the original system satisfies |${{{\bf S}}^T} = {{\bf VS}}{{{\bf V}}^{ - 1}}$|⁠. Thus, the system |${{{\bf S}}^T}{{\bf e}} = {{\bf s}}$| can be found by solving the symmetrized system

\begin{eqnarray*} {\bf VS\tilde{e}} = {\bf s} \end{eqnarray*}

(37)

followed by the direct computation |${{\bf e}} = {{\bf V\tilde{e}}}$|⁠. If we multiply both sides of (37) by |${{\bf D}}{{{\bf V}}^{ - 1}}$| we find

\begin{eqnarray*} {{\bf D}}\Sigma {{\bf \tilde{e}}} = {(i\omega \mu )^{ - 1}}{{\bf D}}{{{\bf V}}^{ - 1}}{{\bf s}}. \end{eqnarray*}

(38)

As for the forward problem the interpretation of this equation is different for air and earth nodes. For the latter case this is a condition on the divergence of currents, while in the air we obtain (for zero conductivity) a condition on the source term |${{\bf D}}{{{\bf V}}^{ - 1}}{{\bf s}} = 0$|⁠. Note that this condition in the air must be satisfied for (37) to have a solution. We discuss some technical issues further in the Appendix. It also should be noted that while the air edges extending from surface nodes (dotted edges in Fig. 1) do not contribute to the divergence of currents on the left hand side of (38), they may contribute to the divergence of the source term. Multiplying (38) by |${{\bf VG\Lambda }}$| and adding to (37) and using (22) we have an alternative equation for |${{\bf \tilde{e}}}$|

\begin{eqnarray*} {{\bf V}}{{{\bf S}}_{\,\rm mod\,}}{{\bf \tilde{e}}} = {{\bf s}} + {(i\omega \mu )^{ - 1}}{{\bf VG\Lambda D}}{{{\bf V}}^{ - 1}}{{\bf s}}. \end{eqnarray*}

(39)

Thus, the original adjoint solution can be obtained using the modified system of equations, with an appropriately modified right-hand side. An alternative approach to computing the transpose of the Jacobian is thus to solve

\begin{eqnarray*} {{\bf V}}{{{\bf S}}_{\,\rm mod\,}}{{\bf X}} = {{{\bf L}}^T} + {(i\omega \mu )^{ - 1}}{{\bf VG\Lambda D}}{{{\bf V}}^{ - 1}}{{{\bf L}}^T}, \end{eqnarray*}

(40)

and then compute

\begin{eqnarray*} {{{\bf J}}^T} = {{{\bf P}}^T}{{\bf VX}}. \end{eqnarray*}

(41)

In contrast to the forward Jacobian calculation, it is not immediately obvious from the equations given that the black box and direct approaches should yield identical results for |${{{\bf J}}^T}$|⁠, even within numerical precision. Different systems of equations are solved, with different right-hand side, and the results are multiplied by different matrices. However, we have shown that the same result for |${{\bf J}}$| are obtained with the two approaches, and the two different expressions for |${{{\bf J}}^T}$| are obtained formally by applying standard rules for matrix transpose. In Section 4, we demonstrate equivalence of the two approaches with computational examples. Within the modular ModEM code, we use the black box approach to computation of |${{{\bf J}}^T}$| or of matrix–vector products such as required for penalty functional gradients.

4 NUMERICAL RESULTS

As an initial comparison between forward solvers based on the modified and original systems of equations we consider one of the most widely used 3-D synthetic models: the COMMEMI 3D-2A model introduced by Zhdanov et al. (1997). The model includes two surface outcropping anomalies in a layered earth, with more details shown in fig. 5(b) of Zhdanov et al. (1997). The model is discretized into a 71 by 71 by 46 mesh (excluding air layers but including padding cells). MT forward responses are calculated at 6 periods from 0.01 to 1000 s. For these, and all other forward calculations we used the BiCGstab method, preconditioned with an incomplete LU decomposition. For calculations with the original system of equations, a divergence correction is performed every 40 BiCGstab iterations. All forward computations were done with a single process running on an Intel(R) Xeon E5 1.7 GHz. Iterations are set to terminate after the calculated relative residual is below 10⁻¹⁰. To simplify discussion, we use CC-DC (curl–curl equations followed by divergence correction) and CCGD (curl–curl equations modified by grad-div operator) respectively, to refer to the original and modified approaches.

First, we compare forward solutions for the COMMEMI 3D-2 resistivity model using ModEM, with forward solvers based on CC-DC and CCGD methods. Fig. 3 compares the calculated surface electric field of XY mode at 100 s using the two approaches. It is quite clear that the surface electric fields computed with both approaches are nearly identical, with relative differences of order 10⁻⁷ or less. After all, we are still solving the same physical problem, just in a different way. It should also be noted that the most pronounced differences between the two solutions are located where the model conductivity is in harsh contrast. This is because contrasts in conductivity may result in abrupt changes in the magnitude of the |${{\bf G\Lambda D\Sigma e}}$| term in the neighbouring cells near the boundary, disrupting the relative weight of the curl–curl and grad-div terms.

Figure 3.

Comparison of the surface Ex field at the central part of COMMEMI 3D-2A model with XY polarization at a period of 100 s using (a) CC-DC approach and (b) CCGD approach. Panel (c) shows the electric field difference of (a) – (b).

Open in new tab Download slide

For the CCGD results shown in Fig. 3 the scaling factor was variable in space and applied at the nodes. Similar agreement with the CC-DC solution is obtained regardless of how the regularization term is scaled—that is implemented as |${{\bf G\Lambda D\Sigma e}}$| or |${{\bf \Lambda GD\Sigma e}}$|⁠, with scale factors in the earth constant, or varying as |$\sigma _{Earth}^{ - 1}$|⁠. On the other hand, performance is significantly affected by details of the scaling. This is illustrated in Fig. 4, where we plot the histogram of model conductivity values, together with wall-clock execution time for the COMMEMI 3D-2A modelling runs at a period of 100 s. If |$\lambda$| is taken as a constant in the earth, performance of course depends on the value used. For both edge and node cases run times are reduced, relative to CC-DC (indicate by the heavy dashed line) only for |$\lambda$| near the centre of the distribution of model conductivities. There is no clear advantage to either node or edge scaling, at least for this test model. The clear winner in terms of speed is spatially variable scaling applied to nodes. The same variable scaling applied to edges is remarkably slower, and in fact is considerably less efficient than the CC-DC approach.

Figure 4.

Distribution of conductivity values in COMMEMI 3D-2A forward model, with average computation time for XY mode forward calculation at 100 s. The dashed line denotes the calculation time using CC-DC approach. The solid line is the forward time for CCGD approach with spatially variable scaling factors λ, while the dot–dashed line gives the comparable result for variable edge scaling. The solid red lines with squares (triangles) denotes the computation time for different constant λ values for edge (node) scaling.

Open in new tab Download slide

Fig. 5 provides a more detailed look at the relative performance of CC-DC and CCGD (variable node scaling), in terms of calculation time and iteration number for a range of periods. The computation time difference is negligible for shorter periods, where the divergence problem is less pronounced, and convergence with either system is fast. In fact, CCGD is slightly slower than CC-DC for the shortest period. As the period increases, the CCGD method gains a significant advantage, with wall times reduced up to 60 per cent of the CC-DC run time at 100 s (∼70.9 s versus 182 s). Note that the reduction of iteration count for the CCGD method is less significant than the reduction in wall time, as each iteration is faster due to the reduction in the number of non-zero elements in the system matrix. Furthermore, the dependence of computation time on period is weaker with the CCGD approach. For the CC-DC approach, time required by the slowest task (1000 s) is ∼5.14 times that of the fastest task (0.01 s). This ratio is only ∼1.93 for the CCGD counterparts. This can be advantageous for parallelized computations, as will be discussed in the inversion tests below.

Comparison of run time (upper panel) and number of BiCGstab iterations required for convergence to a tolerance of 10−10 as a function of period for the COMMEMI 3D-2A model.

Figure 5.

Comparison of run time (upper panel) and number of BiCGstab iterations required for convergence to a tolerance of 10⁻¹⁰ as a function of period for the COMMEMI 3D-2A model.

Open in new tab Download slide

We also use the COMMEMI 3D-2A model to demonstrate the equivalence of the ‘black-box’ adjoint calculation [i.e. using (40) and (41)] to a more direct approach using the unmodified system. In Fig. 6 we show a section through the sensitivity for a single component of the impedance (⁠|${Z_{xy}}$|⁠) measured near the centre of the model (one column of |${{{\bf J}}^T}$|⁠) computed with the adjoint solution as defined by (39). Relative differences from the sensitivity obtained using the original CC-DV approach are of order 10⁻⁵, and are largest on boundaries between blocks of different conductivity.

Figure 6.

Comparison of different approaches to sensitivity calculation, for a single component of the impedance (Z_xy) at a period of 100 s near the centre of the model domain. (a) shows section under site of real part of sensitivity computed with modified system of adjoint equations (CCGD), following (40) and (41). (b) shows difference from standard sensitivity calculation using unmodified system. Panel (c) shows corresponding section through COMMEMI 3D-2A model used for the calculations.

Open in new tab Download slide

To evaluate the performance of the two approaches for more realistic MT forward and inverse calculations, we conducted a series of tests using the 3D inverse model of the Cascadia, USA region described by Patro & Egbert (2008). The model has a spatial dimension of 1460 by 1590 by 550 km (78 by 80 by 34, excluding air layers, but including padding cells) and includes the Pacific Ocean, an a priori constraint in the western part of the model domain. The resistivity model was originally obtained using the WSINV3DMT code (Siripunvaraporn et al.2005) to invert an initial subset of the USArray long-period MT data. This model is also included as a test example in the publicly available ModEM 3D MT inversion package (Kelbert et al.2014).

In the Cascadia forward test, forward responses of the Cascadia model for seven periods ranging from 0.01 to 10000s are calculated. Similar to the synthetic forward test, surface electric fields computed with both approaches for the Cascadia model are nearly identical, with relative differences of order 10⁻⁸ or less for the Ex field with XY mode at 100 s (Fig. 7).

Figure 7.

Comparison of the surface Ex field of the central part of Cascadia model with XY polarization at a period of 100 s using (a) CC-DC approach and (b) CCGD approach. Panel (c) shows the electric field difference of a–b.

Open in new tab Download slide

The different scaling approaches for the grad-div regularization term are also compared for the Cascadia inverse model in Fig. 8. Results are generally similar to those obtained with the simpler COMMEMI 3D-2A model: CCGD with spatially variable node (edge) scaling is much faster (slower) than CC-DC. Curiously, with fixed |$\lambda$| edge scaling works significantly better than node scaling. We give a more detailed comparison of performance for the CC-DC and CCGD (variable node scaling) approaches in Fig. 9. The CCGD approach has an advantage in both the convergence rate and calculation speed for almost all periods. For shorter periods, the computational speed advantage of the CCGD approach is less significant at ∼83 per cent (∼49 s versus 90 s at 0.01 s) with respect to the CC-DC method, while the difference increases to ∼200 per cent for longer period (∼101 s versus 307 s at 1000 s). As with the COMMEMI forward test results, the computation time is also more uniform for different periods with the CCGD approach. Time required for the slowest task (1000 s) is ∼3.42 times that of the fastest task (0.01 s) for the CC-DC approach. The ratio is only ∼2.15 with CCGD.

Figure 8.

Distribution of conductivity values in the Cascadia inverse model, with average computation time for XY mode forward calculation at 100 s, overlain with comparison of efficiency for the 4 scaling approaches. Lines and symbols are as in Fig. 4.

Open in new tab Download slide

Figure 9.

Comparisons of (a) average calculation time and (b) average iteration number from forward calculations at different periods for the Cascadia model, using modified and original system of equations. The calculation time and iteration number are averaged from two polarizations. Note the longest computation time emerges at 1000s instead of the longest period.

Open in new tab Download slide

To more fully assess performance of the modified approach in an inversion context, a series of tests are conducted using the Cascadia data as in Patro & Egbert (2008). The original starting model (80 × 78 × 34) with the ocean is used as we run a series of comparisons between the modified and original approach. Eight periods from the original study, ranging from ∼100 s to 7300 s are used in the inversions. The tolerances for forward and adjoint calculations are both set as 10⁻¹⁰ for this test.

Here 17 CPU cores (including one master to coordinate the parallelization) are used for the test inverse problem. All cores are identical to that used for the forward tests described above. A total of 185 and 198 iterations are taken before the inversions reach the desired misfit (RMS = 1.6) for the CC-DC and CCGD approaches, respectively. First, we compare the final models from both inversions. Fig. 10 demonstrates the horizontal sections of resistivity for the two models at 10, 30, and 70 km depths. It is obvious that the differences from the two inversion models are indistinguishable in the log10 resistivity domain. This shows that the CCGD approach is effectively equivalent to the traditional CC-DC approach.

Figure 10.

Comparison of horizontal resistivity of the Cascadia inversion model at 10, 30 and 70 km. (a–c) show the model calculated with CC-DC approach, (d–f) with CCGD approach and (g–i) shows the relative resistivity difference.

Open in new tab Download slide

However, a small proportion of cells (∼1.5 per cent) show relative differences larger than 10 per cent (Figs 10h–j), even though the forward/adjoint calculation difference of the two approaches is around an order of 10⁻⁶ to 10⁻⁸. This is because a small difference in the forward/adjoint solutions will result in a larger error in the gradient, which directly determines the search step (and the model selected afterwards) in the model line search. The small differences at each step will obviously build up after ∼200 iterations, resulting in more significant differences in the final inverse solution. Also, given that the inverse solution conductivity can vary by orders of magnitude, slight shifts in position of a model feature in an area of large gradient can lead to localized patches with large relative differences. That said, the differences between the inverse solutions are not significant in practice, as for most of the cells the difference is smaller than 1 per cent.

Regarding the performance of the inversion, as the most time-consuming part of geophysical inversion calculations are essentially the forward and adjoint (sensitivity) calculations, the efficiency of the inversion computations should be similarly improved as in the forward cases. The real-world performance difference in a coarse-grain (period-wise) parallelization scheme (as used in ModEM) often comes from the forward calculation for the longest periods. This is because such parallel schemes distribute each period/polarization to one processer, so the slowest forward problems will determine the wall-clock time for the entire calculation. As shown in the forward tests, the CCGD approach leads to a more balanced workload among different periods/polarizations. Therefore, it is naturally more efficient in such schemes.

Fig. 11 compares the overall time required for inversion of the Cascadia data set using different forward solvers. It is clear that the CCGD approach also drastically improves the computational efficiency of the inversion, which is more than 160 per cent faster than the traditional CC-DC method (with wall time of ∼677 min versus 1791 min) in term of overall computational speed. Both forward and adjoint calculations are similarly accelerated. Note that the FWD calculations took roughly twice as much time as adjoint calculations in our inversion test. This is because in ModEM, one NLCG iteration normally requires two forward calculations and only one adjoint calculation (see Kelbert et al.2014 for details). The ‘other’ time consumption (I/O, MPI communications, etc.) is similar (∼4 min) for both approaches as both use the same basic modules in ModEM.

Figure 11.

Comparison of computation time of Cascadia inversion with forward calculation based on CCGD and CC-DC methods.

Open in new tab Download slide

5 DISCUSSION AND CONCLUSIONS

It is clear from our results that success with the CCGD approach depends critically on the scaling of the GD term. In a region of constant conductivity (e.g. in the air, where a small constant |${\sigma _{air}}$| is used) we expect |$\lambda = {\sigma ^{ - 1}}$| to be optimal, as this will reduce the curl–curl operator to the (vector) Laplacian, and essential eliminate the (near) null space of the quasi-static EM operator. However, this simplification will not hold exactly for even the simplest heterogeneous models, and it is probably better to view the scaling as providing a relative weighting of two constraints on the solution—that is the curl–curl equation, and the divergence-free condition |$\nabla \cdot \sigma E = 0$|⁠. Weighting the second constraint by |$\lambda \approx {\sigma ^{ - 1}}$| should enforce both constraints at a similar level in the iterative solution schemes used. Clearly, very different weights must be used within air and earth domains, which typically differ in conductivity by up to 10 orders of magnitude. If, as may often be common in practical (regularized) inversions, conductivities cluster around a central value, the inverse of this mean (perhaps better, median) conductivity might provide a reasonable scalar weight within the Earth domain. However, even for something simple like the COMMEMI model (which has a bimodal distribution of conductivities) there may be no single good scaling factor within the Earth, so we have been motivated to consider spatially varying weights. For all examples we have tested we find that a spatially variable weighting of the divergence-free condition provides the best performance.

This approach is also not without problems, as the appropriate scaling is ambiguous near blocks of different conductivity, or where gradients are sharp. And note that even a smooth regularized inversion may contain discontinuities in conductivity, especially if constraints (such as an ocean domain) are included. As we discussed specifically in the context of the air-earth interface this ambiguity in scaling is especially clear when we apply the (spatially variable) scaling factor on grid edges, after computing gradients. Each grid node corresponds to one constraint on current divergence |$\nabla \cdot \sigma E = 0$|⁠, a balance between terms that must be comparable. Probably the best (but still imperfect) scaling for this constraint should be some average of the conductivities surrounding the node. Near a sharp gradient in conductivity the gradient of this divergence will couple nodes which should properly have greatly different scaling. We hypothesize that this may explain the poor performance of variable scaling applied to edges, even while this variable scale approach seems to be the best choice when applied on the nodes. Further study of the fundamental differences observed between scaling approaches is warranted.

Calculation of the gradient of the data misfit sum of squares during an inversion search (e.g. with NLCG) requires computation of the sensitivity matrix-residual product |${{{\bf J}}^T}{{\bf r}} = {{{\bf P}}^T}{{{\bf S}}^{ - T}}{{{\bf L}}^T}{{\bf r}}$|⁠. We have shown that when we use the modified coefficient matrix |${{{\bf S}}_{\,\rm mod}}$| there are two distinct approaches to this calculation. The first is based on modification of the right-hand side for the adjoint solver [i.e. modify |${{{\bf L}}^T}{{\bf r}}$|⁠; see (40) and (41)] and then multiply the resulting adjoint solution by |${{{\bf P}}^T}$| computed for the original coefficient matrix. The second, perhaps more direct, approach is to solve |${{\bf S}}_{\,\rm mod\,}^T{{\bf x}} = {{{\bf L}}^T}{{\bf r}}$| (without modifying the right-hand side of the modified adjoint system) and then completing the computation as |${{{\bf J}}^T}{{\bf r}} = {{\bf P}}_{\,\rm mod\,}^T{{\bf x}}$|⁠, using a modification of |${{{\bf P}}^T}$| appropriate for the modified coefficient matrix. The argument for equivalence of these two approaches is rather indirect, but mathematically sound. Direct calculation demonstrates that equivalent results for sensitivities are indeed obtained with either approach.

While in the above study our efforts are focused on regularization of an electrical field formation of the second order Maxwell eq. (1), an equivalent regularization term can be applied to the magnetic formulation (2). In this case the conservation condition becomes |$\nabla \cdot B = 0$| for the whole model domain, so the regularization term becomes |$\nabla (\nabla \cdot H) = 0$|⁠, by applying the gradient (and dividing by an assumed constant magnetic susceptibility). As resistivity now appears in the first term on the left-hand side of (2), it is natural to scale the regularization term with a factor of |$\lambda = \rho = {\sigma ^{ - 1}}$|⁠, for dimensional consistency between the curl–curl and grad-div terms. This is the approach taken by Weaver et al. (1999). Note that the regularization term (23) in the actual implementation would become |${{\bf VG\Lambda Dh}} = 0$|⁠, which still retains symmetry. Thus, the regularization approach may be even more attractive for the magnetic field formation, given that a symmetric system is often easier to solve.

We have presented a divergence-free solution approach for time-harmonic Maxwell equations in frequency domain geophysical EM methods. The curl–curl equation is modified, and effectively regularized, by addition of a scaled grad-div operator to simultaneously enforce the divergence gauge condition and the usual curl–curl equation. This approach has been implemented with total electrical field formulation within the ModEM 3-D inversion code (Kelbert et al.2014) and compared with the more standard approach (in EM geophysics) where the divergence correction is performed every few Krylov iterations. Comparison on synthetic and real-world problems shows that the new approach significantly improves computational efficiency in both forward and inversion calculations. The improvements on cases we have tested so far are substantial enough to suggest that the new approach will have a very significant impact for practical applications of 3-D inversion. Further research on improvement in solver efficiency are warranted. In particular, the modified system of equations we solve here is much more diagonally dominant, closely approximating a Poisson equation. In contrast to the original curl–curl equations, systems of this sort can often be solved very efficiently with an algebraic multigrid approach. Exploring this possibility is an obvious next step.

We also believe this approach can be extended to other numerical methods (e.g. finite volume or finite element methods), since the foundation of this divergence-free approach is built on the strong form. With the added regularization term, the null space of the curl operator should be similarly deflated, with the condition number of the discrete system improved regardless of the numerical method used. Of course, the actual performance of this approach in other methods cannot be guaranteed without tests. Assembly of the regularized system would involve the construction of discrete divergence and gradient operators, which should be trivial for staggered grid finite volume or Lagrange finite element methods. However, these operators may not be readily available in some methods, requiring additional effort. For example, in FEM with Nedelec curl-conforming elements, the gradient and divergence operators have to be built in an auxiliary H(1) space, with the curl operators in H(curl) space, as demonstrated in Hiptmair & Xu (2007) method.

ACKNOWLEDGEMENTS

Discussions with Randall Mackie provided the original motivation for this work and helped clarify several issues. The authors thank Dr Alexander Grayver and an anonymous referee for comments that improved the manuscript. Funding for H.D. was provided by NSFC grant 41504062 and 863 program 2014AA06A603. Partial support for GDE was provided by NSF grant EAR1225496.

REFERENCES

Arnold

D.N.

Falk

R.S.

Winther

2000

Multigrid in H(div) and H(curl)

Numer. Math.

197

–

217

Aruliah

D.A.

Ascher

U.M.

Haber

Oldenburg

2001

A method for the forward modelling of 3-D electromagnetic quasi-static problems

Math. Models Methods Appl. Sci.

(

–

10.1142/S0218202501000702

Avdeev

D.B.

2007

EM modeling, forward

Encyclopedia of Geomagnetism and Paleomagnetism

, pp.

215

–

218

., eds.

Gubbins;

Herrero-Bervera

Springer Netherlands

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Chave

Jones

2012

The Magnetotelluric Method: Theory and Practice

, eds.

Chave

A.D.

Jones

A.G.

Cambridge University Press

Available at: http://www.cambridge.org/978052189275

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Clemens

Weiland

2002

Regularization of eddy-current formulations using discrete grad-div operators

IEEE Trans. Magn.

(

569

–

572

Druskin

V.L.

Knizhnerman

L.A.

Lee

1999

New spectral Lanczos decomposition method for induction modeling in arbitrary 3-D geometry

Geophysics

(

701

https://doi.org/10.1190/1.1444579

Egbert

G.D.

Kelbert

2012

Computational recipes for electromagnetic inverse problems

Geophys. J. Int.

189

(

251

–

267

10.1111/j.1365-246X.2011.05347.x

Haber

Ascher

U.M.

Aruliah

D.A.

Oldenburg

D.W.

2000

Fast simulation of 3D electromagnetic problems using potentials

J. Comput. Phys.

163

(

150

–

171

10.1006/jcph.2000.6545

Hiptmair

2007

Nodal auxiliary space preconditioning in H(curl) and H(div) spaces

SIAM J. Numer. Anal.

(

2483

–

2509

Jiang

Povinelli

1996

The origin of spurious solutions in computational electromagnetics

J. Comput. Phys.

123

(

125

104

–

123

10.1006/jcph.1996.0082

Kelbert

Meqbel

Egbert

G.D.

Tandon

2014

ModEM: a modular system for inversion of electromagnetic geophysical data

Comput. Geosci.

–

10.1016/j.cageo.2014.01.010

Mackie

Watts

M.D.

2012

Detectability of 3-D sulphide targets with AFMAG

, In

SEG Technical Program Expanded Abstracts 2012

, pp.

–

Society of Exploration Geophysicists

https://doi.org/10.1190/segam2012-1248.1

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Mackie

R.L.

Rodi

Watts

M.D.

2001

3-D magnetotelluric inversion for resource exploration

, in

Proceedings of the 2001 SEG Annual Meeting

San Antonio, Texas, 9–14 September

, pp.

–

.. .

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Mackie

R.L.

Smith

J.T.

Madden

T.R.

1994

Three-dimensional electromagnetic modeling using finite difference equations : the magnetotelluric example

Radio Sci.

(

923

–

935

Newman

G.A.

Alumbaugh

D.L.

2002

Three-dimensional induction logging problems, Part 2: a finite-difference solution

Geophysics

(

484

https://doi.org/10.1190/1.1468608

Patro

P.K.

Egbert

G.D.

2008

Regional conductivity structure of Cascadia: preliminary results from 3D inversion of USArray transportable array magnetotelluric data

Geophys. Res. Lett.

(

–

Saad

van der Vorst

H.A.

2000

Iterative solution of linear systems in the 20th century

J. Comput. Appl. Math.

123

(

1–2

–

10.1016/S0377-0427(00)00412-X

Sadiku

M.N.O.

2000

Numerical Techniques in Electromagnetics

, 2nd edn, ed.

Raton

CRC Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Schwarzbach

2009

Stability of Finite Element Discretization of Maxwell's Equations for Geophysical Applications

Technische Universität Bergakademie Freiberg

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Siripunvaraporn

Egbert

Lenbury

Uyeshima

2005

Three-dimensional magnetotelluric inversion: data-space method

Phys. Earth planet. Inter.

150

(

1–3

–

10.1016/j.pepi.2004.08.023

Smith

J.T.

1996

Conservative modeling of 3-D electromagnetic fields, Part II: Biconjugate gradient solution and an accelerator

Geophysics

(

1319

–

1324

Weaver

J.T.

Agarwal

A.K.

X.H.

1999

3-D Finite-Difference modeling of the magnetic field in geoelectromagnetic induction

, in

Three-Dimensional Electromagnetics

, 1st edn, pp.

426

–

443

., eds.

Oristaglio

Spies

Society of Exploration Geophysicists

https://doi.org/10.1190/1.9781560802154.ch27

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Weiland

1985

On the unique numerical solution of Maxwellian eigenvalue problems in three dimensions

Particle Accelerators

227

–

242

Google Scholar

OpenURL Placeholder Text

WorldCat

Yee

1966

Numerical solution of initial boundary value problems involving Maxwell's equations in isotropic media

IEEE Trans. Antennas Propagat.

(

302

–

307

10.1109/TAP.1966.1138693

Zhdanov

M.S.

Varentsov

I.M.

Weaver

J.T.

Golubev

N.G.

Krylov

V.A.

1997

Methods for modelling electromagnetic fields Results from COMMEMI—the international project on the comparison of modelling methods for electromagnetic induction

J. appl. Geophys.

(

3–4

133

–

271

APPENDIX: SOME TECHNICAL DETAILS

If air conductivity is zero, the inverse coefficient matrix |${{{\bf S}}^{ - 1}}$| used in (27) is not defined, and the formal manipulation used to derive the adjoint expression (29) might be questioned. Furthermore, for the modified system (which actually should be non-singular even without a non-zero air conductivity) we argued that for solving the adjoint problem the source term, scaled by the inverse of grid volume weights, should vanish in the air. The source for the adjoint problem of (29), which is derived from the linearized data functional |${{{\bf L}}^T}$|⁠, can in fact extend into the air (e.g. because magnetic fields need to be interpolated to the air-earth interface). It is not immediately clear that |${{{\bf V}}^{ - 1}}{{{\bf L}}^T}$| will be divergence free in the air. We thus explicitly omit the component of |${{\bf D}}{{{\bf V}}^{ - 1}}{{{\bf L}}^T}$| that lies in the air domain when modifying the RIGHT-HAND SIDE of (36), but make no comparable adjustment to |${{\bf L}}$| in the direct sensitivity calculation of (27), or in the original source term |${{\bf s}} = {{{\bf L}}^T}$| in (36). Here we demonstrate more formally the consistency of this approach.

Notation is as in the main paper; specifically we work with discrete curl |${{\bf C}}$| and gradient |${{\bf G}}$| operators on the staggered grid, with curl mapping from edges to faces, and gradient from nodes to edges. Adjoint curl (faces to edges) is denoted |${{{\bf C}}^\dagger }$|⁠; the divergence is (minus) the adjoint gradient and is written as |${{\bf D}} = - {{{\bf G}}^\dagger }$|⁠. Note that all matrices here are real, and that the dagger denotes adjoint relative to an appropriate volume weighted inner product |${\langle {{{\bf a}},{{\bf b}}} \rangle _V} = {{{\bf a}}^T}{{\bf Vb}}$|⁠, and is thus not just the matrix transpose. In particular for the adjoint curl (which maps from faces to edges in the staggered grid) we have

\begin{eqnarray*} {{{\bf C}}^\dagger } = {{{\bf V}}^{ - 1}}{{{\bf C}}^T}{{\bf \bar{V}}}, \end{eqnarray*}

(A1)

where |${{\bf \bar{V}}}$| is the diagonal matrix of volume elements for faces (or perhaps better, edges of the dual grid), and |${{\bf V}}$| is the diagonal matrix of edge volume weights, as in the main text. The operators are orthogonal in the sense that |${{\bf D}}{{{\bf C}}^\dagger } \equiv 0$| and |${{\bf CG}} \equiv 0$|⁠.

Any divergence-free (solenoidal) edge vector can be represented as the curl of a face vector |${{{\bf x}}_C} = {{{\bf C}}^\dagger }{{{\bf x}}_F}$|⁠, and any irrotational (curl-free) edge vector can be expressed as the gradient of a scalar (defined on grid nodes) |${{{\bf x}}_G} = {{\bf Gp}}$|⁠. For any pair of such vectors we evaluate the inner product

\begin{eqnarray*} {\left\langle {{{{\bf x}}_C},{{{\bf x}}_G}} \right\rangle _V} &=& {{\bf x}}_C^T{{\bf V}}{{{\bf x}}_G} = {\left( {{{{\bf C}}^\dagger }{{{\bf x}}_F}} \right)^T}{{\bf VGp}}\nonumber \\ & =& {\left( {{{{\bf V}}^{ - 1}}{{{\bf C}}^T}{{\bf \bar{V}}}{{{\bf x}}_F}} \right)^T}{{\bf VGp}} = {{{\bf x}}_F}^T{{\bf CGp}} = 0. \end{eqnarray*}

(A2)

Any edge vector can be uniquely represented as a sum of solenoidal and irrotational components |${{\bf x}} = {{{\bf x}}_C} + {{{\bf x}}_G}$|⁠, which from (A2) will be orthogonal with respect to the edge-volume-weighted inner product. Of course we have |${{\bf D}}{{{\bf x}}_C} = 0$| and |${{\bf C}}{{{\bf x}}_G} = 0$|⁠. All of this holds if we restrict all vectors, scalars, and operators to the air domain, following the conventions of Fig. 1 and associated discussion.

For the sensitivity calculation, |${{{\bf L}}^T}$| is a sparse edge vector representing the measurement functional, the source term for the adjoint solution of (29). Decompose the scaled sparse vector of (35) into solenoidal and irrotational components

\begin{eqnarray*} {{{\bf V}}^{ - 1}}{{{\bf L}}^T} = {{{\bf q}}_C} + {{{\bf q}}_G}. \end{eqnarray*}

(A3)

The requirement that this be divergence free implies that |${{\bf D}}{{{\bf q}}_G} = 0$|⁠, which in turn implies that |${{{\bf q}}_G} = 0.$| Now consider an electric field forward solution vector |${{\bf e}} = {{{\bf e}}_C} + {{{\bf e}}_G}$|⁠. In the air we should have |${{\bf De}} = {{\bf D}}{{{\bf e}}_G} = 0$|⁠, again implying |${{{\bf e}}_G} = 0$|⁠. Applying the linearized data functional |${{\bf L}}$| to the solution (allow for both components in the corresponding expansion (A3), again restricting everything to the air domain) we find

\begin{eqnarray*} {{\bf Le}} = {\left( {{{{\bf L}}^T}} \right)^T}{{{\bf e}}_C} = {\left( {{{\bf V}}{{{\bf q}}_C} + {{\bf V}}{{{\bf q}}_G}} \right)^T}{{{\bf e}}_C} = {{{\bf q}}_C}^T{{{\bf e}}_C}. \end{eqnarray*}

(A4)

Thus, the irrotational component of the (scaled) data functional (in the air) makes no contribution to the forward sensitivity calculation of (27); omitting this from the source for the adjoint calculation is thus consistent with the requirement that |${{\bf e}}$| be divergence-free in the air. Note that the full data functional vector |${{{\bf L}}^T}$| (including any components in the air) should be included in the original source term [i.e. |${{\bf s}}$| in (36)], but the correction |${{\bf D}}{{{\bf V}}^{ - 1}}{{\bf \Lambda }}{{{\bf L}}^T}$| should be limited to Earth nodes.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
November 2018	40
December 2018	66
January 2019	109
February 2019	73
March 2019	64
April 2019	94
May 2019	62
June 2019	56
July 2019	41
August 2019	58
September 2019	73
October 2019	78
November 2019	63
December 2019	98
January 2020	107
February 2020	80
March 2020	92
April 2020	41
May 2020	79
June 2020	87
July 2020	82
August 2020	76
September 2020	112
October 2020	124
November 2020	105
December 2020	117
January 2021	78
February 2021	105
March 2021	94
April 2021	79
May 2021	87
June 2021	75
July 2021	45
August 2021	63
September 2021	58
October 2021	81
November 2021	58
December 2021	70
January 2022	59
February 2022	64
March 2022	59
April 2022	87
May 2022	51
June 2022	42
July 2022	76
August 2022	59
September 2022	38
October 2022	51
November 2022	33
December 2022	29
January 2023	19
February 2023	26
March 2023	43
April 2023	28
May 2023	43
June 2023	26
July 2023	29
August 2023	27
September 2023	34
October 2023	48
November 2023	43
December 2023	38
January 2024	37
February 2024	33
March 2024	44
April 2024	61
May 2024	70
June 2024	28
July 2024	44
August 2024	24
September 2024	27
October 2024	32
November 2024	27
December 2024	28
January 2025	18
February 2025	22
March 2025	34
April 2025	24
May 2025	10

Article Contents

Divergence-free solutions to electromagnetic forward and adjoint problems: a regularization approach

SUMMARY

1 INTRODUCTION

2 A MODIFIED (REGULARIZED) SYSTEM OF EQUATIONS

From Helmholtz decomposition to divergence correction

Numerical implementation

Scaling factors

Coefficient matrix structure

Non-zero current source

3 SENSITIVITY CALCULATIONS FOR INVERSE PROBLEMS

4 NUMERICAL RESULTS

5 DISCUSSION AND CONCLUSIONS

ACKNOWLEDGEMENTS

REFERENCES

APPENDIX: SOME TECHNICAL DETAILS

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Divergence-free solutions to electromagnetic forward and adjoint problems: a regularization approach Open Access

SUMMARY

1 INTRODUCTION

2 A MODIFIED (REGULARIZED) SYSTEM OF EQUATIONS

From Helmholtz decomposition to divergence correction

Numerical implementation

Scaling factors

Coefficient matrix structure

Non-zero current source

3 SENSITIVITY CALCULATIONS FOR INVERSE PROBLEMS

4 NUMERICAL RESULTS

5 DISCUSSION AND CONCLUSIONS

ACKNOWLEDGEMENTS

REFERENCES

APPENDIX: SOME TECHNICAL DETAILS

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Divergence-free solutions to electromagnetic forward and adjoint problems: a regularization approach