Elastic Full Waveform Inversion Based on the Trust Region Strategy

Show more

1. Introduction

The physical properties of the earth can be inverted quantitatively by the reflected seismic waves. In modern seismology, traveltime inversion is the main technique approach in the 19th century. Since the 1980s [1] [2], full waveform inversion (FWI) has become a rapidly developing subject because of its high resolution for reconstructing media parameters.

In FWI, the Fréchet derivative is required to estimate explicitly before proceeding to the inversion of the linearized system [3]. However, the computational cost of calculating Fréchet or gradient is too large. So some other methods have been taken into account. In [1] [2], the authors proposed to compute the gradient by solving an adjoint problem, which is considered as the initial of time-domain FWI. Since then, the FWI has been an important research topic in exploration geophysics. In the 1990s, the time-domain FWI was extended to the frequency domain [4] [5]. The frequency-domain FWI has the advantage of high computational efficiency and flexibility of data selection [6].

The FWI is actually an optimization iterative process to solve a nonlinear optimization problem. Generally, the computational cost of global optimization methods is extensive large. For this reason, local optimization algorithms are adopted. To avoid iteration falling to an unreasonable local minimum point, the multiscale method was proposed [7] in 1995. The multiscale method performs the inversion from the low frequency to high-frequency information step by step. It includes the long-wavelength information of the model and thus it enhances the robustness of inversion effectively. It can overcome the problem of local minima caused by the poor initial model selection. This technique has been successfully applied to the time-domain acoustic FWI, see e.g. [8] [9]. In order to reconstruct the long wavelength of the macroscopic velocity model, the Laplace-domain FWI is proposed [10]. A hybrid domain method combining the Laplace-domain method and the frequency-domain method together is also developed [11]. This method can be regarded as a special case of frequency-domain damping inversion.

In elastic FWI, there are three different parameters, *i.e*., density and two Lamé parameters. In [12], the authors pointed out that different parameters have coupled effects on the seismic response which is called the trade-off effect or crosstalk. In order to tame the trade-off effect, several hierarchical inversion strategies are developed. In [13], Lamé parameters are inverted with fixed density first and then velocities are updated. In [14], all parameters are inverted simultaneously. However, the overall steplength is estimated by calculating an optimal steplength for every parameter individually.

The Hessian matrix plays an important role in FWI. It contains the information of multiple scattering wavefields and the trade-off effects among different parameters. The off-diagonal blocks of Hessian reflect the relationship between different parameters [12]. The Newton-type method such as the quasi-Newton method, the Gauss-Newton method [15] and the truncated Newton method [16] all contain the Hessian information and so they behave better than the steepest descent method in accuracy. In the calculation of the gradient vector, the adjoint method is usually used [17]. The adjoint method can be extended to the truncated Newton method [18]. For the elastic wave equations, the wave propagator operator is symmetric. The back propagator operator and the forward propagator operator are identical. Besides that, the Gauss-Newton approximation of Hessian matrix is positive and definite. This property can be used in the trust region strategy. Our investigations in this paper show that the trust region method can accelerate the convergence rate and improve the inversion accuracy.

In this paper, we investigate the application of the trust region strategy in FWI. The rest of our paper is organized as follows. In Section 2, the theoretical methods are described in detail. In Section 3, numerical computations and comparisons are presented. Finally, in Section 4, the conclusion is given.

2. Theory

In this section, we will introduce the forward problem first. Then we will describe FWI based on the trust region method. In order to solve the subproblem in the trust region algorithm, the two dimensional subspace method is described.

2.1. Forward Method

The source-free time-domain two dimensional (2D) elastic wave equations can be written as [19]

$\rho \frac{{\partial}^{2}u}{\partial {t}^{2}}=\frac{\partial}{\partial x}\left[\lambda \left(\frac{\partial u}{\partial x}+\frac{\partial v}{\partial y}\right)+2\mu \frac{\partial u}{\partial x}\right]+\frac{\partial}{\partial y}\left[\mu \left(\frac{\partial u}{\partial y}+\frac{\partial v}{\partial x}\right)\right],$ (2.1)

$\rho \frac{{\partial}^{2}v}{\partial {t}^{2}}=\frac{\partial}{\partial y}\left[\lambda \left(\frac{\partial u}{\partial x}+\frac{\partial v}{\partial y}\right)+2\mu \frac{\partial v}{\partial y}\right]+\frac{\partial}{\partial x}\left[\mu \left(\frac{\partial u}{\partial y}+\frac{\partial v}{\partial x}\right)\right],$ (2.2)

where $\rho $ is the density, $\lambda $ and $\mu $ are the Lamé parameters, $u\left(x\mathrm{,}y\mathrm{,}t\right)$ and $v\left(x\mathrm{,}y\mathrm{,}t\right)$ are the displacement in the horizontal $x$ and vertical $y$ directions respectively. Furthermore, (2.1) and (2.2) can be rewritten as the following stress-velocity formulation after adding a source

$\{\begin{array}{l}\rho \frac{\partial {v}_{x}}{\partial t}=\frac{\partial {\sigma}_{xx}}{\partial x}+\frac{\partial {\sigma}_{xy}}{\partial y}+{f}_{x},\hfill \\ \rho \frac{\partial {v}_{y}}{\partial t}=\frac{\partial {\sigma}_{xy}}{\partial x}+\frac{\partial {\sigma}_{yy}}{\partial y}+{f}_{y},\hfill \\ \frac{\partial {\sigma}_{xx}}{\partial t}=\left(\lambda +2\mu \right)\frac{\partial {v}_{x}}{\partial x}+\lambda \frac{\partial {v}_{y}}{\partial y},\hfill \\ \frac{\partial {\sigma}_{yy}}{\partial t}=\lambda \frac{\partial {v}_{x}}{\partial x}+\left(\lambda +2\mu \right)\frac{\partial {v}_{y}}{\partial y},\hfill \\ \frac{\partial {\sigma}_{xy}}{\partial t}=\mu \left(\frac{\partial {v}_{x}}{\partial y}+\frac{\partial {v}_{y}}{\partial x}\right),\hfill \end{array}$ (2.3)

where
${v}_{x}$ and
${v}_{y}$ are the particle velocity components in the
$x$ and
$y$ directions respectively,
${\sigma}_{xx}$,
${\sigma}_{yy}$,
${\sigma}_{xy}$ are the stress tensor components. Here we have added the source, *i.e*., the body force
${f}_{x}$ and
${f}_{y}$ vector to
${v}_{x}$ and
${v}_{y}$ components respectively. We rewrite the system (2.3) in the following matrix form

$\rho {\partial}_{t}\underset{v}{\underset{\ufe38}{\left[\begin{array}{c}{v}_{x}\\ {v}_{y}\end{array}\right]}}=\underset{-{D}^{\u2020}}{\underset{\ufe38}{\left[\begin{array}{ccc}\partial x& \partial y& 0\\ 0& \partial x& \partial y\end{array}\right]}}\underset{\sigma}{\underset{\ufe38}{\left[\begin{array}{c}{\sigma}_{xx}\\ {\sigma}_{xy}\\ {\sigma}_{yy}\end{array}\right]}}+\underset{{f}_{v}}{\underset{\ufe38}{\left[\begin{array}{c}{f}_{x}\\ {f}_{y}\end{array}\right]\mathrm{,}}}$ (2.4)

${\partial}_{t}\underset{\sigma}{\underset{\ufe38}{\left[\begin{array}{c}{\sigma}_{xx}\\ {\sigma}_{xy}\\ {\sigma}_{yy}\end{array}\right]}}=\underset{C}{\underset{\ufe38}{\left[\begin{array}{ccc}\lambda +2\mu & 0& \lambda \\ 0& \mu & 0\\ \lambda & 0& \lambda +2\mu \end{array}\right]}}\underset{-D}{\underset{\ufe38}{\left[\begin{array}{cc}{\partial}_{x}& 0\\ {\partial}_{y}& {\partial}_{x}\\ 0& {\partial}_{y}\end{array}\right]}}\underset{v}{\underset{\ufe38}{\left[\begin{array}{c}{v}_{x}\\ {v}_{y}\end{array}\right]}}\mathrm{,}$ (2.5)

where ${\partial}_{t}$, ${\partial}_{x}$, ${\partial}_{y}$ denote $\frac{\partial}{\partial t}$, $\frac{\partial}{\partial x}$, $\frac{\partial}{\partial y}$. The symbol $\u2020$ denotes the adjoint

operator. Note that the determinant of the matrix *C* is positive. So from (2.4) and (2.5), we have

$\underset{A\left(m\right)}{\underset{\ufe38}{\left[\begin{array}{cc}\rho {\partial}_{t}& {D}^{\u2020}\\ D& {C}^{-1}{\partial}_{t}\end{array}\right]}}\underset{w}{\underset{\ufe38}{\left[\begin{array}{c}v\\ \sigma \end{array}\right]}}=\underset{s}{\underset{\ufe38}{\left[\begin{array}{c}{f}_{v}\\ 0\end{array}\right]\mathrm{.}}}$ (2.6)

The compact form of the forward problem is

$A\left(m\left(x\right)\right)w\left(x\mathrm{,}t\mathrm{;}{x}_{s}\right)=s\left(x\mathrm{,}t\mathrm{;}{x}_{s}\right)\mathrm{,}$ (2.7)

where $m=\left(\rho \mathrm{,}\lambda \mathrm{,}\mu \right)$ is the model parameters, $w=\left(v\mathrm{,}\sigma \right)$. In (2.7), the wave propagator $A\left(m\right)$ is symmetric. Given the media parameters $\rho $, $\lambda $ and $\mu $, the forward problem is to solve system (2.6) numerically.

We have many numerical methods to solve system (2.6), for example, the finite volume method [20] and the staggered-grid method [21] and so on. In this paper, we use the staggered-grid method for its high computational efficiency and relatively easy code implementation. Since the computational domain is finite, the absorbing boundary conditions [22] are required to absorb the boundary reflections. In this paper, we adopt the perfectly matched layer (PML) method [23] [24]. The forward discrete schemes for solving system (2.3) are given in Appendix A.

2.2. Full Waveform Inversion

The FWI is a data-fitting procedure to find the model parameter $m$ by minimizing the data misfit $\delta d$ between the simulated data ${w}^{cal}$ and the observed data ${w}^{obs}$. The objective function can be formulated as

$\chi \left(m\right)=\frac{1}{2}{\displaystyle \underset{s=1}{\overset{{N}_{s}}{\sum}}}\text{\hspace{0.17em}}{\displaystyle \underset{r=1}{\overset{{N}_{r}}{\sum}}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}{\left({w}^{obs}-{w}^{cal}\right)}^{2}\text{d}t\mathrm{,}$ (2.8)

where *T* is the total recording duration,
${N}_{s}$ and
${N}_{r}$ are the number of the sources and receivers respectively. Once the media parameters
$\rho $,
$\lambda $, and
$\mu $ are inverted, we can yield the P-wave velocity and S-wave velocity by

${v}_{p}=\sqrt{\frac{\lambda +2\mu}{\rho}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{v}_{s}=\sqrt{\frac{\mu}{\rho}},$ (2.9)

respectively.

The misfit function $\chi \left(m\right)$ is minimized iteratively. The update procedure in FWI at $k$ -th iteration can be expressed as

${m}_{k}={m}_{k-1}+{\alpha}_{k}{p}_{k}\mathrm{,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=\mathrm{1,2,}\cdots \mathrm{,}$ (2.10)

where ${m}_{k+1}$ and ${m}_{k}$ are the model parameter at $k+1$ and $k$ iteration respectively, ${\alpha}^{k}$ denotes the step length at $k$ iteration, ${p}_{k}$ is the descent direction. The step length ${\alpha}_{k}$ must be found by using the line search technique for nonlinear inverse problem.

Neglecting the summation in (2.8) and introducing the restriction operator *R* at receiver points, then the gradient of the objective function with respect to the model parameter
$m$ is

${\nabla}_{m}\chi =\frac{\partial \chi}{\partial {m}_{i}}={\displaystyle {\int}_{0}^{T}}{\left(\frac{\partial w}{\partial {m}_{i}}\right)}^{\u2020}{R}^{\u2020}\left(Rw-{w}^{obs}\right)\text{d}t\mathrm{,}$ (2.11)

where $\u2020$ is the adjoint operator. Furthermore, the Hessian matrix can be obtained by taking derivative of (2.11) with respect to the model parameter

${H}_{ij}=\underset{{\left({H}_{1}\right)}_{ij}}{\underset{\ufe38}{{\displaystyle {\int}_{0}^{T}}{\left(R\frac{\partial w}{\partial {m}_{i}}\right)}^{\u2020}\left(R\frac{\partial w}{\partial {m}_{j}}\right)\text{d}t}}+\underset{{\left({H}_{2}\right)}_{ij}}{\underset{\ufe38}{{\displaystyle {\int}_{0}^{T}}{\left(\frac{{\partial}^{2}w}{\partial {m}_{i}\partial {m}_{j}}\right)}^{\u2020}{R}^{\u2020}\left(Rw-{d}_{obs}\right)\text{d}t}}\mathrm{.}$ (2.12)

The expression of Hessian *H* is divided into two parts, *i.e*.,
${H}_{1}$ and
${H}_{2}$. The first part
${H}_{1}$ is the Gauss-Newton approximation of Hessian [15], which just contains the 1st-order derivative of the wavefield. The second part
${H}_{2}$ contains the 2nd-order derivative of the wavefield. The full Newton method consists of both the parts
${H}_{1}$ and
${H}_{2}$ [15]. The truncated Newton is based on the computations of descent direction by the conjugated gradient algorithm in the full Newton framework. The BFGS method is an efficient algorithm to compute the inverse of Hessian approximation in the Gauss-Newton method. It is the first quasi-Newton algorithm and named for its discovers Broyden, Fletcher, Goldfarb and Shanno. To save storage memory further, the limited-memory BFGS (L-BFGS) algorithm is usually applied [25] [26] [27] in computations.

2.3. Gradient Calculation

The gradient of the objective function can be obtained by the Lagrangian formulation [17] [18] [28] or the perturbation theory [29]. Following the perturbation theory [29], the gradient of the objective function with respect to model parameters can be rewritten as

$\frac{\partial \chi}{\partial \rho}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\left(\frac{\partial {u}_{x}}{\partial t}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial t}+\frac{\partial {u}_{y}}{\partial t}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial t}\right)\text{d}t\mathrm{,}$ (2.13)

$\frac{\partial \chi}{\partial \lambda}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\left(\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)\text{d}t,$ (2.14)

$\frac{\partial \chi}{\partial \mu}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\left\{\left(\frac{\partial {u}_{y}}{\partial x}+\frac{\partial {u}_{x}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)+2\left(\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)\right\}\text{d}t\mathrm{,}$ (2.15)

where ${u}_{x}:=u$ and ${u}_{y}:=v$ are the forward displacements in the $x$ and $y$ directions respectively, and ${\stackrel{\u02dc}{u}}_{x}$ and ${\stackrel{\u02dc}{u}}_{y}$ are the corresponding backward displacements in the $x$ and $y$ directions respectively. Equations (2.13)-(2.15) are the gradient expressions in terms of the displacements.

We solve the forward problem based on (2.3) instead of (2.1)-(2.2) by the staggered-grid method. So it is necessary to express the gradient (2.13)-(2.15) with the particle velocity and the stress tensor. The results are the following

$\frac{\partial \chi}{\partial \rho}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\left({v}_{x}{\stackrel{\u02dc}{v}}_{x}+{v}_{y}{\stackrel{\u02dc}{v}}_{y}\right)\text{d}t,$ (2.16)

$\frac{\partial \chi}{\partial \lambda}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\frac{\left({\sigma}_{xx}+{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{4{\left(\lambda +\mu \right)}^{2}}\text{d}t,$ (2.17)

$\begin{array}{c}\frac{\partial \chi}{\partial \mu}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\{\frac{{\sigma}_{xy}{\stackrel{\u02dc}{\sigma}}_{xy}}{{\mu}^{2}}+\frac{1}{4}[\frac{\left({\sigma}_{xx}+{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{{\left(\lambda +\mu \right)}^{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.05em}}+\frac{\left({\sigma}_{xx}-{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}-{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{{\mu}^{2}}]\}\text{d}t\mathrm{,}\end{array}$ (2.18)

where ${v}_{x}$, ${v}_{y}$, ${\sigma}_{xx}$, ${\sigma}_{yy}$, ${\sigma}_{xy}$ are the solutions of forward problem (2.3) whereas ${\stackrel{\u02dc}{v}}_{x}$, ${\stackrel{\u02dc}{v}}_{y}$, ${\stackrel{\u02dc}{\sigma}}_{xx}$, ${\stackrel{\u02dc}{\sigma}}_{yy}$ and ${\stackrel{\u02dc}{\sigma}}_{xy}$ are the solutions of corresponding adjoint problem. The derivation is given in Appendix B.

2.4. Trust Region Method

There are two strategies to solve a minimization problem. One is the line search method and the other is the trust region method. In order to describe the trust-region method more clearly and directly, we consider minimizing an abstract objection function
$f\left(x\right)$, *i.e*.,
$\underset{x}{\mathrm{min}}f\left(x\right)$. Then the line search strategy is

${x}_{k+1}={x}_{k}+{\alpha}_{k}{p}_{k}.$ (2.19)

The line search chooses a direction ${p}_{k}$ first and then finds the step length ${\alpha}_{k}$ along this direction. The direction may be the steepest descent direction, the Newton direction and so on. After fixing the search direction, the search length ${\alpha}_{k}$ is searched by approximately solving a one-dimensional minimization problem

$\underset{\alpha >0}{\mathrm{min}}f\left({x}_{k}+\alpha {p}_{k}\right).$ (2.20)

Usually, the inexact line search methods such as the Wolfe method and the interpolation method are used [26] since solving (2.20) exactly is too expensive. A popular inexact method is the Wolfe method [30]. In this paper, we use the Wolfe method.

In the trust region method, information gathered from objective function
$f\left(x\right)$ is used to construct a model function
${\stackrel{\u02dc}{f}}_{k}$ whose behavior near the current point
${x}_{k}$ is similar to that of the objective function
${f}_{k}$. Because
${\stackrel{\u02dc}{f}}_{k}$ may not be a good approximation of
$f$ when
$x$ is far from
${x}_{k}$, we restrict the search for a minimizer of
${\stackrel{\u02dc}{f}}_{k}$ to some region around
${x}_{k}$. Namely, we find the step *p* by approximately solving the following subproblem

$\begin{array}{l}\underset{p}{\mathrm{min}}\text{\hspace{0.17em}}{\stackrel{\u02dc}{f}}_{k}\left({x}_{k}+p\right)\mathrm{,}\\ \text{s}\text{.t}\mathrm{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{x}_{k}+p\in \text{the}\text{\hspace{0.17em}}\text{trust}\text{\hspace{0.17em}}\text{region}\end{array}$ (2.21)

If the solution of the problem (2.21) does not produce a sufficient decrease in $f\left(x\right)$, it means the trust region is too large, and we shrink it and resolve (2.21). Usually, the trust region is a ball defined by $\Vert {p}_{k}\Vert \le \Delta $, where $\Delta >0$ is called the trust region radius. Here the norm is ${\Vert \cdot \Vert}_{2}$ norm.

The model function ${\stackrel{\u02dc}{f}}_{k}$ in (2.21) is usually defined to be a quadratic function

${\stackrel{\u02dc}{f}}_{k}\left(p\right)={f}_{k}+\nabla {f}_{k}^{\text{T}}p+\frac{1}{2}{p}^{\text{T}}{B}_{k}p\mathrm{,}$ (2.22)

where ${f}_{k}$ and $\nabla {f}_{k}$ are the objective function and the gradient at ${x}_{k}$ respectively, and the matrix ${B}_{k}$ is either the Hessian ${\nabla}^{2}{f}_{k}$ or some approximation to it like the Gauss-Newton method.

So combining (2.21) and (2.22) together, the form of the trust region method is

${\stackrel{\u02dc}{f}}_{k}\left(p\right)={f}_{k}+\nabla {f}_{k}^{\text{T}}p+\frac{1}{2}{p}^{\text{T}}{B}_{k}p\mathrm{,}\text{\hspace{1em}}\text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\Vert p\Vert \text{\hspace{0.17em}}\le {\Delta}_{k}\mathrm{,}$ (2.23)

where ${\Delta}_{k}$ is the radius. This is the trust region subproblem and we solve it by the two dimensional subspace method [26]. Note that the two dimensional subspace method requires the Hessian is symmetric and positive.

In a sense, the line search method and the trust region method differ in that the line search method starts by fixing the direction ${p}_{k}$ and then searches the step length ${\alpha}_{k}$ whereas in trust region the direction ${p}_{k}$ and the search step ${\alpha}_{k}$ are determined together subject to the trust radius ${\Delta}_{k}$. Theoretically, the trust region is easier to reach quadric convergence than the line search method. More theoretical details can be found in the references (e.g., [31] [32] [33] ).

One key step in the trust region method is the choice of the trust region radius ${\Delta}_{k}$ at each iteration. The choice is based on the agreement between the objective function $f$ and the model function ${\stackrel{\u02dc}{f}}_{k}$ at previous iteration. Given a step ${p}_{k}$, we define the ratio ${\rho}_{k}$ as

${\rho}_{k}=\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{p}_{k}\right)}{{\stackrel{\u02dc}{f}}_{k}\left(0\right)-{\stackrel{\u02dc}{f}}_{k}\left({p}_{k}\right)},$ (2.24)

where the numerator is the actual reduction, and the denominator is the predicted reduction.

If $\Vert {p}_{k}\Vert $ reaches the boundary of the trust region or the ratio ${\rho}_{k}$ is satisfactory enough, the radius is increased by a constant time:

${\Delta}_{k+1}={\alpha}_{1}{\Delta}_{k},\text{\hspace{1em}}{\alpha}_{1}>1.$ (2.25)

If the ratio ${\rho}_{k}$ is too small, which means that the model function ${\stackrel{\u02dc}{f}}_{k}$ is not a good approximation to the objective function $f$ under the radius, then we reduce the trust region radius:

${\Delta}_{k+1}={\alpha}_{2}{\Delta}_{k},\text{\hspace{1em}}0<{\alpha}_{2}<1.$ (2.26)

In this paper, we choose ${\alpha}_{1}=2$ and ${\alpha}_{2}=\frac{1}{4}$.

2.5. FWI Algorithms

In this section, we present the algorithm of the truncated Newton method with the line search strategy and the algorithm of the Gauss-Newton method with the trust-region strategy.

Algorithm 1 is the FWI algorithm by using the truncated Newton method based on the line search method. There are two loops in Algorithm 1. The outer loop is the iteration of the line search strategy, and the inner loop is to compute the iterative direction ${p}_{k}$ and the step length ${\alpha}_{k}$. The core of the inner loop is Hessian-vector product ${B}_{k}{d}_{j}$ and so on. Algorithm 2 is the FWI algorithm by using the Gauss-Newton method based on the trust region method. In Algorithm 2, ${\stackrel{\u02dc}{B}}_{k}$ is the Hessian approximation by the Gauss-Newton method. Comparing Algorithm 1 and Algorithm 2, we know the starting part of both algorithms is the same. The difference is in the rear part of the algorithm. In Algorithm 1, the line search strategy consists of the function of computing the step length and the iterative direction. On the contrary, in Algorithm 2, the trust region strategy combines the two steps together.

3. Numerical Computations

In this section, numerical computations are presented to illustrate the advantages of the trust region method. We also present the inversion results by the L-BFGS method for comparisons. We choose the L-BFGS method to make comparison as it is the most popular quasi-Newton method.

We consider FWI for the Marmousi model which is usually applied to test the ability of an imaging or inversion algorithm. The exact Marmousi model is shown in Figure 1. Figures 1(a)-(c) are the density, P-wave and S-wave velocities respectively. The model is 10 km in length along horizontal direction and 3.48 km

Figure 1. The exact Marmousi model. (a) Density; (b) P-wave velocity; (c) S-wave velocity.

in depth. It is grided by the 500 × 174 mesh with 20 m spatial steps in both $x$ and $y$ directions. We set 100 sources and 400 receivers in the data acquisition. The sources are located at 40 m in depth and the first source is set at $x=800\text{\hspace{0.17em}}\text{m}$ in the horizontal direction. The receivers are located at 460 m in depth with the first receiver is set at $x=800\text{\hspace{0.17em}}\text{m}$ in the horizontal direction. The source space is 40 m and the receiver space is 20 m. The source is the Ricker wavelet with the time function as

$s\left(t\right)=\left(1-\frac{1}{2}{\omega}^{2}{t}^{2}\right)\mathrm{exp}\left(-\frac{1}{4}{\omega}^{2}{t}^{2}\right),$ (3.1)

where $\omega =2\pi {f}_{0}$ and ${f}_{0}=7\text{\hspace{0.17em}}\text{Hz}$ is the central frequency. The source is loaded on the ${v}_{x}$ component. The recording time is 6 s with the time interval 2 ms.

For the elastic wave FWI, the density is more sensitive than the velocities. Some hierarchical inversion strategies developed by [13] and [14] can be applied. In our FWI computations, we don’t apply these strategies. We invert the three media parameters simultaneously. However, a multiscale technique by implementing the inversion from low frequency to high frequency is still used. Moreover, the data of the next high frequency band include the data of the previous low frequency band which guarantees the robustness of inversion. In our computations, four groups of frequencies are used to complete inversion step by step, *i.e*., 2 Hz, 5 Hz, 10 Hz and 20 Hz. For every stage, the stopping criterion is

$\frac{\Vert f\left({x}_{k}\right)-f\left({x}_{k-2}\right)\Vert}{\Vert f\left({x}_{k}\right)\Vert}\le \epsilon \mathrm{,}$ (3.2)

where $f\left({x}_{k}\right)$ is in fact the objective function computed by (2.8), and $\epsilon $ is set 0.01 in our computation. This stopping criterion is based on the relative decrease of the misfit function.

The initial model for FWI is shown in Figure 2. Figures 2(a)-(c) are the initial density, P-wave and S-wave velocities respectively. They are generated by arithmetic average smoothing for the exact model along the depth direction only. In Figure 2, we can see there are no geological structures such as the faults in the initial model like those in the exact model.

The final inversion result by the Gauss-Newton method with the line search strategy is shown in Figure 3. The final inversion result by the Gauss-Newton method with the two dimensional subspace strategy is shown in Figure 4. The final inversion result by the truncated Newton method with the line search strategy is shown in Figure 5. We also complete the FWI by the L-BFGS. The results are similar and we omit the figures for saving space. We remark that the Hessian in the truncated Newton method is not positive and so the two dimensional subspace strategy can not be applied directly. Comparing Figures 3-5 with Figure 1, we can see that both the Gauss-Newton method and the truncated Newton method with the line search strategy can yield good imaging results. And the case of the Gauss-Newton method with the two dimensional subspace strategy is the same.

Figure 6 shows the profile comparison at the position $x=5\text{\hspace{0.17em}}\text{km}$ among the

Figure 2. The initial model for FWI. (a) Density; (b) P-wave velocity; (c) S-wave velocity.

Figure 3. The finial inversion result of Marmousi model by the Gauss-Newton method with the line search strategy. (a) Density; (b) P-wave velocity; (c) S-wave velocity.

Figure 4. The finial inversion result of Marmousi model by the Gauss-Newton method with the two dimensional subspace strategy. (a) Density; (b) P-wave velocity; (c) S-wave velocity.

Figure 5. The inversion result of Marmousi model by the truncated Newton method with the line search strategy. (a) Density; (b) P-wave velocity; (c) S-wave velocity.

Figure 6. The comparison of inversion results at $x=5\text{\hspace{0.17em}}\text{km}$ inverted by the L-BFGS method, the Gauss-Newton method and the truncated Newton method with the line search strategy. (a) Density; (b) P-wave velocity; (c) S-wave velocity.

inversion results by using the L-BFGS method, the Gauss-Newton method and the truncated Newton method with the line search strategy. In Figure 6, we can see both the inversion results of P-wave velocity and S-wave velocity obtained by the three methods are almost the same. However, the inversion results of density by the Gauss-Newton method and the truncated Newton method are obviously better than the L-BFGS method especially in deep positions.

In Table 1, we present the iteration number at every stage (*i.e*., 2 Hz, 5 Hz, 10 Hz and 20 Hz) for the L-BFGS method, the Gauss-Newton (GN) method and the truncated Newton (TN) method by the line search (LS) strategy or the two dimensional subspace (Sub) strategy. From Table 1, we can see that the Gauss-Newton method has obvious advantage over other methods from point of total iteration number.

In Table 2, we compare the computational efficiency for different methods. The total computational time and the time of each iteration are listed in Table 2. The total iteration number is also listed for convenience. As we can see, the Gauss-Newton method with the two dimensional subspace method costs least computational time for each iteration.

In Figure 7, we present a profile comparison of the inversion results at $x=6\text{\hspace{0.17em}}\text{km}$ by the Gauss-Newton method with the line search strategy and the trust region strategy. Figures 7(a)-(c) are the inversion results of density, P-wave velocity and S-wave velocity respectively. We can see that both strategies almost have the same inversion accuracy for P-wave velocity and S-wave

Table 1. The number of iterations at each stage for the L-BFGS method, the Gauss-Newton (GN) method and the truncated Newton (TN) method with the line search (LS) strategy or the two dimensional subspace (Sub) strategy.

Table 2. The total computation time and the time of each iteration for different inversion methods with the line search (LS) strategy or the two dimension subspace (Sub) strategy.

Figure 7. The comparison of inversion results for the Marmousi model at $x=6\text{\hspace{0.17em}}\text{km}$ inverted by using the Gauss-Newton method with the line search strategy and the trust region strategy. (a) Density; (b) P-wave velocity; (b) S-wave velocity.

velocity. For the density, we can see the trust region strategy with the two dimensional subspace strategy has better inversion accuracy than the line search strategy. In Table 3, we present ${l}_{2}$ errors and relative ${l}_{2}$ errors for the different methods. The relative error $\epsilon $ is defined by

$\epsilon =\frac{{\Vert {m}^{exa}-{m}^{cal}\Vert}_{2}}{{\Vert {m}^{exa}\Vert}_{2}}\mathrm{,}$ (3.3)

where ${m}^{exa}$ is the exact model and ${m}^{cal}$ is the inversion result. The relative ${l}_{2}$ error for the truncated Newton method with the two dimensional subspace strategy is 7.28% and it is the least. This fact is consistent with the theory since the Newton method contains the full information of Hessian. For the Gauss-Newton method, we can see that the relative ${l}_{2}$ error by the two dimensional subspace strategy is 7.38% whereas it is 7.49% for the line search strategy. It shows that the two dimensional subspace strategy is helpful to improve the inversion accuracy. Figure 8 shows the absolute ${l}_{2}$ errors of inversion results versus iteration for the L-BFGS method, the Gauss-Newton method and the truncated Newton method.

Table 3. The ${l}_{2}$ errors and relative ${l}_{2}$ errors of final inversion results for different inversion methods with the line search (LS) strategy or the two dimension subspace (Sub) strategy.

Figure 8. The absolute ${l}_{2}$ errors of inversion results versus iteration for the L-BFGS, the Gauss-Newton method and the truncated Newton method.

4. Conclusions

In FWI, the line search strategy is usually applied. In this paper, we have investigated the application of the trust-region strategy to elastic wave multi-parameter inversion. The theoretical methods and algorithms are described in detail. The trust-region subproblem is solved by the two-dimensional subspace method. Numerical computations for Marmousi model by the Gauss-Newton method and the truncated Newton method in the time domain are completed and compared.

We also have presented the corresponding inversion result by the L-BFGS method for comparison since it is the most popular quasi-Newton method. The comparisons demonstrate that both the Gauss-Newton method and the truncated Newton method are more accurate than the L-BFGS method. The trust region strategy is more efficient than the line search strategy. In the line search strategy, the information of Hessian and gradient is utilized for deciding the search direction whereas the search direction and the search step are determined together in the trust region strategy. In the trust region method, as long as the trust-region radius is well updated, the model updating can be performed for every iteration with the fixed trial step and there are no more extra computations of the forward problem. This is the theoretical reason why the trust-region strategy can save computational time. In the trust-region strategy, the two-dimensional subspace method requires that the Hessian matrix is positive and definite. Unfortunately, this condition is not always satisfied for the truncated Newton method. Future work will be the other measures to update the trust-region radius and the application of the trust-region method to other methods.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under grant number 11471328. It is also supported by the Key Project National Natural Science Foundation under grant number 51739007. The computations in this paper are completed on the parallel cluster of State Key Laboratory of Scientific and Engineering Computing.

Appendix A. The Staggered-Grid Schemes for Solving (2.3)

Let
$\Delta t$ be the time step in the time direction and
$h$ the spatial step in
$x$ and
$y$ directions. The discrete indexes for
$x$,
$y$ and
$t$ are denoted by
$j$,
$i$ and*
$n$ * respectively. Then the difference schemes for
${v}_{x}$ and
${v}_{y}$ with the second order accuracy are

${v}_{x}^{n+1}\left(j\mathrm{,}i+\frac{1}{2}\right)={v}_{x}^{n}\left(j\mathrm{,}i+\frac{1}{2}\right)+\Delta t\frac{1}{\rho \left(j\mathrm{,}i+\frac{1}{2}\right)}{\stackrel{^}{v}}_{x}^{n}\left(j\mathrm{,}i+\frac{1}{2}\right)\mathrm{,}$ (A.1)

${v}_{y}^{n+1}\left(j+\frac{1}{2}\mathrm{,}i\right)={v}_{y}^{n}\left(j+\frac{1}{2}\mathrm{,}i\right)+\Delta t\frac{1}{\rho \left(j+\frac{1}{2}\mathrm{,}i\right)}{\stackrel{^}{v}}_{y}^{n}\left(j+\frac{1}{2}\mathrm{,}i\right)\mathrm{,}$ (A.2)

where

${\stackrel{^}{v}}_{x}^{n}\left(j\mathrm{,}i+\frac{1}{2}\right)=\frac{{\sigma}_{xx}\left(j\mathrm{,}i+1\right)-{\sigma}_{xx}\left(j\mathrm{,}i\right)}{h}+\frac{{\sigma}_{xy}\left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)-{\sigma}_{xy}\left(j-\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)}{h}\mathrm{,}$

${\stackrel{^}{v}}_{y}^{n}\left(j+\frac{1}{2}\mathrm{,}i\right)=\frac{{\sigma}_{xy}\left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)-{\sigma}_{xy}\left(j+\frac{1}{2}\mathrm{,}i-\frac{1}{2}\right)}{h}+\frac{{\sigma}_{yy}\left(j+\mathrm{1,}i\right)-{\sigma}_{yy}\left(j\mathrm{,}i\right)}{h}\mathrm{.}$

The difference schemes for strain ${\sigma}_{xx}$, ${\sigma}_{xy}$ and ${\sigma}_{yy}$ with the second order accuracy are the following

$\begin{array}{c}{\sigma}_{xy}^{n+1}\left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)={\sigma}_{xy}^{n}\left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)+\Delta t\mu \left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.05em}}\times \left({\stackrel{^}{v}}_{xy}^{n}\left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)+{\stackrel{^}{v}}_{yx}^{n}\left(j+\frac{1}{2}\mathrm{,}i+\frac{1}{2}\right)\right)\mathrm{,}\end{array}$ (A.3)

${\sigma}_{xx}^{n+1}\left(j,i\right)={\sigma}_{xx}^{n}\left(j,i\right)+\Delta t\lambda \left(j,i\right)\left({\stackrel{^}{v}}_{xx}^{n}\left(j,i\right)+{\stackrel{^}{v}}_{yy}^{n}\left(j,i\right)\right)+2\Delta t\mu \left(j,i\right){\stackrel{^}{v}}_{xx}^{n}\left(j,i\right),$ (A.4)

${\sigma}_{yy}^{n+1}\left(j,i\right)={\sigma}_{yy}^{n}\left(j,i\right)+\Delta t\lambda \left(j,i\right)\left({v}_{xx}^{n}\left(j,i\right)+{\stackrel{^}{v}}_{yy}^{n}\left(j,i\right)\right)+2\Delta t\mu \left(j,i\right){\stackrel{^}{v}}_{yy}^{n}\left(j,i\right),$ (A.5)

where

$\begin{array}{l}{\stackrel{^}{v}}_{xx}^{n}\left(j,i\right)=\frac{{v}_{x}\left(j,i+\frac{1}{2}\right)-{v}_{x}\left(j,i-\frac{1}{2}\right)}{h},\\ {\stackrel{^}{v}}_{yy}^{n}\left(j,i\right)=\frac{{v}_{y}\left(j+\frac{1}{2},i\right)-{v}_{y}\left(j-\frac{1}{2},i\right)}{h},\\ {\stackrel{^}{v}}_{yx}^{n}\left(j+\frac{1}{2},i+\frac{1}{2}\right)=\frac{{v}_{y}\left(j+\frac{1}{2},i+1\right)-{v}_{y}\left(j+\frac{1}{2},i\right)}{h},\\ {\stackrel{^}{v}}_{xy}^{n}\left(j+\frac{1}{2},i+\frac{1}{2}\right)=\frac{{v}_{x}\left(j+1,i+\frac{1}{2}\right)-{v}_{x}\left(j,i+\frac{1}{2}\right)}{h}.\end{array}$

Appendix B. The Derivation for Gradient Formulae (2.16)-(2.18)

Note that

${v}_{x}=\frac{\partial {u}_{x}}{\partial t},\text{\hspace{0.17em}}{v}_{y}=\frac{\partial {u}_{y}}{\partial t},\text{\hspace{0.17em}}{\stackrel{\u02dc}{v}}_{x}=\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial t},\text{\hspace{0.17em}}{\stackrel{\u02dc}{v}}_{y}=\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial t}$. (B.1)

Substituting (B.1) into (2.13), we have

$\frac{\partial \chi}{\partial \rho}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\left(\frac{\partial {v}_{x}}{\partial t}{\stackrel{\u02dc}{u}}_{x}+\frac{\partial {v}_{y}}{\partial t}{\stackrel{\u02dc}{u}}_{y}\right)\text{d}t,$ (B.2)

which is just (2.16).

From the relation of strain and stress [19], we have

${\sigma}_{xx}=\lambda \left(\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\right)+2\mu \frac{\partial {u}_{x}}{\partial x},$ (B.3)

${\sigma}_{yy}=\lambda \left(\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\right)+2\mu \frac{\partial {u}_{y}}{\partial y},$ (B.4)

where ${u}_{x}$ and ${u}_{y}$ are the displacement in the $x$ and $y$ directions respectively. Adding (B.3) and (B.4) together, we have

$\left({\sigma}_{xx}+{\sigma}_{yy}\right)=2\left(\lambda +\mu \right)\left(\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\right),$ (B.5)

that is

$\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}=\frac{{\sigma}_{xx}+{\sigma}_{yy}}{2\left(\lambda +\mu \right)}.$ (B.6)

Similarly, for the backward wavefield ${\stackrel{\u02dc}{u}}_{x}$ and ${\stackrel{\u02dc}{u}}_{y}$, we have

$\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}=\frac{{\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}}{2\left(\lambda +\mu \right)}\mathrm{,}$ (B.7)

where ${\stackrel{\u02dc}{u}}_{x}$, ${\stackrel{\u02dc}{u}}_{y}$, ${\stackrel{\u02dc}{\sigma}}_{xx}$ and ${\stackrel{\u02dc}{\sigma}}_{yy}$ are the backward wavefield of the adjoint problem. Substituting (B.6) and (B.7) into (2.14), we get

$\frac{\partial \chi}{\partial \lambda}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\frac{\left({\sigma}_{xx}+{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{4{\left(\lambda +\mu \right)}^{2}}\text{d}t,$ (B.8)

which is just (2.17).

Since

${\sigma}_{xy}=\mu \left(\frac{\partial {u}_{x}}{\partial y}+\frac{\partial {u}_{y}}{\partial x}\right)\mathrm{,}$ (B.9)

we have

$\frac{\partial {u}_{x}}{\partial y}+\frac{\partial {u}_{y}}{\partial x}=\frac{{\sigma}_{xy}}{\mu},$ (B.10)

and

$\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial y}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial x}=\frac{{\stackrel{\u02dc}{\sigma}}_{xy}}{\mu},$ (B.11)

where ${\stackrel{\u02dc}{\sigma}}_{xy}$ is the backward stress field corresponding to ${\sigma}_{xy}$. Multiplying (B.10) and (B.11), we obtain

$\left(\frac{\partial {u}_{y}}{\partial x}+\frac{\partial {u}_{x}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)=\frac{{\sigma}_{xy}{\stackrel{\u02dc}{\sigma}}_{xy}}{{\mu}^{2}}\mathrm{.}$ (B.12)

Subtracting (B.3) and (B.4), we have

${\sigma}_{xx}-{\sigma}_{yy}=2\mu \left(\frac{\partial {u}_{x}}{\partial x}-\frac{\partial {u}_{y}}{\partial y}\right)\mathrm{,}$ (B.13)

that is

$\frac{\partial {u}_{x}}{\partial x}-\frac{\partial {u}_{y}}{\partial y}=\frac{{\sigma}_{xx}-{\sigma}_{yy}}{2\mu}.$ (B.14)

Similarly, we have

$\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}-\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}=\frac{{\stackrel{\u02dc}{\sigma}}_{xx}-{\stackrel{\u02dc}{\sigma}}_{yy}}{2\mu}.$ (B.15)

Multiplying (B.14) and (B.15), we have

$\begin{array}{c}\left(\frac{\partial {u}_{x}}{\partial x}-\frac{\partial {u}_{y}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}-\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)=\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}-\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}-\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\\ =\frac{\left({\sigma}_{xx}-{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}-{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{4{\mu}^{2}}.\end{array}$ (B.16)

Multiplying (B.6) and (B.7), we have

$\begin{array}{c}\left(\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)=\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\\ =\frac{\left({\sigma}_{xx}+{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{4{\left(\lambda +\mu \right)}^{2}}.\end{array}$ (B.17)

Adding (B.16) and (B.17) together, we have

$\begin{array}{l}\left(\frac{\partial {u}_{x}}{\partial x}-\frac{\partial {u}_{y}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}-\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)+\left(\frac{\partial {u}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\right)\left(\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)\\ =\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}-\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}-\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\\ =2\left(\frac{\partial {u}_{x}}{\partial x}\frac{\partial {\stackrel{\u02dc}{u}}_{x}}{\partial x}+\frac{\partial {u}_{y}}{\partial y}\frac{\partial {\stackrel{\u02dc}{u}}_{y}}{\partial y}\right)\\ =\frac{\left({\sigma}_{xx}+{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{4{\left(\lambda +\mu \right)}^{2}}+\frac{\left({\sigma}_{xx}-{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}-{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{4{\mu}^{2}}.\end{array}$ (B.18)

Substituting (B.12) and (B.18) into (2.15), we have

$\begin{array}{c}\frac{\partial \chi}{\partial \mu}=-{\displaystyle \underset{{x}_{s}}{\sum}}\text{\hspace{0.17em}}{\displaystyle {\int}_{0}^{T}}\{\frac{{\sigma}_{xy}{\stackrel{\u02dc}{\sigma}}_{xy}}{{\mu}^{2}}+\frac{1}{4}[\frac{\left({\sigma}_{xx}+{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}+{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{{\left(\lambda +\mu \right)}^{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+\frac{\left({\sigma}_{xx}-{\sigma}_{yy}\right)\left({\stackrel{\u02dc}{\sigma}}_{xx}-{\stackrel{\u02dc}{\sigma}}_{yy}\right)}{{\mu}^{2}}]\}\text{d}t,\end{array}$ (B.19)

which is just (2.18).

References

[1] Tarantola, A. (1984) Inversion of Seismic Reflection Data in the Acoustic Approximation. Geophysics, 49, 1259-1266.

https://doi.org/10.1190/1.1441754

[2] Lailly, P. (1983) The Seismic Inverse Problem as a Sequence of before Stack Migrations. In: Bednar, J.B., Robinson, E. and Weglein, A., Eds., Conference on Inverse Scattering—Theory and Application, SIAM, Philadelphia, 206-220.

[3] Woodhouse, J.H. and Dziewonski, A.M. (1984) Mapping the Upper Mantle: Three-Dimensional Modeling of Earth Structure by Inversion of Seismic Waveforms. Journal of Geophysical Research: Solid Earth, 89, 5953-5986.

https://doi.org/10.1029/JB089iB07p05953

[4] Pratt, R.G. (1999) Seismic Waveform Inversion in the Frequency Domain, Part 1: Theory and Verification in a Physical Scale Model. Geophysics, 64, 888-901.

https://doi.org/10.1190/1.1444597

[5] Pratt, R.G. and Worthington, M.H. (1990) Inverse Theory Applied to Multi-Source Cross-Hole Tomography, Part 1: Acoustic Wave-Equation Method. Geophysical Prospecting, 38, 287-310.

https://doi.org/10.1111/j.1365-2478.1990.tb01846.x

[6] Sirgue, L. and Pratt, R.G. (2004) Efficient Waveform Inversion and Imaging: A Strategy for Selecting Temporal Frequencies. Geophysics, 69, 231-248.

https://doi.org/10.1190/1.1649391

[7] Bunks, C., Saleck, F.M., Zaleski, S. and Chavent, G. (1995) Multiscale Seismic Waveform Inversion. Geophysics, 60, 1457-1473.

https://doi.org/10.1190/1.1443880

[8] Zhang, W. and Luo, J. (2013) Full-Waveform Velocity Inversion Based on the Acoustic Wave Equation. American Journal of Computational Mathematics, 3, 13-20.

https://doi.org/10.4236/ajcm.2013.33B003

[9] Zhang, W. and Joardar, A. (2018) Acoustic Based Crosshole Full Waveform Slowness Inversion in the Time Domain. Journal of Applied Mathematics and Physics, 6, 1086-1110.

https://doi.org/10.4236/jamp.2018.65094

[10] Shin, C. and Cha, Y.H. (2008) Waveform Inversion in the Laplace Domain. Geophysical Journal International, 173, 922-931.

https://doi.org/10.1111/j.1365-246X.2008.03768.x

[11] Kim, Y., Shin, C., Calandra, H. and Min, D.J. (2013) An Algorithm for 3D Acoustic Time-Laplace-Fourier-Domain Hybrid Full Waveform Inversion. Geophysics, 78, R151-R166.

https://doi.org/10.1190/geo2012-0155.1

[12] Operto, S., Gholami, Y., Prieux, V., Ribodetti, A., Brossier, R., Metivier, L. and Virieux, J. (2013) A Guided Tour of Multiparameter Full-Waveform Inversion with Multicomponent Data: From Theory to Practice. The Leading Edge, 32, 1040-1054.

https://doi.org/10.1190/tle32091040.1

[13] Jeong, W., Lee, H.Y. and Min, D.J. (2012) Full Waveform Inversion Strategy for Density in the Frequency Domain. Geophysical Journal International, 188, 1221-1242.

https://doi.org/10.1111/j.1365-246X.2011.05314.x

[14] Xu, K. and McMechan, G.A. (2014) 2D Frequency-Domain Elastic Full-Waveform Inversion Using Time-Domain Modeling and a Multistep-Length Gradient Approach. Geophysics, 79, R41-R53.

https://doi.org/10.1190/geo2013-0134.1

[15] Pratt, R.G., Shin, C. and Hick, G. (1998) Gauss-Newton and Full Newton Methods in Frequency—Space Seismic Waveform Inversion. Geophysical Journal International, 133, 341-362.

https://doi.org/10.1046/j.1365-246X.1998.00498.x

[16] Yang, P., Brossier, R., M’etivier, L., Virieux, J. and Zhou, W. (2018) A Time-Domain Preconditioned Truncated Newton Approach to Visco-Acoustic Multiparameter Full Waveform Inversion. SIAM Journal on Scientific Computing, 40, B1101-B1130.

https://doi.org/10.1137/17M1126126

[17] Plessix, R.E. (2006) A Review of the Adjoint-State Method for Computing the Gradient of a Functional with Geophysical Applications. Geophysical Journal International, 167, 495-503.

https://doi.org/10.1111/j.1365-246X.2006.02978.x

[18] Métivier, L., Bretaudeau, F., Brossier, R., Operto, S. and Virieux, J. (2014) Full Waveform Inversion and the Truncated Newton Method: Quantitative Imaging of Complex Subsurface Structures. Geophysical Prospecting, 62, 1353-1375.

https://doi.org/10.1111/1365-2478.12136

[19] Aki, K. and Richards, P.G. (1980) Quantitative Seismology: Theory and Methods. Freeman, San Francisco.

[20] Zhang, W., Zhuang, Y. and Zhang, L. (2017) A New High-Order Finite Volume Method for 3D Elastic Wave Simulation on Unstructured Meshes. Journal of Computational Physics, 340, 534-555.

https://doi.org/10.1016/j.jcp.2017.03.050

[21] Virieux, J. (1986) P-SV-Wave Propagation in Heterogeneous Media: Velocitystress Finite-Difference Method. Geophysics, 51, 889-901.

https://doi.org/10.1190/1.1442147

[22] Clayton, R. and Engquist, B. (1977) Absorbing Boundary Conditions for Acoustic and Elastic Wave Equations. Bulletin of the Seismological Society of America, 67, 1529-1540.

https://doi.org/10.1785/BSSA0670061529

[23] Berenger, J.P. (1994) A Perfectly Matched Layer for the Absorption of Electromagnetic Waves. Journal of Computational Physics, 114, 185-200.

https://doi.org/10.1006/jcph.1994.1159

[24] Komatitsch, D. and Tromp, J. (2003) A Perfectly Matched Layer Absorbing Boundary Condition for the Second-Order Seismic Wave Equation. Geophysical Journal International, 154, 146-153.

https://doi.org/10.1046/j.1365-246X.2003.01950.x

[25] Nocedal, J. (1980) Updating Quasi-Newton Matrices with Limited Storage. Mathematics of Computation, 35, 773-782.

https://doi.org/10.1090/S0025-5718-1980-0572855-7

[26] Nocedal, J. and Wright, S.J. (2006) Numerical Optimization. Springer Science & Business Media, New York.

[27] Byrd, R.H., Lu, P., Nocedal, J. and Zhu, C. (1995) A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific Computing, 16, 1190-1208.

https://doi.org/10.1137/0916069

[28] Métivier, L., Brossier, R., Operto, S. and Virieux, J. (2017) Full Waveform Inversion and the Truncated Newton Method. SIAM Review, 59, 153-195.

https://doi.org/10.1137/16M1093239

[29] Albert, T. (2004) Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, Philadelphia.

[30] Philip, W. (1969) Convergence Conditions for Ascent Methods. SIAM Review, 11, 226-235.

https://doi.org/10.1137/1011036

[31] David, M.G. (1981) Computing Optimal Locally Constrained Steps. SIAM Journal on Scientific and Statistical Computing, 2, 186-197.

https://doi.org/10.1137/0902016

[32] Sorensen, D.C. (1982) Newton’s Method with a Model Trust Region Modification. SIAM Journal on Numerical Analysis, 19, 409-426.

https://doi.org/10.1137/0719026

[33] Moré, J.J. and Sorensen, D.C. (1983) Computing a Trust Region Step. SIAM Journal on Scientific and Statistical Computing, 4, 553-572.

https://doi.org/10.1137/0904038