Acoustic, electromagnetic and seismic waves are routinely used to probe the media through which they propagate, and especially to image the spatially-varying velocity field. A fundamental property of these waves that commonly is exploited is their travel time T, defined as the time between the generation of a wave at its source to its detection by a distant observer. In many cases, travel times can be computed under the ray approximation of the exact wave equation, which is valid at high-frequencies when the scale length of heterogeneities in the medium is much larger than the wavelength of the waves. Since the 1970’s, the simplicity of ray calculations has underpinned the use of travel time tomography in a variety of disciplines, including seismology  , oceanography , petroleum exploration , geotechnical engineering  and cosmology . In some disciplines, ray-based tomography is being superseded by full wavefield methods  ; nevertheless, it remains an important part of a tomographer’s toolbox on account of its computational efficiency.
Over the last several decades, the development of the so-called adjoint state method  has allowed tomographic imaging to be applied in cases where it was hitherto fore infeasible, because of vastly reduced computational effort. To date, this efficiency mainly has used to enable computationally-intensive forms of tomography, and especially to full wavefield tomography  . Nevertheless, adjoint state methodology is very widely applicable. It has the potential for significantly speeding up even computationally-light problems, including ray-based tomography. The feasibility of using adjoint state methods in this form of tomography was first investigated by , who demonstrate its effectiveness. In this paper, we further explore it application. We study the mathematical structure of the differential equation that arises out of the adjoint state method (the equation for the so-called adjoint field) and show that it is very closely related to and in important cases identical to the transport equation of ray theory. This relationship provides an intuitive understanding of the adjoint field and suggests further ways of obtaining further computational efficiency.
Our analysis is divided into four sections: first, we review how the adjoint state method is used to streamline the computation of a critical quantity need to perform tomography; second, we review the concept of the geometrical spreading of rays and its connection to the transport equation; third, we use the adjunct state method to derive and solve the differential equation for the adjoint field; and lastly, we show that the adjoint equation is very closely related to the transport equation and that its solution can be trivially constructed when the solution to the transport equation (the geometrical spreading function) is known.
2. The Adjoint State Method for Computing the Error Derivative
The main purpose of this section is to define the error derivative, discuss its usefulness and review how the adjoint state method is used to compute it, in the special case where the unknown image is linked to the observed data via the source term in a linear differential equation.
Many types of tomography involve a set of observations , each associated with a spatial position , which are related to an unknown image function by the possibly-nonlinear map . Here, are real spatial coordinates and denotes transpose. Usually, the data, spatial coordinates and image function are presumed to be real. Because no finite number of observations can define a continuous function, the image is usually approximated with a finite number of parameters; that is, as with . For example, the image might be divided into voxels, each with a value . A common approach to image reconstruction is to define individual errors, and total error and then to find the that minimizes ( ; see also   ). Here, is an empirically-chosen constant and is a linear operator that embodies prior information, such as smoothness. Among the many optimization procedures put forward for solving this problem, several commonly-used ones, based on the linearized least-squares method (e.g.  ), employ the partial derivative to compute the data perturbation associated with the image perturbation via . Alternatively, other common-used procedures, based on the gradient-descent method (e.g. ), use the partial derivative to compute the error perturbation associated with the image perturbation via .
The error derivative can be computed from the data derivative :
However, recent advances in tomography have followed the realization that often can be computed without first computing , giving gradient-descent methods tremendous computational advantage over least-squares methods. The underlying idea of these adjoint state methods   is to promote the error to a field , with the assumption that the data have been measured everywhere, so that and:
Here is the inner product over spatial coordinates. Now, consider the simple case in which the data solves the linear differential equation (together with some appropriate boundary condition). Here, the image is the source term in the differential equation. By differentiating the differential equation, we obtain an expression for :
The partial derivative of total error is computed by differentiating to yield and inserting into (2):
Here denotes adjoint and is an adjoint field that satisfies the so-called adjoint equation (with appropriate boundary conditions). Note that the error plays the role of the source term in the adjoint equation.
Cases in which the error is known everywhere are uncommon, since they imply observations within the medium, as contrasted to on its boundary. More typical are the cases where error is at discrete points on the boundary. These cases are handled by writing defining a partial adjoint field and error density :
Here, is the Dirac impulse function. The resulting equation is then solved only for those points at which the error is known and adjoint field is taken to be the sum of the . This procedure is equivalent to solving the original adjoint equation with error:
3. The Transport Equation of Ray Theory
The main purpose of this section is to review the geometrical interpretation of the transport equation and to highlight its link to ray divergence. However, in order to provide some background for readers unfamiliar with ray theory, and to establish nomenclature, we also present an abridged derivation of the equation.
In many cases, the imaging problem involves a field that is a function of time t as well as spatial coordinates and that satisfies a wave equation of the form . Here, the differential operator contains only spatial derivatives and depends on parameters . The equation reduces to the spatial equation after Fourier transformation of time t to angular frequency , where denotes a transformed variable. The ray approximation is the solution to this equation in the limit , and is achieved by postulating that the solution can be written as a Laurent series of the form  :
Here i is the imaginary unit. The travel time function represents the time needed for a fluctuation in u to propagate from to , and represents its amplitude. The details of the ray solution depend on ; we consider the simple (and common) case , where is a slowness function; that is, a material property that is inversely proportional to the local propagation velocity. Inserting (4) into the differential equation and equating equal powers of lead to the Eikonal equation for :
and a sequence of equations for , the lowest order of which is the transport equation  :
The unit normal to a surface of equal travel time is . A sequence of these vectors connecting surfaces of increasing travel times defines a ray; that is, a parametric curve with arclength and tangent (Figure 1(A)). The volume enclosed by a group of rays is called a ray tube. The Eikonal equation, written as two coupled first order equations in and is:
with boundary conditions
The ray’s starting point is and its take-off direction is . Travel time is then the path integral of the slowness along the ray, as can be seen by manipulating the formula for the directional derivative :
The transport equation, written in terms of , is:
Figure 1. (A) Basic ray theory nomenclature. Wave propagates outward from a source at (black circle), through the medium, to the surface (with normal ). Surfaces of equal travel time (wave fronts, grey curves) are labeled with their travel times , , etc., Normals to wave fronts define rays (blue curves) with tangents . Neighboring rays enclosing a solid angle at the source define a ray tube. (B) Relationship between ray tangents and ray tube cross-sectional area S. Gauss’s theorem is applied to a small volume V along the ray tube, with the shape of a section of a cone, whose cross-sectional area S changes with arc-length and whose volume is . The tangent is parallel to the sides of the section and normal to its ends. See text for further discussion.
The quantity has a simple geometric interpretation, as can be seen by applying Gauss’ theorem (e.g. ) to a volume V along a ray tube, which has the shape of a section of a cone (Figure 1(B)). The cross-sectional area of the ray tube increases from S on the end nearest to the source, to at a distance further away. For small volumes, the integral in Gauss’ theorem is where . The surface integral in Gauss’ theorem has contributions only from the two ends of the cone, of and respectively, which sum to dS. Consequently, Gauss’s theorem implies and the transport equation becomes:
According to the transport equation, the fractional decrease in , measured along a ray, is equal to the fractional increase in area S of the ray tube. In many cases, the quantity has the interpretation of the energy density, so the transport equation embodies conservation of energy. Conventionally, the area of the ray tube is written , where is the geometrical spreading function and is the solid angle subtended by the ray tube at the source (e.g. ). Consequently, where c is a constant. Ray-tracing algorithms that solve (9) typically tabulate both T and R (e.g.  ).
4. Adjoint Equation for Travel Time Tomography
The main purpose of this section is to derive and solve the adjoint equation needed to compute the quantity , the derivative of the total travel time error with respect to a model parameter controlling the slowness of the medium. Our derivation focuses on expressing the equation in terms of quantities that vary along rays, so that it can be readily compared to the transport Equation (11). Our derivation is equivalent to, but different than, the one by , being a direct application of perturbation theory, as contrasted to one that employs Lagrange multipliers.
In travel time tomography, travel time observations are considered to be the data, and the slowness , or rather its approximation , is the image function. In order to apply the adjunct methodology as outlined in the Introduction, the non-linear Eikonal equation must be linearized about a “background” solution. Let the slowness equal a background slowness plus a small perturbation , where is a small parameter, and the corresponding travel time equal a background travel time plus a small perturbation . Then to first order in , the Eikonal equation becomes:
Equating terms of equal order in yields equations for the background travel time and the perturbation in travel time :
Equation (14b) indicates that the component of in the direction of the background ray direction is . Since plays the role the source term in the differential equation, the formulation in (3) is applicable. If we define to be an increment of arc length along the unperturbed ray, then this is just an equation involving the directional derivative :
The perturbation in travel time is the integral of the perturbation in slowness along the unperturbed ray. We rewrite the equation for as:
Using the rules and (e.g., ) we obtain an expression for the adjoint equation:
As is typical of first-order equations, the “left hand” boundary condition associated with implies a “right hand” boundary condition for (e.g. ); that is, while at the source at as the end point of the ray (where it touches the boundary of the medium).
The adjoint Equation (17) can be further manipulated:
The formal solution to (17) is well-known (e.g. ):
Here the constant C is chosen to enforce the boundary condition .
5. Analysis of the Role of the Geometrical Spreading
The main purpose of this section is show that the solution to the adjoint equation can be constructed from the geometrical spreading function, and to interpret this result.
In any region in which , the adjoint Equation (18) has the same form as the transport Equation (12). Since the error is rarely known within the medium, but rather only on its boundary , this restriction is satisfied by all commonly-encountered cases. As we will show below, the similarity of form provides considerable insight into the behavior of the adjoint field .
Ray divergence enters into the adjoint equation through the term. In order to highlight its contribution, we first examine a solution in which this term is zero. Consider a plane wave propagating in the z-direction through a homogenous layer with (Figure 2). The background travel time is , the ray direction is and . The plane wave satisfies the background Eikonal equation (14a),since . Since the rays of a plane wave do not diverge, .
Now consider the case where the background slowness is everywhere too small by an amount b, so that the background error grows linearly with distance z; that is, . We will assume that this error is known only on the boundary . Following (5), the adjoint equation is . Because of the Dirac impulse function, the boundary condition for requires some scrutiny. We will consider that the error is defined just below the boundary, at , where . In order to satisfy both the boundary condition of and the adjoint equation, the solution must be discontinuous at ; and in the immediate vicinity of must be . Effectively, the boundary condition is . The solution of the adjoint equation is ; note that it does not depend upon z.
Now consider a slowness perturbation in the form of a very thin rectangular prism, centered at , of thickness D, and having sides at and , and and (so that its volume is ). Since the prism is very thin, it can be approximated as a Dirac impulse function in depth z:
Here, is the Heaviside function, which is unity when its argument is positive and zero otherwise. The partial derivative of total error is:
An expected, , since increasing lowers the error. Also as expected, is proportional to the area of the prism, since the larger its area, the larger the region to which the slowness perturbation is applied. Interestingly, is independent of the position of the prism; that is, the prism can be moved up or down without affecting the error. As we will show below, this insensitivity to position is due to the absence of ray divergence in this plane wave case.
We now consider a spherical wave propagating in the r-direction in through a homogenous sphere with (Figure 3), described by spherical polar coordinates . The background travel time is , the ray direction is and . The spherical wave satisfies the background Eikonal Equation (14a), since . The area of a ray tube is
Figure 2. Rays (blue) of a plane wave cross a layer with bottom and top surfaces at and , respectively. A prismatic slowness perturbation (red rectangle) is placed within the layer, with left and right edges at and , respectively. The travel time error , measured on the upper surface (top plot), is reduced in the region where the rays project the prism. Because the rays do not diverge, the size of this region is independent of the depth of the perturbation.
Figure 3. Rays (blue) of a spherical wave start at a source at the center of the sphere at and propagate outward through the sphere to its surface at . A slowness perturbation with the shape of a spherical cap (red cap) is placed within the sphere at radius , with left and right edges at polar angle and , respectively. The travel time error , measured on the upper surface, is reduced in the region where the perturbation is projected by the rays (graph at top).
, from whence we conclude that the geometrical spreading function is and the ray divergence is . As in the plane wave case, the background slowness is everywhere too small by an amount b, leading to a background error . We will assume that this error is known only on the boundary . The adjoint Equation (18) reduces to:
The solution is . As is asserted in the Introduction, the solution to this transport-like equation is related to the geometrical spreading function by .
Now consider a slowness perturbation in the form of a very thin spherical cap of fixed thickness D, centered at and and subtending a variable polar angle area such that its area is fixed as :
For a position away from the origin where a spherical cap of thickness D and area is possible, the partial derivative of total error is:
The spherical wave solution (24) differs from the plane wave solution (21) by a factor that involves the ratio of geometric spreading functions, , evaluated at the heterogeneity and the surface. The area, on the surface of the sphere, subtended by the prism decreases with its radius , decreasing the error over wider region (Figure 4). This example illustrates the importance of geometric spreading on the amplitude of the adjoint field and on the effectiveness of a given perturbation to reduce the error E. Given several perturbations of equal size, the most effective is one whose projection on the boundary, by rays interacting with it, is the largest.
Although the adjoint field is singular at the source (ray starting point) , the partial derivative is finite there, as can be seen by considering a spherical heterogeneity of radius centered on the origin of the form :
when the background slowness is spatially varying, the rays have a complicated spatial pattern and the background error , measured on the boundary , is spatially varying. Suppose that the medium has a surface with outward pointing normal . A ray connecting an interior point to can be labeled by . Then, means the point on a boundary at which a ray passing through ends, and arc-length means the distance at along a ray that ends at . Similarly, the geometrical spreading
Figure 4. Rays (blue) of a spherical wave, as in Figure 3. One of two alternate slowness perturbations (green and red caps) are placed within the sphere, at radii and , respectively, with . These caps have equal area and equal thickness D. The travel time error , measured on the surface of the sphere, is reduced in the region where the perturbation is projected by the rays (green and red curves in top plot). The reduction in error in this region is the same in both cases, because the thicknesses of the perturbations are equal. However, because the rays diverge, the size affected region is larger for the perturbation at .
function can be written as ; that is, the geometrical spreading function at associated with the ray that ends at . Then, the adjoint field is then:
Here, the dot product between the ray tangent and surface normal is introduced to account for the increased surface area intersected by the ray tube, in the case (unlike the examples, above) where the ray tube obliquely impinges upon the boundary. Now, suppose that slowness perturbation is represented with voxels, where voxel k has volume , amplitude , and centroid position . When the adjoint field varies slowly compared to the length scale of a voxel (a requirement that excludes the source point) the error derivative is:
Here is the end point of the ray passing through . This result emphasizes the link between the geometrical spreading function R and the partial derivative of total error E. (When the voxel is close to, or overlaps the origin, is still well-defined and finite, but the inner product in (27) must be computed appropriately).
The key result in this paper is the demonstration that the adjoint equation in ray-based travel time tomography has the same form as the well-known transport equation for ray theoretical amplitudes. Consequently, the spatial variation of the adjoint field is completely controlled by the geometrical spreading function R. This result provides an intuitive understanding of the primary factor controlling the size of the partial derivative of total error E with respect to the slowness of a voxel. The partial derivative is large when ray divergence causes the projection of the voxel on the measurement surface to be large. Since this result provides an explicit formula for in terms of R, it enables to be calculated without resorting to the numerical solution of the adjoint equation. Only an inner product needs to be calculated, and in the case of a voxel parameterization of the slowness image, it can be calculated trivially.
I thank the graduate students who participated in Columbia University’s 2017 Seminar in Adjoint Methods for helpful discussion.
 Justice, J.H., Vassiliou, A.A., Singh, S., Logel, J.D., Hansen, P.A., Hall, B.R., Hurt, P.R. and Solanki, J.J. (1989) Acoustic Tomography for Monitoring Enhanced Oil Recovery. Leading Edge, 8, 12-19.
 Nolet, G. (1987) Seismic Wave Propagation and Seismic Tomography. In: Nolet, G., Ed., Seismic Tomography with Applications in Global Seismology and Exploration Geophysics, Springer, New York, 1-23.
 Dahlen, F., Hung, S.-H. and Nolet, G. (2002) Fréchet Kernels for Finite-Frequency Travel Times—I. Theory. Geophysical Journal International, 141, 157-174.
 Hall, M.C.G., Cacuci, D.G. and Schlesinger, M.E. (1982) Sensitivity Analysis of a Radiative-Convective Model by the Adjoint Method. Journal of the Atmospheric Sciences, 29, 2038-2050.
 Bin Waheed, U., Flagg, G. and Yarman, C.E. (2016) First-Arrival Traveltime Tomography for Anisotropic Media Using the Adjoint-State Method. Geophysics, 81, R147-R155.
 Menke, W. and Eilon, Z. (2015) Relationship between Data Smoothing and the Regularization of Inverse Problems. Pure and Applied Geophysics, 172, 2711-2726.
 Snyman, J.A. and Wilke, D.N. (2018) Practical Mathematical Optimization—Basic Optimization Theory and Gradient-Based Algorithms. Springer Optimization and Its Applications, Second Edition, Springer, New York.
 Tromp, J., Tape, C. and Liu, Q. (2005) Seismic Tomography, Adjoint Methods, Time Reversal and Banana-Doughnut Kernels. Geophysical Journal International, 160, 195-216.
 Menke, W. (2005) Case Studies of Seismic Tomography and Earthquake Location in a Regional Context. In: Levander, A. and Nolet, G., Eds., Seismic Earth: Array Analysis of Broadband Seismograms, Geophysical Monograph Series 157, American Geophysical Union, Washington DC, 7-36.