This article assumes that the reader is familiar with basic principles and formulas of Geometric Optics (see  or  ). Being a continuation of , it similarly deals with only axially symmetrical refracting surfaces of spherical shape, combined into more complex optical systems. Light is considered to consist of a collection of monochromatic rays; its wavelike properties are ignored. A light ray is then identified with its path, consisting of straight-line segments which change direction (according to Snell’s law—see  ) only at a boundary between two optical media with different light speed. We assume that any such ray deviates from the axis of rotational symmetry by only a small angle (measured in radians), and explore what happens when the paraxial (i.e. first-order) approximation is extended by including quadratic and cubic terms in the corresponding Taylor expansion.
The basic idea is to let a pencil or rays originate at a point-like object, and trace their individual paths until they meet again at a single point called the image of the original object. Unfortunately, such a convergence is achieved only approximately, when ignoring all but the first terms in the expansion of , where is the angle by which a ray’s direction deviates from the axis of symmetry; this is the approach of most textbooks with the usual results summarized in . Now we explore what happens when third-order terms are included when tracing individual paths of a pencil of rays, and show that these terms are responsible for so called aberrations of the resulting image; the purpose of this article is to classify them into five different types, and derive formulas to demonstrate their nature, shape and magnitude. To avoid duplication, we deliberately skip topics covered in detail in , such as: making distinction between real and virtual images, introducing and utilizing cardinal points of an optical system, issues related to aperture and the corresponding vignetting, etc.
Our goal is to provide deeper understanding of a topic which students often encounter only as a collection of rather puzzling graphs and formulas . Yet the mathematical prerequisites to follow our presentation are quite elementary: Taylor expansion of simple functions, basic algebra of low-degree polynomials, and rudimentary knowledge of two-dimensional geometry (circles and straight lines in particular). A computer program is also presented, to enable students to explore various configurations of lenses in terms of the resulting aberrations. Rather than presenting any new results, we concentrate on rigorous yet mathematically elementary validation of existing formulas, including a novel derivation of the exact form of coma (a rather intriguing aberration).
All subsequent formulas are presented to cubic accuracy; this is occasionally emphasized by using the sign (similarly, the ≈ sign indicates linear accuracy only), while the: = sign implies “is defined as”. Locations and directions are three-component quantities; the x and y components consist of linear and cubic terms, the z (axis of rotational symmetry) component has absolute and quadratic terms only. A single (double) dot over a symbol refers to its linear (quadratic) part.
Our notation and conventions follow the readily available, open-access reference .
2. Single-Surface Refraction
Let a single ray start at an object’s location —note that our objects are points—and follow a unit direction
until a new medium of a relative (to the previous medium) refractive index n is reached; the boundary between the two media is an axially symmetric spherical surface of radius R (measured from its apex to the sphere’s center—a negative value indicates that the surface is concave), and an apex at . One can show that this happens at
Rewriting (2) in a more explicit form, we get
Note that the z component simplifies to
Proof. Substituting components of the right hand side (RHS from now on) of (2) into the equation of the spherical surface, we get
Solving for q can be done directly (a quadratic equation yields two solutions; we have to pick the correct solution and expand it up to quadratic terms); alternately, we can proceed iteratively, as follows: eliminating small quantities from (8) yields
which implies that, to the same accuracy, ( would take us to the wrong face of the surface). Similarly, expanding the same equation up to linear terms results in
implying that (the linear part of q) must be equal to zero. And finally, the quadratically accurate version of (8), namely
where the last term of the left hand side can be expanded to
Note that (6) is correct for all, i.e. convex, concave, and flat (R positive, negative, and infinite) spherical surfaces.
At the point of entry, the corresponding unit normal (to the surface to be entered) is then given by
(note that its z component is always positive), which further implies that
Proof. To prove (15), we need both and to quadratic accuracy only; it is thus sufficient to use
We then get (to cubic accuracy)
This implies that
which leads to (15).n
The ray’s new direction is then given by
whose x component is, based on (14) and (15)
with an analogous ( and ) expression for the y component. Note that and contribute both their linear and cubic parts. Also note that we do not need to keep track of the z component of a unit vector, since it is always a simple function of the first two components.
3. Multiple Surfaces
The whole procedure can then be repeated, starting with and , and using a new set of , and values (we now have to start correspondingly indexing these) until a second spherical boundary is reached, and so on. Note that, to use the same procedure, we must move the coordinate origin along the z axis to the apex of the first surface, so that the new z component has only quadratic terms again. In this manner we can continue till we reach the last surface of the optical system.
A single step of this process is summarized by the following Mathematica program
whose first argument X has the following fully general form
(the rest of them are self-explanatory). Triple dot indicates cubic terms of the corresponding components. The output computes the location and direction of the ray upon entering the next surface. It can then be used as the first argument of the subsequent call to “step” (with new values of g, R and n), and so one, thus following the ray from one surface to the next, till reaching the end of the optical system. We present some examples of this in due course, but let us first explore what to expect of the final output, after k such steps have been taken.
4. Optical Systems
It is obvious that, starting with and advancing through k steps of this procedure, the resulting first two components (of both location and direction) will consist of only linear and cubic terms in and . These results must be invariant under each of the following two (with respect to the y-z, and to the x-z plane) reflections, i.e. after simultaneous , (and/or , ) replacement; this reduces the number of potential linear terms from four to two, and cubic terms from twenty to ten, thus:
and their , analogs. The cubic coefficients are further constrained by rotational symmetry, meaning that all equations must be invariant under the following replacement: , , and . This implies that
and their analogs. The easiest way to verify these is to use induction: the constraints certainly hold for the initial components (having no cubic terms at all); feeding an X which has a general form of the RHS of (22) and (23), restricted by (24) to (27), into “step”, and checking that the coefficients of the output meet the same restrictions completes the proof (which we leave as an exercise).
Note that the recursive formula for the linear coefficients of the (22/23) transformation can be expressed in a simple matrix form, thus
which follows from generalization of (6) and (20). Since the determinant of the first RHS matrix is , and the second RHS matrix is the unit matrix when , we get the following expression for the determinant of the left-hand-side matrix
These formulas are interesting in their own right, but also essential for the proof of our next statement.
The coefficients of (22/23) are further restricted by the following identities
Proof. To prove (30) and (32), we first re-state them in the following form
where , applied to a cubic polynomial in and , returns the coefficient of minus three times the coefficient of , and is the expression in parentheses on the RHS of (32).
(and their , analogs), it is easy to verify that
We now proceed by induction: (34) and (35) are certainly true for (the initial values of location and direction have no cubic terms), and the statements are assumed correct for k. The objective is to prove that they must then hold for .
We get, for the first component of the generalized (i.e. and ) version of (6),
based on the fact that the proportionate and b proportionate terms contribute zero. The RHS is equal to
which proves (30).
Similarly, based on the generalized version of (20),
thus proving (32). Note that we have replaced by the correspondingly adjusted (the first big parentheses); also that this time it is the a proportionate term which contributes zero.
Proving (31) and (33) is then done in a practically identical way; one has only to modify the definition (to: the coefficient of minus three times the coefficient of ), and replace C by A and D by B.n
5. Image Construction
Without a loss of generality, we now assume that the object is placed at , and trace a ray with an initial direction of
This will make all terms containing a power of equal to zero in the (22/23) equations, thus simplifying them to read
and analogous expansions of and .
Once we have reached the last (say kth) surface of the optical system, we create an imaginary, flat ( ), surface at a distance from the last surface’s apex (the corresponding at the kth surface is equal to 1, since the optical medium remains the same); we choose in such a way to make equal to zero, resulting in all rays emanating from our image converge (to linear accuracy) to a single point. Since, based on (28) with ,
this is achieved by taking . Thus, any object with the initial z coordinate equal to 0 (thus defining the object plane—objects located in this plane form what we call a scenery) will come into a sharp (i.e. to the first order approximation) focus in thus created image plane.
Nevertheless, the cubic terms of the final location of our image indicate that the convergence is not perfect: the image is either slightly misplaced from its ideal location (thus distorting the shape of the original scenery), or smeared in a variety of ways. Since we have made equal to zero, this implies that , which, together with (26), enables us to further simplify coordinates of the final image’s location to
Let us now explore how these cubic terms affect the quality of the image.
· The term displaces the location of the image away from (towards)—depending on the sign of —the optical axis; this effect increases with the magnitude of , but also with the distance of the image from the axis, thus causing a distortion of the original scenery (see Figure 1).
· The terms smear each image (ideally, a single point) into a small disk whose size is proportional to , with most rays concentrated at its center, and of diminishing (with , where r is the distance from this center) light intensity towards its edges; this is called spherical aberration and it is the same for all images, regardless of their distance from the optical axis (see Figure 2).
· The terms similarly smear the image into a 60˚ wedge pointing towards the optical axis, with a high-intensity apex at the image’s original location, and of decreasing intensity as it spreads up (see Figure 2); this is the so-called coma—the size of the wedge is proportional not only to but also to its distance from the optical axis.
· The terms have two different manifestations: their average effect, namely
can be removed by changing the image plane to
Figure 1. Positive and negative distortion.
Figure 2. Spherical aberration, coma and astigmatism.
i.e. a slightly curved (spherical, to a sufficient approximation) surface of radius (we call it the screen from now on); this aberration is called the medial field curvature.
· The remaining
then yields a disk of uniform intensity (on the new screen—it would form an ellipse in the original image plane). The size of the disk is determined by the system’s aperture—see , but it is also proportional to the first two factors in (53), thus becoming point-like again for images close to the optical axis.
More interestingly, by further modifying the screen’s curvature (making its radius equal to ), we may fully remove the first component of (53), thus making all rays staying in the x-z plane intersect at a single point (their tangential focus), while the remaining rays smear into a straight-line segment in a perpendicular-to-x-z (also known as sagittal) direction. Similarly, when the radius changes to , it is the perpendicular rays which converge to the sagittal focus, while the rest of them create a line segment in the x direction. This effect is called astigmatism (see Figure 2).
We should now mention that these formulas have been derived assuming a conical pencil of rays whose central ray is parallel to the z axis. But this would often result in most of these rays missing the first spherical surface (which is always of only a finite radial extent). It is therefore important to redirect the cone towards the central part of this surface; this can be achieved by changing (47) to
so that (to a good approximation) the central ray (properly called chief of primary ray, see  ) enters the first surface at its apex. This maximizes the size of the light pencil which will pass through the optical system and build the corresponding image (the situation is actually more complicated—the cone should be directed at the so-called entrance pupil, but discussing this would take us beyond the scope of this article). This will correspondingly change the coefficients in (50), but it will not change the general form of it; this is easy to prove, and it is also automatically achieved by our Mathematica program.
7.1. Simple Lens
When an optical system consists of more than one lens, finding general formulas for individual aberrations is not feasible (they would be extremely lengthy functions of many parameters). We thus choose to do this only for the simplest possible optical system, namely a single lens with identically shaped surfaces (of radius R and -R) at zero distance from each other (the thin-lens approximation, which works reasonably well when their distance is small). To get the answer, all we need to do is to type:
Note that to find (denoted h and H in the program) of the image plane, we had to eliminate the v term in the linear part of the first component of the image’s location.
Running this code yields the following results: there is zero distortion, while the remaining aberration terms are
n is the lens’ index of refraction and g is the distance from the object to the first surface.
Note that the largest value of x is given by g multiplied by the so-called field of view, while the largest v is given by the radius of the lens’ x-y extent divided by g; this is important to realize when comparing coefficients of different aberrations.
7.2. Objects at Infinity
When the object’s distance from the optical system (our g1) is orders of magnitude larger than the size of the system itself, it is convenient to employ a different approach: the object’s location can then be specified by the incoming rays’ direction (they arrive practically parallel to each other), and x and y become the first two coordinates of the point at which any such ray enters the plane.
This necessitates reversing the role of x and v when interpreting the resulting aberrations: the term proportional to v3 now represents distortion while the term which goes with x3 yields spherical aberration, etc. We demonstrate how this works, also using a thin lens, but this time allowing its two surfaces to be of different radius, say R1 and R2.
This results in the sum of the following terms
the last being the spherical aberration.
7.3. Cooke Triplet
This is an old (going back to 1935) design of a camera objective consisting of three lenses (see  ); the actual details are obvious from the following Mathematica code (all distances and radii are in mm).
The program yields the usual sum of four aberration terms plus, as a by-product (see  ), the focal length of the system (of 98.66 mm).
We have already mentioned that the largest value of x/g is determined by the system’s field of view which is, in this case, about 25 degrees (this can be established by using the same program to follow a principal ray entering the system at 25 degrees and noting that its location upon reaching the last lens is at its very edge—all three lenses have roughly the same diameter of about 200 mm; this implies that a ray entering at a higher angle would not make it through the system). Similarly, the largest value of is to a good approximation given by the corresponding radius (100 mm), divided by g. To be able to directly compare individual aberrations, we then express them all in powers of
instead of the original x and v. Note that both X and V are now dimensionless, each having the maximum possible value of 1.
This is achieved by extending our program by the following extra line:
The result is still an expression too lengthy to quote here, due to its g dependence. But a simple graph reveals that the expression rather quickly converges (becoming sufficiently accurate when ) to its limit of
where all coefficients are in mm. This should be compared to the size of the actual image, which our program locates at
We would like to reiterate that this article has focused on a single issue of third-order aberrations of spherically symmetric system of lenses, and has deliberately avoided many other important issues related to optical-system design. We also acknowledge that the ultimate goal of understanding aberrations is to be able to design optical systems which minimize these; something we have not attempted in this article since this goes well beyond its scope. We have also skipped discussing yet another important, so-called chromatic aberration, which is due to the index of refraction changing with the color of the light. We have similarly avoided any mention of wavelike nature of light, and the limitations this imposes on forming an image of an object. Our bibliography lists several books (e.g.  and  ) which provide more information on many of the topics left out by this article.