1. Introduction and Background
In time-spectral methods for time-dependent ordinary partial differential equations, a spectral representation is employed for the temporal domain. As an alternative to standard finite time differencing, this approach has been studied by several authors  -  . It is sometimes held, however, that computing the solution simultaneously over all space-time is inefficient   . In this work, we will show that high efficiency can indeed be obtained through the use of optimizing methods, including spatial and temporal subdomains.
The focus is here on the recently developed Generalized Weighted Residual Method (GWRM), where truncated Chebyshev expansions are employed   . Similarly as for other time-spectral approaches, the CFL condition and other grid causality conditions associated with time marching algorithms are eliminated. Although the problems to be solved typically are causal, the method is acausal in the sense that the time dependence is calculated by a global minimization procedure (the weighted residual formalism) acting on the time integrated problem, recalling that, in standard WRM, initial value problems are transformed into a set of coupled linear or nonlinear ordinary differential equations for the time-dependent expansion coefficients  . These are solved using finite differencing techniques.
In the GWRM, not only temporal and spatial but also physical parameter domains may be treated spectrally using Chebyshev polynomials, being of interest for carrying out parameter scaling dependence in a single computation. How this works becomes clear as the method is briefly described in the next section.
Various suggestions related to spectral methods in time have been put forth in the literature. As early as 1979, a pseudo-spectral method, based on iterative calculation and an approximate factorization of the given equations, was suggested  . Also, some early ideas, not developed further, are due to Peyret and Taylor  .
In 1986 and 1989, Tal-Ezer   , proposed time-spectral methods for linear, periodic hyperbolic and parabolic equations, respectively, using polynomial approximation of the evolution operator in a Chebyshev least square sense. Periodicity was assumed for the spatial domain by using Fourier spectral approximation. The method extends the traditional stability criterion for explicit algorithms, where the space resolution parameter , to higher efficiency, yields a stability condition . This approach to extend the time step in explicit methods was further studied in  . The method is not widely used; a reason for this may be its complexity and its restriction to certain classes of problems. Later, Luo extended the method to more general boundary conditions and multiple spatial dimensions  .
In 1987, a “double spectral method”, with polynomial spectral functions in both space and time variables was suggested for nonlinear diffusion equations  . The method, using a somewhat different convergence scheme than the GWRM, was found to be more efficient than a related scheme where only the spatial domain was spectrally decomposed.
Ierley et al.  used a Fourier representation in space and a Chebyshev representation in time for solving a class of nonlinear parabolic partial differential equations with periodic boundary conditions. Similarly as for the GWRM, the Burger equation and other problems were solved with high resolution. Tang and Ma  also used a Fourier representation for solution of parabolic equations, but introduced Legendre Petrov-Galerkin methods for the temporal domain.
In 1994, Bar-Yoseph et al.   used space-time spectral element methods for solving one-dimensional nonlinear advection-diffusion problems and second order hyperbolic equations. Chebyshev polynomials were later employed in space-time least-squares spectral element methods  .
A theoretical analysis of Chebyshev solution expansion in time and one-dimensional space, for equal spectral orders, was given in  . The minimized residuals employed were however different from those of the GWRM. More recently Dehghan and Taleei  found solutions to the non-linear Schrödinger equation, using a time-space pseudo-spectral method.
Time-spectral methods feature high order accuracy in time. For primarily implicit finite difference methods, deferred correction methods may provide high order temporal accuracy       . High-order collocation solutions are found by performing a series of correction sweeps with a low-order time-stepping method. Both classical and spectral deferred correction methods, however, employ relatively short time intervals for iterated correction of the solution, whereas the GWRM time spectral functions are supported in long time intervals; for ODEs about two orders of magnitudes times typical Runge-Kutta step lengths.
A relatively recent approach to increase the temporal efficiency of finite difference methods is time-parallelization via the parareal algorithm  . This method, however, features rather low parallel efficiency and improvements have been suggested, for example the use of spectral deferred corrections  . An interesting Jacobian-free Newton-Krylov method for implicit time-spectral solution of the compressible Navier-Stokes equations has recently been put forth by Attar  .
A time-spectral method for periodic unsteady computations, using a Fourier representation in time, was suggested in  and further developed in  and  . A generalization to quasi-periodic problems was developed in  . In summary, although time-spectral methods have been explored in various forms by several authors during the last few decades, and were found to be highly accurate, the GWRM as described in  has not been pursued extensively.
Returning to the issue of efficiency, most of the GWRM computational effort is spent in solving the system of linear or nonlinear (depending on the type of problem) algebraic equations for the Chebyshev series coefficients. Iterative root solvers require either the computation of an inverse matrix or the solution of an equivalent matrix equation. As a simple example consider GWRM solution of a 1D initial-value PDE, employing Chebyshev polynomials of order K and L in time and space, respectively. Then numerical operations are typically required for matrix inversion and operations using LU decomposition for solving the corresponding matrix equation  . Should large modal number K and L be necessary for sufficient resolution of the full computational domain, the corresponding large number of operations may indeed prohibit any positive comparison with finite difference methods. Furthermore the memory requirements can be shown to scale as . It is clear that measures need to be taken to reduce these numbers.
The paper is arranged as follows. In the next section, a short introduction to the GWRM is provided. In Section 3 several methods for improved GWRM efficiency will be presented. These will, in turn, be implemented as we compare the efficiency of the GWRM versus explicit and implicit finite differencing methods in section 4. The paper ends with discussion and conclusion.
2. The Generalized Weighted Residual Method (GWRM)
We may write a system of parabolic or hyperbolic initial-value partial differential equations symbolically as
where is the solution vector, D is a linear or nonlinear matrix operator and is an explicitly given source (or forcing) term. Note that D may depend on both physical variables (t, and ) and physical parameters (denoted ) and that f is assumed arbitrary but non-dependent on . Initial as well as (Dirichlet, Neumann or Robin) boundary conditions are assumed known.
Our aim is to determine a spectral solution of Equation (1), using Chebyshev polynomials  in all dimensions. For simplicity, we restrict the discussion to a single equation with one spatial dimension x and one physical parameter p. Thus the solution is approximated by (prime denotes that each zero index renders a multiplication by ½)
The Chebyshev polynomials of the first kind (henceforth simply referred to as Chebyshev polynomials) are defined by . These are real ordinary polynomials of degree n, orthogonal in the interval over a weight . Thus , , and so forth. Extension to arbitrary computational intervals for t, x and p is described in  .
As in standard WRM a residual R is defined as, here using the Picard integral formulation of Equation (1):
The coefficients of the Chebyshev series are subsequently determined from the set of algebraic equations being generated by R from the requirement that the residual should satisfy the Galerkin WRM defined over the full computational domain:
Interval variables and P are defined in  . The right hand terms of Equation (1) have all been expanded in Chebyshev polynomials. The resulting algebraic equations are solved using the iterative solver SIR  , which features improved convergence characteristics as compared to Newton’s method with line-search. Further details of the GWRM procedure, including handling of boundary conditions, can be found in   .
All computations are performed using the computer mathematics program Maple. The GWRM is easily coded in languages like Matlab or Fortran, but absolute computational speed is not important for the comparisons with finite difference methods made here; rather it is important that all comparisons are carried out within the same computational environment.
3. Improving Efficiency
An early implementation of the GWRM was compared with finite difference methods for solving two elementary initial-value problems in  . Studies of accuracy and efficiency were made for the nonlinear 1D Burger equation and a linear, forced 1D wave equation, respectively.
The 1D Burger equation, being related to problems in fluid mechanics and magnetohydrodynamics (MHD), is
where υ can be interpreted as a (kinematic) viscosity. For comparisons, we use an exact solution of this equation  . It was found that, for specified accuracy, the Burger equation was solved about two times faster for by the Lax-Wendroff method than by the GWRM and about four times faster with a semi-implicit method, advancing the linear diffusive term with the Crank-Nicolson scheme and the nonlinear convective term explicitly.
The 1D forced wave equation being solved is
where the forcing function is . The exact solution is , featuring two time scales where we choose the driving term time scale to be much longer than the intrinsic time scale; the respective ratio is . The primary aim was here to average out the fast time scale behavior in order to generate approximate solutions following the slower time scale. For similar accuracy, the GWRM was here about 10 times faster than Lax-Wendroff and 30 times faster than Crank-Nicolson.
In the following, we will present algorithm improvements that substantially enhance the performance of the GWRM for the 1D Burger and the 1D forced wave equations described above. Comparisons will be made with the explicit Lax-Wendroff and implicit Crank-Nicolson methods. Although more efficient time stepping methods for the model problems have been developed, these are chosen as well known reference methods.
Furthermore, GWRM performance improvements for a third, advanced problem will be studied; the set of 14 (actually 7 complex), linearized ideal MHD equations modelling the stability of a magnetically confined plasma.
How then, is the GWRM made more efficient? The measures that are taken fall into two categories: a) optimal adaption of the root solver SIR to the GWRM and b) streamlining of the GWRM itself. Below we present the ideas and algorithms that have been developed for these categories; performance results will be given in the next section.
3.1. SIR Optimization
ODEs and PDEs can be solved globally by the GWRM scheme given in section 2 using single spatial and temporal domains. High resolution then requires high modal numbers K and L (we let M = 0 in this paper) which in turn results in a large set of nonlinear or linear algebraic equations to be solved simultaneously by SIR. A natural step to avoid the corresponding cubic and quadratic dependencies on N for the number of operations and memory storage, respectively, would be to divide the physical domain into coupled subdomains in space and time.
Substantial CPU time would be saved if the subdomain equations could be computed independently to some extent. Attempts to update the spatial domains independently at each iteration, using previous iterates for boundary conditions only, was however found to be only partially successful  . Convergence requires for this approach that the initial iterates are chosen very close to the solution. In fact it has been shown both theoretically and computationally that iteration convergence, in terms of a limited maximum norm, usually requires a formulation that, by some procedure, couples all equations in each iteration   . In the following this latter, “dependent” subdomain approach is thus employed.
The root solver SIR  is at the core of the GWRM. We will now discuss the measures that have been taken to optimize SIR for GWRM use.
S1. Matrix and vector numerical package. It is important that the computational environment includes efficient packages for standard operations on vectors and matrices. In Maple, the transition from the linalg to the Linear Algebra package resulted in faster handling of the matrix equations. Certain packages, like Vector Calculus, should not be called globally since they slow down computations.
S2. Solution of matrix equations. In SIR, the matrix equation is solved iteratively, where the vector x contains the Chebyshev coefficients of the solution u, φ is a vector with components that are functions of the coefficients, and A is a linear matrix operator being computed to provide optimal convergence at each iteration. To determine A, a linear matrix equation involving the system Jacobian needs to be solved. A large fraction of the GWRM CPU time lies here. Using LU decomposition solution of this system, instead of inversion of J, a dependence rather than for the number of operations is obtained for large matrices. For small matrices, however, inversion turns out to be faster, thus there is an option to choose either method.
S3. Choice of equation solver mode. For many problems, SIR can be run as Newton’s method since sufficient convergence is achieved and less iterations are needed. For improved convergence, SIR default settings  are preferably used.
S4. Effect of A matrix on convergence. When solving linear algebraic equations, the matrix A needs to be computed only for the first domain, provided that the domains are equidistant in time, and can then be re-used for the following time domains. This fact is extremely useful when dividing the temporal domain of the problem into subdomains. Nonlinear PDEs usually require at least 5 - 10 iterations. For the last few iterations, however, the A matrix is nearly constant. Thus substantial CPU time is saved by computing A in the first few iterations only; even beautiful houses can be built with ugly scaffolds.
S5. Band matrix methods. Sparse, band-shaped Jacobian matrices J occur in problems where many spatial subdomains are employed because only neighboring domains are analytically coupled. The Maple Linear Algebra package has built-in algorithms that automatically handle sparse matrix equations efficiently.
S6. J matrix differentiation. The Jacobian J is obtained exactly by analytical differentiation of φ. This is a tedious procedure that, without optimization, may require more than 50% of the total GWRM CPU time for matrices of dimension about 3000 or higher. By implementing algorithms that differentiate the non-zero band matrix elements only, favorable scaling with the number of spatial subdomains is obtained for very large matrices.
S7. Spatial and temporal subdomain influence on φ. In particular for nonlinear problems, the components of φ may be lengthy and complex, thus being time-consuming to differentiate analytically. Significant speed is gained by the use of spatial and temporal subdomains, since then the same global accuracy may be obtained using lower order Chebyshev polynomial expansions in each subdomain, resulting in more manageable φ vectors for differentiation.
S8. Choice of initial vector x = x0. As for all iterative methods, SIR convergence strongly depends on the choice of initial vector x0. The closer to the solution, the faster the convergence. In GWRM computations, x0 is typically taken to be the initial condition or, when multiple time domains are used, the solution for the end of the previous time interval. Thus, if the temporal length is reduced, the solution vector x will arbitrarily approach the initial guess x = x0. Hence, GWRM convergence is always guaranteed. In some computations particularly well-conditioned choices of x0 can be made. For example in numerical weather prediction, several scenarios are computed with slightly different initial conditions in order to provide ensemble results. Rapid GWRM convergence can then be reached by using solutions x from previous computations as x0  .
3.2. GWRM Optimization
Next follows a discussion on the measures taken to optimize the GWRM.
S9. Spatial and temporal subdomains. The use of spatial and temporal subdomains implies that the same accuracy can be retained with lower order Chebyshev polynomials in each domain. Optimistically, if this order could be reduced to half by halving the interval, a speed gain of about a factor 4 would be obtained because of the cubic dependence on the number of modes and that the number of intervals is doubled. In reality, the story is more complicated and there is usually an optimum subinterval length  . For the time domain this means that GWRM time intervals may be a factor of 100 longer than the time steps of, for example, Runge-Kutta methods. For the spatial domain the optimum Chebyshev order is typically much higher than those of finite element methods. As mentioned regarding SIR optimization, a large number of spatial subdomains is favorable for efficiency since the corresponding global Jacobian will become a very sparse band matrix due to that only immediately neighboring domains will contribute to non-zero near-diagonal matrix elements.
S10. Overlapping spatial subdomains. It is preferable to use overlapping spatial subdomains in Chebyshev spectral methods as compared to a procedure where function and functional derivative values are matched at the borders. A standard is two-point overlap (“hand-shake”). The reason is that the Chebyshev spectral space representation of derivatives is sensitive to the values of higher order coefficients, which values are quite approximative both during early iterations and for solutions that do not need to be precisely calculated. The amount of overlap can be chosen arbitrarily; very small values (order 10−6 of the spatial domain) are usually favorable for high accuracy. The number of overlap points required to preserve boundary condition information across the spatial domain is a function of the number of first order PDEs that are solved  .
S11. Adaptive temporal subdomains. Time overlap is only used for the temporal domains when it enhances convergence, since accuracy generally is negatively affected. Adaptive time interval length, however, greatly enhances efficiency. Best results have been obtained by starting with a relatively long time interval; if convergence is not reached, the time interval is automatically reduced and a new computation is performed. The algorithm regularly strives to increase the time interval length, which procedure is very forceful in smooth computational terrain. It may be mentioned that this algorithm is very robust since Chebyshev polynomials are limited to values in the interval [−1, 1]; thus the numerical values of higher order Chebyshev coefficients directly measure convergence.
S12. Time parallelization. The use of spatial subdomains opens up the possibility for performing strongly parallel computations in each time interval. In an approach termed “the Common Boundary Condition method” (CBC) we solve the local physical equations of each subdomain in parallel for each iteration, whereas the global computation only involves the boundary equations that connect the domains. This promising procedure is relatively complex and will be reported elsewhere.
S13. Clenshaw’s algorithm. Nearly all GWRM computations take place in spectral space. The computation of a Chebyshev series however, which may be needed for example when handling overlapping temporal domains, suffers from truncation errors at higher modal numbers. Clenshaw’s algorithm  allows accurate high order representations and should be used instead.
S14. End conditions. Since the GWRM is an acausal algorithm, initial conditions can be traded for end conditions for possible improvement of numerical stability. This potential avenue is, so far, only explored for some simple cases with neutral result.
Finally we would like to mention our present development of an adaptive spatial subdomain method that automatically widens the spatial domains in smooth regions and narrows them near sharp spatial gradients. The idea is to narrow the domains with the highest amplitudes of the highest order Chebyshev coefficients, since these indicate limited resolution. Substantially extended accuracy, with only marginally increased computing time, has been found for the Burger equation. Results will be reported elsewhere.
Early implementations of the GWRM were compared with finite difference methods with respect to convergence, accuracy and efficiency for the two model problems discussed above   . Efficiency enhancement of the GWRM, employing the ideas of Section 3, will now be demonstrated for these cases. An advanced problem, formulated as 14 coupled MHD PDEs, will subsequently be addressed.
4.1. Accuracy―The Burger Equation
In  Burger’s equation was solved by the GWRM for with the parameters T = 10, Nt = 1, K = 9, Ns = 2, L = 7 using a relatively fast algorithm where the spatial subdomains were solved independently at each iteration, and coupled thereafter. We here and henceforth use the notation T = t1 − t0, with t0 = 0. The number of time intervals is denoted by Nt and Ns is the number of spatial subdomains. Obtained run parameters were CPU time 2.48 s and memory use 182 MB. This algorithm is, however, often numerically unstable  and is therefore not reported elsewhere in this paper. The alternative unoptimized code in  , with the spatial subdomains simultaneously (‘dependently’) solved at each iteration, required 5 iterations for an accuracy of 1.0 × 10−3, using 14.1 s CPU time and 192 Mb. The new, optimized code (employing the measures of section 3) is substantially more efficient, using 1.27 s and 37.1 MB.
Comparing with finite difference methods, the same accuracy is obtained with the second order Lax-Wendroff method  in 2.37 s, using 234 MB of memory. The spatial grid needs 70 points for accuracy whereas 1000 time steps are needed to satisfy the dominating CFL criterion for this problem   . A semi-implicit method, advancing the linear diffusive term using the Crank-Nicolson scheme and the nonlinear convective term explicitly was also implemented. Again employing , only 500 time steps, 0.47 CPU s and 37.1 MB of memory were required for an accuracy of 1.0 × 10−3.
In summary, for an accuracy of 1.0 × 10−3 the optimized GWRM solution of the Burger equation for required about half the CPU time of the explicit Lax-Wendroff method and only 15% of the memory. The semi-implicit Crank-Nicolson method needed the same amount of memory as the GWRM but was about two times faster. In this section accuracy is studied, so we now turn to comparisons for higher accuracy.
Using the optimized GWRM, again for , an increased accuracy of 1.0 × 10−4 was obtainable for T = 10, Nt = 5, K = 6, Ns = 5, L = 7, requiring 4 iterations for each time interval, 6.72 CPU s and 88.3 MB. The Lax-Wendroff method needed 57.4 CPU s and 1430 MB, using and 8100 time steps. Corresponding parameters for the semi-implicit method was 28.6 CPU s, 456 MB, and 4500 time steps. Increasing accuracy to 1.0 × 10−5, the GWRM provides a solution for T = 10, Nt = 12, K = 6, Ns = 8, L = 7, with 3 iterations for each time interval, in 32.3 CPU s using 195 MB of memory. This accuracy could neither be achieved with the Lax-Wendroff nor the Crank-Nicolson method within 180 CPU s or below 3000 MB of memory. As an example, 2.0 × 10−5 accuracy was found for the latter method using and 22,000 time steps in 472 CPU s for 3390 MB memory use.
In summary, for and an accuracy of 1.0 × 10−4, the optimized GWRM solution required 12% of the CPU time and 6% of the memory of the Lax-Wendroff method. When compared to the Crank-Nicolson method, the numbers become 23% and 19% for CPU and memory requirements, respectively. The GWRM consequently strongly outperforms both finite difference methods for higher accuracies. For lower accuracies the finite difference methods become more competitive.
It is well known that spectral methods often are less efficient for problems where shocks or steep gradients must be resolved. This is confirmed for the stiffer 1D Burger case with . A steep gradient towards develops due to convection, as can be seen in Figure 1. The GWRM provides a 7.0 × 10−4 accurate solution for T = 10, Nt = 5, K = 6, Ns = 9, L = 7, with maximum 4 iterations for each time interval, in 17.4 CPU s using 181 MB of memory. The Lax-Wendroff method requires, for the same accuracy, 2.75 CPU s and 180 MB, with = 1/80 and 1000 time steps. Corresponding parameters for the semi-implicit method are 4.62 CPU s and 187 MB, using = 1/80 and 4000 time steps. For an accuracy of 1.0・10−4 the GWRM needs T = 10, Nt = 10, K = 6, Ns = 20, L = 7, with maximum 4 iterations for each time interval, in 153 CPU s using 306 MB of memory. The Lax-Wendroff method uses 73.2 CPU s and 1420 MB for the parameters and 10,000 time steps, whereas the semi-implicit method uses 106 CPU s and 2040 MB for the parameters and 20000 time steps. Thus it is again seen that for high accuracy the GWRM becomes comparatively more efficient and much less memory demanding than the finite difference methods.
Of particular interest is GWRM CPU time and memory scaling with Nt and Ns. Using the case mentioned at the beginning of this section we have performed scans where and . It was found that CPU time scales as and memory usage as (for ). These scalings represent a substantial improvement as compared to the cubic and quadratic
Figure 1. GWRM solution of Burger’s equation for . For parameters see text.
scalings with for CPU time and memory, respectively, that hold for unoptimized code without subdomains (assuming KNt and LNs global modes would be used instead).
Finally we consider which of the measures S1-S8, G1-G6 that contribute most to the improved GWRM performance. Clearly simultaneous use of temporal and spatial subdomains (G1, G2) is important through the avoidance of high order global temporal and spatial modes. The CPU time linear dependence (and memory independence) on Nt is expected, whereas band matrix methods (S5), and also measures S1-S4, S6-S7, contribute to the weak Ns dependence. The present Burger problem is easily solved by SIR, which converges also in Newton mode, being quite insensitive to the choice of initial vector x0 (S8). Measures G3-G6 were unimportant here. We may mention, however, that measure G3, automatic time interval adaption, may improve efficiency substantially in certain problems; for example in a solution of three coupled, time-dependent and chaotic ODEs it leads to GWRM efficiency beyond that of fourth order Runge-Kutta methods  .
4.2. Efficiency-A Forced Wave Equation
The forced 1D wave equation studied in  features two distinct time scales; a slow time scale associated with the driving function and a fast system time scale. A major reason for developing the GWRM was its potential to average out small scale oscillations, thus enhancing efficiency by using a reduced number of spectral modes to follow slow time scales only. Explicit finite difference methods are here hampered by the limiting CFL conditions associated with signals travelling at the fast time scale. Indeed it was found in  that, for the problem studied, the GWRM is substantially faster than the Lax-Wendroff and Crank-Nicolson’s methods. The latter method is slowed down by the need to solve matrix equations at each time step since multiple equations are solved.
Focusing on efficiency in finding smooth, time-averaged solutions, accuracy is here determined by comparison with the slow time scale part of the exact solution, that is the second term of . The optimized GWRM code solves the case in  (T = 30, Nt = 1, K = 6, Ns = 1, L = 8 for A = 10, α = 2π/T, β = 3π, n = 3) to an accuracy of 0.08 in 0.212 CPU s using 36.1 MB of memory. The Lax-Wendroff method solution of  (with and 900 time steps) needs 0.828 s and 69.1 MB for an accuracy of 0.30.
Being a hyperbolic equation, the wave equation is not well suited for the use of implicit finite difference methods because of the problem of resolving phase at time steps larger than the maximum time step stipulated by the CFL condition. Here, however, the emphasis is rather on time-averaged accuracy and efficiency, thus it is of interest to see how an implicit method like Crank-Nicolson’s performs. This method has now been optimized in relation to  . For the case using 100 time steps, a limited time-averaged accuracy of 0.87 was
Figure 2. GWRM time-averaged solution versus time t of the wave equation at , compared to exact, oscillatory solution. For parameters, see text.
achieved employing 1.16 s and 64.1 MB. The Lax-Wendroff method is thus preferable of the two finite difference methods, in spite of being explicit. The GWRM, however, is more accurate and much faster than both the finite difference methods.
The case above features a single wavelength of the slow time scale. In practical situations, many period and slow time scale solutions often are of interest. In Figure 2 we show a GWRM solution of the same problem above for 10 periods (with T = 200, Nt = 10, K = 6, Ns = 2, L = 8 for A = 10, α = 20π/T, β = 3π, n = 3). A global accuracy of 0.22 was obtained using 2.66 CPU s and 83.2 MB of memory. Using Ns = 1 (a single spatial domain) nearly the same accuracy was obtained in 1.08 s, using 66.7 MB.
Comparing with the finite difference methods, Lax-Wendroff obtains the same accuracy with and 10,000 time steps (CFL limit) using as much as 15.8 CPU s and 442 MB. The Crank-Nicolson method features low accuracy because of phase drift and is thus outperformed by the Lax-Wendroff method.
4.3. GWRM Solution of the Linearized, Perturbed Ideal MHD Equation
Magnetohydrodynamic (MHD) stability usually is a necessary condition for magnetically confined fusion plasmas. Theoretically, the stability of a specified plasma equilibrium may be tested by linearizing the so-called ideal MHD equations
Standard notation has been used; mass density, scalar kinetic pressure, fluid flow, magnetic field, current density and electric field are denoted by , respectively. The notation for the total derivative has been used. Having specified a plasma equilibrium and the boundary conditions (in this case in circular cylindrical geometry), a perturbation is applied and the plasma time dynamics is investigated for possible exponential growth, in which case the equilibrium is unstable. Details are given in  , where it is shown that 14 coupled scalar (7 complex) PDEs need to be solved simultaneously. Notable is that the evolution, in the unstable case, will be given by the competition of a number of unstable modes with different number of radial nodes. As the fastest growing mode (with zero radial nodes) starts to dominate, memory of the initial perturbation is gone.
The stability of two equilibria will be studied here, applying the GWRM. The first is that of  ; a) a screw-pinch equilibrium with radially constant current density profile and axial magnetic field B0z = 0.05 (normalized units, erroneously given as 0.2 in  ); the second case b) is a pure z-pinch equilibrium so that B0z = 0. The azimuthally perturbation has Fourier mode number m = 1 and the axial mode number is k = 10. Both equilibria are strongly unstable to this perturbation, featuring exponential growth rates of order unity (normalized to the Alfvén time). A difficulty for the GWRM is thus to polynomially resolve the exponential growth in time. In order for the dominant mode to develop, the equations typically needs to be solved for times T = 10 or more. For benchmarking, GWRM results are compared with an eigenvalue shooting code  . All computations are run in Maple on the same platform.
First we note that the CPU time and memory requirement for case A, earlier discussed in  , was 26.0 s and 444 MB respectively. In total 5 time intervals were used for a single spatial domain (Ns = 1); furthermore temporal K and spatial L maximum mode numbers were both 5. Employing the improvements described in Sections 3.1 and 3.2, the CPU time becomes reduced to 4.44 s and memory to 89.8 MB. Both cases gave the same result; growth rates = 0.83 within 1% error and eigenfunctions within approximately 2% error.
Of particular interest is dependence on number of time intervals Nt and number of spatial domains Ns. Since a linear equation is solved, the A matrix needs to be solved only for the first time interval (see S4). For the case above, the first time interval needs 1.49 s for full solution, whereas succeeding time intervals on average require only 0.68 s, thus a 54% reduction. The CPU time scaling with Nt for these time intervals is linear. Memory requirements are essentially independent of Nt.
For case B) the parameters T = 20, Nt = 3, K = 5, Ns = 1, L = 5 were here used for a run that took 11.1 s, using 105 MB memory, with 15% maximum error in eigenfunction ur. Using Ns = 2 (with 1.0 ´ 10−6 overlap), the CPU time increased to 26.2 s and memory to 336 MB, whereas eigenfunction error decreased to 5%; see Figure 3. The correct growth rate 1.04 was achieved within 1% error.
Figure 3. GWRM solution showing exponential time evolution (a.u.) of perturbed radial velocity u1r versus time t and radial coordinate r, for unstable equilibrium B.
Increasing the number of spatial domains, the CPU time scaling was obtained, in stark contrast to the unoptimized scaling . The memory scaling was found to be (rather than ) as a result of memory use unrelated to SIR.
The ambition of this work has been to evaluate the performance of optimized implementations of the time-spectral method GWRM as compared to finite difference methods in time. In early work   some example PDEs were solved. It was found that the explicit Lax-Wendroff and implicit Crank-Nicolson methods were both somewhat more efficient in finding accurate solutions to the 1D Burger equation, whereas the GWRM outperformed the finite difference methods in tracing the longer time scale behavior of a PDE representing a forced wave equation. An advanced problem in MHD, including 14 simultaneous PDEs, was accurately solved but it was, in this early work, realized that the cubic CPU time and quadratic memory dependence on the total number of Chebyshev modes limits the performance of the method. Subdomains in time and space should be used for advanced problems. Subsequently, in  , it was found that the spatial domains could be decoupled during the iterations for some problems, which dramatically increases performance, but the method is not universally stable and usually requires good initial guesses (by, for example, using short time intervals) for the root solver SIR.
In the present work we report on results using fully coupled spatial subdomains. As described in Section 3, a number of measures to enhance efficiency both for the GWRM itself, but also for the root solver SIR, have been developed. Returning to the earlier model problems, using the new algorithms, we now find strongly enhanced performance. Of primary importance are the improved CPU time and memory scalings, where the uses of sparse matrix methods play central roles.
Let us estimate the requirements for solution of an advanced 2D initial-boundary value problem. Primarily, GWRM efficiency depends on the solution of a matrix equation in SIR for determining the matrix A, used by SIR. In the absence of subdomains in time and space, the dimension N of this matrix is determined by the number of simultaneous equations to be solved , and the number of Chebyshev modes ( ); thus , assuming K + 1 ≈ K etc. With , , , , we obtain . Standard Gauss elimination requires operations for each SIR iteration. Thus, for this case, operations, which would call for high performance computers.
A substantial improvement in efficiency comes from the introduction of subdomains in time and space. We let , and . Here and denote the number of spatial subdomains in the x- and y-directions, respectively and is the number of temporal subdomains. , , and denote the number of Chebyshev modes used for each domain. In the unoptimized case we have approximately and . Letting , , , , , , we find and , Clearly, spatial optimization is of the essence. The scalings found from the optimizations presented in this paper substantially improves the situation. Using the optimized dependency obtained in this work, rather than , we find , a substantial reduction. A gigahertz table top computer could thus solve the problem within a few hours.
The scalings discussed above are indeed validated for the 1D problems considered in this paper; taking into account the dependence good agreement is obtained with the CPU times used.
Turning to 3D problems, we may assume a further scaling of the number of operations with . Thus for a problem with and , using the above parameters, we have and , which is not prohibitive for high performance computers.
Further novel ideas for improving GWRM efficiency can, however, be employed. In recent work, to be published elsewhere, the number of simultaneous global spatial equations to be solved by SIR is reduced to the boundary equations (external plus internal) only. The physics equations of each spatial subdomain are solved locally at each iteration, and strong time parallelization is possible. The resulting improved scalings are particularly important for problems with multiple spatial dimensions.
In this paper we have not employed automatic adaption of the time intervals (G4). This method has been proven to be very efficient when the GWRM was used for solving a set of chaotic differential equations in time, typical for numerical weather prediction  . Time adaption lead to accurate GWRM solution of this problem at least as efficient as fourth order Runge-Kutta methods. As mentioned, automatic spatial subdomain adaption is presently investigated with promising initial results. Thus automatic temporal and spatial interval adaption, global solution of boundary equations only and parallelization will be interesting further paths of development of the GWRM for applications on advanced problems.
The time-spectral generalized weighted residual method (GWRM) employs a Chebyshev polynomial representation in time instead of the time differencing procedures that are typical for standard methods for solving differential equations. Unoptimized use of the method is, however, hampered in efficiency by the cubic dependence of the number of operations on the total number of Chebyshev modes. Several measures for enhancing efficiency, primarily sparse matrix methods, have here been studied, employing multiple temporal and spatial domains.
It was found that Burger’s 1D equation, with viscosity parameter , was solved significantly faster and more accurate by the GWRM than by the explicit Lax-Wendroff and the implicit Crank-Nicolson finite difference methods for accuracies of order 1.0 × 10−4 or higher. Furthermore, it was found that GWRM CPU time scales as and memory usage as, where Nt and Ns are the number of time intervals and spatial subdomains, respectively. This is a significant improvement of the and scalings, respectively, characteristic of the unoptimized case.
The slower time scale of a forced wave equation problem, solved by all three methods, is found and followed by the GWRM much faster and using less memory than the finite difference methods.
For an ideal MHD stability problem, it was found that the performance enhancement measures S1-S8, G1-G5 of section 3 yielded a more than five-fold increase in efficiency. Being a linear problem, for which information from the first time interval can be reused, the CPU time for further time intervals becomes halved. A CPU time scaling with spatial subdomains was obtained; a substantial reduction of the unoptimized scaling. The memory scaling was somewhat improved to (as compared to). The scalings enable solution of advanced 2D and 3D problems using the GWRM.
In closing, it may be mentioned that all obtained GWRM solutions are analytical piece-wise polynomial expressions in time and space, thus immediately tractable for analysis. By using Chebyshev expansions also in parameter space, scaling behavior can be determined in a single GWRM run, as demonstrated in  .
 Ierley, B.S.G., Spencer, B. and Worthing, R. (1992) Spectral Methods in Time for a Class of Parabolic Partial Differential Equations. Journal of Computational Physics, 102, 88-97.
 Kosloff, D. and Tal-Ezer, H. (1993) A Modified Chebyshev Pseudospectral Method with an O(N-1) Time Step Restriction. Journal of Computational Physics, 104, 457-469.
 Zrahia, A.U. and Bar-Yoseph, P. (1994) Space-Time Spectral Element Method for Solution of Second-Order Hyperbolic Equations. Computer Methods in Applied Mechanics and Engineering, 116, 135-146.
 Bar-Yoseph, U.Z.P., Moses, E. and Yarin, A. (1995) Space-Time Spectral Element Methods for One-Dimensional Nonlinear Advection-Diffusion Problems. Journal of Computational Physics, 119, 62-74.
 Tang, J.-G. and Ma, H.-P. (2002) Single and Multi-Interval Legendre-Methods in Time for Parabolic Equations. Advances in Computational Mathematics, 17, 349-367.
 Gopinath, A.K. and Jameson, A. (2005) Time Spectral Method for Periodic Unsteady Computations over Two- and Three-Dimensional Bodies. 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, 10-13 January 2005, 14 p.
 Dehghan, M. and Taleei, A. (2010) Numerical Solution of Nonlinear Schrödinger Equation by Using Time-Space Pseudo-Spectral Method. Numerical Methods for Partial Differential Equations, 26, 979-992.
 Mavriplis, Z.Y.D.J., Yang, Z. and Mundis, N. (2012) Extensions of Time Spectral Methods for Practical Rotorcraft Problems. 50th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, Nashville, 9-12 January 2012, Paper AIAA 2012-0423.
 Emmett, M. and Minion, M. (2012) Toward an Efficient Parallel in Time Method for Partial Differential Equations. Communications in Applied Mathematics and Computational Science, 7, 105-132.
 Attar, P.J. (2015) Jacobian-Free Newton-Krylov Method for Implicit Time-Spectral Solution of the Compressible Navier-Stokes Equations. International Journal for Numerical Methods in Fluids, 79, 1-15.
 Scheffel, J. (2011) Time-Spectral Solution of Initial-Value Problems. In: Jang, C.-L., Ed., Partial Differential Equations: Theory, Analysis and Applications, Nova Science Publishers, Inc., Hauppauge, 1-49.
 Dutt, A., Greengard, L. and Rokhlin, V. (2000) Spectral Deferred Correction Methods for Ordinary Differential Equations. BIT Numerical Mathematics, 40, 241-266.
 Huang, J., Jia, J. and Minion, M. (2006) Accelerating the Convergence of Spectral Deferred Correction Methods. Journal of Computational Physics, 214, 633-656.
 Layton, A.T. (2009) On the Efficiency of Spectral Deferred Correction Methods for Time-Dependent Partial Differential Equations. Applied Numerical Mathematics, 59, 1629-1643.
 Scheffel, J. and Håkansson, C. (2009) Solution of Systems of Nonlinear Equations, a Semi-Implicit Approach. Applied Numerical Mathematics, 59, 2430-2443.
 Scheffel, J. and Mirza, A. (2012) Time-Spectral Solution of Initial-Value Problems-Subdomain Approach. American Journal of Computational Mathematics, 2, 72-81.
 Riva, F., Milanese, L. and Ricci, P. (2017) Uncertainty Propagation by Using Spectral Methods: A Practical Application to a Two-Dimensional Turbulence Fluid Model. Physics of Plasmas, 24, Article ID 102302.