In this paper, we consider the numerical solution to the Helmholtz equation in the Cartesian domain with homogeneous Dirichlet boundary condition, where is the Laplacian operator, is a real parameter representing the wavenumber, u is the amplitude associated with wave propagation, and is a forcing function. Without loss of generality we assume u = 0 on the boundary, i.e.,
We present a highly parallel numerical algorithm for solving the Helmholtz equation discretized by the Chebyshev pseudospectral scheme. Chebyshev pseudo-spectral methods have long been used to numerically solve partial differential equations    . This discretization yields a linear system with coefficient matrix of a special block structure that allows for a block diagonalization which decouples the original matrix, along with some proper permutation of the decoupled matrix into several independent submatrices. By exploiting a reflexivity property that is inherent in the Chebyshev differentiation matrix, it is possible to further decompose each of the submatrices into two independent submatrices when the boundary conditions at both ends of the decoupled 1D subproblems are the same. This second level of decomposition effectively doubles the number of independent submatrices, yielding a higher degree of course-grain parallelism.
This paper is organized as follows. In Section 2, we present the general structure of the linear system derived from the Chebyshev collocation method discretized independently in x and y direction. In Section 3, we address the two-level block decomposition. This decomposition was proposed by the author in  for solving Poisson equations with identical grids in either direction. In this paper, we reformulate the decomposition to allow for the number of grid points in the x direction to be different from that in the y direction and derive the explicit forms for the decomposed submatrices, making applications of this approach more flexible. The computational procedure and a numerical example to demonstrate its usefulness are presented in Sections 4 and 5, respectively.
2. Linear System from the Chebyshev Collocation Method
In this section, we briefly describe the Chebyshev collocation method for solving the Helmholtz equation in two dimensions. We assume that the problem is discretized with a tensor product grid , , where and are the Chebyshev points in x and y direction, respectively. Let , indexed from 0 to , be the Chebyshev spectral differentiation matrix associated with the x direction. The entries of are given as   
where . The Chebyshev spectral differentiation matrix associated with the y direction can be obtained in a similar may.
Now, let be the stripped matrix of by removing the first and last rows and columns of and be the stripped matrix of . We have
With the grid just described, the Chebyshev collocation method yields, after removing the boundary nodes whose values are already known from the system, the following linear system
where , , and represents the identity matrix of dimension k.
Many solution schemes can be employed to solve Equation (3), including Gaussian eliminations, Gauss-Seidel iterations, ADI iterations, and matrix diagonalizations     . In this paper, the block version of the matrix diagonalization method will be employed to first decompose the matrix K into a block diagonal matrix consisting of q diagonal blocks. Each diagonal block will then be further decomposed into two smaller diagonal sub-blocks using reflexive decompositions, yielding a total of 2q diagonal blocks. Note that q is the number of internal grid points along the y direction. This two-step block decomposition scheme allows for the linear system Equation (3) to be decoupled into 2q linear subsystems which can then be solved in parallel with course-grain parallelism using multiprocessors or networked computers.
3. Two-Level Block Decompositions
3.1. Level-1 Decomposition: Block Diagonalization
The diagonalization scheme has been used elsewhere    . In this section, however, we address the decomposition of the matrix K into q diagonal blocks using a different representation. To begin with, let be such a matrix that the similarity transformation diagonalizes into :
A natural choice of is to have its columns formed from the eigenvectors of so that is a diagonal matrix that consists of the eigenvalues of . Now split the coefficient matrix K in Equation (3) into three parts , , and :
and let . We have the transformed matrix
We now show that the transformed matrix can be permuted to yield a block diagonal matrix. First, we observe that both and are diagonal matrices and is a diagonal block matrix since
Let be the element of , . In its matrix form, can be expressed as
where I is of dimension q. Apparently, is also a diagonal block matrix because both I and are diagonal matrices. The diagonal block matrix can be rearranged to yield a block diagonal matrix :
via appropriate permutations of rows and columns as shown below. Let be the block of and P be such a permutation matrix that the transformation rearranges the entries of in such a way that
where is the element of and is the diagonal element of . Then we have
with being the eigenvalue of . Accordingly,
since . In other words, the matrix K has been transformed to a block diagonal matrix consisting of q blocks. This decomposition takes the advantage of the homogeneity property of the problem with the boundary conditions mentioned in Equation (1) along the y direction to decouple the 2D problem into q independent 1D problems. A similar decomposition can be performed by using the homogeneity property of the problem along the x direction.
3.2. Level-2 Decomposition: Reflexive Decomposition
As seen from the previous section, the matrix K can be transformed to a block diagonal matrix, implying that the original problem can be decomposed into independent subproblems which can then be solved in parallel using multi-processors. Although each diagonal block in Equation (6) can be further decomposed into a diagonal matrix using a single transformation matrix that consists of the eigenvectors of , this diagonalization yields little advantage in general, especially a multiprocessing environment, unless the eigenpairs are readily available, which is not the case for the matrix .
In this section, we shall employ the reflexivity property of the diagonal blocks to further decompose each of them into two independent sub-blocks using another decomposition technique called the reflexive decomposition  .
Let be the cross identity matrix of dimension n: . It is
trivial to see that . Now, we note that the Chebyshev differentiation matrix is anti-reflexive and is reflexive, with respect to . In other words,
Accordingly, the matrix , which is the stripped matrix of (Equation (2)), is reflexive with respect to . In fact, it is a centro-symmetric matrix   . Recall that . Therefore, satisfy the reflexivity property
indicating that each can be decomposed via reflexive decompositions into two block diagonal submatrices of almost equal size, depending on whether p is even or odd. The decompositions can be done through orthogonal transformations which have been shown in  . Here we just present the results.
First consider the case when p is even. Assume for some ,
and is evenly partitioned into sub-blocks: . Taking
the advantage of the reflexivity property of by applying the orthogonal
transformation to where , one can easily
decompose into two decoupled diagonal subblocks of equal size:
In the case when p is odd, say , we consistently partition and J as
By taking the transformation matrix to be , the matrix can be decoupled via the orthogonal transformation as
In either case, let . The orthogonal transformation yields
which obviously consists of 2q independent diagonal blocks since each has two independent diagonal blocks.
4. Computational Procedure
The numerical solution to the Helmholtz Equation (1) discretized by the Chebyshev pseudospectral method yields the linear system
where is the identity matrix of dimension p, is a , a , and K a matrix. Here p (q) is the number of internal grid points in the x (y) direction. The two-level decomposition scheme described in this paper yields the following transformed linear system consisting of 2q independent
subsystems, each of dimension when p is even.
The sizes of the sub-blocks in D differ at most by 1 when p is an odd number. Computationally, this decomposition is very attractive. Not only can the decoupled subsystems be solved much more efficiently, they can also be solved in parallel with course-grain parallelism using a computer system with multi-processors. This two-level composition consists of the following three stages.
1) Transform to where , , and .
This is a typical similarity transformation that is achieved by premultiplying both sides of with and inseting , which is simply an identity matrix, between K and u. The matrix Q has been defined in Section 3.1 and can be found in Equation (5).
2) Transform to where , , and .
The main purpose of this transformation is to reorder the unknowns of the linear system so that the decoupled coefficent matrix is converted to a block diagonal matrix ( ) for the sake of computational convenience. The matrix P, which is just a permutation matrix, has been explained in Section 3.1 and can be found in Equation (6).
3) Transform to where , , and .
This transformation is analogous to that in stage 1 except that X here is an orthogonal matrix. Therefore we replace by . The matrices X and D can be found in Section 3.2
The advantage of this approach is that the decomposed form of D is explicitly known and, therefore, the actual decompositions from K to , from to , and from to D are not necessary. Furthermore, the diagonal blocks of D can be obtained without any matrix-matrix multiplications. The computational procedure of this two-level decomposition boils down to the following three typical steps.
1) Decomposition step. In this step, we shall compute D and g. Since each diagonal block of D,
is independent and explicitly known, D can be obtained in parallel using 2q processors, without any difficulty. The main task here is, thus, to obtain g. This can be achieved by transforming f to first, which involves solving the linear system . It is essential to note that Q, , is a block diagonal matrix and, therefore, can be obtained by solving p independent linear subsystems, each of size q, instead of solving the linear system as a whole, which is of size pq. This step can be performed in parallel with large granularity. The matrix , consisting of the eigenvectors of , is in general not known explicitly and, thus, has to be computed from . Fortunately, is only of dimension q. Once has been computed, and g can be obtained easily because is simply a permuted version of and involves no matrix-vector multiplications due to the special form of X.
2) Solution step. In this step, we solve the linear system for v. Note that D consists of 2q diagonal blocks: and , . By consistently partitioning the vectors x and g into 2q parts, based on the size of the blocks, the linear system can now be expressed as
for . Since all of the subsystems are independent, the linear system can be solved in parallel, using 2q processors, one for each subsystem.
3) Retrieval step. Once v has been obtained, the solution u to the original system can be retrieved from v as follows. We first compute from v using , which again involves no matrix-vector multiplication. We then obtain by simply permuting using . The final solution u can now be computed by the matrix multiplication . Note that and , implying that and u can be obtained in parallel with coursegrain parallelism.
To end this section, it deserves mentioning that this two-level decomposition is a similarity transformation and therefore, all eigenvalues of the original matrix K are preserved in matrix D. Obtaining eigenvalues from D is far more efficient than from K.
5. Numerical Example
To demonstrate the usefulness of the two-level decomposition scheme, we present a numerical example derived from the Helmholtz equation with over a square domain on , subject to homogeneous Dirichlet boundary conditions on a grid with and . The numerical results presented in this section were conducted using GNU Octave which is software that offers many powerful high-level programming features for scientific computations, similar to Matlab. It suffices to show the final decomposed submatrices to illustrate the advantage of this approach. By excluding the grid points on the boundary and using the lexicographic ordering to number the internal nodes  , the matrix K, of dimension 12, yields
For simplicity, we have rounded off the numbers after the last digit shown. First we observe that both and are reflexive. In fact, they are centrosymmetric matrices. Let the columns of be the eigenvectors of , of dimension 3, and be the following permutation matrix of dimension 12,
From the Level-1 decomposition presented in Section 3,
we easily obtain the decoupled diagonal blocks without the need of any matrix multiplications once the eigenvalues of , , are available:
The numerical values of clearly indicate that each is reflexive with respect to . Now, applying the Level-2 decomposition to each by taking
where the unshown entries in the matrices are 0. Apparently, each has been further decomposed into two independent diagonal blocks, yielding a total of six independent diagonal blocks from the original matrix K. Note that the decomposed matrices can be obtained directly from , , and using Equation (7) without the need of computing . Note also that the eigenvalues of K, of dimension 12, can now be easily obtained from the six diagonal blocks, each of dimension only 2, a clear indication of the advantage of this approach.
In this paper, we have presented a two-level block decomposition scheme for the numerical solution to the Helmholtz equation in the Cartesian domain , subject to homogeneous Dirichlet boundary condition, discretized with the Chebyshev pseudo-spectral method. For a computational grid with grid points in the x direction and in the y direction, our first-level decomposition yields independent 1D subproblems, excluding the known values at boundary points, using the eigen-pairs of the 1D second-order Chebyshev differentiation matrix along y. The decoupled coefficient matrices are then rearranged to form a block diagonal matrix through a proper permutation. The second level of decomposition further decouples each of the subproblem into two smaller and independent subproblems using reflexive decompositions, yielding a total of independent subproblems.
The general form of the final decoupled submatrices has also been explicitly derived, eliminating the need of performing matrix-matrix multiplications during the decomposition step. A numerical example is presented to illustrate the applicability and to demonstrate the advantage of this two-level decomposition scheme. This scheme leads naturally to a highly parallel numerical algorithm for solving the problem under consideration, since all subproblems can be solved in parallel using multiprocessors.
To close our conclusion, it deserves mentioning that this block decomposition can be readily employed for finding the eigenvalues of the problem as well. This is due to the fact that all eigenvalues of the original matrix are preserved in the decomposed submatrices because the first-level decomposition is a similarity transformation and the second-level decomposition is an orthogonal transformation. The preservation of all eigenvalues in the submatrices makes this block decomposition scheme even more attractive because finding eigenvalues from the decomposed submatrices is obviously much more efficient than from the original matrix, not to mention the computational benefit that can be achieved from the large-grain parallelism induced by the decomposition.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.