The understanding of muscle structure and muscular movements is the foundation for many scientific researches and clinical applications in image processing, medical imaging and human physiology. However, it is generally challenging to determine whether an observed anatomical structure, whether it be the brain, the tongue, the heart or the limb, is “normal” because there exists a great difference in the underlying anatomical structures for even a small group of subjects. In addition to the great inter-subject differences, significant variance also exists in the soft tissue anatomy on even a single subject. For instance, a recent imaging-based research has indicated that the size and shape of the human heart vary significantly at multiple cardiac phases of a heartbeat, among different heart beats and across various subjects . Given these potential sources of anatomical differences during muscular motion, it has long been a dream for scientists to establish a standard sequence of images representing the “expected” muscle structure and motion pattern for a targeted group of subjects. If such a standard sequence of images, i.e., a spatiotemporal atlas, were to exist, then it serves as the ground truth for quantitatively interpreting the observed muscle movement and accurately characterizing the motion variability of a specific subject as versus to the general motion pattern .
Despite the usefulness of spatiotemporal atlas, its construction often involves procedures that are time-consuming and computationally expensive. In order to create a set of high quality spatiotemporal atlas, initial image formation procedures need to be first performed on each subject from the targeted group. Upon completion of this initial step, it is often essential to define a common space towards which accurate mappings of all subjects can be mapped into . After this common space is successfully defined, group-wise registration techniques are often employed to extract the “expected” anatomical features and motion patterns . Several methods have been proposed to facilitate this step, including group-wise registration with kernel regression , group-wise registration with individual growth model , template alignment techniques  and diffeomorphic image registration . It is worth noting that each of the above mentioned steps involves multiple image formation and image processing procedures that are often nonlinear, memory inefficient and computationally expensive. Specifically in this paper, the author constructed a small training set of spatiotemporal atlas using a diffeomorphic image registration method with Lipschitz-norm-based temporal alignment routine .
Construction of high quality spatiotemporal atlas is even more challenging when the underlying image data are undersampled. The quality of the constructed spatiotemporal atlas depends heavily on the data quality from the image formation steps. Therefore, the quality of the resulting atlas images is very likely compromised if they are constructed upon initial images that are contaminated by geometric distortion, imaging artifacts and noise. This situation, unfortunately, is often seen with clinical applications that involve fast medical imaging, where sparse sampling is performed on the image formation step due to certain physical or physiological considerations . Figure 1 illustrates representative atlas images constructed based on sparse-sampled Fourier space (k-space) data from a dynamic magnetic resonance imaging (MRI) experiment. The top and bottom portions of the initial images are cropped to enable better spatial alignment under subject motion. The lower jaw of the subject in time frame 1001 and 2076 and the nose tip of the subject in time frame 4118 suffer from significant geometric distortion. These geometric distortions mainly result from sparse
Figure 1. Representative atlas image frames (time frame 1001, 2076, 4118) constructed based on sparse sampled k-space data from a dynamic MRI experiment.
sampling in the image formation step and significantly compromises the ensuing quantitative analysis of anatomical features on the resulting atlas image sequence. As can be seen, the subject’s lower jaw and nose tip in the constructed atlas image suffer from significant geometric distortion, which renders the atlas less useful for quantitative analysis on the underlying anatomical features. To remedy the compromised image quality from sparse sampling, anatomical models need to be incorporated to compensate for the image artifacts and distortions from the image formation step .
This paper targets at the above mentioned technical challenges and aims to develop a practical method for generating high quality spatiotemporal atlas images within reasonable computation time and memory requirements. The authors formulate the atlas generation problem as multi-level modulation (M-ary) classification problem. Specifically, a small subset of spatiotemporal atlas images is first constructed as training set using a diffeomorphic registration routine as mentioned above. With this training set, atlas construction proceeds as choosing an atlas image that is “closest” to the one in the training set. A fast template matching method based on Fourier space samples has been chosen to perform this task. The principles underlying these two methods are given in the METHODS section and evaluations of their performance are given in the RESULTS section of this paper. Concluding remarks will also be given at the end of the paper with detailed discussion on the theoretical, numerical and algorithmic problems occur during the formation of the atlas.
2.1. Constructing the Training Set of Spatiotemporal Atlas
Diffeomorphic group-wise image registration lies at the core of constructing the training set of the spatiotemporal atlas. Specifically, the initial images for registration are first obtained from a numerical phantom constructed for evaluating dynamic MRI data sampling and reconstruction methods . The group-wise image registration procedure is performed with the large deformation diffeomorphic metric mapping (LDDMM) algorithm  : let Iinit represent an initial image sequence obtained from an image formation step, Iatlas represent a constructed atlas image sequence on an open and bounded image domain Ω, φ (x, t) denote a continuous differentiable function representing the diffeomorphic transformation from Iinit to Iatlas parameterized over time t. It is also assumed that φ (x, t) can be computed by integrating over a time-dependent velocity field v defined as follows ,
where the function φ (x, t) also satisfies the following integration,
Given Equation (1) and Equation (2), it has been indicated in  that the diffeomorphic image registration procedure can be formulated as a convex optimization problem that aims at minimizing a energy functional as follows
where the first term in the energy functional is a data consistency term and the second term is a regularization term with λ as a regularization parameter. To better leverage the spatial correlation between Iinit to Iatlas,  defines a pair of diffeomorphism over φ1 and φ2 and leads to an improved energy functional as follows,
where Ψ (·) represents a similarity measure. By introducing diffeomorphism into the energy functional, the above formulation guarantees that the forward and inverse mappings between Iinit to Iatlas are “symmetric”. This optimization problem is convex and its global optimum can be obtained with a gradient-descent-based routine . Implementation of this registration routine is based on the ANTs open source software library  and is revised from an existing implementation in .
2.2. Template Matching Functional MRI
Template matching often refers to a range of image processing methods that match a subset of a given image to the targeted template image. Although the underlying principles of template matching are not especially advanced, template matching has found great use and has proven effective in many applications such as object tracking, feature recognition and video matching. Multiple components directly interact with the performance of a template matching algorithm—prototype templates, probability distribution, similarity measure and feature space. Details of each of these three components are given in the following paragraphs.
1) Prototype Templates: Prototype templates are usually created in two ways: a) directly generate prototype templates from an existing physiological model. b) extract prototype templates from predetermined representative images  . In this report, the author chooses the latter approach because having a set of predetermined template images (which are available from publicly accessible data bases) removes the complicated procedures of developing and validating the quality of physiological models . Specifically in this paper, initial prototype templates are created from the training set of the spatiotemporal atlas with manual adjustment on image orientation and contrast using the image processing toolbox in Matlab 2014b (Mathworks, Natick, MA). Three representative prototype templates are shown in Figure 2. As can be seen, the prototype template images are free from geometric distortion and imaging artifacts.
2) Probability Distribution: The underlying probability distribution is critical towards accurately defining how well the targeted image “matches” with the template. The similarity measure is obviously important in the decision rule because its value directly influences the likelihood ratio and therefore the decision boundary (under Gaussian assumption as introduced in class). The risk associated with a certain decision rule (i.e., generating a incorrect frame of spatiotemporal atlas) can only be minimized on the condition that the underlying probability distribution is properly defined.
Despite the importance of probability distribution, defining an appropriate distribution is not an easy task for the problem of generating a spatiotemporal atlas. This is because: 1) There exists a certain level of spatial and temporal misalignment between the targeted images and the prototype templates. 2) The targeted image data are often given as Fourier space (k-space) data. 3) The situation gets worse when the targeted image data are only partially available, or sparsely sampled. Aiming at these difficulties, it has been developed in  an estimation scheme for determining the probability distribution from multiple incomplete features in both image and feature domains. Details of the probability density function projection theorem and the extension to the atlas-generation problem in this report are given in the following paragraphs.
The probability density function projection theorem provides a setting where the algorithm can work with features in both domains the image domain I and the feature domain Z. For the proposed problem in this project, we choose the feature domain Z to be the undersampled Fourier domain, i.e., the sparsely sampled k-space data from the MRI machine. The Neyman-Fisher factorization theorem  states that if Z = ψ (I) is a sufficient statistics for hypothesis H, then the posterior probability p (I|H) can be factorized into the product of two terms as follows:
Figure 2. Representative prototype template frames (template 3001, 3021 and 3041) constructed by performing manual adjustment of image orientation and contrast on the training set of spatiotemporal atlas.
Significance of the Neyman-Fisher factorization theorem lies in that it removes the dependence of p (I|H) and separates it into the product of a function g, whose dependence on H is obtained via ψ (I), and another function h. This allows the cross-talk between the raw data domain and the image data domain to be more manageable. In the case of having M prototype templates, it has been indicated  that the posterior probability can be defined as
where a specific feature set zj is extracted for each hypothesis Hj. It is noticed that the term) p (zj|Hj) lies at the denominator and, therefore, its accuracy may significantly impact the accuracy of the resulting p (I|Hj) even a slight error in p (zj|H0) may cause p (I|Hj) to vary to a large amount. Considering this, it is suggested in  to choose an analytical probability density function as a reference hypothesis. A number of potential analytical solutions for a number of statistics have also been provided as references . Also, according to the Neyman-Fisher factorization theorem , the reference hypothesis H0 can be arbitrarily defined as long as zj represents a sufficient statistic for H0 and Hj In practice, however, it has been demonstrated  that careful choice over the reference hypothesis H0 can significantly improve the accuracy of determining the probability distribution.
3) Similarity Measure: Similarity measure should be a real-valued function that quantifies the similarity between two objects. In the context of this project, the similarity measure should be a function of spatial location, temporal location, image perspective, image contrast and image orientations. Although various definitions of similarity measures have been defined in the literature    , these definitions are invariably based on either the vector norm or the matrix norm. Specifically, if we assume a vector x representing a patch in the incoming image data (an putting it into a column),
The performance of various vector norms has been evaluated and the Frobenius norm (with p = 2 in the above expression) has been chosen due to its computational convenience. Also, if we choose to overload the notation with matrix norm, the lp norm of a mapping A from the image space to the feature space given by,
In the context of this paper, p can take on various non-negative values. In particular, the case of p = 1 corresponds to the template matching approach by finding the mean absolute difference (MAD) and the case of p = 2 corresponds to the template matching approach by finding the mean squared errors (MSE). It should be noted that there is no limitation on the form of the mapping A as long as its induced norm can be properly defined.
4) Feature Space: Defining an appropriate feature space is important for a practical template matching technique. As mentioned in the above paragraphs, the incoming image data for construction of the spatiotemporal atlas are provided in forms of k-space. The relation between the image space data and the k-space data are given by the Fourier transform relationship as follows,
where d (k, t) represents the acquired data in k-space along time, Ω represents the spatial support of the spatiotem-poral image function I (x, t), k represents the coordinates in the Fourier space (where data are acquired from) and η (x, t) represents the measurement noise. In a more practical setting, sparse sampling is applied to acquire the k-space and the relation between the image space data and the acquired data is given by
where U represents an undersampling operator that sparsely collects samples in the k-space along time. Figure 3 illustrates the relationship between a representative image in Ω, its associated image in the k-space and the sampling trajectory in the k-space. As can be seen, the image data in k-space are only partially acquired with a sparse spiral trajectory. Therefore, it is difficult to define an appropriate similarity measure based on the data acquired from this trajectory.
As can be seen with Equation (10) and Figure 3, the provided data for the construction of spatiotemporal atlas are acquired in the k-space. Mathematically, the k-space data can be interpreted as the integration of all the “spatial frequency” components in the image space I. In this case, it is challenging to define and separate anatomical features from the acquired data because each sample contains information from all the features from I. Considering this, this project proposes to apply singular value decomposition (SVD) on the acquired k-space data to extract features. The author decides to use SVD to extract anatomical features from the acquired data because: 1) numerical procedures in calculating SVD are independent of the spatial coordinates of the acquired data d (k, t). Using SVD removes the difficulties of interpolating the non-Cartesian sparse
Figure 3. The relationship between an image in Ω, its associated image in the k-space and the associated sampling trajectory in the k-space. As can be seen, the sampling trajectory is a sparse spiral trajectory that only collects sparse, non-Cartesian samples in the k-space.
sampled data onto a Cartesian grid; and 2) SVD has proven useful for extracting the general data features as a variation of the principal component analysis. Mathematically, this can be expressed as
where d (k,t) represents the acquired data organized into a matrix, whose column space represents the spatial domain and row space represents the temporal domain, L represents the order of decomposition, dl (k) represents the spatial subspace of d (k, t), σl represents the singular values corresponding to each index l and dl (t) represents the temporal subspace of d (k, t). Mathematically, Equation (11) can be written in a more condensed form,
where are chosen as the feature space in the context of this project. It should be noted that , instead of are chosen as the feature space because the template matching problem mainly attempts to match an incoming image towards the subject motion at a speciﬁc time point. However, the template matching problem can also be formulated as one that matches the spatial features towards the prototype template. Therefore, it would be also reasonable to deﬁne feature space over . The evaluation and comparison between these two methods will be evaluated in the future.
In Figure 4, nonlinear image registration procedures that are computationally expensive (left); the proposed fast template matching methods in this report (middle). The absolute difference between these two images is shown on the right. It is obvious that the difference is mainly due to contrast difference between the image-registration-based image and the prototype image. This result demonstrates the effectiveness of the fast template matching. The schematic diagram of how the spatiotemporal atlas was created from numerical phantom and training set of images are shown in Figure 5.
Figure 4. Comparison between the spatiotemporal images generated from.
Figure 5. Schematic diagram of the method to create spatiotemporal atlas through fast template matching.
To demonstrate the effectiveness of the method introduced in this paper, a comparison between two spatiotemporal atlas images has been performed. As seen in Figure 4, the image on the left was constructed with expensive nonlinear image processing steps. The image in the middle, however, was constructed using the fast template matching method introduced in this paper. The image on the right illustrates the absolute difference between the previous two images. It is obvious that the image created with template matching inherits the anatomical features from the prototype templates and matches well with the image generated with computationally expensive image processing. In addition, the differences are mainly caused by the contrast difference, but do not contain signiﬁcant structural misalignment or geometric distortion.
To demonstrate how well the spatiotemporal atlas generated from fast template matching represent the true movements of the subject, a temporal proﬁle is given in Figure 6 across a vertical strip across the tongue tip. As can be seen, the ﬁrst 600 frames of the spatiotemporal atlas are taken directly as the prototype templates, while the rest of the temporal frames of the spatiotemporal atlas are generated by the fast template matching algorithm. It is obvious that there is no signiﬁcant temporal blurring or spatial distortion occurred in the generated frames, especially when compared with those in the prototype frames. In addition, the temporal dynamics of the subject are well represented as repetitions of the motion in the prototype templates. This result demonstrates that the template matching algorithm is capable of generating high quality spatiotemporal atlas.
An important motivation of this project is to accelerate the generation of spatiotemporal atlas. To illustrate the speed up available from the proposed template matching algorithm, a comparative study on computation time was performed (versus generating the spatiotemporal atlas using expensive nonlinear image registration procedures) using the “tic” and “toc” commands in Matlab2014b (Mathworks, Natick, MA). In particular, comparison was performed on the time spent on generating a total of 100 frames of spatiotemporal atlas. The total computation time for the nonlinear image registration method was 10.53 hours on a personal computer with an Intel i5 CPU and 6 GB of RAM. As contrast, the template matching algorithm took 1.51 hours with an acceleration factor of 7. As
Figure 6. The temporal proﬁle of the generated spatiotemporal atlas.
the number of frames in the spatiotemporal atlas increase, it is reasonable to expect an even larger number of acceleration factor. This comparative study shows the effectiveness to apply the template matching algorithm to reduce computation time.
An M-ary classiﬁcation problem lies at the core of the generation of spatiotemporal atlas. There exist many algorithms to perform this task and many have proven useful. For instance, one potential algorithm to realize this goal is the artiﬁcial neural networks. However, the proposed template matching algorithm outperforms artiﬁcial neural networks in terms of computation complexity for obvious reasons. In addition, the template matching algorithm is much easier to implement—the author has attempted to implement the neural network methods based on a combination of Matlab 2014b (Mathworks, Natick, MA) and ArrayFire (a GPU jacket software for Matlab), but was forced to stop this attempt because of the “segmentation fault” problem during computation. Even though such problems can be solved by updating software versions and computation hardware, the template matching algorithm still stands out as a practical solution to generating spatiotemporal atlases in a reliable fashion.
A representative frame of the spatiotemporal atlas is shown on the left. A vertical strip across the tongue tip (yellow dashed line) is plotted along time to form the temporal proﬁle. As can be seen, the ﬁrst 600 frames of the spatiotemporal atlas are taken directly as the prototype templates, while the rest of the temporal frames of the spatiotemporal atlas are generated by the fast template matching algorithm. There is no signiﬁcant temporal blurring or spatial distortion in the generated frames compared with those in the prototype frames.
It will be useful to compare the obtained result with an underlying ground truth. As mentioned in the METHODS section of the report, the probability distribution is deﬁned in an empirical fashion because of the incoming data are sub-optimal—spatial distortion, temporal misalignment, contamination from noise and incomplete data from sparse sampling. Considering these practical difﬁculties, the author decides to examine the “correctness” of the generated spatiotemporal atlas by directly comparing the result with the traditional method—generating spatiotemporal atlas using computationally-expensive image registration methods. Based on the results in Figure 6, the proposed template matching algorithm is capable of generating spatiotemporal atlas free of spatial distortion and temporal misalignment. Therefore, it is reasonable to assume that the determined probability distribution is at least near optimal.
This paper focuses on applying the principles of statistical learning and pattern recognition to accelerate the generation of spatiotemporal atlas. Speciﬁcally, the paper investigates the fast template matching algorithm and applies it to generate high quality spatiotemporal atlas within reasonable time and computation complexity. Unlike existing methods that focus on the image domain in the generation of spatiotemporal atlas, the template matching algorithm introduced in this research allows prototype templates to be matched based on incomplete samples of the image from the Fourier space. In particular, the singular value decomposition is performed to extract features from the sparse-sampled data and template matching is performed in this feature space. Conceptual feasibility of the method has been validated. Practical effectiveness of the method has been evaluated on publicly accessible data sets. The results demonstrate that fast template matching algorithm is capable of generating high quality spatiotemporal atlas from sparse-sampled data in short computation time. This study provides a practical method of accelerating the generation of spatiotemporal atlas. This allows it to serve as the ground truth for quantitatively interpreting the observed muscle movement in medical imaging, as well as accurately characterizing the motion variability of a specific subject as versus to the general motion pattern in medical research and clinical applications.
 Alessandrini, M., Basarab, A., Liebgott, H. and Bernard, O. (2013) Cardiac Motion Assessment from Echocardiographic Image Sequences by Means of the Structure Multivector. 2013 IEEE International Ultrasonics Symposium (IUS), Prague, 21-25 July 2013, 1541-1544.
 Woo, J., Stone, M. and Prince, J.L. (2015) Multimodal Registration via Mutual Information Incorporating Geometric and Spatial Context. IEEE Transactions on Image Processing, 24, 757-769.
 Woo, J., Xing, F., Lee, J., Stone, M. and Prince, J.L. (2015) Construction of an Unbiased Spatio-Temporal Atlas of the Tongue during Speech. Information Processing in Medical Imaging, 723-732.
 Serag, A., Aljabar, P., Ball, G., Counsell, S.J., Boardman, J.P., Rutherford, M.A., Edwards, A.D., Hajnal, J.V. and Rueckert, D. (2012) Construction of a Consistent High-Definition Spatio-Temporal Atlas of the Developing Brain Using Adaptive Kernel Regression. NeuroImage, 59, 2255-2265.
 Liao, S., Jia, H., Wu, G., Shen, D., Initiative, A.D.N., et al. (2012) A Novel Framework for Longitudinal Atlas Construction with Groupwise Registration of Subject Image Sequences. NeuroImage, 59, 1275-1289.
 Durrleman, S., Pennec, X., Trouvé, A., Gerig, G. and Ayache, N. (2009) Spatiotemporal Atlas Estimation for Developmental Delay Detection in Longitudinal Datasets. Medical Image Computing and Computer-Assisted Intervention—MICCAI 297-304.
 Cao, Y., Miller, M., Winslow, R.L. and Younes, L. (2005) Large Deformation Diffeomorphic Metric Mapping of Vector fields. IEEE Transactions on Medical Imaging, 24, 1216-1230.
 Avants, B.B., Tustison, N.J., Song, G., Cook, P.A., Klein, A. and Gee, J.C. (2011) A Reproducible Evaluation of ANTs Similarity Metric Performance in Brain Image Registration. NeuroImage, 54, 2033-2044.
 Baggenstoss, P.M. (1998) Class-Specific Feature Sets in Classification. Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) Held Jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell, Gaithersburg, 17 September 1998, 413-416.