The Better Accuracy of Strassen-Winograd Algorithms (FastMMW)
Author(s) Paolo D’Alberto*
Affiliation(s)
FastMMW, San Jose, USA.
ABSTRACT
The first error theory and bounds for Fast Matrix Multiplication based on the Strassen-Winograd algorithms (FastMMW) were formulated in the 70s. The theory introduces the concept, which is now known as weakly-stable error analysis, where the error bounds must use matrix norms instead of component-wise bounds. While the theory debunked the instability myth by using matrix scaling and a clean and simple analysis, its bounds are available only as properties of the whole matrices, which are too coarse, pessimistic, at times used to suggest instability, and are not used for algorithm optimization. We build on top of the original theory in order to reformulate the bounds: we show that tighter norm-wise and component-wise bounds are achievable by orthogonal algorithm optimizations. To achieve even better discrimination and circumvent the use of norm bounds, we develop an error theory by using communication and statistics concepts: we investigate lower and upper bounds, we estimate the practical bounds, and we investigate the algorithmic nature of the error for the class of random matrices. The theory and tools are not limited to random matrices and we can foresee further investigations to different matrix classes and algorithms. We propose new and more accurate algorithms. We show that we can improve theoretically and empirically the maximum absolute error of any FastMMW algorithm by 10% - 20% per recursion (we reduce the error by half for 4 recursions). Our theory and practice, in turn, will provide a kick start for the development of hybrid algorithms as accurate as the vendor GEMM implementation, and in certain cases even more accurate for random matrices.

Cite this paper
D’Alberto, P. (2014) The Better Accuracy of Strassen-Winograd Algorithms (FastMMW). Advances in Linear Algebra & Matrix Theory, 4, 9-39. doi: 10.4236/alamt.2014.41002.
References
   Strassen, V. (1969) Gaussian Elimination Is Not Optimal. Numerische Mathematik, 14, 354-356.
http://dx.doi.org/10.1007/BF02165411

   Douglas, C.C., Heroux, M., Slishman, G. and Smith, R.M. (1994) GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm. Journal of Computational Physics, 110, 1-10.
http://dx.doi.org/10.1006/jcph.1994.1001

   Demmel, J. and Higham, N. (1992) Stability of Block Algorithms with Fast Level-3 BLAS. ACM Transactions on Mathematical Software, 18, 274-291.
http://dx.doi.org/10.1145/131766.131769

   Demmel, J., Dumitriu, J., Holtz, O. and Kleinberg, R. (2006) Fast Matrix Multiplication Is Stable.

   Brent, R.P. (1970) Error Analysis of Algorithms for Matrix Multiplication and Triangular Decomposition Using Winograd’s Identity. Numerische Mathematik, 16, 145-156.
http://dx.doi.org/10.1007/BF02308867

   Miller, W. (1975) Computational Complexity and Numerical Stability. SIAM Journal on Computing, 4, 97-107.
http://dx.doi.org/10.1137/0204009

   Bini, D. and Lotti, G. (1980) Stability of Fast Algorithms for Matrix Multiplication. Numerische Mathematik, 36, 63-72.
http://dx.doi.org/10.1007/BF01395989

   Edelman, A. and Rao, N. (2005) Random Matrix Theory. Acta Numerica, 14, 233-297.
http://dx.doi.org/10.1017/S0962492904000236

   Kolmogorov, A.N. and Uspenskiiq, V.A. (1987) Algorithms and Randomness. Theory of Probability and Its Applications, 32, 389-412.
http://dx.doi.org/10.1137/1132060

   Winograd, S. (1968) A New Algorithm for Inner Product. IEEE Transactions on Computers, 17, 693-694.

   Higham, N.J. (1990) Exploiting Fast Matrix Multiplication within the Level 3 BLAS. ACM Transactions on Mathematical Software, 16, 352-368.
http://dx.doi.org/10.1145/98267.98290

   Higham, N.J. (2002) Accuracy and Stability of Numerical Algorithms. 2nd Edition, SIAM, Philadelphia.
http://dx.doi.org/10.1137/1.9780898718027

   Badin, M., D’Alberto, P., Bic, L., Dillencourt, M. and Nicolau, A. (2011) Improving the Accuracy of High Performance Blas Implementations Using Adaptive Blocked Algorithms. In Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD ’11, Washington, DC, IEEE Computer Society, 26-29 October 2011, 120-127.

   Castaldo, A.M., Clint Whaley, R. and Chronopoulos, A.T. (2008) Reducing Floating Point Error in Dot Product Using the Superblock Family of Algorithms. SIAM Journal on Scientific Computing, 31, 1156-1174.
http://dx.doi.org/10.1137/070679946

   Dongarra, J.J., Du Croz, J., Duff, I.S. and Hammarling, S. (1990) A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transaction in Mathematical Software, 16, 1-17.
http://dx.doi.org/10.1145/77626.79170

   Goto, K. and van de Geijn, R.A. (2008) Anatomy of Highperformance Matrix Multiplication. ACM Transactions on Mathematical Software.
http://dx.doi.org/10.1145/1356052.1356053

   Priestley, M.B. (1981) Spectral Analysis and Time Series. Academic Press Inc, New York.

   Brockwell, P.J. and Davis, R.A. (2006) Time Series: Theory and Methods. Springer, New York.

   D’Alberto, P., Bodrato, M. and Nicolau, A. (2011) Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems: Matrix-Multiplication and Matrix-Addition Algorithm Optimizations by Software Pipelining and Threads Allocation. ACM Transaction in Mathematical Software, 38, 1-2.

   Welch, P.D. (1969) A Fixed-Point Fast Fourier Transform Error Analysis. IEEE Transactions on Audio and Electroacoustics, 17, 151-157.
http://dx.doi.org/10.1109/TAU.1969.1162035

   Loos, S. and Wise, D.S. (2009) Strassen’s Matrix Multiplication Relabeled.

   Li, J.J., Ranka, S. and Sahni, S. (2011) Strassen’s Matrix Multiplication on Gpus. 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), Tainan, 7-9 December 2011, 157-164.

Top