The Better Accuracy of Strassen-Winograd Algorithms (FastMMW)

Paolo D’Alberto^{*}

Show more

References

[1] Strassen, V. (1969) Gaussian Elimination Is Not Optimal. Numerische Mathematik, 14, 354-356.

http://dx.doi.org/10.1007/BF02165411

[2] Douglas, C.C., Heroux, M., Slishman, G. and Smith, R.M. (1994) GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm. Journal of Computational Physics, 110, 1-10.

http://dx.doi.org/10.1006/jcph.1994.1001

[3] Demmel, J. and Higham, N. (1992) Stability of Block Algorithms with Fast Level-3 BLAS. ACM Transactions on Mathematical Software, 18, 274-291.

http://dx.doi.org/10.1145/131766.131769

[4] Demmel, J., Dumitriu, J., Holtz, O. and Kleinberg, R. (2006) Fast Matrix Multiplication Is Stable.

[5] Brent, R.P. (1970) Error Analysis of Algorithms for Matrix Multiplication and Triangular Decomposition Using Winograd’s Identity. Numerische Mathematik, 16, 145-156.

http://dx.doi.org/10.1007/BF02308867

[6] Miller, W. (1975) Computational Complexity and Numerical Stability. SIAM Journal on Computing, 4, 97-107.

http://dx.doi.org/10.1137/0204009

[7] Bini, D. and Lotti, G. (1980) Stability of Fast Algorithms for Matrix Multiplication. Numerische Mathematik, 36, 63-72.

http://dx.doi.org/10.1007/BF01395989

[8] Edelman, A. and Rao, N. (2005) Random Matrix Theory. Acta Numerica, 14, 233-297.

http://dx.doi.org/10.1017/S0962492904000236

[9] Kolmogorov, A.N. and Uspenskiiq, V.A. (1987) Algorithms and Randomness. Theory of Probability and Its Applications, 32, 389-412.

http://dx.doi.org/10.1137/1132060

[10] Winograd, S. (1968) A New Algorithm for Inner Product. IEEE Transactions on Computers, 17, 693-694.

[11] Higham, N.J. (1990) Exploiting Fast Matrix Multiplication within the Level 3 BLAS. ACM Transactions on Mathematical Software, 16, 352-368.

http://dx.doi.org/10.1145/98267.98290

[12] Higham, N.J. (2002) Accuracy and Stability of Numerical Algorithms. 2nd Edition, SIAM, Philadelphia.

http://dx.doi.org/10.1137/1.9780898718027

[13] Badin, M., D’Alberto, P., Bic, L., Dillencourt, M. and Nicolau, A. (2011) Improving the Accuracy of High Performance Blas Implementations Using Adaptive Blocked Algorithms. In Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD ’11, Washington, DC, IEEE Computer Society, 26-29 October 2011, 120-127.

[14] Castaldo, A.M., Clint Whaley, R. and Chronopoulos, A.T. (2008) Reducing Floating Point Error in Dot Product Using the Superblock Family of Algorithms. SIAM Journal on Scientific Computing, 31, 1156-1174.

http://dx.doi.org/10.1137/070679946

[15] Dongarra, J.J., Du Croz, J., Duff, I.S. and Hammarling, S. (1990) A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transaction in Mathematical Software, 16, 1-17.

http://dx.doi.org/10.1145/77626.79170

[16] Goto, K. and van de Geijn, R.A. (2008) Anatomy of Highperformance Matrix Multiplication. ACM Transactions on Mathematical Software.

http://dx.doi.org/10.1145/1356052.1356053

[17] Priestley, M.B. (1981) Spectral Analysis and Time Series. Academic Press Inc, New York.

[18] Brockwell, P.J. and Davis, R.A. (2006) Time Series: Theory and Methods. Springer, New York.

[19] D’Alberto, P., Bodrato, M. and Nicolau, A. (2011) Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems: Matrix-Multiplication and Matrix-Addition Algorithm Optimizations by Software Pipelining and Threads Allocation. ACM Transaction in Mathematical Software, 38, 1-2.

[20] Welch, P.D. (1969) A Fixed-Point Fast Fourier Transform Error Analysis. IEEE Transactions on Audio and Electroacoustics, 17, 151-157.

http://dx.doi.org/10.1109/TAU.1969.1162035

[21] Loos, S. and Wise, D.S. (2009) Strassen’s Matrix Multiplication Relabeled.

[22] Li, J.J., Ranka, S. and Sahni, S. (2011) Strassen’s Matrix Multiplication on Gpus. 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), Tainan, 7-9 December 2011, 157-164.