The first error theory and bounds for Fast Matrix Multiplication based on the Strassen-Winograd algorithms (FastMMW) were formulated in the 70s. The theory introduces the concept, which is now known as weakly-stable error analysis, where the error bounds must use matrix norms instead of component-wise bounds. While the theory debunked the instability myth by using matrix scaling and a clean and simple analysis, its bounds are available only as properties of the whole matrices, which are too coarse, pessimistic, at times used to suggest instability, and are not used for algorithm optimization. We build on top of the original theory in order to reformulate the bounds: we show that tighter norm-wise and component-wise bounds are achievable by orthogonal algorithm optimizations. To achieve even better discrimination and circumvent the use of norm bounds, we develop an error theory by using communication and statistics concepts: we investigate lower and upper bounds, we estimate the practical bounds, and we investigate the algorithmic nature of the error for the class of random matrices. The theory and tools are not limited to random matrices and we can foresee further investigations to different matrix classes and algorithms. We propose new and more accurate algorithms. We show that we can improve theoretically and empirically the maximum absolute error of any FastMMW algorithm by 10% - 20% per recursion (we reduce the error by half for 4 recursions). Our theory and practice, in turn, will provide a kick start for the development of hybrid algorithms as accurate as the vendor GEMM implementation, and in certain cases even more accurate for random matrices.
 Strassen, V. (1969) Gaussian Elimination Is Not Optimal. Numerische Mathematik, 14, 354-356.http://dx.doi.org/10.1007/BF02165411
 Douglas, C.C., Heroux, M., Slishman, G. and Smith, R.M. (1994) GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm. Journal of Computational Physics, 110, 1-10.http://dx.doi.org/10.1006/jcph.1994.1001
 Demmel, J. and Higham, N. (1992) Stability of Block Algorithms with Fast Level-3 BLAS. ACM Transactions on Mathematical Software, 18, 274-291.http://dx.doi.org/10.1145/131766.131769
 Demmel, J., Dumitriu, J., Holtz, O. and Kleinberg, R. (2006) Fast Matrix Multiplication Is Stable.
 Brent, R.P. (1970) Error Analysis of Algorithms for Matrix Multiplication and Triangular Decomposition Using Winograd’s Identity. Numerische Mathematik, 16, 145-156. http://dx.doi.org/10.1007/BF02308867
 Miller, W. (1975) Computational Complexity and Numerical Stability. SIAM Journal on Computing, 4, 97-107.http://dx.doi.org/10.1137/0204009
 Bini, D. and Lotti, G. (1980) Stability of Fast Algorithms for Matrix Multiplication. Numerische Mathematik, 36, 63-72.http://dx.doi.org/10.1007/BF01395989
 Edelman, A. and Rao, N. (2005) Random Matrix Theory. Acta Numerica, 14, 233-297.http://dx.doi.org/10.1017/S0962492904000236
 Kolmogorov, A.N. and Uspenskiiq, V.A. (1987) Algorithms and Randomness. Theory of Probability and Its Applications, 32, 389-412. http://dx.doi.org/10.1137/1132060
 Winograd, S. (1968) A New Algorithm for Inner Product. IEEE Transactions on Computers, 17, 693-694.
 Higham, N.J. (1990) Exploiting Fast Matrix Multiplication within the Level 3 BLAS. ACM Transactions on Mathematical Software, 16, 352-368.http://dx.doi.org/10.1145/98267.98290
 Higham, N.J. (2002) Accuracy and Stability of Numerical Algorithms. 2nd Edition, SIAM, Philadelphia.http://dx.doi.org/10.1137/1.9780898718027
 Badin, M., D’Alberto, P., Bic, L., Dillencourt, M. and Nicolau, A. (2011) Improving the Accuracy of High Performance Blas Implementations Using Adaptive Blocked Algorithms. In Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD ’11, Washington, DC, IEEE Computer Society, 26-29 October 2011, 120-127.
 Castaldo, A.M., Clint Whaley, R. and Chronopoulos, A.T. (2008) Reducing Floating Point Error in Dot Product Using the Superblock Family of Algorithms. SIAM Journal on Scientific Computing, 31, 1156-1174. http://dx.doi.org/10.1137/070679946
 Dongarra, J.J., Du Croz, J., Duff, I.S. and Hammarling, S. (1990) A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transaction in Mathematical Software, 16, 1-17.http://dx.doi.org/10.1145/77626.79170
 Goto, K. and van de Geijn, R.A. (2008) Anatomy of Highperformance Matrix Multiplication. ACM Transactions on Mathematical Software.http://dx.doi.org/10.1145/1356052.1356053
 Priestley, M.B. (1981) Spectral Analysis and Time Series. Academic Press Inc, New York.
 Brockwell, P.J. and Davis, R.A. (2006) Time Series: Theory and Methods. Springer, New York.
 D’Alberto, P., Bodrato, M. and Nicolau, A. (2011) Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems: Matrix-Multiplication and Matrix-Addition Algorithm Optimizations by Software Pipelining and Threads Allocation. ACM Transaction in Mathematical Software, 38, 1-2.
 Welch, P.D. (1969) A Fixed-Point Fast Fourier Transform Error Analysis. IEEE Transactions on Audio and Electroacoustics, 17, 151-157. http://dx.doi.org/10.1109/TAU.1969.1162035
 Loos, S. and Wise, D.S. (2009) Strassen’s Matrix Multiplication Relabeled.
 Li, J.J., Ranka, S. and Sahni, S. (2011) Strassen’s Matrix Multiplication on Gpus. 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), Tainan, 7-9 December 2011, 157-164.