p>I: Identity matrix.

A set of vectors is orthogonal if and only if when i ≠ j. If an orthogonal set has a norm of 1 it is said to be orthonormal. This is an important definition of a clear QR transformation. The MOD tool uses the Householder-based QR factorization. The Householder algorithm that is implemented as hardware is shown in Figure 4.

3.5. Matrix Inversion

The other important matrix operation that is used in many important DSP and image processing applications is the inverse of a matrix. A square matrix A and its inverse A-1 can be written as AA−1 = I iff det(A) ≠ 0 where I is the identity matrix. The inverse of a matrix is an important part of the calculation of many matrix transformations and the solution of linear equations. Unfortunately, calculation of a matrix inverse similar to matrix multiplication requires many addition and multiplication operations and may be costly to implement hardware level for large matrices. Hardware implementation of matrix inverse is an important part of the MOD. The MOD can design area efficient matrix inverse hardware using a few different techniques. These techniques are improved here for area efficiency or speed based on the required design parameters. The matrix inverse algorithms used in the MOD are an LU factorization-based inverse system. A block diagram of Matrix inversion using LU transformation is given in Figure 5.

Figure 4. The householder QR factorization algorithm.

Figure 5. Matrix inversion using LU transformation.

The MOD tool can design and validate a system of linear equations using a couple different methods similar to ones that are explained above in the matrix inversion section. Due to similarities between solutions to linear equations and matrix inversion, further hardware reduction can be done by reusing similar sections of the hardware to design a system that can solve a linear equation and can also be used for matrix inversion. A system of linear equations shown in Equation (1) can be solved using LU factorization.

(1)

. (2)

Using Equations (1) in (2), a system of linear equation can be described as in Equation (3) and using forward substitution in Equation (4), and back substitution in Equation (5) variables x can be calculated by.

(3)

(4)

. (5)

A similar solution system can be used if QR factorization is available.

4. Design Results

The MOD design system allows for fast TTM and higher accuracy. The system uses the same verification method for FPGA and VLSI that is shown in Figure 6.

The MOD design system can design Verilog HDL code and a testbench file of any size matrix operations, subject to limitation of the MATLAB version, computer memory and hard drive size. The design creates a MATLAB file for verification and error analysis.

One of the great advantages of the MOD design and verification system is the ability to change size and

Figure 6. The MOD verification flow.

optimization of the design in a minimal amount of time. Designers can compare many designs and make decisions without spending valuable verification time.

5. Conclusion

A design tool for matrix operations is designed for low power and high-speed applications. The MOD decreases design system time and verification by up to 64% without compromising speed and efficiency. The MOD uses a smart control system that is optimized based on the desired operations, and is a bridge between RTL and HLS. It uses RTL-based basic blocks to design most complicated arithmetic operations using structural model design and HLS-style fast and optimized verification. Any designed system can be reconfigured at any time in any way in MOD without going through the same design and verification hassle. The key objective of the proposed tool is to reduce TTM and increase productivity by verifying the hardware during the design process. Future work will include support for all basic arithmetic operations, VHDL, some additional matrix factorizations, curve fitting and floating point support.

Acknowledgements

The authors would like to thank Xilinx, Inc. [6] for their valuable support.

Cite this paper
Aslan, S. and Saniie, J. (2016) Matrix Operations Design Tool for FPGA and VLSI Systems. Circuits and Systems, 7, 43-50. doi: 10.4236/cs.2016.72005.
References
[1]   Andrieux, J., Feix, M., Mourgues, G., Bertrand, P., Izrar, B. and Nguyen, V. (1987) Optimum Smoothing of the WignerVille Distribution. IEEE Transactions on Acoustics, Speech and Signal Processing, 35, 764-769.
http://dx.doi.org/10.1109/TASSP.1987.1165204

[2]   Duncan, A. and Hendry, D. (1995) Area Efficient DSP Datapath Synthesis. Proceedings of EURO-DAC’95, European Design Automation Conference with EURO-VHDL, Brighton, 18-22 September 1995, 130-135.

[3]   (2006) The VLSI Handbook. 2nd ed. CRC Press, Boca Raton, FL.

[4]   Wang, X. and Ziavras, S.G. (2004) Parallel LU Factorization of Sparse Matrices on FPGA-Based Configurable Computing Engines. Concurrency and Computation: Practice and Experience, 16, 319-343.
http://onlinelibrary.wiley.com/doi/10.1002/cpe.748/abstract

[5]   Yang, H., Ziavras, S. and Hu, J. (2007) FPGA-Based Vector Processing for Matrix Operations. Fourth International Conference on Information Technology (ITNG 07), Las Vegas, 2-4 April 2007, 989-994.

[6]   Xilinx.
http://www.xilinx.com/

[7]   MICROWIND-Home.
http://microwind.net/

[8]   Reconfigurable Computing. Elsevier.
https://www.elsevier.com/books/recon_gurable-computing/hauck/978-0-12-370522-8

[9]   SIAM: Toward an Optimal Algorithm for Matrix Multiplication.
http://www.siam.org/news/news.php?id=174

[10]   Ohtaki, Y., Takahashi, D., Boku, T. and Sato, M. (2004) Parallel Implementation of Strassen’s Matrix Multiplication Algorithm for Heterogeneous Clusters. 18th International Parallel and Distributed Processing Symposium (2004), Santa Fe, 26-30 April 2004, 112.

[11]   Yang, Y., Zhao, Y. and Inoue, Y. (2005) High-Performance Systolic Arrays for Band Matrix Multiplication. 2005 IEEE International Symposium on Circuits and Systems, ISCAS 2005, Kobe, 23-26 May 2005, 1130-1133.

[12]   Desmouliers, C., Aslan, C., Oruklu, E., Saniie, J. and Vallina, F. (2010) HW/SW Co-Design Platform for Image and Video Processing Applications on Virtex-5 FPGA Using PICO. 2010 IEEE International Conference on Electro/Information Technology (EIT 2010), Normal, 20-22 May 2010, 1-6.

[13]   Aslan, S., Oruklu, E. and Saniie, J. (2009) Realization of Area Efficient QR Factorization Using Unified Division, Square Root, and Inverse Square Root Hardware. 2009 IEEE International Conference on Electro/Information Technology (EIT 2009), Windsor, 7-9 June 2009, 245-250.

 
 
Top