IJCNS  Vol.3 No.1 , January 2010
Exploiting Loop-Carried Stream Reuse for Scientific Computing Applications on the Stream Processor
Abstract: Compared with other stream applications, scientific stream programs are usually bound by memory accesses. Reusing streams across different iterations, i.e. loop-carried stream reuse, can effectively improve the SRF locality, thus reducing memory accesses greatly. In the paper, we first present the algorism identifying loop-carried stream reuse and that exploiting the reuse after analyzing scientific computing applications. We then perform several representative microbenchmarks and scientific stream programs with and without our optimization on Isim, a cycle-accurate stream processor simulator. Experimental results show that our algorithms can effectively exploit loop-carried stream reuse for scientific stream programs and thus greatly improve the performance of memory-bound scientific stream programs.
Cite this paper: nullW. XU, Q. DOU, Y. ZHANG, G. LI and X. YANG, "Exploiting Loop-Carried Stream Reuse for Scientific Computing Applications on the Stream Processor," International Journal of Communications, Network and System Sciences, Vol. 3 No. 1, 2010, pp. 32-37. doi: 10.4236/ijcns.2010.31003.

[1]   W. A. Wulf and S. A. McKee, “Hitting the memory wall: Implications of the obvious,” Computer Architecture News, Vol. 23, No. 1, pp. 20–24, 1995.

[2]   D. Burger, J. Goodman, and A. Kagi, “Memory bandwidth limitations of future microprocessors,” In Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, PA, pp. 78–89, 1996.

[3]   S. A. William, “Stream architectures,” In PACT 2003, September 27, 2003.

[4]   Merrimac–Stanford Streaming Supercomputer Project, Stanford University,

[5]   W. J. Dally, P. Hanrahan, et al., “Merrimac: Supercomputing with streams,” SC2003, Phoenix, Arizona, November 2003.

[6]   M. Erez, J. H. Ahn, et al., “Merrimac-supercomputing with streams,” Proceedings of the 2004 SIGGRAPH GP^2 Workshop on General Purpose Computing on Gra- phics Processors, Los Angeles, California, June 2004.

[7]   J. B. Wang, Y. H. Tang, et al., “Application and study of scientific computing on stream processor,” Advances on Computer Architecture (ACA’06), Chengdu, China, August 2006.

[8]   J. Du, X. J. Yang, et al., “Implementation and evaluation of scientific computing programs on imagine,” Advances on Computer Architecture (ACA’06), Chengdu, China, August 2006.

[9]   M. Rixner, “Stream processor architecture,” Kluwer Academic Publishers, Boston, MA, 2001.

[10]   P. Mattson, “A programming system for the imagine media processor,” Department of Electrical Engineering, Ph.D. thesis, Stanford University, 2002.

[11]   O. Johnsson, M. Stenemo, and Z. ul-Abdin, “Programming & implementation of streaming applications,” Master’s thesis, Computer and Electrical Engineering, Halmstad University, 2005.

[12]   U. J. Kapasi, S. Rixner, et al., “Programmable stream processor,” IEEE Computer, August 2003.

[13]   G. Goff, K. Kennedy, and C. W. Tseng, “Practical dependence testing,” In Proceedings of the SIGPLAN ‘91 Conference on Programming Language Design and Implementation, ACM, New York, 1991.

[14]   T. F. Chan, E. Gallopoulos, V. Simoncini, T. Szeto, and C. H. TongSIAM, “A quasi-minimal residual variant of the bi-cgstab algorithm for nonsymmetric systems,” Journal on Scientific Computing, 1994.