JCC  Vol.3 No.8 , August 2015
Improving Global Performance on GPU for Algorithms with Main Loop Containing a Reduction Operation: Case of Dijkstra’s Algorithm
Author(s) Amadou Chaibou1, Oumarou Sie1,2
ABSTRACT
In this paper, we study the impact of copying data in GPU computing. GPU computing allows implementing parallel computations at low cost: a GPU can be purchased at under USD 500. Many studies have shown that GPU can be used to speed up the calculations. But for algorithms requiring doing a part of the calculations on GPU and another part on CPU, alternately, latency due to the copy of the data is a performance degradation factor. To illustrate this, we consider the Dijkstra’s algorithm on the shortest path used in solving optimization problems. This algorithm is very heavy to run on sequential machine. So, we are considering a parallel approach on GPU. Note that Dijkstra’s algorithm has been subject of many implementations on GPU. In the present work, we use two platforms with external GPU. Graphs are represented in adjacency matrix. During the computation of this algorithm, intermediates results are copied from GPU to CPU or from CPU to GPU. The purpose of this work is to measure the impact of these copies in the overall performance of the algorithm. For that we calculate time due to the copying data’s implementation; then we compare results with implementation computing only on CPU memory (zero-copy). The real impact shown by experiments demonstrates the interest of this study. GP-GPU programmers have to think that they will use either memory zero-copy or GPU memory. The challenge for GPU’s manufacturers is how to reduce this impact.

Cite this paper
Chaibou, A. and Sie, O. (2015) Improving Global Performance on GPU for Algorithms with Main Loop Containing a Reduction Operation: Case of Dijkstra’s Algorithm. Journal of Computer and Communications, 3, 41-54. doi: 10.4236/jcc.2015.38005.
References
[1]   Dijkstra, E.W. (1959) A Note on Two Problems in Connexion with Graphs. Numerische Mathematik, 1, 269-271.https://repository.cwi.nl/noauth/search/fullrecord.php?publnr=9256

[2]   Davidson, A., Baxter, S., Garland, M. and Owens, J.D. (2014) Work-Efficient Parallel GPU Methods for Single-Source Shortest Path. IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, 19-23 May 2014, 349-359.http://dx.doi.org/10.1109/ipdps.2014.45

[3]   Hajela, G. and Pandey, M. (2014) Parallel Implementations for Solving Shortest Path Problem Using Bellman-Ford. International Journal of Computer Applications (0975-8887), 95, 1-6.

[4]   Singla, G., Tiwari, A. and Singh, D.P. (2013) New Approach for Graph Algorithms on GPU Using CUDA. International Journal of Computer Applications (0975-8887), 72, 38-42.

[5]   Singh, D.P. and Khare, N. (2012) A Study of Different Parallel Implementations of Single Source Shortest Path Algorithms. International Journal of Computer Applications, 54, 26-30.

[6]   Kumar, S., Misra, A. and Tomar, R.S. (2011) A Modified Parallel Approach to Single Source Shortest Path Problem for Massively Dense Graphs Using CUDA. 2nd International Conference on Computer and Communication Technology (ICCCT), Allahabad, 15-17 September 2011, 635-639.

[7]   Buluc, A., Gilbert, J.R. and Bulak, C. (2010) Solving Path Problems on the GPU. Parallel Computing, 36, 241-253.http://dx.doi.org/10.1016/j.parco.2009.12.002

[8]   Martin, P., Torres, R. and Gavilanes, A. (2009) CUDA Solutions for the SSSP Problem. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G., Dongarra, J. and Sloot, P.M.A., Eds., Computational Science—ICCS 2009, Vol. 5544, Springer, Berlin, 904-913. http://dx.doi.org/10.1007/978-3-642-01970-8

[9]   Okuyama, T., Ino, F. and Hagihara, K. (2008) A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU. International Symposium on Parallel and Distributed Processing with Applications, Sydney, 10-12 December 2008, 284-291.
http://dx.doi.org/10.1109/ispa.2008.40

[10]   Ortega-Arranz, H., Torres, Y., Llanos, D.R. and Gonzalez-Escribano, A. (2013) A New GPU-Based Approach to the Shortest Path Problem. International Conference on High Performance Computing and Simulation, Helsinki, 1-5 July 2013, 505-511.

[11]   Harish, P. and Narayanan, P.J. (2007) Accelerating Large Graph Algorithms on the GPU Using CUDA. Lecture Notes in Computer Science 4873. Springer, Berlin, 197-208.
http://dx.doi.org/10.1007/978-3-540-77220-0_21

[12]   Crauser, A., Mehlhorn, K., Meyer, U. and Sanders, P. (1998) A Parallelization of Dijkstra’s Shortest Path Algorithm. In: Brim, L., Gruska, J. and Zlatuska, J., Eds., Mathematical Foundations of Computer Science (MFCS), Lecture Notes in Computer Science, Vol. 1450, Springer, Berlin, 722-731.
http://dx.doi.org/10.1007/bfb0055823

[13]   Harishm, P., Vineet, V. and Narayanan, P.J. (2009) Large Graph Algorithms for Massively Multithreaded Architectures. Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, Technical Report No. IIIT/TR/2009/74.

[14]   Crobak, J.R., Berry, J.W., Madduri, K. and Bader, D.A. (2007) Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture. IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007, Long Beach, 26-30 March 2007, 1-8.
http://dx.doi.org/10.1109/IPDPS.2007.370687

[15]   Gregor, D. and Lumsdaine, A. (2005) The Parallel BGL: A Generic Library for Distributed Graph Computations. Parallel Object-Oriented Scientific Computing (POOSC).

[16]   Gregor, D., Edmonds, N., Barrett, B. and Lumsdaine, A. (2005) The Parallel Boost Graph Library.
http://www.osl.iu.edu/research/pbgl

[17]   Edmond, N., Breuer, A., Gregor, D. and Lumsdaine, A. (2006) Single-Source Shortest Paths with the Parallel Boost Graph Library. 9th DIMACS Implementation Challenge, Shortest Paths, Piscataway, November 2006.

[18]   Gregor, D. and Lumsdaine, A. (2005) Lifting Sequential Graph Algorithms for Distributed-Memory Parallel Computation. Proceedings of the 2005 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’ 05), San Diego, October 2005, 423-437.

[19]   Rétvári, G., Bró, J.J. and Cinkler, T. (2007) On Shortest Path Representation. IEEE/ACM Transactions on Networking, 15, 1293-1306. http://dx.doi.org/10.1109/TNET.2007.900708

[20]   Sanders, J. and Kandrot, E. (2001) CUDA par l’exemple: Une introduction à la programmation parallèle de GPU. Pearson, 27 mai 2011.

[21]   NVIDIA CUDATM (2012) CUDA C Best Practices Guide.

[22]   NVIDIA CUDATM (2012) Nvidia Cuda C Programming Guide.

[23]   Pironneau, O. (2009) Calcul Parallèle et CUDA. LJLL, Mars 2009.

[24]   Harris, M. (2008) Optimizing Parallel Reduction in CUDA. NVIDIA.

[25]   Grama, A., Gupta, A., Karypis, G. and Kumar, V. (2003) Introduction to Parallel Computing. 2nd Edition, Addison-Wesley, Boston.

[26]   DIMACS 9th DIMACS Implementation Challenge—Shortest Paths.
http://www.dis.uniroma1.it/challenge9/download.shtml

[27]   United States Census Bureau Http://www2.census.gov

[28]   Bader, D.A. and Madduri, K. (2006) GTgraph: A Suite of Synthetic Graph Generators.

 
 
Top