Back
 JCC  Vol.10 No.9 , September 2022
Runtime Power Allocation Based on Multi-GPU Utilization in GAMESS
Abstract: To improve the power consumption of parallel applications at the runtime, modern processors provide frequency scaling and power limiting capabilities. In this work, a runtime strategy is proposed to maximize performance under a given power budget by distributing the available power according to the relative GPU utilization. Time series forecasting methods were used to develop workload prediction models that provide accurate prediction of GPU utilization during application execution. Experiments were performed on a multi-GPU computing platform DGX-1 equipped with eight NVIDIA V100 GPUs used for quantum chemistry calculations in the GAMESS package. For a limited power budget, the proposed strategy may deliver as much as hundred times better GAMESS performance than that obtained when the power is distributed equally among all the GPUs.
Cite this paper: Sosonkina, M., Sundriyal, V. and Galvez Vallejo, J.L. (2022) Runtime Power Allocation Based on Multi-GPU Utilization in GAMESS. Journal of Computer and Communications, 10, 66-80. doi: 10.4236/jcc.2022.109005.
References

[1]   Sundriyal, V. and Sosonkina, M. (2022) Runtime Energy Savings Based on Machine Learning Models for Multicore Applications. Journal of Computer and Communications, 10, 63-80.
https://doi.org/10.4236/jcc.2022.106006

[2]   Schmidt, M.W., Baldridge, K.K., Boatz, J.A., Elbert, S.T., Gordon, M.S., Jensen, J.H., Koseki, S., Matsunaga, N., Nguyen, K.A., Su, S., Windus, T.L., Dupuis, M. and Montgomery Jr., J.A. (1993) General Atomic and Molecular Electronic Structure System. Journal of Computational Chemistry, 14, 1347-1363.
https://doi.org/10.1002/jcc.540141112

[3]   Intel (n.d.) Intel® 64 and IA-32 Architectures Software Developer Manuals.
https://software.intel.com/en-us/articles/intel-sdm

[4]   Chen, M., Wang, X.r. and Li, X. (2011) Coordinating Processor and Main Memory for Efficient Server Power Control. Proceedings of the International Conference on Supercomputing (ICS ’11), Tucson, 31 May-4 June 2011, 130-140.
https://doi.org/10.1145/1995896.1995917

[5]   Tiwari, A., Schulz, M. and Carrington, L. (2015) Predicting Optimal Power Allocation for CPU and Dram Domains. 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, Hyderabad, 25-29 May 2015, 951-959.
https://doi.org/10.1109/IPDPSW.2015.146

[6]   Marathe, A., Bailey, P.E., Lowenthal, D.K., Rountree, B., Schulz, M. and de Supinski, B.R. (2015) A Run-Time System for Power-Constrained HPC Applications. Proceedings of International Conference on High Performance Computing, Frankfurt, 12-16 July 2015, 394-408.
https://doi.org/10.1007/978-3-319-20119-1_28

[7]   Zou, P., Allen, T., Davis, C.H., Feng, X. and Ge, R. (2017) Clip: Cluster-Level Intelligent Power Coordination for Power-Bounded Systems. 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, 5-8 September 2017, 541-551.
https://doi.org/10.1109/CLUSTER.2017.98

[8]   Zamani, H., Liu, Y.L., Tripathy, D., Bhuyan, L. and Chen, Z.Z. (2019) Greenmm: Energy Efficient GPU Matrix Multiplication through Undervolting. Proceedings of the ACM International Conference on Supercomputing (ICS ’19), Phoenix, 26-28 June 2019, 308-318.
https://doi.org/10.1145/3330345.3330373

[9]   Chau, V., Chu, X.W., Liu, H. and Leung, Y.-W. (2017) Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems. Proceedings of the 8th International Conference on Future Energy Systems (e-Energy ’17), Hong Kong, 16-19 May 2017, 1-11.
https://doi.org/10.1145/3077839.3077855

[10]   Guerreiro, J., Ilic, A., Roma, N. and Tomas, P. (2018) GPGPU Power Modeling for Multi-Domain Voltage-Frequency Scaling. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vienna, 24-28 February 2018, 789-800.
https://doi.org/10.1109/HPCA.2018.00072

[11]   Yang, Y., Xiang, P., Mantor, M. and Zhou, H. (2012) Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs. 2012 41st International Conference on Parallel Processing, Pittsburgh, 10-13 September 2012, 329-339.
https://doi.org/10.1109/ICPP.2012.30

[12]   Wang, G., Lin, Y. and Yi, W. (2010) Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU. 2010 IEEE/ACM Int’l Conference on Green Computing and Communications & Int’l Conference on Cyber, Physical and Social Computing, Hangzhou, 18-20 December 2010, 344-350.
https://doi.org/10.1109/GreenCom-CPSCom.2010.102

[13]   Ma, K., Li, X., Chen, W., Zhang, C. and Wang, X. (2012) GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures. 2012 41st International Conference on Parallel Processing, Pittsburgh, 10-13 September 2012, 48-57.
https://doi.org/10.1109/ICPP.2012.31

[14]   Lin, Y., Tang, T. and Wang, G. (2011) Power Optimization for GPU Programs Based on Software Prefetching. 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, 16-18 November 2011, 1339-1346.
https://doi.org/10.1109/TrustCom.2011.184

[15]   Azimi, R., Jing, C. and Reda, S. (2018) PowerCoord: A Coordinated Power Capping Controller for Multi-CPU/GPU Servers. 2018 9th International Green and Sustainable Computing Conference (IGSC), Pittsburgh, 22-24 October 2018, 1-9.
https://doi.org/10.1109/IGCC.2018.8752132

[16]   Zhu, Q., Wu, B., Shen, X., Shen, L. and Wang, Z. (2017) Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems. 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, 29 May-2 June 2017, 967-977.
https://doi.org/10.1109/IPDPS.2017.124

[17]   Dietrich, B., Goswami, D., Chakraborty, S., Guha, A. and Gries, M. (2015) Time Series Characterization of Gaming Workload for Runtime Power Management. IEEE Transactions on Computers, 64, 260-273.
https://doi.org/10.1109/TC.2013.198

[18]   Cioara, T., Anghel, I., Salomie, I., Copil, G., Moldovan, D. and Grindean, M. (2011) Time Series Based Dynamic Frequency Scaling Solution for Optimizing the CPU Energy Consumption. 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, 25-27 August 2011, 477-483.
https://doi.org/10.1109/ICCP.2011.6047919

[19]   Gordon, M.S. and Schmidt, M.W. (2005) Advances in Electronic Structure Theory: GAMESS a Decade Later. In: Dykstra, C.E., Frenking, G., Kim, K.S. Scuseria, G.E., Eds., Theory and Applications of Computational Chemistry: The First Forty Years, Elsevier Science, Amsterdam, 1167-1189.
https://doi.org/10.1016/B978-044451719-7/50084-6

[20]   Barca, G.M.J., Bertoni, C., Carrington, L., Datta, D., De Silva, N., Emiliano Deustua, J., Fedorov, D.G., Gour, J.R., Gunina, A.O., Guidez, E., Harville, T., Irle, S., Ivanic, J., Kowalski, K., Leang, S.S., Li, H., Li, W., Lutz, J.J., Magoulas, I., Mato, J., Mironov, V., Nakata, H., Pham, B.Q., Piecuch, P., Poole, D., Pruitt, S.R., Rendell, A.P., Roskop, L.B., Ruedenberg, K., Sattasathuchana, T., Schmidt, M.W., Shen, J., Slipchenko, L., Sosonkina, M., Sundriyal, V., Tiwari, A., Galvez Vallejo, J.L., Westheimer, B., Włoch, M., Xu, P., Zahariev, F. and Gordon, M.S. (2020) Recent Developments in the General Atomic and Molecular Electronic Structure System. The Journal of Chemical Physics, 152, Article ID: 154102.
https://doi.org/10.1063/5.0005188

[21]   Barca, G.M.J., Galvez-Vallejo, J.L., Poole, D.L., Rendell, A.P. and Gordon, M.S. (2020) High-Performance, Graphics Processing Unit-Accelerated Fock Build Algorithm. Journal of Chemical Theory and Computation, 16, 7232-7238.
https://doi.org/10.1021/acs.jctc.0c00768

[22]   Barca, G.M.J., Alkan, M., Galvez-Vallejo, J.L., Poole, D.L., Rendell, A.P. and Gordon, M.S. (2021) Faster Self-Consistent Field (SCF) Calculations on GPU Clusters. Journal of Chemical Theory and Computation, 17, 7486-7503.
https://doi.org/10.1021/acs.jctc.1c00720

[23]   Summit (Supercomputer).
https://en.wikipedia.org/wiki/Summit_(supercomputer)

[24]   Barca, G.M.J., Poole, D.L., Vallejo, J.L.G., Alkan, M., Bertoni, C., Rendell, A.P. and Gordon, M.S. (2020) Scaling the Hartree-Fock Matrix Build on Summit. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, 9-19 November 2020, 1-14.
https://doi.org/10.1109/SC41405.2020.00085

[25]   Barca, G.M.J., Galvez Vallejo, J.L., Poole, D.L., Alkan, M., Stocks, R., Rendell, A.P. and Gordon, M.S. (2021) Enabling Large-Scale Correlated Electronic Structure Calculations: Scaling the RI-MP2 Method on Summit. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21), St. Louis, 14-19 November 2021, Article No. 40.
https://doi.org/10.1145/3458817.3476222

[26]   Peixeiro, M. (2022) Time Series Forecasting in Python. Manning, Shelter Island.

[27]   Sundriyal, V., Sosonkina, M., Poole, D. and Gordon, M.S. (2020) Runtime Power Allocation Approach for Gamess Hybrid CPU-GPU Implementation. Concurrency and Computation: Practice and Experience, 32, Article No. e5917.
https://doi.org/10.1002/cpe.5917

[28]   NVIDIA Tesla V100 GPU Accelerator.
https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf

 
 
Top