IJCNS  Vol.8 No.5 , May 2015
Predictive Prefetching for Parallel Hybrid Storage Systems
Author(s) Maen M. Al Assaf*
ABSTRACT
In this paper, we present a predictive prefetching mechanism that is based on probability graph approach to perform prefetching between different levels in a parallel hybrid storage system. The fundamental concept of our approach is to invoke parallel hybrid storage system’s parallelism and prefetch data among multiple storage levels (e.g. solid state disks, and hard disk drives) in parallel with the application’s on-demand I/O reading requests. In this study, we show that a predictive prefetching across multiple storage levels is an efficient technique for placing near future needed data blocks in the uppermost levels near the application. Our PPHSS approach extends previous ideas of predictive prefetching in two ways: (1) our approach reduces applications’ execution elapsed time by keeping data blocks that are predicted to be accessed in the near future cached in the uppermost level; (2) we propose a parallel data fetching scheme in which multiple fetching mechanisms (i.e. predictive prefetching and application’s on-demand data requests) can work in parallel; where the first one fetches data blocks among the different levels of the hybrid storage systems (i.e. low-level (slow) to high-level (fast) storage devices) and the other one fetches the data from the storage system to the application. Our PPHSS strategy integrated with the predictive prefetching mechanism significantly reduces overall I/O access time in a hybrid storage system. Finally, we developed a simulator to evaluate the performance of the proposed predictive prefetching scheme in the context of hybrid storage systems. Our results show that our PPHSS can improve system performance by 4% across real-world I/O traces without the need of using large size caches.

Cite this paper
Assaf, M. (2015) Predictive Prefetching for Parallel Hybrid Storage Systems. International Journal of Communications, Network and System Sciences, 8, 161-180. doi: 10.4236/ijcns.2015.85018.
References
[1]   Al Assaf, M.M. Informed Prefetching in Distributed Multi-Level Storage Systems.
http://hdl.handle.net/10415/2935

[2]   Yang, C.K., Mitra, T. and Chiueh, T. (2002) A Decoupled Architecture for Application-Specific File Prefetching. Proceedings of the Freenix Track: 2002 USENIX Annual Conference, Monterey, 10-15 June 2002, 157-170.

[3]   Griffioen, J. and Appleton, R. (1994) Reducing File System Latency Using a Predictive Approach. Proceedings of the 1994 USENIX Annual Technical Conference, Boston, 6-10 June 1994.

[4]   Patterson Hugo, R., Gibson, G., Stodolsky, D. and Zelenka, J. (1995) Informed Prefetching and Caching. Proceedings of the 15th ACM Symposium on Operating System Principles, Colorado, 3-6 December 1995, 79-95.

[5]   Chen, Y., Byna, S., Sun, X., Thakur, R. and Gropp, W. (2008) Hiding I/O Latency with Pre-Execution Prefetching for Parallel Applications. Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, 15-21 November 2008, 1-10.

[6]   Al Assaf, M.M., Jiang, X.F., Riduan Abid, M. and Qin, X. (2013) Eco-Storage: A Hybrid Storage System with Energy-Efficient Informed Prefetching. Journal of Signal Processing Systems, 72, 165-180.
http://dx.doi.org/10.1007/s11265-013-0784-9

[7]   Al Assaf, M.M., Qin, X., Jiang, X., Zhang, J. and Alghamdi, M. (2012) A Pipelining Approach to Informed Prefetching in Distributed Multi-Level Storage Systems. Proceedings of the 11th IEEE International Symposium on Network Computing and Applications (IEEE NCA’12), Cambridge, 23-25 August 2012, 87-95.

[8]   Song, J. and Zhang, X. (2004) ULC: A File Block Placement and Replacement Protocol to Effectively Exploit Hierarchical Locality in Multilevel Buffer Caches. Proceedings of the 24th International Conference on Distributed Computer Systems, Tokyo, 23-26 March 2004, 168-177.

[9]   Zhang, Z., Lee, K., Ma, X. and Zhou, Y. (2008) PFC: Transparent Optimization of Existing Prefetching Strategies for Multi-Level Storage Systems. Proceedings of the 28th International Conference on Distributed Computing System, Beijing, 17-20 June 2008, 740-751.
http://dx.doi.org/10.1109/icdcs.2008.89

[10]   Thomasian, A. (2006) Multi-Level RAID for Very Large Disk Arrays. ACM SIGMETRICS Performance Evaluation Review, 33, 17-22.
http://dx.doi.org/10.1145/1138085.1138091

[11]   Nijim, M. (2010) Modelling Speculative Prefetching for Hybrid Storage Systems. Proceedings of the 2010 IEEE 5th International Conference on Networking, Architecture, and Storage, Macau, 15-17 July 2010, 143-151.

[12]   Kaneko, T. (1974) Optimal Task Switching Policy for a Multilevel Storage System. IBM Journal of Research and Development, 18, 310-315.
http://dx.doi.org/10.1147/rd.184.0310

[13]   Rivera, G. and Tseng, C.-W. (1999) Locality Optimizations for Multi-Level Caches. Proceedings of the 1999 ACM/ IEEE Conference on Supercomputing (CDROM), Portland, 14-19 November 1999, Article No. 2.
http://dx.doi.org/10.1145/331532.331534

[14]   Przybylski, S., Horowitz, M. and Hennessy, J. (1988) Performance Tradeoffs in Cache Design. Proceedings of the 15th Annual International Symposium on Computer Architecture, Honolulu, 30 May-2 June 1988, 290-298.
http://dx.doi.org/10.1109/isca.1988.5239

[15]   Przybylski, S., Horowitz, M. and Hennessy, J. (1989) Characteristics of Performance-Optimal Multi-Level Cache Hierarchies. Proceedings of 16th Annual International Symposium on Computer Architecture, 28 May-1 June 1989, 114-121.
http://dx.doi.org/10.1109/ISCA.1989.714545

[16]   Jason, F. (2002) Multi-Level Memory Prefetching for Media and Stream Processing. Proceedings of the IEEE International Conference on Multimedia and Expo, St. Louis, 26-29 August 2002, 101-104.

[17]   Kang, J. and Sung, W. (2001) A Multi-Level Block Priority Based Instruction Caching Scheme for Multimedia Processors. Proceedings of the 24th International Conference on Distributed Computer Systems, Antwerp, 125-132.

[18]   Jiang, X.F., Al Assaf, M.M., Zhang, J., Alghamdi, M.I., Ruan, X.J., Muzaffar, T. and Qin, X. (2013) Thermal Modeling of Hybrid Storage Clusters. Journal of Signal Processing Systems, 72, 181-193.
http://dx.doi.org/10.1007/s11265-013-0787-6

[19]   Lewis, J., Alghamdi, M.I., Assaf, M.A., Ruan, X.-J., Ding, Z.-Y. and Qin, X. (2010) An Automatic Prefetching and Caching System. Proceedings of the 29th International Performance Computing and Communications Conference (IPCCC), Albuquerque, 9-11 December 2010, 180-187.
http://dx.doi.org/10.1109/PCCC.2010.5682310

[20]   Vellanki, V. and Chervenak, A.L. (1999) A Cost-Benefit Scheme for High Performance Predictive Prefetching. Proceedings of the ACM/IEEE SC99 Conference on Supercomputing, Portland, 14-19 November 1999.

[21]   Chen, Y., Byna, S. and Sun, X. (2007) Data Access History Cache and Associated Data Prefetching Mechanisms. Proceedings of the AMC/IEEE Conference on Supercomputing, Reno, 10-16 November 2007, 1-12.
http://dx.doi.org/10.1145/1362622.1362651

[22]   Wang, J.Y.Q., Ong, J.S., Coady, Y. and Feeley, M.J. (2000) Using Idle Workstations to Implement Predictive Prefetching. Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, 1-4 August 2000, 87-94.

[23]   Domenech, J., Sahuquillo, J., Gil, J.A. and Pont, A. (2006) The Impact of the Web Prefetching Architecture on the Limits of Reducing User’s Perceived Latency. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, 18-22 December 2006, 740-744.

[24]   Jeon, J., Lee, G., Cho, H. and Ahn, B. (2003) A Prefetching Web Caching Method Using Adaptive Search Patterns. Proceedings of the 2003 IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, Victoria, 28-30 August 2003, 37-40.

[25]   James, O. and Daniel, A.R. (2002) Markov Model Prediction of I/O Requests for Scientific Applications. Proceedings of the 16th International Conference on Supercomputing, New York, 22-26 June 2002, 147-155.
http://dx.doi.org/10.1145/514191.514214

[26]   Nanopoulos, A., Katsaros, D. and Manolopoulos, Y. (2003) A Data Mining Algorithm for Generalized Web Prefetching. IEEE Transactions on Knowledge and Data Engineering, 15.
http://dx.doi.org/10.1109/TKDE.2003.1232270

[27]   Hugo Patterson, R., Gibson, G.A. and Satyanarayanan, M. (1993) A Status Report on Research in Transparent Informed Prefetching. ACM SIGOPS Operating Systems Review, 27, 21-34.
http://dx.doi.org/10.1145/155848.155855

[28]   Tomkins, A., Hugo Patterson, R. and Gibson, G. (1997) Informed Multi-Process Prefetching and Caching. Proceedings of the 1997 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Seattle, 15-18 June 1997, 100-114.
http://dx.doi.org/10.1145/258612.258680

[29]   Hugo Patterson, R., Gibson, G.A. and Satyanarayanan, M. (1992) Using Transparent Informed Prefetching (TIP) to Reduce File Read Latency. Proceedings of the Conference on Mass Storage Systems and Technologies, Greenbelt, 22-24 September 1992, 329-342.

[30]   Hugo Patterson, R. and Gibson, G.A. (1994) Exposing I/O Concurrency with Informed Prefetching. Proceedings of the Third International Conference on Parallel and Distributed Information Systems, Austin, 28-30 September 1994, 7-16.
http://dx.doi.org/10.1109/PDIS.1994.331737

[31]   Huizinga, D.M. and Desai, S. (2000) Implementation of Informed Prefetching and Caching in Linux. Proceedings of the International Conference on Information Technology, Las Vegas, 27-29 March 2000, 443-448.

[32]   Chen, Y., Byna, S., Sun, X., Thakur, R. and Gropp, W. (2008) Exploring Parallel I/O Concurrency with Speculative Prefetching. Proceedings of the 2008 37th International Conference on Parallel Processing, Portland, 8-12 September 2008, 422-429.
http://dx.doi.org/10.1109/icpp.2008.54

[33]   Kimbrel, T., Cao, P., Felten, E., Karlin, A. and Li, K. (1996) Integrated Parallel Prefetching and Caching. Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Philadelphia, 23-26 May 1996, 262-263.
http://dx.doi.org/10.1145/233013.233052

[34]   Ganger, G.R., Worthington, B.L., Hou, R.Y. and Patt, Y.N. (1994) Disk Arrays: High-Performance, High-Reliability Storage Subsystems. Computer, 27, 30-36.
http://dx.doi.org/10.1109/2.268882

[35]   Chang, F. and Gibson, G.A. (1999) Automatic I/O Hint Generation through Speculative Execution. Proceedings of the 3rd Symposium on Operating Systems Design and Implementation, New Orleans, 22-25 February 1999, 1-14.

[36]   Byna, S., Chen, Y., Sun, X.-H., Thakur, R. and Gropp, W. (2008) Parallel I/O Prefetching Using MPI File Caching and I/O Signatures. Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, 15-21 November 2008.

[37]   Li, C.P., Shen, K. and Papathanasiou, A.E. (2007) Competitive Prefetching for Concurrent Sequential I/O. Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, Lisbon, 21-23 March 2007, 189-202.
http://dx.doi.org/10.1145/1272996.1273017

[38]   Nijim, M., Zong, Z., Qin, X. and Nijim, Y. (2010) Multi-Layer Prefetching for Hybrid Storage Systems: Algorithms, Models, and Evaluations. Proceedings of the 2010 39th International Conference on Parallel Processing Workshops, San Diego, 13-16 September 2010, 44-49.
http://dx.doi.org/10.1109/ICPPW.2010.18

[39]   Borthakur, D. (2008) HDFS Architecture, the Apache Software Foundation.
http://hadoop.apache.org/docs/

[40]   Shafer, J., Rixner, S. and Cox, A.L. (2010) The Hadoop Distributed Filesystem: Balancing Portability and Performance. Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), White Plains, 28-30 March 2010, 122-133.
http://dx.doi.org/10.1109/ISPASS.2010.5452045

[41]   Hadoop Archive Guide. http://hadoop.apache.org/docs/

[42]   Lasr Trace Machine 01. http://iotta.snia.org/

 
 
Top