Aperiodic Checkpoint Placement Algorithms—Survey and Comparison

ABSTRACT

In this article we summarize some
aperiodic checkpoint placement algorithms for a software system over infinite
and finite operation time horizons, and compare them in terms of computational
accuracy. The underlying problem is formulated as the maximization of
steady-state system availability and is to determine the optimal aperiodic
checkpoint sequence. We present two exact computation algorithms in both
forward and backward manners and two approximate ones; constant hazard
approximation and fluid approximation, toward this end. In numerical examples
with Weibull system failure time distribution, it is shown that the combined
algorithm with the fluid approximation can calculate effectively the exact
solutions on the optimal aperiodic checkpoint sequence.

KEYWORDS

Checkpoint Placement; Aperiodic Policy; Availability Models; Computation Algorithms; Comparison

Checkpoint Placement; Aperiodic Policy; Availability Models; Computation Algorithms; Comparison

Cite this paper

S. Hiroyama, T. Dohi and H. Okamura, "Aperiodic Checkpoint Placement Algorithms—Survey and Comparison,"*Journal of Software Engineering and Applications*, Vol. 6 No. 4, 2013, pp. 41-53. doi: 10.4236/jsea.2013.64A006.

S. Hiroyama, T. Dohi and H. Okamura, "Aperiodic Checkpoint Placement Algorithms—Survey and Comparison,"

References

[1] S. Hiroyama, T. Dohi and H. Okamura, “Comparison of Aperiodic Checkpoint Placement Algorithms,” In: G. S. Tomar, R.-S. Chang, O. Gervasi, T. H. Kim and S. K. Bandyopadhyay, Eds., Advanced Computer Science and Information (AST 2010), Communications in Computer and Information Science, Vol. 74, Springer-Verlag, Berlin, 2010, pp. 145-156.

[2] V. F. Nicola, “Checkpointing and Modeling of Program Execution Time,” In: M. R. Lyu, Ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995, pp. 167-188,

[3] K. Naruse and S. Maeji, “Optimal Checkpoint Intervals for Computer Systems,” In: S. Nakamura and T. Nakagawa, Eds., Stochastic Reliability Modeling, Optimization and Applications, World Scientific, Singapore City, 2010, pp. 205-239.

[4] J. W. Young, “A First Order Approximation to the Optimum Checkpoint Interval,” Communications of the ACM, Vol. 17, No. 9, 1974, pp. 530-531. doi:10.1145/361147.361115

[5] F. Baccelli, “Analysis of Service Facility with Periodic Checkpointing,” Acta Informatica, Vol. 15, No. 1, 1981, pp. 67-81. doi:10.1007/BF00269809

[6] K. M. Chandy, “A Survey of Analytic Models of RollBack and Recovery Strategies,” IEEE Computer, Vol. 8, No. 5, 1975, pp. 40-47. doi:10.1109/C-M.1975.218955

[7] K. M. Chandy, J. C. Browne, C. W. Dissly and W. R. Uhrig, “Analytic Models for Rollback and Recovery Strategies in Database Systems,” IEEE Transactions on Software Engineering, Vol. SE-1, No. 1, 1975, pp. 100-110. doi:10.1109/TSE.1975.6312824

[8] T. Dohi, N. Kaio and S. Osaki, “Optimal Ccheckpointing and Rollback Sstrategies with Media Failures: Statistical Estimation Algorithms,” Proceedings of 1999 Pacific Rim International Symposium on Dependable Computing (PRDC 1999), Hong Kong, 16-17 December 1999, pp. 161-168.

[9] T. Dohi, N. Kaio and S. Osaki, “The Optimal Age-Dependent Checkpoint Strategy for a Stochastic System Subject to General Failure Mode,” Journal of Mathematical Analysis and Applications, Vol. 249, No. 1, 2000, pp. 80-94. doi:10.1006/jmaa.2000.6939

[10] T. Dohi, N. Kaio and K. S. Trivedi, “Availability Models with Age Dependent-Checkpointing,” Proceedings of 21st Symposium on Reliable Distributed Systems (SRDS 2002), Osaka, 13-16 October 2002, pp. 130-139.

[11] E. Gelenbe and D. Derochette, “Performance of Rollback Recovery Systems under Intermittent Failures,” Communications of the ACM, Vol. 21, No. 6, 1978, pp. 493-499. doi:10.1145/3 59511.359531

[12] E. Gelenbe, “On the Optimum Checkpoint Interval,” Journal of the ACM, Vol. 26, No. 2, 1979, pp. 259-270. doi:10.1145/322123.322131

[13] E. Gelenbe and M. Hernandez, “Optimum Checkpoints with Age Dependent Failures,” Acta Informatica, Vol. 27, No. 6, 1990, pp. 519-531. doi:10.1007/BF00277388

[14] P. B. Goes and U. Sumita, “Stochastic Models for Performance Analysis of Database Recovery Control,” IEEE Transactions on Computers, Vol. C-44, No. 4, 1995, pp. 561-576. doi:10.1109/1 2.376170

[15] P. B. Goes, “A Stochastic Model for Performance Evaluation of Main Memory Resident Database Systems,” ORSA Journal of Computing, Vol. 7, No. 3, 1997, pp. 269-282. doi:10.1287/ijoc.7.3.269

[16] V. Grassi, L. Donatiello and S. Tucci, “On the Optimal Checkpointing of Critical Tasks and Transaction-Oriented Systems,” IEEE Transactions on Software Engineering, Vol. SE-18, No. 1, 1992, pp. 72-77. doi:10.1109/32.120317

[17] N. Kobayashi and T. Dohi, “Bayesian Perspective of Optimal Checkpoint Placement,” Proceedings of 9th IEEE International Symposium on High Assurance Systems Engineering (HASE 2005), Heidelberg, 12-14 October 2005, pp. 143-159.

[18] V. G. Kulkarni, V. F. Nicola and K. S. Trivedi, “Effects of Checkpointing and Queueing on Program Performance,” Stochastic Models, Vol. 6, No. 4, 1990, pp. 615-648. doi:10.1080/153 26349908807166

[19] V. F. Nicola and J. M. Van Spanje, “Comparative Analysis of Different Models of Checkpointing and Recovery,” IEEE Transactions on Software Engineering, Vol. SE-16, No. 8, 1990, pp. 807-821. doi:10.1109/32.57620

[20] U. Sumita, N. Kaio and P. B. Goes, “Analysis of Effective Service Time with Age Dependent Interruptions and Its Application to Optimal Rollback Policy for Database Management,” Queueing Systems, Vol. 4, No. 3, 1989, pp. 193-212. doi:10.1007/BF02100266

[21] P. L’Ecuyer and J. Malenfant, “Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems,” IEEE Transactions on Computers, Vol. C-37, No. 4, 1988, pp. 491-496. doi:10.1109/1 2.2197

[22] A. Ziv and J. Bruck, “An On-Line Algorithm for Checkpoint Placement,” IEEE Transactions on Computers, Vol. C-46, No. 9, 1997, pp. 976-985. doi:10.1109/12.620479

[23] N. H. Vaidya, “Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme,” IEEE Transactions on Computers, Vol. C-46, No. 8, 1997, pp. 942-947. doi:10.1109/12.609281

[24] H. Okamura, Y. Nishimura and T. Dohi, “A Dynamic Checkpointing Scheme Based on Reinforcement Learning,” Proceedings of 2004 Pacific Rim International Symposium on Dependable Computing (PRDC 2004), Tahiti, 3-5 March 2004, pp. 151-158.

[25] S. Toueg and ?. Babaoglu, “On the Optimum Checkpoint Selection Problem,” SIAM Journal of Computing, Vol. 13, No. 3, 1984, pp. 630-649. doi:10.1137/0213039

[26] N. Kaio and S. Osaki, “A Note on Optimum Checkpointing Policies,” Microelectronics and Reliability, Vol. 25, No. 3, 1985, pp. 451-453. doi:10.1016/0026-2714(85)90195-7

[27] S. Fukumoto, N. Kaio and S. Osaki, “A Study of Checkpoint Generations for a Database Recovery Mechanism,” Computers & Mathematics with Applications, Vol. 24, No. 1-2, 1992, pp. 63-70. doi:10.1016/0898-1221(92)90229-B

[28] S. Fukumoto, N. Kaio and S. Osaki, “Optimal Checkpointing Strategies Using the Checkpointing Density,” Journal of Information Processing, Vol. 15, No. 1, 1992, pp. 87-92.

[29] Y. Ling, J. Mi and X. Lin, “A Variational Calculus Approach to Optimal Checkpoint Placement,” IEEE Transactions on Computers, Vol. 50, No. 7, 2001, pp. 699-707. doi:10.1109/12.936236

[30] T. Ozaki, T. Dohi, H. Okamura and N. Kaio, “Min-Max Checkpoint Placement under Incomplete information,” Proceedings of 2004 International Conference on Dependable Systems and Networks (DSN 2004), Florence, June 28-July 1 2004, pp. 721-730.

[31] T. Ozaki, T. Dohi, H. Okamura and N. Kaio, “Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle,” IEEE Transactions on Dependable and Secure Computing, Vol. 3, No. 2, 2006, pp. 130-140. doi:10.1109/TDSC.2006.22

[32] T. Dohi, T. Ozaki and N. Kaio, “Optimal Sequential Checkpoint Placement with Equality Constraints,” Proceedings of 2nd IEEE International Symposium on Dependable Autonomic and Secure Computing (DASC 2006), Indianapolis, 29 September-1 October 2006, pp. 77-84.

[33] K. Iwamoto, T. Maruo, H. Okamura and T. Dohi, “Aperiodic Optimal Checkpoint Sequence under Steady-State System Availability Criterion,” Proceedings of 2006 Asian International Workshop on Advanced Reliability Modeling (AIWARM 2006), Busan, 24-25 August 2006, pp. 251-258.

[34] H. Okamura, K. Iwamoto and T. Dohi, “A Dynamic Programming Algorithm for Software Rejuvenation Scheduling under Distributed Computation Circumstance,” Journal of Computer Science, Vol. 2, No. 6, 2006, pp. 505-512. doi:10.3844/jcssp.2006.505.512

[35] H. Okamura, K. Iwamoto and T. Dohi, “A DP-Based Optimal Checkpointing Algorithm for Realtime Appications,” International Journal of Reliability, Quality and Safety Engineering, Vol. 13, No. 4, 2006, pp. 323-340. doi:10.1142/S0218539306002288

[36] H. Okamura and T. Dohi, “Comprehensive Evaluation of Aperiodic Checkpointing and Rejuvenation Schemes in Operational Software System,” Journal of Systems and Software, Vol. 83, No. 9, 2010, pp. 1591-1604. doi:10.1016/j.jss.2009.06.058

[37] T. Ozaki, T. Dohi and N. Kaio, “Numerical Computation Aalgorithms for Ssequential Checkpoint Placement,” Performance Evaluation, Vol. 66, No. 6, 2009, pp. 311-326. doi:10.1016/j.p eva.2008.11.003

[38] R. E. Barlow and F. Proschan, “Mathematical Theory of Reliability,” Society for Industrial and Applied Mathematics, Philadelphia, 1996. doi:10.1137/1.9781611971194

[39] K. Naruse, T. Nakagawa and Y. Okuda, “Optimal Checking Time of Backup Operation for a Database System,” In: T. Dohi, S. Osaki and K. Sawaki, Eds., Recent Advances in Stochastic Operations Research, World Scientific, Singapore City, 2007, pp. 131-143.

[40] K. Naruse, T. Nakagawa and S. Maeji, “Optimal Sequential Checkpoint Intervals for Error Detection,” In: T. Dohi, S. Osaki and K. Sawaki, Eds., Recent Advances in Stochastic Operations Research II, World Scientific, Singapore City, 2009, pp. 213-224

[1] S. Hiroyama, T. Dohi and H. Okamura, “Comparison of Aperiodic Checkpoint Placement Algorithms,” In: G. S. Tomar, R.-S. Chang, O. Gervasi, T. H. Kim and S. K. Bandyopadhyay, Eds., Advanced Computer Science and Information (AST 2010), Communications in Computer and Information Science, Vol. 74, Springer-Verlag, Berlin, 2010, pp. 145-156.

[2] V. F. Nicola, “Checkpointing and Modeling of Program Execution Time,” In: M. R. Lyu, Ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995, pp. 167-188,

[3] K. Naruse and S. Maeji, “Optimal Checkpoint Intervals for Computer Systems,” In: S. Nakamura and T. Nakagawa, Eds., Stochastic Reliability Modeling, Optimization and Applications, World Scientific, Singapore City, 2010, pp. 205-239.

[4] J. W. Young, “A First Order Approximation to the Optimum Checkpoint Interval,” Communications of the ACM, Vol. 17, No. 9, 1974, pp. 530-531. doi:10.1145/361147.361115

[5] F. Baccelli, “Analysis of Service Facility with Periodic Checkpointing,” Acta Informatica, Vol. 15, No. 1, 1981, pp. 67-81. doi:10.1007/BF00269809

[6] K. M. Chandy, “A Survey of Analytic Models of RollBack and Recovery Strategies,” IEEE Computer, Vol. 8, No. 5, 1975, pp. 40-47. doi:10.1109/C-M.1975.218955

[7] K. M. Chandy, J. C. Browne, C. W. Dissly and W. R. Uhrig, “Analytic Models for Rollback and Recovery Strategies in Database Systems,” IEEE Transactions on Software Engineering, Vol. SE-1, No. 1, 1975, pp. 100-110. doi:10.1109/TSE.1975.6312824

[8] T. Dohi, N. Kaio and S. Osaki, “Optimal Ccheckpointing and Rollback Sstrategies with Media Failures: Statistical Estimation Algorithms,” Proceedings of 1999 Pacific Rim International Symposium on Dependable Computing (PRDC 1999), Hong Kong, 16-17 December 1999, pp. 161-168.

[9] T. Dohi, N. Kaio and S. Osaki, “The Optimal Age-Dependent Checkpoint Strategy for a Stochastic System Subject to General Failure Mode,” Journal of Mathematical Analysis and Applications, Vol. 249, No. 1, 2000, pp. 80-94. doi:10.1006/jmaa.2000.6939

[10] T. Dohi, N. Kaio and K. S. Trivedi, “Availability Models with Age Dependent-Checkpointing,” Proceedings of 21st Symposium on Reliable Distributed Systems (SRDS 2002), Osaka, 13-16 October 2002, pp. 130-139.

[11] E. Gelenbe and D. Derochette, “Performance of Rollback Recovery Systems under Intermittent Failures,” Communications of the ACM, Vol. 21, No. 6, 1978, pp. 493-499. doi:10.1145/3 59511.359531

[12] E. Gelenbe, “On the Optimum Checkpoint Interval,” Journal of the ACM, Vol. 26, No. 2, 1979, pp. 259-270. doi:10.1145/322123.322131

[13] E. Gelenbe and M. Hernandez, “Optimum Checkpoints with Age Dependent Failures,” Acta Informatica, Vol. 27, No. 6, 1990, pp. 519-531. doi:10.1007/BF00277388

[14] P. B. Goes and U. Sumita, “Stochastic Models for Performance Analysis of Database Recovery Control,” IEEE Transactions on Computers, Vol. C-44, No. 4, 1995, pp. 561-576. doi:10.1109/1 2.376170

[15] P. B. Goes, “A Stochastic Model for Performance Evaluation of Main Memory Resident Database Systems,” ORSA Journal of Computing, Vol. 7, No. 3, 1997, pp. 269-282. doi:10.1287/ijoc.7.3.269

[16] V. Grassi, L. Donatiello and S. Tucci, “On the Optimal Checkpointing of Critical Tasks and Transaction-Oriented Systems,” IEEE Transactions on Software Engineering, Vol. SE-18, No. 1, 1992, pp. 72-77. doi:10.1109/32.120317

[17] N. Kobayashi and T. Dohi, “Bayesian Perspective of Optimal Checkpoint Placement,” Proceedings of 9th IEEE International Symposium on High Assurance Systems Engineering (HASE 2005), Heidelberg, 12-14 October 2005, pp. 143-159.

[18] V. G. Kulkarni, V. F. Nicola and K. S. Trivedi, “Effects of Checkpointing and Queueing on Program Performance,” Stochastic Models, Vol. 6, No. 4, 1990, pp. 615-648. doi:10.1080/153 26349908807166

[19] V. F. Nicola and J. M. Van Spanje, “Comparative Analysis of Different Models of Checkpointing and Recovery,” IEEE Transactions on Software Engineering, Vol. SE-16, No. 8, 1990, pp. 807-821. doi:10.1109/32.57620

[20] U. Sumita, N. Kaio and P. B. Goes, “Analysis of Effective Service Time with Age Dependent Interruptions and Its Application to Optimal Rollback Policy for Database Management,” Queueing Systems, Vol. 4, No. 3, 1989, pp. 193-212. doi:10.1007/BF02100266

[21] P. L’Ecuyer and J. Malenfant, “Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems,” IEEE Transactions on Computers, Vol. C-37, No. 4, 1988, pp. 491-496. doi:10.1109/1 2.2197

[22] A. Ziv and J. Bruck, “An On-Line Algorithm for Checkpoint Placement,” IEEE Transactions on Computers, Vol. C-46, No. 9, 1997, pp. 976-985. doi:10.1109/12.620479

[23] N. H. Vaidya, “Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme,” IEEE Transactions on Computers, Vol. C-46, No. 8, 1997, pp. 942-947. doi:10.1109/12.609281

[24] H. Okamura, Y. Nishimura and T. Dohi, “A Dynamic Checkpointing Scheme Based on Reinforcement Learning,” Proceedings of 2004 Pacific Rim International Symposium on Dependable Computing (PRDC 2004), Tahiti, 3-5 March 2004, pp. 151-158.

[25] S. Toueg and ?. Babaoglu, “On the Optimum Checkpoint Selection Problem,” SIAM Journal of Computing, Vol. 13, No. 3, 1984, pp. 630-649. doi:10.1137/0213039

[26] N. Kaio and S. Osaki, “A Note on Optimum Checkpointing Policies,” Microelectronics and Reliability, Vol. 25, No. 3, 1985, pp. 451-453. doi:10.1016/0026-2714(85)90195-7

[27] S. Fukumoto, N. Kaio and S. Osaki, “A Study of Checkpoint Generations for a Database Recovery Mechanism,” Computers & Mathematics with Applications, Vol. 24, No. 1-2, 1992, pp. 63-70. doi:10.1016/0898-1221(92)90229-B

[28] S. Fukumoto, N. Kaio and S. Osaki, “Optimal Checkpointing Strategies Using the Checkpointing Density,” Journal of Information Processing, Vol. 15, No. 1, 1992, pp. 87-92.

[29] Y. Ling, J. Mi and X. Lin, “A Variational Calculus Approach to Optimal Checkpoint Placement,” IEEE Transactions on Computers, Vol. 50, No. 7, 2001, pp. 699-707. doi:10.1109/12.936236

[30] T. Ozaki, T. Dohi, H. Okamura and N. Kaio, “Min-Max Checkpoint Placement under Incomplete information,” Proceedings of 2004 International Conference on Dependable Systems and Networks (DSN 2004), Florence, June 28-July 1 2004, pp. 721-730.

[31] T. Ozaki, T. Dohi, H. Okamura and N. Kaio, “Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle,” IEEE Transactions on Dependable and Secure Computing, Vol. 3, No. 2, 2006, pp. 130-140. doi:10.1109/TDSC.2006.22

[32] T. Dohi, T. Ozaki and N. Kaio, “Optimal Sequential Checkpoint Placement with Equality Constraints,” Proceedings of 2nd IEEE International Symposium on Dependable Autonomic and Secure Computing (DASC 2006), Indianapolis, 29 September-1 October 2006, pp. 77-84.

[33] K. Iwamoto, T. Maruo, H. Okamura and T. Dohi, “Aperiodic Optimal Checkpoint Sequence under Steady-State System Availability Criterion,” Proceedings of 2006 Asian International Workshop on Advanced Reliability Modeling (AIWARM 2006), Busan, 24-25 August 2006, pp. 251-258.

[34] H. Okamura, K. Iwamoto and T. Dohi, “A Dynamic Programming Algorithm for Software Rejuvenation Scheduling under Distributed Computation Circumstance,” Journal of Computer Science, Vol. 2, No. 6, 2006, pp. 505-512. doi:10.3844/jcssp.2006.505.512

[35] H. Okamura, K. Iwamoto and T. Dohi, “A DP-Based Optimal Checkpointing Algorithm for Realtime Appications,” International Journal of Reliability, Quality and Safety Engineering, Vol. 13, No. 4, 2006, pp. 323-340. doi:10.1142/S0218539306002288

[36] H. Okamura and T. Dohi, “Comprehensive Evaluation of Aperiodic Checkpointing and Rejuvenation Schemes in Operational Software System,” Journal of Systems and Software, Vol. 83, No. 9, 2010, pp. 1591-1604. doi:10.1016/j.jss.2009.06.058

[37] T. Ozaki, T. Dohi and N. Kaio, “Numerical Computation Aalgorithms for Ssequential Checkpoint Placement,” Performance Evaluation, Vol. 66, No. 6, 2009, pp. 311-326. doi:10.1016/j.p eva.2008.11.003

[38] R. E. Barlow and F. Proschan, “Mathematical Theory of Reliability,” Society for Industrial and Applied Mathematics, Philadelphia, 1996. doi:10.1137/1.9781611971194

[39] K. Naruse, T. Nakagawa and Y. Okuda, “Optimal Checking Time of Backup Operation for a Database System,” In: T. Dohi, S. Osaki and K. Sawaki, Eds., Recent Advances in Stochastic Operations Research, World Scientific, Singapore City, 2007, pp. 131-143.

[40] K. Naruse, T. Nakagawa and S. Maeji, “Optimal Sequential Checkpoint Intervals for Error Detection,” In: T. Dohi, S. Osaki and K. Sawaki, Eds., Recent Advances in Stochastic Operations Research II, World Scientific, Singapore City, 2009, pp. 213-224