Cite this paper
Qu, J. , Yin, C. and Song, S. (2015) The Optimization and Improvement of MapReduce in Web Data Mining. Journal of Software Engineering and Applications
, 395-406. doi: 10.4236/jsea.2015.88039
 Dean, R. and Ghemawat, A. (2004) MapReduce: Implified Data Processing on Large Cluster. SDI, 137-149.
 Ghemawat, N., Gobioff, H. and Leung, S.T. (2003) The Google File System. Proceedings of the SOSP’03, Bolton Landing, 19-22 October 2003, 29-43.
 DOUG CUTTING (2005) Scalable Computing with MapReduce. OSCON.
 Borthankur, D. (2007) The Hadoop Distributed File System: Architecture and Design. Apache Software Foundation. 5-14.
 Daniel Abadi, M., DeWitt, D.J., et al. (2010) MapReduce and Parallel DBMSs: Friends or Foes. Communications of the ACM, 53.
 Hadoop, T.W. (2009) The Definitive Guide. O’Reilly Media, 153-174.
 Zaharia, M., Konwinski, A. and Joseph, A.D. (2008) Improving MapReduce Performance in Heterogeneous Environment. Proceedings of the 8th USENIX Conference on Operating Systems De-sign and Implementation, San Diego, 8-10 December 2008, 9-15.
 Becerra, Y., Beltran, V., Carrera, D., Gonzalez, M., Torres, J. and Ayguade, E. (2009) Speeding Up Distributed MapReduce Applications Using Hardware Accelerators. Proceedings of the 2009 Intern-ational Conference on Parallel Processing, Vienna, 22-25 September 2009, 42-49. http://dx.doi.org/10.1109/ICPP.2009.59
 Fei, X., Lu, S. and Lin, C. (2009) A MapReduce-Enabled Scientific Workflow Composition Framework. Proceedings of the IEEE International Conference on Web Services, Los Angeles, 6-10 July 2009, 663-670.
 Hadoop 0.20 Documentation, Capacity Scheduler.
 Hadoop 0.20 Documentation, Fair Scheduler.
 Tian, C., Zhou, H., He, Y. and Zha, L. (2009) A Dynamic MapReduce Scheduler for Heterogeneous Workloads. Proceedings of the 8th International Conference on Grid and Cooperative Computing, Lanzhou, 27-29 August 2009, 218-224.
 Dean, J. and Ghemawat, S. (2004) MapReduce: Simplified Data Processing on Large Clusters. Proceedings of OSDI’04, San Francisco, 5 December 2004, 137-150.
 Pike, R., Dorward, S., Griesemer, R., et al. (2005) Interpreting the Data: Parallel Analysis with Sawzall. Scientific Programming, 13, 227-298. http://dx.doi.org/10.1155/2005/962135
 Lammel, R. (2006) Google’s MapReduce Programming Model—Revisited. Draft, 26 p.
 Tian, F. and Chen, K. (2011) Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds. Proceedings of the 2011 IEEE International Conference on Cloud Computing (CLOUD), Washington DC, 4-9 July 2011, 155-162.
 Kim, K., Jeon, K., Han, H., Kim, S., Jung, H., Yeom, H.Y. and Bench, M.R. (2008) A Benchmark for MapReduce Framework. Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems, Victoria, 8-10 December 2008, 11-18. http://dx.doi.org/10.1109/ICPADS.2008.70
 Kim, K., Jeon, K., Han, H., Kim, S., Jung, H. and Yeom, H.Y. (2008) Mrbench: A Benchmark for MapReduce Framework. Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems, Melbourne, 8-10 December 2008, 11-18.