Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.
 Agrawal, D., Das, S. and El Abbadi, A. (2011) Big Data and Cloud Computing: Current State and Future Opportunities. Proceedings of the 14th International Conference on Extending Database Technology, ACM, 2011, 530-533. http://dx.doi.org/10.1145/1951365.1951432
 Hadoop Introduction (2014). http://hadoop.apache.org/
 Ghe-mawat, J.D.S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-113. http://dx.doi.org/10.1145/1327452.1327492
 Islam, N.S., Rahman, M., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C. and Panda, D.K. (2012) High Performance RDMA-Based Design of HDFS over InfiniBand. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 35. http://dx.doi.org/10.1109/SC.2012.65
 Kim, K., Jeon, K., Han, H., Kim, S.-G., Jung, H. and Yeom, H.Y. (2008) Mrbench: A Benchmark for Mapreduce Framework. 14th IEEE International Conference on Parallel and Distributed Systems, 2008, 11-18. http://dx.doi.org/10.1109/ICPADS.2008.70
 Spark Lightning-Fast Cluster Computing (2014). http://spark.incubator.apache.org/
 Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I. (2010) Spark: Cluster Computing with Working Sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, Berkeley, 2010, 10. http://dl.acm.org/citation.cfm?id=1863103.1863113
 Odersky, M., Spoon, L. and Venners, B. (2008) Programming in Scala. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.202.9255n&rep=rep1n&type=pdf
 Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K. and Currey, J. (2008) Dryadlinq: A System for General-Purpose Distributed Data Parallel Computing Using a High-Level Language. OSDI, 8, 1-4.
 Mosharaf Chowdhury, M.Z. and Das, T. (2012) Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In- Memory Cluster Computing. NSDI’12 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, San Jose, USENIX Association Berkeley, April 2012.
 Mesos: A Distributed Systems Kernel (2014). http://mesos.apache.org
 Free Open Source Seismic Inter-pretation Platform (2015). http://opendtect.org/
 nmon_analyser (2015). https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyer