JSEA  Vol.5 No.4 , April 2012
Measuring Whitespace Pattern Sequences as an Indication of Plagiarism
Abstract: There are several methods and technologies for comparing the statements, comments, strings, identifiers, and other visible elements of source code in order to efficiently identify similarity. In a prior paper we found that comparing the whitespace patterns was not precise enough to identify copying by itself. However, several possible methods for improving the precision of a whitespace pattern comparison were presented, the most promising of which was an examination of the sequences of lines with matching whitespace patterns. This paper demonstrates a method of evaluating the sequences of matching whitespace patterns and a detailed study of the method’s reliability.
Cite this paper: N. Baer and R. Zeidman, "Measuring Whitespace Pattern Sequences as an Indication of Plagiarism," Journal of Software Engineering and Applications, Vol. 5 No. 4, 2012, pp. 249-254. doi: 10.4236/jsea.2012.54029.

[1]   E. Brady and C. Morris, “Whitespace,” 2004.

[2]   G. Cosma and M. Joy, “Source-Code Plagiarism: A UK Academic Perspective,” Research Report, University of Warwick, Coventry, 2006, pp. 116-120.

[3]   G. Cosma, “An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis,” Ph.D. Thesis, University of Warwick, Coventry, 2008.

[4]   P. J. Plauger, “Fingerprints,” Embedded Systems Programming, Miller Freeman, San Francisco, 1994, pp. 84-87.

[5]   S. Schleimer, D. Wilkerson and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proceedings of the 2003 SIGMOD International Conference on Management of Data, San Diego, 9-12 June 2003, pp. 76-85.

[6]   B. Cui, L. Han, Y. Hao, Z. Li, J. Wang and R. Zhang, “Type Redefinition Plagiarism Detection of Token-Based Comparison,” Proceedings of the 2010 International Conference on Multimedia Information Networking and Security of the IEEE Computer Society, Nanjing, 4-6 November 2010, pp. 351-355.

[7]   G. Malpohl, M. Philippsen and L. Prechelt, “Finding Plagiarisms among a Set of Programs with JPlag,” Journal of Universal Computer Science, Vol. 8, No. 11, 2000, pp. 1016-1038.

[8]   M. Wise, “YAP3: Improved Detection of Similarities in Computer Program and Other Texts,” Proceedings of the 27th SIGCSE Technical Symposium on Computer Science Education, Philadelphia, 15-18 February 1996, pp. 130-134.

[9]   C. Anderson and M. Ellis, “Plagiarism Detection in Computer Code,” Rose-Hulman Institute of Technology, Terre Haute, 2005.

[10]   H. T. Jonkowitz, “Detecting Plagiarism in Student Pascal Programs,” The Computer Journal, Vol. 31, No. 1, 1998, pp. 1-8. doi:10.1093/comjnl/31.1.1

[11]   E. Merlo, “Detection of Plagiarism in University Projects Using Metrics-Based Spectral Similarity,” Dagstuhl Seminar Proceedings, Dagstuh1, Saarland, 2007.

[12]   R. Zeidman, “Software Source Code Correlation,” Proceedings of the 5th IEEE/ACIS International Workshop on Component-Based Software Engineering, Honolulu, 10-12 July 2006, pp. 383-392. doi:10.1109/ICIS-COMSAR.2006.79

[13]   R. Zeidman, “Multidimensional Correlation of Software Source Code,” Proceedings of the 3rd International Workshop on Systematic Approaches to Digital Forensic Engineering, Oakland, 22-22 May 2008, pp 144-156. doi:10.1109/SADFE.2008.9

[14]   H. Li, Z. J. Li, H. H. Yan and H. Xiong, “BUAA_AntiPlagiarism: A System to Detect Plagiarism for C Source Code,” Proceedings of the International Conference on Computational Intelligence and Software Engineering, Wuhan, 11-13 December 2009, pp. 1-5. doi:10.1109/CISE.2009.5366790

[15]   U. Bandara and G. Wijayarathna, “A Machine Learning Based Tool for Source Code Plagiarism Detection,” International Journal of Machine Learning and Computing, Vol. 1, No. 4, 2011, pp. 337-343.

[16]   J. Hamblen and A. Parker, “Computer Algorithms for Plagiarism Detection,” IEEE Transactions on Education, Vol. 32, No. 2, 1989, pp. 94-99. doi:10.1109/13.28038

[17]   C. Daly and J. Horgan, “A Technique for Detecting Plagiarism in Computer Code,” The Computer Journal, Vol. 48, No. 6, 2005, pp. 662-666. doi:10.1093/comjnl/bxh139

[18]   S. Aliefendic, “Using Whitespace Patterns to Detect Plagiarism in Program Code,” School of Computer Science and Informatics University College Dublin, Dublin, 2003.

[19]   R. Zeidman, “The Software IP Detective’s Handbook: Measurement, Comparison, and Infringement Detection,” Prentice Hall, Boston, 2011

[20]   B. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,” Proceedings of the Second Working Conference on Reverse Engineering, Washington DC, 1995, pp. 86-95.

[21]   I. Shay, N. Baer and R. Zeidman, “Measuring Whitespace Patterns as an Indication of Plagiarism,” Proceedings of the ADFSL Conference on Digital Forensics, Security and Law, St. Paul, 20 May 2010, pp. 63-72.

[22]   N. Baer and B. Zeidman, “Measuring Software Evolution with Changing Lines of Code,” Proceedings of the 24th International Conference on Computers and Their Applications, New Orleans, 8-10 April 2009, pp. 264-270.