IIM  Vol.4 No.4 , July 2012
A New Metric for Measuring Relatedness of ScientificPapers Based on Non-Textual Features
Abstract: Measuring relatedness of two papers is an issue which arises in many applications, e.g., recommendation, clustering and classification of papers. In this paper, a digital library is modeled as a directed graph; each node representing three different types of entities: papers, authors, and venues, and each edge representing relationships between these entities. Based on this graph model, six different types of relations are considered between two papers, and a new metric is proposed for evaluating relatedness of the papers. This metric only focuses on the relational features, and does not consider textual features. We have used it in combination with a textual similarity measure in the context of citation recommendation systems. Experimental results show that using this metric can successfully improve the quality of the recommendations.
Cite this paper: F. Zarrinkalam and M. Kahani, "A New Metric for Measuring Relatedness of ScientificPapers Based on Non-Textual Features," Intelligent Information Management, Vol. 4 No. 4, 2012, pp. 99-107. doi: 10.4236/iim.2012.44016.

[1]   T. Strohman, W. B. Croft and D. Jensen, “Recommending Citations for Academic Papers,” Proceedings of the 30th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Amsterdam, 23-27 July 2007, pp. 705-706.

[2]   S. Bethard and D. Jurafsky, “Who Should I Cite? Learning Literature Search Models from Citation Behavior,” Proceedings of ACM Conference on Information and Knowledge Management, New York, 2010, pp. 609-618.

[3]   M. Vallez and R. Pedraza-Jimenez, “Natural Language Processing in Textual Information Retrieval and Related Topics,” 2007.

[4]   S. Huang, G. Xue, B. Zhang, Z. Chen, Y. Yu and W. Ma, “TSSP: A Reinforcement Algorithm to Find Related Papers,” Proceedings of the Web Intelligence, IEEE/WIC/ ACM International Conference on (WI’04), Shanghai, 20-24 September 2004, pp. 117-123.

[5]   G. Salton and C. Buckley, “Term Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, Vol. 24, No. 5, 1988, pp. 513-523. dio:10.1016/0306-4573(88)90021-0

[6]   P. Lakkaraju, S. Gauch and M. Speretta, “Document Similarity Based on Concept Tree Distance,” Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, Pittsburgh, June 2008, pp. 19-21.doi:10.1145/1379092.1379118

[7]   K. Chandrasekan, S. Gauch, P. Lakkaraju and H. P. Luong, “Concept-Based Document Recommendations for CiteSeer Authors,” Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive WebBased Systems, Hannover, 29 July-1 August 2008, pp. 8392.

[8]   G. H. Martín, S. Schockaert, C. Cornelis and H. Naessens, “Finding Similar Research Papers Using Language Models,” Proceedings of the 2nd Workshop on Semantic Personalized Information Management: Retrieval and Recommendation (SPIM’11), 2011, pp. 106-113.

[9]   A. Ritchie, “Citation Context Analysis for Information Retrieval,” Ph.D. Thesis, University of Cambridge, Cambridge, 2008.

[10]   Q. He, J. Pei, D. Kifer, P. Mitra and C. L. Giles, “Context-Aware Citation Recommendation,” Proceedings of the 19th International World Wide Web Conference (WWW), Raleigh, 26-30 April 2010, pp. 421-430. doi:10.1145/1772690.1772734

[11]   B. Aljaber, N. Stokes, J. Bailey and J. Pei, “Document Clustering of Scientific Texts Using Citation Contexts,” Information Retrieval, Vol. 13, No. 2, 2010, pp. 101-131. doi:10.1007/s10791-009-9108-x

[12]   M. R. Henzinger, R. Motwani and C. Silverstein, “Challenges in Web Search Engines,” Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, 2003, pp. 1573-1579.

[13]   H. Small, “Co-Citation in the Scientific Literature: A New Measurement of the Relationship between Two Documents,” The American Society of Information Science, Vol. 24, No. 4, 1973, pp. 265-269. doi:10.1002/asi.4630240406

[14]   M. Kessler, “Bibliographic Coupling between Scientific Papers,” American Documentation, Vol. 14 No. 1, 1963, pp. 10-25. doi:10.1002/asi.5090140103

[15]   S. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. Lam, A. Rashid, J. Konstan and J. Ried, “On the Recommending of Citations for Research Papers,” Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, New Orleans, 16-20 November 2002, pp. 116125.

[16]   T. Couto, N. Ziviani, P. Calado, M. Cristo, M. Gon?alves, E. S. D. Moura and W. C. Brand?o, “Classifying Documents with Link-Based Bibliometric Measures,” Information Retrieval, Vol. 13, No. 4, 2010, pp. 315-345.doi:10.1007/s10791-009-9119-7

[17]   C. L. Giles, K. D. Bollacker and S. Lawrence, “CiteSeer: An Automatic Citation Indexing System,” Proceedings of Third ACM Conference on Digital Libraries, Pittsburgh, 23-26 June 1998, pp. 89-98. doi:10.1145/276675.276685

[18]   R. Torres, S. M. McNee, M. Abel, J. A. Konstan and J. Riedl, “Enhancing Digital Libraries with TechLens,” Proceeding of IEEE/ACM Joint Conference on Digital Libraries (ACM/IEEE JCDL’2004), Washington DC, 2004, pp. 228-236.

[19]   S. Bethard and D. D. Jurafsky, “Who Should I Cite? Learning Literature Search Models from Citation Behavior,” Proceedings of ACM Conference on Information and Knowledge Management, Toronto, 26-30 October 2010, pp. 609-618.

[20]   D. Liben-Nowell and J. Kleinberg, “The Link Prediction Problem for Social Networks,” Proceeding of the 12th International Conference on Information and Knowledge Management, New Orleans, 2-8 November 2003, pp. 556559.