SN  Vol.3 No.2 , February 2014
A Process to Support Analysts in Exploring and Selecting Content from Online Forums
Abstract: The public content increasingly available on the Internet, especially in online forums, enables researchers to study society in new ways. However, qualitative analysis of online forums is very time consuming and most content is not related to researchers’ interest. Consequently, analysts face the following problem: how to efficiently explore and select the content to be analyzed? This article introduces a new process to support analysts in solving this problem. This process is based on unsupervised machine learning techniques like hierarchical clustering and term co-occurrence network. A tool that helps to apply the proposed process was created to provide consolidated and structured results. This includes measurements and a content exploration interface.
Cite this paper: Carvalho, D. , Marcacini, R. , Lucena, C. and Rezende, S. (2014) A Process to Support Analysts in Exploring and Selecting Content from Online Forums. Social Networking, 3, 86-93. doi: 10.4236/sn.2014.32011.

[1]   J. Preece and D. Maloney-Krichmar, “Online Communi- ties: Design, Theory, and Practice,” Journal of Computer- Mediated Communication, Vol. 10, No. 4, 2005, Article 1.

[2]   R. V. Kozinets, “Netnography: Doing Ethnographic Re- search Online,” Sage Publications Ltd., London, 2010.

[3]   D. Carvalho, W. Madeira, M. Okamura, C. Lucena and S. Zanetta, “A Practical Approach to Exploit Public Data Available on the Internet to Study Healthcare Issues,” Proceedings of the XII Workshop on Medical Informatics, XXXII Congress of the Brazilian Computer Society Computer Society, Curitiba, 2012.

[4]   F. Lefevre and A. M. C. Lefevre, “The Collective Subject That Speaks,” Interface-Comunica??o, Saúde, Educa??o, Vol. 10, No. 20, 2006, pp. 517-524.

[5]   D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabási, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, T. Jebara, G. King, M. Macy, D. Roy and M. Van Alstyne, “Computational Social Science,” Science, Vol. 323, No. 5915, 2009, pp. 721-723.

[6]   J. N. Lasker, E. D. Sogolow and R. R. Sharim, “The Role of an Online Community for People with a Rare Disease: Content Analysis of Messages Posted on a Primary Biliary Cirrhosis Mailing List,” Journal of Medical Internet Research, Vol. 7, No. 1, 2005, p. e10.

[7]   L. C. Whitehead, “Methodological and Ethical Issues in Internet-Mediated Research in the Field of Health: An Integrated Review of the Literature,” Social Science & Medicine, Vol. 57, No. 4, 2007, pp. 782-791.

[8]   J. A. Greene, N. K. Choudhry, E. Kilabuk and W. H. Shrank, “Online Social Networking by Patients with Diabetes: A Qualitative Evaluation of Communication with Facebook,” Journal of General Internal Medicine, Vol. 26, No. 3, 2011, pp. 287-292.

[9]   M. Berger, T. H. Wagner and L. C. Baker, “Internet Use and Stigmatized Illness,” Social Science & Medicine, Vol. 61, No. 8, 2005, pp. 1821-1827.

[10]   W. Madeira, “Transforming Needed: Changes in Power Relationships Established between Doctor and Patient,” Ph.D. Thesis, Universidade de S?o Paulo, S?o Paulo, 2011.

[11]   Y. Zhao, G. Karypis and U. Fayyad, “Hierarchical Clus- tering Algorithms for Document Datasets,” Data Mining and Knowledge Discovery, Vol. 10, No. 2, 2005, pp. 141- 168.

[12]   I.-S. Kang, S.-H. Na, J. Kim and J.-H. Lee, “Cluster- Based Patent Retrieval,” Information Processing & Man- agement, Vol. 43, No. 5, 2007, pp. 1173-1182.

[13]   C. Carpineto, S. Osiński, G. Romano and D. Weiss, “A Survey of Web Clustering Engines,” ACM Computing Surveys, Vol. 41, No. 3, 2009, p. 17.

[14]   B. Liu, “Unsupervised Learning,” Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd Edition, Springer Berlin Heidelberg, 2011, pp. 133-166.

[15]   C. Kadushin, “Understanding Social Networks: Theories, Concepts, and Findings,” Oxford University Press, Oxford, 2012.

[16]   M. Shmueli-Scheuer, H. Roitman, D. Carmel, Y. Mass and D. Konopnicki, “Extracting User Profiles from Large Scale Data,” Proceedings of the 2010 Workshop on Mas- sive Data Analytics on the Cloud, Article no. 4, ACM, New York, 2010.

[17]   Q. Zhao, P. Mitra and B. Chen, “Temporal and Informa- tion Flow Based Event Detection from Social Text Streams,” Proceedings of the 22nd National Conference on Artificial Intelligence, AAAI’07, Vol. 2, AAAI Press, Vancouver, 2007, pp. 1501-1506.

[18]   E. Davoodi, M. Afsharchi and K. Kianmehr, “A Social Networkbased Approach to Expert Recommendation System,” Proceedings of the 7th International Conference on Hybrid Artificial Intelligent Systems (HAIS’12), Vol. Part I, Springer Berlin Heidelberg, 2012, pp. 91-102.

[19]   D. B. F. Carvalho, R. M. Marcacini, C. J. P. Lucena and S. O. Rezende, “Towards a Process to Support Solving the Content Selection Problem from Online Community Forums,” Proceedings of the Brazilian Workshop on Social Network Analysis and Mining, XXXII Congress of the Brazilian Computer Society Computer Society, Curitiba, 2012.

[20]   C. C. Aggarwal and C. Zhai, “A Survey of Text Clustering Algorithms,” In: C. C. Aggarwal and C. Zhai, Eds., Mining Text Data, Springer, US, 2012, pp. 77-128.

[21]   M. Marcacini and S. O. Rezende, “Torch: A Tool for Building Topic Hierarchies from Growing Text Collection,” Proceeding of WTA’2010: IX Workshop on Tools and Applications, 8th Brazilian Symposium on Multimedia and the Web (Webmedia), Belo Horizonte, 2010, pp. 133-135.

[22]   M. Porter, “The Porter Stemming Algorithm,” 2009.

[23]   B. M. Nogueira, M. F. Moura, M. S. Conrado, R. G. Rossi, R. M. Marcacini and S. O. Rezende, “Winning Some of the Document Preprocessing Challenges in a Text Mining Process,” Proceedings of IV Workshop on Algorithms and Data Mining Applications, XXIV Brazilian Symposium on Database, Porto Alegre, 2008, pp. 10-18.

[24]   A. Banks, “State of the Internet in Brazil,” 2011.