JIS  Vol.8 No.1 , January 2017
Web Search Query Privacy, an End-User Perspective
Abstract: While search engines have become vital tools for searching information on the Internet, privacy issues remain a growing concern due to the technological abilities of search engines to retain user search logs. Although such capabilities might provide enhanced personalized search results, the confidentiality of user intent remains uncertain. Even with web search query obfuscation techniques, another challenge remains, namely, reusing the same obfuscation methods is problematic, given that search engines have enormous computation and storage resources for query disambiguation. A number of web search query privacy procedures involve the cooperation of the search engine, a non-trusted entity in such cases, making query obfuscation even more challenging. In this study, we provide a review on how search engines work in regards to web search queries and user intent. Secondly, this study reviews material in a manner accessible to those outside computer science with the intent to introduce knowledge of web search engines to enable non-computer scientists to approach web search query privacy innovatively. As a contribution, we identify and highlight areas open for further investigative and innovative research in regards to end-user personalized web search privacy—that is methods that can be executed on the user side without third party involvement such as, search engines. The goal is to motivate future web search obfuscation heuristics that give users control over their personal search privacy.
Cite this paper: Mivule, K. (2017) Web Search Query Privacy, an End-User Perspective. Journal of Information Security, 8, 56-74. doi: 10.4236/jis.2017.81005.

[1]   Zheleva, E.G. (2011) Privacy in Social Networks: A Survey. In: Social Network Data Analytics, 277-306.

[2]   Gotz, M., Machanavajjhala, A., Wang, G., Xiao, X. and Gehrke, J. (2012) Publishing Search Logs—A Comparative Study of Privacy Guarantees. IEEE Transactions on Knowledge and Data Engineering, 24, 520-532.

[3]   Chen, T., Boreli, R., Kaafar, M.A. and Friedman, A. (2014) On the Effectiveness of Obfuscation Techniques in Online Social Networks. In: Privacy Enhancing Technologies, Vol. 8555 LNCS, 42-62.

[4]   Ruiz-Martínez, A. (2012) A Survey on Solutions and Main Free Tools for Privacy Enhancing Web Communications. Journal of Network and Computer Applications, 35, 1473-1492.

[5]   Toch, E., Wang, Y. and Cranor, L.F. (2012) Personalization and Privacy: A Survey of Privacy Risks and Remedies in Personalization-Based Systems. User Modeling and User-Adapted Interaction, 22, 203-220.

[6]   Pang, H., Xiao, X. and Shen, J. (2012) Obfuscating the Topical Intention in Enterprise Text Search. IEEE 28th International Conference on Data Engineering (ICDE), 1-5 April 2012, 1168-1179.

[7]   Hillard, D., Schroedl, S., Manavoglu, E., Raghavan, H. and Leggetter, C. (2010) Improving ad Relevance in Sponsored Search. In: Proceedings of the third ACM International Conference on Web Search and Data Mining—WSDM’10, ACM, New York, 361-370.

[8]   Cooper, A. (2008) A Survey of Query Log Privacy-Enhancing Techniques from a Policy Perspective. ACM Transactions on the Web, 2, 1-27.

[9]   Arrington, M. (2006) AOL Proudly Releases Massive Amounts of Private Data.

[10]   Barbaro, M. and Zeller Jr., T. (2006) A Face Is Exposed for AOL Searcher No. 4417749. The New York Times, p. C4.

[11]   Gao, X., Yang, Y., Fu, H., Lindqvist, J. and Wang, Y. (2014) Private Browsing: An Inquiry on Usability and Privacy Protection. Proceedings of the 13th Workshop on Privacy in the Electronic Society, Scottsdale, 3 November 2014, 97-106.

[12]   Danezis, G. and Diaz, C. (2008) A Survey of Anonymous Communication Channels.

[13]   Reed, M.G., Syverson, P.F. and Goldschlag, D.M. (1998) Anonymous Connections and Onion Routing. IEEE Journal on Selected Areas in Communications, 16, 482-494.

[14]   Ren, J. and Wu, J. (2010) Survey on Anonymous Communications in Computer Networks. Computer Communications, 33, 420-431.

[15]   Gordon, M. and Pathak, P. (1999) Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines. Information Processing and Management, 35, 141-180.

[16]   Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30, 107-117.

[17]   Ozcan, R., Altingovde, I.S., Cambazoglu, B.B., Junqueira, F.P. and Ulusoy, Ö. (2011) A Five-Level Static Cache Architecture for Web Search Engines. Information Processing & Management, 48, 828-840.

[18]   Chakrabarti, S., Berg, M. and Dom, B. (1999) Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Computer Networks, 31, 1623-1640.

[19]   Barroso, L.A., Dean, J. and Holzle, U. (2003) Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, 23, 22-28.

[20]   Levene, M. (2010) An Introduction to Search Engines and Web Navigation. Wiley, Hoboken.

[21]   Jansen, B.J. and Molina, P.R. (2006) The Effectiveness of Web Search Engines for Retrieving Relevant Ecommerce Links. Information Processing & Management, 42, 1075-1098.

[22]   Chen, Y., Pavlov, D., Canny, J.F. and Ave, H. (2009) Large-Scale Behavioral Targeting Categories and Subject Descriptors. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 28 June-1 July 2009, 209-218.

[23]   Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L. and Zhang, T. (2009) Classifying Search Queries Using the Web as a Source of Knowledge. ACM Transactions on the Web, 3, Article No. 5.

[24]   Nunney, M. (2012) SEO Made Simple, Wordtracker’s Free SEO Guide.

[25]   Speretta, M. and Gauch, S. (2005) Personalized Search Based on User Search Histories. Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne, 19-22 September 2005, 622-628.

[26]   Sugiyama, K., Hatano, K. and Yoshikawa, M. (2004) Adaptive Web Search Based on User Profile Constructed without Any Effort from Users. Proceedings of the 13th Conference on World Wide Web, New York, 17-20 May 2004, 675-684.

[27]   Baeza-Yates, R. and Ribeiro-Neto, B. (1999) Modern Information Retrieval. ACM Press, New York.

[28]   Lee, W.M. and Sanderson, M., (2010) Analyzing URL Queries. Journal of the Association for Information Science and Technology, 61, 2300-2310.

[29]   Broder, A. (2002) A Taxonomy of Web Search. ACM SIGIR Forum, 36, 3-10.

[30]   Ullah, M.Z. and Aono, M. (2014) Query Subtopic Mining for Search Result Diversification. IEEE International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), Vol. 1, Bandung, 20-21 August 2014, 309-314.

[31]   Zamora, J., Mendoza, M. and Allende, H. (2014) Query Intent Detection Based on Query Log Mining. Journal of Web Engineering, 13, 24-52.

[32]   Egozi, O., Markovitch, S. and Gabrilovich, E. (2008) Concept-Based Information Retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29, Article No. 8.

[33]   De Luca, E.W. and Scheel, C. (2013) Disambiguate Yourself. Translation: Computation, Corpora, Cognition, 3, 75-86.

[34]   Demidova, E., Zhou, X., Oelze, I. and Nejdl, W. (2010) Evaluating Evidences for Keyword Query Disambiguation in Entity Centric Database Search. 21th International Conference on Database and Expert Systems Applications, Bilbao, 30 August-3 September 2010, 240-247.

[35]   Pound, J. and Hudek, A.K. (2012) Interpreting Keyword Queries over Web Knowledge Bases. Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 29 October-2 November 2012, 305-314.

[36]   Liu, B. (2007) Web Data Mining. Springer-Verlag, Berlin Heidelberg.

[37]   Croft, W.B. and Wei, X. (2005) Context-Based Topic Models for Query Modification.

[38]   Hristidis, V. (2009) Natural Language Queries Information Discovery on Electronic Health Records. CRC Press, Boca Raton.

[39]   Liu, Y., Song, R., Zhang, M., Dou, Z., Yamamoto, T., Kato, M., Ohshima, H. and Zhou, K. (2014) Overview of the NTCIR-11 IMine Task. Proceedings of the 11th NTCIR Conference, Vol. 14, Tokyo, 9-12 December 2014, 8-23.

[40]   Luo, C., Liu, Y., Zhang, M. and Ma, S. (2014) Query Ambiguity Identification Based on User Behavior Information. In: Jaafar, A., et al., Eds., Information Retrieval Technology, Springer International Publishing, Basel, 36-47.

[41]   Song, R., Luo, Z., Wen, J.-R., Yu, Y. and Hon, H.-W. (2007) Identifying Ambiguous Queries in Web Search. Proceedings of the 16th ACM International Conference on World Wide Web, Banff, 08-12 May 2007, 1169-1170.

[42]   Wu, H., Wu, W., Zhou, M., Chen, E., Duan, L. and Shum, H.-Y. (2014) Improving Search Relevance for Short Queries in Community Question Answering. Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, 24-28 February 2014, 43-52.

[43]   Mangold, C. (2007) A Survey and Classification of Semantic Search Approaches. International Journal of Metadata, Semantics and Ontologies, 2, 23.

[44]   Manning, C.D. and Schutze, H. (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge.

[45]   Guha, R., McCool, R. and Miller, E. (2003) Semantic Search. Proceedings of the 12th International Conference on World Wide Web, Budapest, 20-24 May 2003, 700-709.

[46]   Bonino, D., Corno, F., Farinetti, L., Bosca, A., Torino, P. and Duca, C. (2004) Ontology Driven Semantic Search. Transactions on Information Science and Applications, 1, 1597-1605.

[47]   Navigli, R. (2009) Word Sense Disambiguation. ACM Computing Surveys, 41, Article No. 10.

[48]   Miller, G.A. (1995) WordNet: A Lexical Database for English. Communications of the ACM, 38, 39-41.

[49]   Dang, V., Croft, W.B. and Croft, B. (2010) Query Reformulation Using Anchor Text. Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, 4-6 February 2010, 41-50.

[50]   Song, Y., Zhou, D. and He, L. (2012) Query Suggestion by Constructing Term-Transition Graphs. Proceedings of the 5th ACM International Conference on Web Search and Data Mining, Seattle, 8-12 February 2012, 353-362.

[51]   Gupta, M. and Bendersky, M. (2015) Information Retrieval with Verbose Queries. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, 9-13 August 2015, 1121-1124.

[52]   Bing, L., Lam, W., Wong, T.-L. and Jameel, S. (2015) Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context. ACM Transactions on Information Systems, 33, Article No. 6.

[53]   Huang, J. and Efthimiadis, E.N. (2009) Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs. Proceeding of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, 2-6 November 2009, 77-86.

[54]   Hannak, A., Sapiezynski, P., Kakhki, A.M., Krishnamurthy, B., Lazer, D., Mislove, A. and Wilson, C. (2013) Measuring Personalization of Web Search. Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, 13-17 May 2013, 527-537.

[55]   Liu, F., Yu, C. and Meng, W. (2004) Personalized Web Search for Improving Retrieval Effectiveness. IEEE Transactions on Knowledge and Data Engineering, 16, 28-40.

[56]   Campos, R., Al, J. and Jorge, A.M. (2011) Using Web Snippets and Web Query-Logs to Measure Implicit Temporal Intents in Queries to Cite This Version: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries. ACM SIGIR 2011 Workshop on Query Representation and Understanding, New York.

[57]   Lin, S., Jin, P., Zhao, X. and Yue, L. (2014) Exploiting Temporal Information in Web Search. Expert Systems with Applications, 41, 331-341.

[58]   Xu, Z., Liu, Y., Mei, L., Hu, C. and Chen, L. (2014) Generating Temporal Semantic Context of Concepts Using Web Search Engines. Journal of Network and Computer Applications, 43, 42-55.

[59]   Jansen, B.J., Booth, D.L. and Spink, A. (2007) Determining the User Intent of Web Search Engine Queries. Proceedings of the 16th ACM International Conference on World Wide Web, Banff, 8-12 May 2007, 1149-1150.

[60]   Ortiz-Cordova, A. and Jansen, B.J. (2012) Classifying Web Search Queries to Identify High Revenue Generating Customers. Journal of the American Society for Information Science and Technology, 63, 1426-1441.

[61]   Rose, D.E. and Levinson, D. (2004) Understanding User Goals in Web Search. Proceedings of the 13th ACM International Conference on World Wide Web, New York, 17-20 May 2004, 13-19.

[62]   Beitzel, S.M., Jensen, E.C., Frieder, O., Grossman, D., Lewis, D.D., Chowdhury, A. and Kolcz, A. (2005) Automatic Web Query Classification Using Labeled and Unlabeled Training Data. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, 15-19 August 2005, 581-582.

[63]   Agrawal, R., Yu, X., King, I. and Zajac, R. (2011) Enrichment and Reductionism: Two Approaches for Web Query Classification. Proceedings of 18th International Conference on Neural Information Processing, Vol. 7064, Shanghai, 13-17 November 2011, 148-157.

[64]   Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V. and Zhang, O. (2007) Robust Classification of Rare Queries Using Web Knowledge. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, 23-27 July 2007, 231-238.

[65]   Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E. and Yang, Q. (2009) Context-Aware Query Classification. The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, 19-23 July 2009, 3-10.