The World Wide Web is essential to general public nowadays. From a
data analysis viewpoint, it provides rich opportunities to gather
observational data on a large-scale. This paper focuses on modeling the
behavior of visitors to an academic website. Although the conventional
probability models, which were used by other literature for fitting in a commercial
web site, capture the power law behavior in our data, they fail to
capture other important features like the long tail. We propose a new model based on
the identities of the users. Qualitative and quantitative tests, which are used for
comparing the model fitting to our data, show that the new model outperforms
other two conventional probability models.
Cite this paper
F. Phoa and J. Sanchez, "Modeling the Browsing Behavior of World Wide Web Users," Open Journal of Statistics
, Vol. 3 No. 2, 2013, pp. 145-154. doi: 10.4236/ojs.2013.32016
 J. Srivastava, R. Cooley, D. Mujund and P. N. Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, Vol. 1, No. 2, 2000, pp. 12-23. doi:10.1145/846183.846188
 J. Sanchez and Y. He, “Internet Data Analysis for the Undergraduate Statistics Curriculum,” Journal of Statistics Education, Vol. 13, No. 3, 2005, pp. 1-20.
 M. Eirinaki and M. Vazirgiannis, “Web Mining for Web personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, 2003, pp. 1-27.
 S. Park, N. C. Suresh and B. Jeong, “Sequence-Based Clustering for Web Usage Mining: A New Experimental Framework and ANN-Enhanced K-Means Algorithm,” Data & Knowledge Engineering, Vol. 65, No. 3, 2008, pp. 512-543. doi:10.1016/j.datak.2008.01.002
 J. G. Dias and J. K. Vermunt, “Latent Class Modeling of Website Users’ Search Patterns: Implications for Online Market Segmentation,” Journal of Retailing and Consumer Services, Vol. 14, No. 6, 2007, pp. 359-368.
 P. Baldi, P. Frasconi and P. Smyth, “Modeling the Internet and the Web: Probabilistic Methods and Algorithms,” John Wiley and Sons Ltd., Hoboken, 2003.
 R. Sen and M. Hansen, “Predicting Web Users Next Access Based on Log Data,” Journal of Computational and Graphical Statistics, Vol. 12, No. 1, 2003, pp. 143-155.
 I. Cadez, D. Heckerman, C. Meek, P. Smyth and S. White, “Model-Based Clustering and Visualization of Navigation Patterns on a Web Site,” Journal of Data Mining and Knowledge Discovery, Vol. 7, No. 4, 2003, pp. 399-424.
 B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow and R. M. Lukose, “Strong Regularities in World Wide Web Surfing,” Science, Vol. 280, No. 3, 1998, pp. 95-97.
 D. Heckerman, “The UCI KDD Archive,” Department of Information and Computer Science, University of California, Oakland, 2013.
 J. Eason and J. Johannesen, “Meaningful Data from Web Logs,” Proceedings of the Twenty-Ninth Annual SAS Users Group International Conference (SUGI 29), SAS Institute Inc., Cary, 2004.
 J. Callender, “Perl for Web Site Management,” O’Reilly, Sebastopol, 2001.
 “Robots Database,” 2008.
 I. M. Chakravarti, R. G. Laha and J. Roy, “Handbook of Methods of Applied Statistics, Volume I,” John Wiley and Sons, Hoboken, 1967, pp. 392-394.
 J. M. Hilbe, “Negative Binomial Regression,” Cambridge University Press, Cambridge, 2007.
 V. Pareto, “Cours d’Economie Politique: Nouvelle Edition par G.-H. Bousquet et G. Busino,” Librairie Droz, Geneva, 1964, pp. 299-345.
 W. J. Reed and M. Jorgensen, “The Double Pareto-Lognormal Distribution—A New ParametricModel for Size Distributions,” Communications in Statistics: Theory and Methods, Vol. 33, No. 8, 2004, pp. 1733-1753.
 F. K. H. Phoa and W. C. Liu, “High-Quality Winners Take More: Modeling Non-Scale-Free Bulletin Forums with Content Variations,” Journal of Data Science, in Press, 2013.