Prediction of marketing strategies performance based on clickstream data

Document Type : conference paper


Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, Tehran, Iran


Today, Internet-based businesses are one of the most useful tools to make gain in the economies of developing and developed countries. It can even said that the expansion of the World Wide Web caused other businesses to seek customers in the virtual advertising and online world to increase their sales. This study presents a data-driven approach to predict the success of the marketing strategies performance of an online shopping store. The data has been collected by a Poland online shopping website in the year 2008, which has extracted in the UCI datasets. In the data preparation phase, a decision tree (DT) is developed and 13 features of customers are selected for modeling phase. In the proposed method in this research, the rminer package of R software is used. In which three classification models including neural network(NN), support vector machine (SVM), and logistic regression(LR) are developed. Then, two criteria of AUC and ROC curves are used to compare these three models. By comparing the models, it is determined that the NN technique works better than the other three models in prediction. This result can be helpful for marketing managers to plan effectively in website design to attract new visitors and shoppers.


Babcock, B., Babu, S., Datar, M., Motwani, R., & Widom, J. (2002). Models and Issues in Data Stream Systems" Proceedings of the twenty-first symposium on Principles of database systems, 1–16,
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification And Regression Trees (Routledge Ed), 1st ed.
Bucklin, R. E., & Sismeiro, C. (2009). Click Here for Internet Insight: Advances in Clickstream Data Analysis in Marketing. Journal of Interactive Marketing, 23(1), 35-48. doi:
Chompaisal, S., Amphawan, K., & Surarerks, A.(2014). Mining N-most Interesting Multi-level Frequent Itemsets without Support Threshold. Paper presented at the Recent Advances in Information and Communication Technology, Cham.
Cleger-Tamayo, S., Fernández-Luna, J. M., & Huete, J. F. (2012). Top-N news recommendations in digital newspapers. Knowledge-Based Systems, 27, 180-189. doi:
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/BF00994018.
Cortez, P. (2010). Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool. Paper presented at the Advances in Data Mining. Applications and Theoretical Aspects, Berlin, Heidelberg.
Hastie, T., Tibshirani, R., Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.): Springer-Verlag, NY, USA.
Huynha, H. M., Nguyenb, L. T. T., Voc, B., Nguyend. A., Tseng, V. S. (2020), Efficient methods for mining weighted clickstream patterns. Expert Systems with Applications, 142, 112993. doi:
Kawaf, F., & Istanbulluoglu, D. (2019). Online fashion shopping paradox: The role of customer reviews and facebook marketing. Journal of Retailing and Consumer Services, 48, 144-153. doi:
Kelly, G. A. (1955). The psychology of personal constructs, (Vol. 1).
Koehn, D., Lessmann, S., & Schaal, M. (2020). Predicting online shopping behaviour from clickstream data using deep learning. Expert Systems with Applications, 150, 113-342. doi:
Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences. Expert Systems with Applications, 36(3, Part 1), 4382-4386. doi:
Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4), 525-533. doi:
Nasraoui, O., Cardona, C., Rojas, C., & Gonz'alez, F. (2003). Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm.
Platt, J. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in Kernel Methods-Support Vector Learning, 208.
Venables, W. N., Ripley, B.D. (2003). Modern Applied Statistics with S (4th ed.).
Xia, Y., Liu, C., Da, B., Xie, F. (2018). A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems With Applications, 93, 182-199. doi: 10.1016/j.eswa.2017.10.022
Zeng, J., Zhang, S., & Wu, C. (2008). A framework for WWW user activity analysis based on user interest. Knowledge-Based Systems, 21(8), 905-910. doi:
Zhao, X., Niu, Z., & Chen, W. (2013). Interest before liking: Two-step recommendation approaches. Knowledge-Based Systems, 48, 46-56. doi: