Predicting Probable Product Swaps in Customer Behaviour: An In-depth Analysis of Forecasting Techniques, Factors Influencing Decisions, and Implications for Business Strategies

Mohit M Rao; Virendra Kumar` Shrivastava

doi:10.4108/eetsis.4049

Authors

Mohit M Rao Alliance University
Virendra Kumar` Shrivastava Alliance University

DOI:

https://doi.org/10.4108/eetsis.4049

Keywords:

Prediction, Product swap, Feature Selection, Random Forest, ranking, chi-square test, Support Vector Machine, Machine Learning, Artificial Intelligence

Abstract

Introduction: Factors influencing product swap requests and predict the likelihood of such requests, focusing on product usage, attributes, and customer behaviour, particularly in the IT industry.

Objectives: Analyse customer and product data from a leading IT company, aiming to uncover insights and determinants of swap requests

Methods: Gather product and customer data, perform data processing, and employ machine learning methods such as Random Forest, Support Vector Machine, and Naive Bayes to discern the variables influencing product swap requests and apply them for classification purposes.

Results: Analysed a substantial dataset, comprising 320K product purchase requests and 30K swap requests from a prominent social media company. The dataset encompasses 520 attributes, encompassing customer and product details, usage data, purchase history, and chatter comments related to swap requests. The study compared Random Forest, Support Vector Machine, and Naïve Bayes models, with Random Forest fine-tuned for optimal results and feature importance identified based on F1 scores to understand attribute relevance in swap requests.

Conclusion: Evaluated three algorithms: support vector machine, naive Bayes, and Random Forest. The Random Forest, fine-tuned based on feature importance, yielded the best results with an accuracy of 0.83 and an F1 score of 0.86.

References

Rao, M. M. (2022). Transformation story of a new manager!. India: Amazon Digital Services LLC – kdp.

https://www.linkedin.com/business/sales/blog/modern-selling/infographic-great-reshuffle-affect-on-selling. Accessed 22 Nov 2022

Sikandar, M. A., & Rahman, P. F. (2021). Edtech Start-ups in the education ecosystem in the post-Covid-19 era in India. Towards Excellence: Journal of Higher Education, UGC-HRDC, Gujarat University, India.

Urbanke, P., Kranz, J., & Kolbe, L. (2015). Predicting product returns in e-commerce: the contribution of mahalanobis feature extraction.

Parra-Frutos, I. (2009). The behaviour of the modified Levene’s test when data are not normally distributed. Computational Statistics, 24(4), 671-693.

Kedia, S., Madan, M., & Borar, S. (2019). Early bird catches the worm: Predicting returns even before purchase in fashion E-commerce. arXiv preprint arXiv:1906.12128.

Bonifield, C., Cole, C., & Schultz, R. L. (2010). Product returns on the internet: a case of mixed signals. Journal of Business Research, 63(9- 10), 1058-1065.

Harris, L. C. (2010). Fraudulent consumer returns: exploiting retailers' return policies. European Journal of Marketing.

Chen, J., & Bell, P. C. (2009). The impact of customer returns on pricing and order decisions. European Journal of Operational Research, 195(1), 280-295.

Ma, J., & Kim, H. M. (2016). Predictive model selection for forecasting product returns. Journal of Mechanical Design, 138(5), 054501.

Al Imran, A., & Amin, M. N. (2020). Predicting the return of orders in the e-tail industry accompanying with model interpretation. Procedia Computer Science, 176, 1170-1179.

Zhu, Y., Li, J., He, J., Quanz, B. L., & Deshpande, A. A. (2018, July). A Local Algorithm for Product Return Prediction in E-Commerce. In IJCAI (pp. 3718-3724).

Zhongguo, Y., Hongqi, L., Ali, S., & Yile, A. (2017). Choosing classification algorithms and its optimum parameters based on data set characteristics. Journal of Computers, 28(5), 26-38.

Wang, L. (Ed.). (2005). Support vector machines: theory and applications (Vol. 177). Springer Science & Business Media.

Liao, J., & Bai, R. (2008, December). Study on the performance support vector machine by parameter optimized. In International Conference on Advanced Software Engineering and Its Applications (pp. 79-92). Springer, Berlin, Heidelberg.

Bartlett, P., & Shawe-Taylor, J. (1999). Generalization performance of support vector machines and other pattern classifiers. Advances in Kernel methods—support vector learning, 43-54.

Gündüz, Y., & Uhrig-Homburg, M. (2011). Predicting credit default swap prices with financial and pure data-driven approaches. Quantitative Finance, 11(12), 1709-1727.

Do, T. N., Lenca, P., Lallich, S., & Pham, N. K. (2010). Classifying very-high-dimensional data with random forests of oblique decision trees. In Advances in knowledge discovery and management (pp. 39- 55). Springer, Berlin, Heidelberg.

Te Beest, D. E., Mes, S. W., Wilting, S. M., Brakenhoff, R. H., & van de Wiel, M. A. (2017). Improved high-dimensional prediction with random forests by the use of co-data. BMC bioinformatics, 18(1), 1- 11.

Kursa, M. B., & Rudnicki, W. R. (2011). The all relevant feature selection using random forest. arXiv preprint arXiv:1106.5112.

Shrivastava, V.K., Shrivastava, A., Sharma, N., Mohanty, S.N., & Pattanaik, C.R. (2022). Deep learning model for temperature prediction: an empirical study. Model. Earth Syst. Environ.

Shrivastava, V. K., Kumar, A., Shrivastava, A., Tiwari, A., Thiru, K., & Batra, R. (2021, August). Study and trend prediction of Covid-19 cases in India using deep learning techniques. In Journal of Physics: Conference Series (Vol. 1950, No. 1, p. 012084). IOP Publishing.

Batra, R., Mahajan, M., Shrivastava, V. K., & Goel, A. K. (2021). Detection of COVID-19 Using Textual Clinical Data: A Machine Learning Approach. In Impact of AI and Data Science in Response to Coronavirus Pandemic (pp. 97-109). Springer, Singapore.

Saini, V., Rai, N., Sharma, N., & Shrivastava, V. K. (2022, December). A Convolutional Neural Network Based Prediction Model for Classification of Skin Cancer Images. In International Conference on Intelligent Systems and Machine Learning (pp. 92-102). Cham: Springer Nature Switzerland.

Batra, R., Shrivastava, V. K., & Goel, A. K. (2021). Anomaly Detection over SDN Using Machine Learning and Deep Learning for Securing Smart City. In Green Internet of Things for Smart Cities (pp. 191-204). CRC Press.

Saini, V., Rai, N., Sharma, N., & Shrivastava, V. K. (2022, December). A Convolutional Neural Network Based Prediction Model for Classification of Skin Cancer Images. In International Conference on Intelligent Systems and Machine Learning (pp. 92-102). Cham: Springer Nature Switzerland.

Singhal, A., Phogat, M., Kumar, D., Kumar, A., Dahiya, M., & Shrivastava, V. K. (2022). Study of deep learning techniques for medical image analysis: A review. Materials Today: Proceedings, 56, 209-214.

Lalli, K., Shrivastava, V. K., & Shekhar, R. (2023, April). Detecting Copy Move Image Forgery using a Deep Learning Model: A Review. In 2023 International Conference on Artificial Intelligence and Applications (ICAIA) Alliance Technology Conference (ATCON-1) (pp. 1-7). IEEE.

Streiner, D. L., & Norman, G. R. (2006). “Precision” and “accuracy”: two terms that are neither. Journal of clinical epidemiology, 59(4), 327- 330.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22.

Rykov, V. V., Balakrishnan, N., & Nikulin, M. S. (Eds.). (2010). Mathematical and statistical models and methods in reliability: applications to medicine, finance, and quality control. Springer Science & Business Media.

Yates, F. (1934). Contingency tables involving small numbers and the χ 2 test. Supplement to the Journal of the Royal Statistical Society, 1(2), 217-235.

Rana, R., & Singhal, R. (2015). Chi-square test and its application in hypothesis testing. Journal of the Practice of Cardiovascular Sciences, 1(1), 69.

Hoey, J. (2012). The two-way likelihood ratio (G) test and comparison to two-way chi squared test. arXiv preprint arXiv:1206.4881.

Dey, A., Mondal, S., & Pal, T. (2019). Robust and minimum spanning tree in fuzzy environment. International Journal of Computing Science and Mathematics, 10(5), 513-524.

Mohanta, K., Dey, A., Pal, A., Long, H. V., & Son, L. H. (2020). A study of m-polar neutrosophic graph with applications. Journal of Intelligent & Fuzzy Systems, 38(4), 4809-4828.

Mohanta, K., Dey, A., & Pal, A. (2021). A note on different types of product of neutrosophic graphs. Complex & Intelligent Systems, 7, 857-871.

Deli, I., Long, H. V., Son, L. H., Kumar, R., & Dey, A. (2020). New expected impact functions and algorithms for modeling games under soft sets. Journal of Intelligent & Fuzzy Systems, 39(3), 4463-4472.

Dey, A., Agarwal, A., Dixit, P., Long, H. V., Werner, F., Pal, T., & Son, L. H. (2019). A genetic algorithm for total graph coloring. Journal of Intelligent & Fuzzy Systems, 37(6), 7831-7838.

Khatri, I., & Shrivastava, V. K. (2016). A survey of big data in healthcare industry. In Advanced Computing and Communication Technologies (pp. 245-257). Springer, Singapore.

Sethi, R., Traverso, M., Sundstrom, D., Phillips, D., Xie, W., Sun, Y., Berner, C. (2019, April). Presto: SQL on everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE) (pp. 1802- 1813). IEEE.

Shaikh, E., Mohiuddin, I., Alufaisan, Y., & Nahvi, I. (2019, November). Apache spark: A big data processing engine. In 2019 2nd IEEE Middle East and North Africa COMMunications Conference (MENACOMM) (pp. 1-6). IEEE.

Swarna, C., & Ansari, Z. (2017). Apache Pig-a data flow framework based on Hadoop Map Reduce. International Journal of Engineering Trends and Technology (IJETT), 50(5), 271-275.

Jankatti, S., Raghavendra, B. K., Raghavendra, S., & Meenakshi, M. (2020). Performance evaluation of Map-reduce jar pig hive and spark with machine learning using big data. International Journal of Electrical and Computer Engineering, 10(4), 3811.

Martinez, A. M., & Kak, A. C. (2001). Pca versus lda. IEEE transactions on pattern analysis and machine intelligence, 23(2), 228- 233.

Shereena, V. B., & David, J. M. (2015). Comparative Study of Dimensionality Reduction Techniques Using PCA and LDA for Content Based Image Retrieval. Computer Science & Information Technology, 41.

Chavent, M., Kuentz-Simonet, V., Labenne, A., & Saracco, J. (2014). Multivariate analysis of mixed data: The R package PCAmixdata. arXiv preprint arXiv:1411.4911.

Hryhorzhevska, A., Wiewiórka, M., Okoniewski, M., & Gambin, T. (2017). Scalable framework for the analysis of population structure using the next generation sequencing data. In Foundations of Intelligent Systems: 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings 23 (pp. 471-480). Springer International Publishing.

Batra, R., Shrivastava, V. K., & Goel, A. K. (2021). Anomaly Detection over SDN Using Machine Learning and Deep Learning for Securing Smart City. In Green Internet of Things for Smart Cities (pp. 191-204). CRC Press.

Nagalla, R., Pothuganti, P., & Pawar, D. S. (2017). Analyzing gap acceptance behavior at unsignalized intersections using support vector machines, decision tree and random forests. Procedia Computer Science, 109, 474-481.

Ketkar, N. (2017). Stochastic gradient descent. In Deep learning with Python (pp. 113-132). Apress, Berkeley, CA.

Ye, Y., Wu, Q., Huang, J. Z., Ng, M. K., & Li, X. (2013). Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 46(3), 769-787

Singh, A., Halgamuge, M. N., & Lakshmiganthan, R. (2017). Impact of different data types on classifier performance of random forest, naive bayes, and k-nearest neighbors algorithms. International Journal of Advanced Computer Science and Applications, 8(12).

Ralaivola, L., & d’Alché-Buc, F. (2001, August). Incremental support vector machine learning: A local approach. In International conference on artificial neural networks (pp. 322-330). Springer, Berlin, Heidelberg.

Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., ... & Zupan, B. (2013). Orange: data mining toolbox in Python. the Journal of machine Learning research, 14(1), 2349-2353.

Ostertagova, E., Ostertag, O., & Kováč, J. (2014). Methodology and application of the Kruskal-Wallis test. In Applied Mechanics and Materials (Vol. 611, pp. 115-120). Trans Tech Publications Ltd.

Plackett, R. L. (1983). Karl Pearson and the chi-squared test. International statistical review/revue internationale de statistique, 59-72.