A novel two-level clustering algorithm for time series group forecasting.

Document Type : Research Paper

Authors

Department Of Industrial Engineering. Faculty of Engineering. Ferdowsi University of Mashhad, Iran

Abstract

Parametric models are considered the widespread methods for time series forecasting. Non-parametric or machine learning methods have significantly replaced statistical methods in recent years. In this study, we develop a novel two-level clustering algorithm to forecast short-length time series datasets using a multi-step approach, including clustering, sliding window, and MLP neural network. In first-level clustering, the time series dataset in the training part is clustered. Then, we made a long time series by concatenating the existing time series in each cluster in the first level. After that, using a sliding window, every long-time series created in the previous step is restructured to equal-size sub-series and clustered in the second level. Applying an MLP network, a model has been fitted to final clusters. Finally, the test data distance is calculated with the center of the final cluster, selecting the nearest distance, and using the fitted model in that cluster, the final forecasting is done. Using the WAPE index, we compare the one-level clustering algorithm in the literature regarding the mean of answers and the best answer in a ten-time run. The results reveal that the algorithm could increase the WAPE index value in terms of the mean and the best solution by 8.78% and 5.24%, respectively. Also, comparing the standard deviation of different runs shows that the proposed algorithm could be further stabilized with a 3.24 decline in this index. This novel study proposed a two-level clustering for forecasting short-length time series datasets, improving the accuracy and stability of time series forecasting.

Keywords

Main Subjects


Abbasimehr, H., & Paki, R. (2022). Improving time series forecasting using LSTM and attention models. Journal of Ambient Intelligence and Humanized Computing, 13(1), 673-691.
Al-Hiary, H., Bani-Ahmad, S., Reyalat, M., Braik, M., & Alrahamneh, Z. (2011). Fast and accurate detection and classification of plant diseases. International Journal of Computer Applications, 17(1), 31-38.
Arias, M. B., & Bae, S. (2016). Electric vehicle charging demand forecasting model based on big data technologies. Applied Energy, 183, 327-339.
Astakhova, N. N., Demidova, L. A., & Nikulchev, E. V. (2015). Forecasting method for grouped time series with the use of k-means algorithm. arXiv preprint arXiv:1509.04705.
Bock, C. (2018). Forecasting energy demand by clustering smart metering time series. Paper presented at the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems.
Borghi, P. H., Zakordonets, O., & Teixeira, J. P. (2021). A COVID-19 time series forecasting model based on MLP ANN. Procedia Computer Science, 181, 940-947.
Boshnakov, G. N. (2016). Introduction to Time Series Analysis and Forecasting, Wiley Series in Probability and Statistics, by Douglas C. Montgomery, Cheryl L. Jennings and Murat Kulahci (eds). Published by John Wiley and Sons, Hoboken, NJ, USA, 2015. Total number of pages: 672 Hardcover: ISBN: 978-1-118-74511-3, ebook: ISBN: 978-1-118-74515-1, etext: ISBN: 978-1-118-74495-6. In: Wiley Online Library.
Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control: John Wiley & Sons.
Chen, I.-F., & Lu, C.-J. (2017). Sales forecasting by combining clustering and machine-learning techniques for computer retailing. Neural Computing and Applications, 28(9), 2633-2647.
Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., . . . Keogh, E. (2019). The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6), 1293-1305.
de Araújo Morais, L. R., & da Silva Gomes, G. S. (2022). Forecasting daily Covid-19 cases in the world with a hybrid ARIMA and neural network model. Applied Soft Computing, 126, 109315.
de Jesús Rubio, J. (2017). A method with neural networks for the classification of fruits and vegetables. Soft Computing, 21(23), 7207-7220.
Dong, X., Qian, L., & Huang, L. (2017). Short-term load forecasting in smart grid: A combined CNN and K-means clustering approach. Paper presented at the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).
Hajirahimi, Z., & Khashei, M. (2022). A Novel Parallel Hybrid Model Based on Series Hybrid Models of ARIMA and ANN Models. Neural Processing Letters, 1-19.
Hu, Y., & Xiao, F. (2022). A novel method for forecasting time series based on directed visibility graph and improved random walk. Physica A: Statistical Mechanics and its Applications, 594, 127029.
Huang, X., Ye, Y., Xiong, L., Lau, R. Y., Jiang, N., & Wang, S. (2016). Time series k-means: A new k-means type smooth subspace clustering for time series data. Information Sciences, 367, 1-13.
Islam, M., & Sivakumar, B. (2002). Characterization and prediction of runoff dynamics: a nonlinear dynamical view. Advances in water resources, 25(2), 179-190.
Kedia, V., Thummala, V., & Karlapalem, K. (2005). Time Series Forecasting through Clustering-A Case Study. Paper presented at the COMAD.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2004). Segmenting time series: A survey and novel approach. In Data mining in time series databases (pp. 1-21): World Scientific.
Kobylin, O., & Lyashenko, V. (2020). Time series clustering based on the k-means algorithm. Journal La Multiapp, 1(3), 1-7.
Koosha, H., Ghorbani, Z., & Nikfetrat, R. (2022). A Clustering-Classification Recommender System based on Firefly Algorithm. Journal of AI and Data Mining, 10(1), 103-116.
Lee, C.-H., Su, Y.-Y., Lin, Y.-C., & Lee, S.-J. (2017). Time series forecasting based on weighted clustering. Paper presented at the 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA).
Li, D., Zhao, Y., & Li, Y. (2019). Time-series representation and clustering approaches for sharing bike usage mining. IEEE access, 7, 177856-177863.
Li, P., Wu, W., & Pei, X. (2023). A separate modeling approach for short-term bus passenger flow prediction based on behavioral patterns: A hybrid decision tree method. Physica A: Statistical Mechanics and its Applications, 128567.
Lucas, J. M., & Saccucci, M. S. (1990). Exponentially weighted moving average control schemes: properties and enhancements. Technometrics, 32(1), 1-12.
Mai, S. D., Ngo, L. T., & Trinh, H. L. (2018). Satellite image classification based spatial-spectral fuzzy clustering algorithm. Paper presented at the Asian Conference on Intelligent Information and Database Systems.
Norwawi, N. M. (2021). Sliding window time series forecasting with multilayer perceptron and multiregression of COVID-19 outbreak in Malaysia. In Data Science for COVID-19 (pp. 547-564): Elsevier.
Panapakidis, I. P. (2016). Clustering based day-ahead and hour-ahead bus load forecasting models. International Journal of Electrical Power & Energy Systems, 80, 171-178.
Pant, M., & Kumar, S. (2022). Fuzzy time series forecasting based on hesitant fuzzy sets, particle swarm optimization and support vector machine-based hybrid method. Granular Computing, 7(4), 861-879.
Parmezan, A. R. S., & Batista, G. E. (2015). A study of the use of complexity measures in the similarity search process adopted by knn algorithm for time series prediction. Paper presented at the 2015 IEEE 14th international conference on machine learning and applications (ICMLA).
Parmezan, A. R. S., Souza, V. M., & Batista, G. E. (2019). Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model. Information Sciences, 484, 302-337.
Pérez, S. I., Moral-Rubio, S., & Criado, R. (2023). Combining multiplex networks and time series: A new way to optimize real estate forecasting in New York using cab rides. Physica A: Statistical Mechanics and its Applications, 609, 128306.
Polat, K. (2012). Classification of Parkinson's disease using feature weighting method on the basis of fuzzy C-means clustering. International Journal of Systems Science, 43(4), 597-609.
Said, A. A., Abd-Elmegid, L. A., Kholeif, S., & Gaber, A. A. (2018). Classification based on clustering model for predicting main outcomes of breast cancer using hyper-parameters optimization. International Journal of Advanced Computer Science and Applications, 9(12).
Sfetsos, A., & Siriopoulos, C. (2004). Combinatorial time series forecasting based on clustering algorithms and neural networks. Neural computing & applications, 13(1), 56-64.
Sinaga, K. P., & Yang, M.-S. (2020). Unsupervised K-means clustering algorithm. IEEE access, 8, 80716-80727.
Talkhi, N., Fatemi, N. A., Ataei, Z., & Nooghabi, M. J. (2021). Modeling and forecasting number of confirmed and death caused COVID-19 in IRAN: A comparison of time series forecasting methods. Biomedical Signal Processing and Control, 66, 102494.
Udler, M. S., Kim, J., von Grotthuss, M., Bonàs-Guarch, S., Cole, J. B., Chiou, J., . . . Atzmon, G. (2018). Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: a soft clustering analysis. PLoS medicine, 15(9), e1002654.
Weerakody, P. B., Wong, K. W., Wang, G., & Ela, W. (2021). A review of irregular time series data handling with gated recurrent neural networks. Neurocomputing, 441, 161-178.
Xu, S., Chan, H. K., Ch’ng, E., & Tan, K. H. (2020). A comparison of forecasting methods for medical device demand using trend-based clustering scheme. Journal of Data, Information and Management, 1-10.
Yu, B., Song, X., Guan, F., Yang, Z., & Yao, B. (2016). k-Nearest neighbor model for multiple-time-step prediction of short-term traffic condition. Journal of Transportation Engineering, 142(6), 04016018.
Yu, C., Wang, L., Zhao, J., Hao, L., & Shen, Y. (2020). Remote sensing image classification based on RBF neural network based on fuzzy C-means clustering algorithm. Journal of Intelligent & Fuzzy Systems, 38(4), 3567-3574.