Comparative Analysis of ARIMA and Random Forest Models for Forecasting COVID-19 Cases in China

Authors

  • Ran Chen Department of Applied Mathematics, Case Western Reserve University, Cleveland, United States

DOI:

https://doi.org/10.54097/bga9mp76

Keywords:

COVID-19 forecasting, ARIMA model, random forest model, time series, machine learning.

Abstract

The COVID-19 pandemic has had a profound global impact on public health systems and global economic situation, making accurate forecasting of infection cases crucial for formulating effective intervention strategies. This study systematically compares the performance of the ARIMA time series model and the Random Forest machine learning model in predicting daily COVID-19 cases in China from 2020 to 2022. The data was partitioned into training and testing sets for model development and evaluation. Results indicate that the Random Forest model significantly outperforms the ARIMA model across all evaluated metrics, including residual mean, standard deviation, and key error indicators, demonstrating a superior ability to capture the timing and amplitude of infection peaks and troughs. Therefore, the value of this study lies in providing clear empirical evidence for model selection in epidemic prediction, indicating that in the face of complex epidemic data, machine learning models may be more reliable than traditional time series methods.

Downloads

Download data is not yet available.

References

[1] Wang L, et al. A comparative analysis of time series and machine learning models for COVID-19 forecasting. Journal of Medical Systems, 2022, 46 (4): 25.

[2] Chen J, Li K. Forecasting the COVID-19 pandemic: A comparative study of ARIMA and LSTM models. Journal of Healthcare Informatics Research, 2020, 4 (3): 210-225.

[3] Liu Y, Wang Z. Machine learning approaches for epidemic prediction: A case study of COVID-19. IEEE Transactions on Computational Social Systems, 2021, 8 (4): 890-901.

[4] Zhao X, Li X. Predicting COVID-19 outbreaks with random forest and mobility data. Scientific Reports, 2021, 11: 17921.

[5] Box G E P, Jenkins G M. Time series analysis: Forecasting and control. San Francisco: Holden-Day, 1970.

[6] Breiman L. Random forests. Machine Learning, 2001, 45 (1): 5-32. DOI: https://doi.org/10.1023/A:1010933404324

[7] Zhou L, et al. Evaluating the performance of ensemble methods in epidemic forecasting: Lessons from COVID-19. BMC Medical Informatics and Decision Making, 2022, 22 (1): 98.

[8] Petropoulos F, Makridakis S. Forecasting the novel coronavirus COVID-19. PLOS ONE, 2020, 15 (3): e0231236. DOI: https://doi.org/10.1371/journal.pone.0231236

[9] Hyndman R J, Athanasopoulos G. Forecasting: principles and practice. 2nd ed. Melbourne: OTexts, 2018. DOI: https://doi.org/10.32614/CRAN.package.fpp2

[10] Zhang G P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 2003, 50: 159-175. DOI: https://doi.org/10.1016/S0925-2312(01)00702-0

Downloads

Published

27-12-2025

How to Cite

Chen, R. (2025). Comparative Analysis of ARIMA and Random Forest Models for Forecasting COVID-19 Cases in China. Highlights in Business, Economics and Management, 65, 851-855. https://doi.org/10.54097/bga9mp76