Public Health Weekly Report 2025; 18(34): 1261-1276
Published online July 8, 2025
https://doi.org/10.56786/PHWR.2025.18.34.1
© The Korea Disease Control and Prevention Agency
Hyun-Kyung Kim 1
, Boyeong Ryu 2
, Min-Gyu Yoo 2
, Jaehoon Kim 2
, Kyung-Duk Min 3*
1Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Korea, 2Division of Disease Control Research Planning, Bureau of Department of Data Science, Korea Disease Control and Prevention Agency, Cheongju, Korea, 3College of Veterinary Medicine, Chungbuk National University, Cheongju, Korea
*Corresponding author: Kyung-Duk Min, Tel: +82-43-261-8393, E-mail: kdmin@chungbuk.ac.kr
This is an Open Access aritcle distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) which permits unrestricted distribution, and reproduction in any medium, provided the original work is properly cited.
Objectives: During the initial outbreak of coronavirus disease 2019 (COVID-19), numerous predictive studies were conducted amid high uncertainty regarding the characteristics of the virus, and the study results were considered in the policymaking process.
Methods: This study systematically analyzed research papers that predicted the spread of COVID-19 in the Republic of Korea. Focusing on 138 studies published between 2020 and October 15, 2024, it examined the data and methodologies employed and explored ways to enhance the utility of predictive outcomes in managing infectious disease outbreaks.
Results: These methodologies included mathematical models, statistical models, and machine learning–based approaches to predict COVID-19 spread patterns. Beyond forecasting future outbreak trends, these predictive models were also instrumental in evaluating existing measures and proposing effective policies through scenario-based assumptions.
Conclusions: This study’s findings highlight the importance of multidisciplinary collaboration in developing predictive models to effectively prepare for and respond to infectious diseases. By doing so, it aims to minimize the public health impacts of infectious diseases.
Key words Coronavirus disease 2019; Forecasting; Projection; Modelling
Predictive models for infectious diseases can be used to analyze epidemic patterns, assess policy effectiveness, and establish policy-based evidence.
During the pandemic, numerous prediction studies were conducted using methodologies from various fields such as mathematics, engineering, and sociology. Domestic coronavirus disease 2019 data were used to develop predictive models to forecast international epidemic trends.
Multidisciplinary collaboration is essential for refining predictive models to control the spread of infectious diseases. These models can be used in future preparedness and response plans for emerging infectious diseases.
The coronavirus disease 2019 (COVID-19), which was first detected in Wuhan, Hubei Province, China, in December 2019, rapidly spread across the globe, causing the World Health Organization to officially declare it a global pandemic on March 11, 2020. In the early stages of the outbreak, there was a high level of uncertainty regarding the epidemiological characteristics of the COVID-19 virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]), such as its routes of transmission and infectiousness. In response, studies using predictive models were conducted to forecast the transmission patterns and scale of COVID-19 and to evaluate the effectiveness of the various interventions implemented. These studies were instrumental in designing strategies to effectively control and respond to the spread of infectious diseases.
Predictions can be broadly divided into short-term forecasting and long-term projection. Long-term projection is primarily based on specific models or assumptions and aims to show long-term trends under certain scenarios, such as vaccination rates or behavioral changes. It is intended to elucidate potential outcomes resulting from specific assumptions. In contrast, short-term forecasting refers to attempts made to predict the future course of an infectious disease over a short time frame, typically spanning a few weeks, by providing specific and detailed estimates [1].
Short-term forecasting of infectious disease outbreaks is useful for understanding the current epidemic situation and preparing prompt responses. In contrast, scenario-based long-term projections are valuable because they support long-term decision-making and facilitate comparisons of outbreak trajectories under various conditions. Infectious disease forecasting provides scientific evidence for establishing public health systems and can contribute to the development of effective prevention policies as well as the preparation of proactive crisis response systems through outbreak prediction simulations. Since the proposal of the S (Susceptible)–I (Infected)–R (Recovered) compartmental model by British mathematicians Kermack and McKendrick in 1927 [2], extensive research has been conducted on infectious disease forecasting. In recent years, methodologies have expanded to include a variety of approaches from fields such as machine learning, deep learning, and network analysis. During the COVID-19 pandemic, major international organizations, such as the United States Centers for Disease Control and Prevention (CDC) and the European Centre for Disease Prevention and Control, also used predictive models to mitigate the spread of the disease and establish effective response strategies. The U.S. CDC, in particular, collaborated with more than 100 academic and research institutions during the pandemic to generate short-term forecasts of confirmed cases, hospitalizations, and deaths. These forecasts were published on CDC’s website and served as valuable insights for informing policy decision-making.
It is important to improve the accuracy of outbreak scale predictions to enable proactive preparation for future infectious disease events [3]. To support this effort, the present study conducted a literature review of domestic research papers on COVID-19 forecasting, with the aim of securing the necessary scientific evidence, understanding the current state of research, identifying existing needs, and establishing research priorities. Through this review, the study sought to explore effective strategies for managing infectious disease outbreaks and aimed to propose ways for enhancing the practical utility of forecasting results.
To understand the current state of COVID-19 forecasting research in the Republic of Korea (ROK), relevant literature was collected using PubMed, the most widely used international academic database in the field of health and medicine. The literature search was conducted until October 15, 2024, with no restrictions on the start date of the search period. Keywords used for the review were categorized into three main groups, “Korea,” “COVID-19,” and “Prediction,” and combinations of terms from these categories were used during the search. The keywords and query formulas were finalized through a review by the research team (Table 1).
| Topic | Search strategy |
|---|---|
| Republic of Korea | Korea[Title/Abstract] OR “South Korea”[Title/Abstract] OR “Republic of Korea”[Title/Abstract] |
| COVID-19 | covid[Title/Abstract] OR corona[Title/Abstract] OR sars-cov-2[Title/Abstract] OR COVID-19[Title/Abstract] |
| Prediction | predict*[Title/Abstract] OR forecast*[Title/Abstract] OR projecti*[Title/Abstract] OR model*[Title/Abstract] |
COVID-19=coronavirus disease 2019.
The Preferred Reporting Items for Systematic reviews and Meta-Analyses flow diagram was used to systematically describe the literature selection process, presenting each step of the selection procedure. The literature included studies that used domestic data to predict COVID-19 epidemiological indicators in ROK and were published in English. All other papers were excluded. Out of the 929 retrieved articles, 181 that matched the research topic were selected through title and abstract screening based on the inclusion and exclusion criteria. After a full-text review, a total of 138 research papers were included in the final review (Figure 1, Supplementary Table 1; available online).
A review of domestic literature on COVID-19 forecasting revealed that a total of 138 research studies were conducted over the past 5 years, from 2020 to October 15, 2024. There were a record number of publications immediately after the COVID-19 outbreak, with 49 studies published in 2020 and 42 in 2021. This number has since decreased, with 18 papers published in 2023 (Table 2). Furthermore, an analysis of the selected literature, which aided in identifying trends in research topics and methodologies of COVID-19 forecasting studies by year, revealed that mathematical models, such as compartment and agent-based models, were most frequently used, accounting for 70.3% (97 studies) of the papers. Statistical models, machine learning–based predictive models, and hybrid models that combine compartment models with machine learning approaches were also utilized (Table 3, Figure 2). A closer examination of the characteristics of the models used in domestic COVID-19 forecasting research showed that factors, such as research objectives, data availability, prediction accuracy, and policy applicability, were comprehensively considered when selecting models. Detailed descriptions, advantages and disadvantages, and key features of each model type are summarized in Table 4.
| Year | Published papers |
|---|---|
| 2020 | 49 |
| 2021 | 42 |
| 2022 | 23 |
| 2023 | 18 |
| 2024 | 6 |
| Total | 138 |
| Model | Published papers |
|---|---|
| Mathematical models | 97 |
| Statistical models | 20 |
| Machine learning and artificial intelligence-based models | 11 |
| Mixed models | 10 |
| Model type | Specific details | Strengths | Limitations | |
|---|---|---|---|---|
| Mathematical models | Compartmental models | Differential equation-based models (e.g., SIR, SEIR, SIRD) Population divided into S, E, I, R, D compartments Transition rates between compartments modeled as parameters | Intuitive reflection of biological mechanisms Epidemiologically meaningful parameters Can simulate intervention scenarios Predictive with few parameters | Assumes homogeneous population, reducing realism Uncertainty due to simplified parameter estimation Difficult to reflect spatial heterogeneity Sensitive to initial settings |
| Agent-based models | Models individual behaviors and interactions Defines agent attributes and behavioral rules Derives macro patterns from micro interactions | Captures individual-level heterogeneity Reflects spatial structures and network effects Analyzes micro-level impacts of interventions Represents realistic population structures | High computational complexity Difficult to validate due to many parameters Subjectivity in rule setting Scalability issues in large simulations | |
| Machine learning models | Long Short-Term Memory (LSTM) | Deep learning model specialized for sequential/time-series data Learns long-term dependencies in recurrent neural networks Selective memory through gate mechanisms Learns patterns from past infection data | High predictive accuracy Learns complex nonlinear patterns Capable of learning long-term dependencies Can handle multivariate time-series data No need for strict statistical assumptions | Lack of interpretability (black-box) Requires sufficient training data Complex hyperparameter tuning Limited generalizability to out-of-distribution scenarios |
| Gradient Boosting Machine (GBM) | Sequential combination of weak learners (e.g., decision trees) Builds the final predictive model by gradually reducing errors Utilizes various feature variables for prediction | High predictive accuracy Handles diverse data types Provides variable importance metrics | Complex hyperparameter tuning Limited parallelization due to sequential learning Limited in capturing temporal dependencies | |
| Statistical models | ARIMA, ARIMAX | Models autoregressive, differencing, and moving average components Predicts by normalizing non-stationary time series Uses Box-Jenkins methodology | Systematic approach based on statistical theory Parameters are statistically interpretable Quantifies uncertainty via confidence intervals Can model with small datasets | Models only linear relationships Poor adaptability to structural changes Subjectivity in order selection Reduced accuracy in long-term predictions |
| Statistical models | Exponential growth models | Models exponential increase in case counts Estimates spread speed using growth rate parameters | Simplicity and interpretability Rapid prediction in early spread stages Efficient modeling with few parameters Intuitive understanding of growth rates | Limited in reflecting complex spread patterns May overestimate in long-term predictions Difficult to incorporate intervention effects Does not reflect saturation effects |
| Hybrid models | Combines advantages of different modeling approaches Integration of compartmental models with machine learning or ensemble of statistical and machine learning models Multi-scale modeling approach | Overcomes limitations of single models Ensures predictive accuracy and interpretability Integrates diverse data sources Provides stable prediction performance | Increased model complexity and implementation difficulty Higher computational cost Complexity in determining weights among models Challenges in validation and interpretation | |
SIR=susceptible-infectious-recovered; SEIR=susceptible-exposed-infectious-recovered; SIRD=susceptible-infected-recovered-deceased; ARIMA=Autoregressive Integrated Moving Average Model; ARIMAX=Autoregressive Integrated Moving Average with Exogenous Variables.
Mathematical models typically categorize the total population into groups such as “Susceptible,” “Infected,” and “Recovered,” the classic S-I-R model design. However, these models have evolved to incorporate additional groups, such as the “Hospitalized,” “Vaccinated,” and “Quarantined” populations, resulting in various advanced models such as SEIHR, V-SEIR, SEIHQ, and SEIQRDV3P [4-7]. In studies that used machine learning and algorithmic models, epidemic trends were primarily predicted using techniques such as the Long Short-Term Memory and Gradient Boosting Machine models [8-11]. Among statistical models, time series models such as the Autoregressive Integrated Moving Average Model (ARIMA) and Autoregressive Integrated Moving Average with Exogenous Variables (ARIMAX) were most commonly used. Logistic growth models and various regression models were additionally employed [12-15]. In recent times, there has been a notable increase in the utilization of hybrid models that integrate methodologies from multiple disciplines for the purpose of infectious disease forecasting. Models, such as m-SIQRD and SIRVD-DL, which combine mathematical modeling with deep learning techniques, have been applied to predict the spread patterns of COVID-19 [10,16,17]. In the early stages of the COVID-19 outbreak, research primarily focused on short-term forecasting of confirmed case numbers by using data released by the Korea Disease Control and Prevention Agency (KDCA). Over time, however, the scope of predictive variables was expanded to include death counts, the number of critically ill patients, and the effective reproduction number (R2). Moreover, although several studies exclusively focused on predicting the spread of COVID-19 within ROK, a considerable number of comparative studies were also conducted to include other countries such as Italy, Hong Kong, and India [18-20]. In addition to studies that predicted epidemic trends using epidemiological indicators, many other studies evaluated the effects of interventions such as vaccination, social distancing, school closures, mask-wearing, and changes in individual behavior. Several also examined the influence of variations in population mobility, modeled through different scenarios, on the patterns of the outbreak [21-23]. Various types of mobility-related data, such as mobile carrier data and consumer behavior patterns, were used for forecasting. In addition, several indices, including the stringency index and the Oxford COVID-19 Government Response Tracker index, were incorporated to account for the intensity of social distancing measures [24].
The aim of the present study was to establish a foundation for more effective management of future infectious disease outbreaks by identifying trends in COVID-19 forecasting research conducted in ROK. As a result, it was found that forecasting studies on the spread of COVID-19 used a diverse range of data sources and methodologies. Approaches from other fields, such as machine learning, and hybrid models were also integrated and applied in addition to well-established mathematical models. In addition, numerous studies extended beyond simply forecasting trends in deaths, confirmed cases, or hospitalizations to evaluate the effectiveness of implemented policies in order to inform and support future decision-making processes.
A retrospective review of the studies analyzed showed that overall, medium- to long-term predictions (over a 1 month period) often differed from the actual epidemic trends owing to factors not considered at the time of prediction, such as the emergence of new variants, changes in public health policies, and shifts in social behaviors. It was specifically observed that the accuracy of the existing predictive models significantly decreased with each occurrence of virus mutation or alteration in characteristics. This limitation of infectious disease forecasting models highlights the importance of continuous model calibration and updates.
This study confirmed the impact of predictive models on scientific decision-making and the potential for expanding these models through multidisciplinary collaboration. Continuous cooperation among experts and relevant government agencies is essential for the ongoing advancement of infectious disease forecasting research. Specifically, by expanding the data and resources available for infectious disease forecasting through multidisciplinary efforts, it is expected that uncertainties in future outbreaks can be reduced, and more effective response systems can be established. The limitations identified in the study in regard to prediction accuracy also necessitate the development of ensemble forecasting that utilizes multiple methodologies instead of relying on a single model as well as the establishment of real-time model calibration systems. When using forecasting results, it is crucial to explicitly consider prediction uncertainties and to support decision-making with a range of scenarios. There is also a need to establish institutional frameworks to ensure effective utilization of research findings. To ensure that forecasting results are practically reflected in infectious disease response policies, it is crucial to establish strong connections between insights derived from predictive models and the implementation of actual public health policies. This mechanism facilitates health authorities to proactively respond based on scientific evidence and contribute to ensuring public safety.
The KDCA is currently developing an integrated risk analysis system to monitor both domestic and international public health threats, aiming to establish a comprehensive prevention and medical response system for nationwide and large-scale infectious disease pandemics. Standard operating procedures that detail risk assessments according to crisis levels and specific situations have already been established. Moreover, the KDCA has formulated mid- to long-term plans for the prevention of novel infectious disease pandemics by refining outbreak size predictions through methods, such as artificial intelligence modeling, and by formulating comprehensive research plans for implementation in various phases [3]. Future preparedness for infectious diseases is expected to improve by the continuous development of refined predictive models through multidisciplinary collaboration across fields such as mathematics, statistics, artificial intelligence, public health, and medicine, ultimately contributing to reducing the impact on public health.
Ethics Statement: Not applicable.
Funding Source: None.
Acknowledgments: None.
Conflict of Interest: The authors have no conflicts of interest to declare.
Author Contributions: Conceptualization: HKK, KDM. Data curation: HKK. Supervision: KDM. Validation: BYR, MGY, JHK. Visualization: HKK. Writing–original draft: HKK. Writing–review & editing: HKK, KDM.
Public Health Weekly Report 2025; 18(34): 1261-1276
Published online August 28, 2025 https://doi.org/10.56786/PHWR.2025.18.34.1
Copyright © The Korea Disease Control and Prevention Agency.
Hyun-Kyung Kim 1
, Boyeong Ryu 2
, Min-Gyu Yoo 2
, Jaehoon Kim 2
, Kyung-Duk Min 3*
1Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Korea, 2Division of Disease Control Research Planning, Bureau of Department of Data Science, Korea Disease Control and Prevention Agency, Cheongju, Korea, 3College of Veterinary Medicine, Chungbuk National University, Cheongju, Korea
Correspondence to:*Corresponding author: Kyung-Duk Min, Tel: +82-43-261-8393, E-mail: kdmin@chungbuk.ac.kr
This is an Open Access aritcle distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) which permits unrestricted distribution, and reproduction in any medium, provided the original work is properly cited.
Objectives: During the initial outbreak of coronavirus disease 2019 (COVID-19), numerous predictive studies were conducted amid high uncertainty regarding the characteristics of the virus, and the study results were considered in the policymaking process.
Methods: This study systematically analyzed research papers that predicted the spread of COVID-19 in the Republic of Korea. Focusing on 138 studies published between 2020 and October 15, 2024, it examined the data and methodologies employed and explored ways to enhance the utility of predictive outcomes in managing infectious disease outbreaks.
Results: These methodologies included mathematical models, statistical models, and machine learning–based approaches to predict COVID-19 spread patterns. Beyond forecasting future outbreak trends, these predictive models were also instrumental in evaluating existing measures and proposing effective policies through scenario-based assumptions.
Conclusions: This study’s findings highlight the importance of multidisciplinary collaboration in developing predictive models to effectively prepare for and respond to infectious diseases. By doing so, it aims to minimize the public health impacts of infectious diseases.
Keywords: Coronavirus disease 2019, Forecasting, Projection, Modelling
Predictive models for infectious diseases can be used to analyze epidemic patterns, assess policy effectiveness, and establish policy-based evidence.
During the pandemic, numerous prediction studies were conducted using methodologies from various fields such as mathematics, engineering, and sociology. Domestic coronavirus disease 2019 data were used to develop predictive models to forecast international epidemic trends.
Multidisciplinary collaboration is essential for refining predictive models to control the spread of infectious diseases. These models can be used in future preparedness and response plans for emerging infectious diseases.
The coronavirus disease 2019 (COVID-19), which was first detected in Wuhan, Hubei Province, China, in December 2019, rapidly spread across the globe, causing the World Health Organization to officially declare it a global pandemic on March 11, 2020. In the early stages of the outbreak, there was a high level of uncertainty regarding the epidemiological characteristics of the COVID-19 virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]), such as its routes of transmission and infectiousness. In response, studies using predictive models were conducted to forecast the transmission patterns and scale of COVID-19 and to evaluate the effectiveness of the various interventions implemented. These studies were instrumental in designing strategies to effectively control and respond to the spread of infectious diseases.
Predictions can be broadly divided into short-term forecasting and long-term projection. Long-term projection is primarily based on specific models or assumptions and aims to show long-term trends under certain scenarios, such as vaccination rates or behavioral changes. It is intended to elucidate potential outcomes resulting from specific assumptions. In contrast, short-term forecasting refers to attempts made to predict the future course of an infectious disease over a short time frame, typically spanning a few weeks, by providing specific and detailed estimates [1].
Short-term forecasting of infectious disease outbreaks is useful for understanding the current epidemic situation and preparing prompt responses. In contrast, scenario-based long-term projections are valuable because they support long-term decision-making and facilitate comparisons of outbreak trajectories under various conditions. Infectious disease forecasting provides scientific evidence for establishing public health systems and can contribute to the development of effective prevention policies as well as the preparation of proactive crisis response systems through outbreak prediction simulations. Since the proposal of the S (Susceptible)–I (Infected)–R (Recovered) compartmental model by British mathematicians Kermack and McKendrick in 1927 [2], extensive research has been conducted on infectious disease forecasting. In recent years, methodologies have expanded to include a variety of approaches from fields such as machine learning, deep learning, and network analysis. During the COVID-19 pandemic, major international organizations, such as the United States Centers for Disease Control and Prevention (CDC) and the European Centre for Disease Prevention and Control, also used predictive models to mitigate the spread of the disease and establish effective response strategies. The U.S. CDC, in particular, collaborated with more than 100 academic and research institutions during the pandemic to generate short-term forecasts of confirmed cases, hospitalizations, and deaths. These forecasts were published on CDC’s website and served as valuable insights for informing policy decision-making.
It is important to improve the accuracy of outbreak scale predictions to enable proactive preparation for future infectious disease events [3]. To support this effort, the present study conducted a literature review of domestic research papers on COVID-19 forecasting, with the aim of securing the necessary scientific evidence, understanding the current state of research, identifying existing needs, and establishing research priorities. Through this review, the study sought to explore effective strategies for managing infectious disease outbreaks and aimed to propose ways for enhancing the practical utility of forecasting results.
To understand the current state of COVID-19 forecasting research in the Republic of Korea (ROK), relevant literature was collected using PubMed, the most widely used international academic database in the field of health and medicine. The literature search was conducted until October 15, 2024, with no restrictions on the start date of the search period. Keywords used for the review were categorized into three main groups, “Korea,” “COVID-19,” and “Prediction,” and combinations of terms from these categories were used during the search. The keywords and query formulas were finalized through a review by the research team (Table 1).
| Topic | Search strategy |
|---|---|
| Republic of Korea | Korea[Title/Abstract] OR “South Korea”[Title/Abstract] OR “Republic of Korea”[Title/Abstract] |
| COVID-19 | covid[Title/Abstract] OR corona[Title/Abstract] OR sars-cov-2[Title/Abstract] OR COVID-19[Title/Abstract] |
| Prediction | predict*[Title/Abstract] OR forecast*[Title/Abstract] OR projecti*[Title/Abstract] OR model*[Title/Abstract] |
COVID-19=coronavirus disease 2019..
The Preferred Reporting Items for Systematic reviews and Meta-Analyses flow diagram was used to systematically describe the literature selection process, presenting each step of the selection procedure. The literature included studies that used domestic data to predict COVID-19 epidemiological indicators in ROK and were published in English. All other papers were excluded. Out of the 929 retrieved articles, 181 that matched the research topic were selected through title and abstract screening based on the inclusion and exclusion criteria. After a full-text review, a total of 138 research papers were included in the final review (Figure 1, Supplementary Table 1; available online).
A review of domestic literature on COVID-19 forecasting revealed that a total of 138 research studies were conducted over the past 5 years, from 2020 to October 15, 2024. There were a record number of publications immediately after the COVID-19 outbreak, with 49 studies published in 2020 and 42 in 2021. This number has since decreased, with 18 papers published in 2023 (Table 2). Furthermore, an analysis of the selected literature, which aided in identifying trends in research topics and methodologies of COVID-19 forecasting studies by year, revealed that mathematical models, such as compartment and agent-based models, were most frequently used, accounting for 70.3% (97 studies) of the papers. Statistical models, machine learning–based predictive models, and hybrid models that combine compartment models with machine learning approaches were also utilized (Table 3, Figure 2). A closer examination of the characteristics of the models used in domestic COVID-19 forecasting research showed that factors, such as research objectives, data availability, prediction accuracy, and policy applicability, were comprehensively considered when selecting models. Detailed descriptions, advantages and disadvantages, and key features of each model type are summarized in Table 4.
| Year | Published papers |
|---|---|
| 2020 | 49 |
| 2021 | 42 |
| 2022 | 23 |
| 2023 | 18 |
| 2024 | 6 |
| Total | 138 |
| Model | Published papers |
|---|---|
| Mathematical models | 97 |
| Statistical models | 20 |
| Machine learning and artificial intelligence-based models | 11 |
| Mixed models | 10 |
| Model type | Specific details | Strengths | Limitations | |
|---|---|---|---|---|
| Mathematical models | Compartmental models | Differential equation-based models (e.g., SIR, SEIR, SIRD). Population divided into S, E, I, R, D compartments. Transition rates between compartments modeled as parameters. | Intuitive reflection of biological mechanisms. Epidemiologically meaningful parameters. Can simulate intervention scenarios. Predictive with few parameters. | Assumes homogeneous population, reducing realism. Uncertainty due to simplified parameter estimation. Difficult to reflect spatial heterogeneity. Sensitive to initial settings. |
| Agent-based models | Models individual behaviors and interactions. Defines agent attributes and behavioral rules. Derives macro patterns from micro interactions. | Captures individual-level heterogeneity. Reflects spatial structures and network effects. Analyzes micro-level impacts of interventions. Represents realistic population structures. | High computational complexity. Difficult to validate due to many parameters. Subjectivity in rule setting. Scalability issues in large simulations. | |
| Machine learning models | Long Short-Term Memory (LSTM) | Deep learning model specialized for sequential/time-series data. Learns long-term dependencies in recurrent neural networks. Selective memory through gate mechanisms. Learns patterns from past infection data. | High predictive accuracy. Learns complex nonlinear patterns. Capable of learning long-term dependencies. Can handle multivariate time-series data. No need for strict statistical assumptions. | Lack of interpretability (black-box). Requires sufficient training data. Complex hyperparameter tuning. Limited generalizability to out-of-distribution scenarios. |
| Gradient Boosting Machine (GBM) | Sequential combination of weak learners (e.g., decision trees). Builds the final predictive model by gradually reducing errors. Utilizes various feature variables for prediction. | High predictive accuracy. Handles diverse data types. Provides variable importance metrics. | Complex hyperparameter tuning. Limited parallelization due to sequential learning. Limited in capturing temporal dependencies. | |
| Statistical models | ARIMA, ARIMAX | Models autoregressive, differencing, and moving average components. Predicts by normalizing non-stationary time series. Uses Box-Jenkins methodology. | Systematic approach based on statistical theory. Parameters are statistically interpretable. Quantifies uncertainty via confidence intervals. Can model with small datasets. | Models only linear relationships. Poor adaptability to structural changes. Subjectivity in order selection. Reduced accuracy in long-term predictions. |
| Statistical models | Exponential growth models | Models exponential increase in case counts. Estimates spread speed using growth rate parameters. | Simplicity and interpretability. Rapid prediction in early spread stages. Efficient modeling with few parameters. Intuitive understanding of growth rates. | Limited in reflecting complex spread patterns. May overestimate in long-term predictions. Difficult to incorporate intervention effects. Does not reflect saturation effects. |
| Hybrid models | Combines advantages of different modeling approaches. Integration of compartmental models with machine learning or ensemble of statistical and machine learning models. Multi-scale modeling approach. | Overcomes limitations of single models. Ensures predictive accuracy and interpretability. Integrates diverse data sources. Provides stable prediction performance. | Increased model complexity and implementation difficulty. Higher computational cost. Complexity in determining weights among models. Challenges in validation and interpretation. | |
SIR=susceptible-infectious-recovered; SEIR=susceptible-exposed-infectious-recovered; SIRD=susceptible-infected-recovered-deceased; ARIMA=Autoregressive Integrated Moving Average Model; ARIMAX=Autoregressive Integrated Moving Average with Exogenous Variables..
Mathematical models typically categorize the total population into groups such as “Susceptible,” “Infected,” and “Recovered,” the classic S-I-R model design. However, these models have evolved to incorporate additional groups, such as the “Hospitalized,” “Vaccinated,” and “Quarantined” populations, resulting in various advanced models such as SEIHR, V-SEIR, SEIHQ, and SEIQRDV3P [4-7]. In studies that used machine learning and algorithmic models, epidemic trends were primarily predicted using techniques such as the Long Short-Term Memory and Gradient Boosting Machine models [8-11]. Among statistical models, time series models such as the Autoregressive Integrated Moving Average Model (ARIMA) and Autoregressive Integrated Moving Average with Exogenous Variables (ARIMAX) were most commonly used. Logistic growth models and various regression models were additionally employed [12-15]. In recent times, there has been a notable increase in the utilization of hybrid models that integrate methodologies from multiple disciplines for the purpose of infectious disease forecasting. Models, such as m-SIQRD and SIRVD-DL, which combine mathematical modeling with deep learning techniques, have been applied to predict the spread patterns of COVID-19 [10,16,17]. In the early stages of the COVID-19 outbreak, research primarily focused on short-term forecasting of confirmed case numbers by using data released by the Korea Disease Control and Prevention Agency (KDCA). Over time, however, the scope of predictive variables was expanded to include death counts, the number of critically ill patients, and the effective reproduction number (R2). Moreover, although several studies exclusively focused on predicting the spread of COVID-19 within ROK, a considerable number of comparative studies were also conducted to include other countries such as Italy, Hong Kong, and India [18-20]. In addition to studies that predicted epidemic trends using epidemiological indicators, many other studies evaluated the effects of interventions such as vaccination, social distancing, school closures, mask-wearing, and changes in individual behavior. Several also examined the influence of variations in population mobility, modeled through different scenarios, on the patterns of the outbreak [21-23]. Various types of mobility-related data, such as mobile carrier data and consumer behavior patterns, were used for forecasting. In addition, several indices, including the stringency index and the Oxford COVID-19 Government Response Tracker index, were incorporated to account for the intensity of social distancing measures [24].
The aim of the present study was to establish a foundation for more effective management of future infectious disease outbreaks by identifying trends in COVID-19 forecasting research conducted in ROK. As a result, it was found that forecasting studies on the spread of COVID-19 used a diverse range of data sources and methodologies. Approaches from other fields, such as machine learning, and hybrid models were also integrated and applied in addition to well-established mathematical models. In addition, numerous studies extended beyond simply forecasting trends in deaths, confirmed cases, or hospitalizations to evaluate the effectiveness of implemented policies in order to inform and support future decision-making processes.
A retrospective review of the studies analyzed showed that overall, medium- to long-term predictions (over a 1 month period) often differed from the actual epidemic trends owing to factors not considered at the time of prediction, such as the emergence of new variants, changes in public health policies, and shifts in social behaviors. It was specifically observed that the accuracy of the existing predictive models significantly decreased with each occurrence of virus mutation or alteration in characteristics. This limitation of infectious disease forecasting models highlights the importance of continuous model calibration and updates.
This study confirmed the impact of predictive models on scientific decision-making and the potential for expanding these models through multidisciplinary collaboration. Continuous cooperation among experts and relevant government agencies is essential for the ongoing advancement of infectious disease forecasting research. Specifically, by expanding the data and resources available for infectious disease forecasting through multidisciplinary efforts, it is expected that uncertainties in future outbreaks can be reduced, and more effective response systems can be established. The limitations identified in the study in regard to prediction accuracy also necessitate the development of ensemble forecasting that utilizes multiple methodologies instead of relying on a single model as well as the establishment of real-time model calibration systems. When using forecasting results, it is crucial to explicitly consider prediction uncertainties and to support decision-making with a range of scenarios. There is also a need to establish institutional frameworks to ensure effective utilization of research findings. To ensure that forecasting results are practically reflected in infectious disease response policies, it is crucial to establish strong connections between insights derived from predictive models and the implementation of actual public health policies. This mechanism facilitates health authorities to proactively respond based on scientific evidence and contribute to ensuring public safety.
The KDCA is currently developing an integrated risk analysis system to monitor both domestic and international public health threats, aiming to establish a comprehensive prevention and medical response system for nationwide and large-scale infectious disease pandemics. Standard operating procedures that detail risk assessments according to crisis levels and specific situations have already been established. Moreover, the KDCA has formulated mid- to long-term plans for the prevention of novel infectious disease pandemics by refining outbreak size predictions through methods, such as artificial intelligence modeling, and by formulating comprehensive research plans for implementation in various phases [3]. Future preparedness for infectious diseases is expected to improve by the continuous development of refined predictive models through multidisciplinary collaboration across fields such as mathematics, statistics, artificial intelligence, public health, and medicine, ultimately contributing to reducing the impact on public health.
Ethics Statement: Not applicable.
Funding Source: None.
Acknowledgments: None.
Conflict of Interest: The authors have no conflicts of interest to declare.
Author Contributions: Conceptualization: HKK, KDM. Data curation: HKK. Supervision: KDM. Validation: BYR, MGY, JHK. Visualization: HKK. Writing–original draft: HKK. Writing–review & editing: HKK, KDM.
| Topic | Search strategy |
|---|---|
| Republic of Korea | Korea[Title/Abstract] OR “South Korea”[Title/Abstract] OR “Republic of Korea”[Title/Abstract] |
| COVID-19 | covid[Title/Abstract] OR corona[Title/Abstract] OR sars-cov-2[Title/Abstract] OR COVID-19[Title/Abstract] |
| Prediction | predict*[Title/Abstract] OR forecast*[Title/Abstract] OR projecti*[Title/Abstract] OR model*[Title/Abstract] |
COVID-19=coronavirus disease 2019..
| Year | Published papers |
|---|---|
| 2020 | 49 |
| 2021 | 42 |
| 2022 | 23 |
| 2023 | 18 |
| 2024 | 6 |
| Total | 138 |
| Model | Published papers |
|---|---|
| Mathematical models | 97 |
| Statistical models | 20 |
| Machine learning and artificial intelligence-based models | 11 |
| Mixed models | 10 |
| Model type | Specific details | Strengths | Limitations | |
|---|---|---|---|---|
| Mathematical models | Compartmental models | Differential equation-based models (e.g., SIR, SEIR, SIRD). Population divided into S, E, I, R, D compartments. Transition rates between compartments modeled as parameters. | Intuitive reflection of biological mechanisms. Epidemiologically meaningful parameters. Can simulate intervention scenarios. Predictive with few parameters. | Assumes homogeneous population, reducing realism. Uncertainty due to simplified parameter estimation. Difficult to reflect spatial heterogeneity. Sensitive to initial settings. |
| Agent-based models | Models individual behaviors and interactions. Defines agent attributes and behavioral rules. Derives macro patterns from micro interactions. | Captures individual-level heterogeneity. Reflects spatial structures and network effects. Analyzes micro-level impacts of interventions. Represents realistic population structures. | High computational complexity. Difficult to validate due to many parameters. Subjectivity in rule setting. Scalability issues in large simulations. | |
| Machine learning models | Long Short-Term Memory (LSTM) | Deep learning model specialized for sequential/time-series data. Learns long-term dependencies in recurrent neural networks. Selective memory through gate mechanisms. Learns patterns from past infection data. | High predictive accuracy. Learns complex nonlinear patterns. Capable of learning long-term dependencies. Can handle multivariate time-series data. No need for strict statistical assumptions. | Lack of interpretability (black-box). Requires sufficient training data. Complex hyperparameter tuning. Limited generalizability to out-of-distribution scenarios. |
| Gradient Boosting Machine (GBM) | Sequential combination of weak learners (e.g., decision trees). Builds the final predictive model by gradually reducing errors. Utilizes various feature variables for prediction. | High predictive accuracy. Handles diverse data types. Provides variable importance metrics. | Complex hyperparameter tuning. Limited parallelization due to sequential learning. Limited in capturing temporal dependencies. | |
| Statistical models | ARIMA, ARIMAX | Models autoregressive, differencing, and moving average components. Predicts by normalizing non-stationary time series. Uses Box-Jenkins methodology. | Systematic approach based on statistical theory. Parameters are statistically interpretable. Quantifies uncertainty via confidence intervals. Can model with small datasets. | Models only linear relationships. Poor adaptability to structural changes. Subjectivity in order selection. Reduced accuracy in long-term predictions. |
| Statistical models | Exponential growth models | Models exponential increase in case counts. Estimates spread speed using growth rate parameters. | Simplicity and interpretability. Rapid prediction in early spread stages. Efficient modeling with few parameters. Intuitive understanding of growth rates. | Limited in reflecting complex spread patterns. May overestimate in long-term predictions. Difficult to incorporate intervention effects. Does not reflect saturation effects. |
| Hybrid models | Combines advantages of different modeling approaches. Integration of compartmental models with machine learning or ensemble of statistical and machine learning models. Multi-scale modeling approach. | Overcomes limitations of single models. Ensures predictive accuracy and interpretability. Integrates diverse data sources. Provides stable prediction performance. | Increased model complexity and implementation difficulty. Higher computational cost. Complexity in determining weights among models. Challenges in validation and interpretation. | |
SIR=susceptible-infectious-recovered; SEIR=susceptible-exposed-infectious-recovered; SIRD=susceptible-infected-recovered-deceased; ARIMA=Autoregressive Integrated Moving Average Model; ARIMAX=Autoregressive Integrated Moving Average with Exogenous Variables..
Chloe Jeongjin Lee, Yeonsu Kim, Gangju Lee, Kiryong Nam, Benimana Dieudonne Theos, Seungpil Jung, Woojoo Lee, Juhwan Oh, Seung-sik Hwang, Dasol Kim
Public Health Weekly Report 2024; 17(30): 1267-1284 https://doi.org/10.56786/PHWR.2024.17.30.1Ji-Ae Lim, Daehui Han, Dongkwon Choi, Yoo Ho Choi, Seung-Hee Jeong, Sang Ouk Woo, Oh-Hyun Cho
Public Health Weekly Report 2024; 17(22): 946-961 https://doi.org/10.56786/PHWR.2024.17.22.2Hye Young Lee, Ji Joo Lee, Hanul Park, Mi Yu, Jong Mu Kim, Sang-Eun Lee, Young-Joon Park, Moonsu Kim, Seonggon Kim, Hanna Yoo, Mi Young Kim, Jin Su Song, Jihee Lee, Jeong Hee Yu, Eun-young Kim, Hyo Seon Jeong, Jae Hwa Chung
Public Health Weekly Report 2021; 14(53): 3768-3776 https://doi.org/10.56786/phwr.2021.14.53.3768