OJEpi  Vol.10 No.4 , November 2020
Real-Time COVID-19 Forecasting for Four States of India Using a Regression Transmission Model
Abstract: Introduction: More than a million people are reported to have been infected with COVID in India, since the beginning of the pandemic. However, the epidemic is not the same across the country. Though there are state-level variations rapidly changing disease dynamics and the response has created uncertainty towards appropriate use of models to project for the future. Method: This paper aims at using a validated semi-mechanistic stochastic model to generate short term forecasts. This analysis used data available at the respective state government bulletins for four states. The analysis used a simplified transmission model using Markov Chain Monte Carlo simulation with Metropolis-Hastings updating. Results: Two weeks were used to compare the results with the actual data. The forecasted results are well within the 25th and 75th percentile of the actual cases reported by the respective states. The results indicate a reliable method for a real-time short term forecasting of COVID-19 cases. The 1st week projected interquartile range and actual; reported cases for the state of Kerala, Tamil Nadu, Andhra Pradesh and Odisha were (1064 - 2532) 2234, (17,503 - 50,125) 27,214, (5225 - 11,003) 9563, (2559 - 4461) 3925, respectively. Similarly, the 2nd week projected interquartile range and actual; reported cases were (1055 - 7803) 4221, (18,298 - 73,952) 31,488, (4705 - 23,224) 13,357, (2701 - 9037) 4175 respectively. Conclusion: This real-time forecast can be used as an early warning tool for projecting the changes in the epidemic in the near future triggering proactive management steps.

1. Introduction

As on 18th July 2020, there are more than million cases of COVID-19 reported from India [1]. The first case was reported on 31st of January 2020 [2]. As the pandemic of COVID continued to sweep across the world the country took a series of measures to address it. This included improving the testing capacity and having a testing strategy to identify the cases [3]. However, the uncertainty over the duration and the burden of the pandemic is visible with reports both peer-reviewed and not peer-reviewed [4]. These reports indicate the epidemic to be a range from few hundred thousand to few hundred millions with peak varying between April to July 2020 [5] [6]. Since the first reported case, India has taken several non-pharmaceutical interventions to address the pandemic [7].

India follows a federal structure, where health is a state subject and the centre plays its supporting role at the time of need. While the earlier models provide the bigger picture, the planning for response needed short term projections that can keep a close eye on the upcoming wave of cases, which can help them in local decision making. The states in India are different in population size, density and connectivity to the other parts of the world directly [8]. The burden of COVID-19 both in terms of cases and death differ from state to state [1]. Also the strength of health system is not uniform across the country, thus the local response is expected to be different [9].

The short term forecasting is being found useful for similar infectious diseases earlier [10] [11] [12]. These models use the reported cases as inputs and use various methods like discrete time stochastic model, a conditional intensity of accumulation of cases using non parametric probability method or generation dependent growth factors to develop simple but robust models for forecasting disease respectively. Short term projections using growth models and modified SIR model (Susceptible-Infected-recovered) have also been used for early epidemic [13] [14]. Keeping the above facts in view, this paper is focused on creating real-time and short-term projections for COVID-19 in the near future that can be helpful for the states in India and applicable not only for early but also for later part of the epidemic.

2. Methodology

2.1. Data

The data required for the model was available from several publicly available domains. However, due to reported discrepancies, the authors decided to choose the ones that have the desired information and reported by government through daily bulletins [15]. We collected daily updated data on the number of confirmed cases from all the respective state government daily bulletins and dashboard, which were reporting daily from the first case identified in their state. The data was available in the respective state government websites [16] [17] [18] [19]. These states included Kerala, Tamil Nadu, Andhra Pradesh, and Odisha. The time period of data collection was from 30th January 2020 to 18th July 2020.

2.2. Model

We re-calibrated the semi-mechanistic discrete-time stochastic compartmental disease model, the details of the model can be found elsewhere [20]. The model consisted of two integer states or compartments of “Susceptible” - “Exposed” - “Infectious”, thus can be considered as mechanistic. The mean latent period was assumed to be 2.5 days. The duration of infectiousness for the model was obtained from the literature search showed pre-symptomatic period was 5 - 6 days with 97% of people infected persons showing infections before 12 days [21] and in India the disease detection from day of sample collection was around 4 days [22]. As soon as people were detected they were removed from the non-infected population through quarantine measures, thus removing them from further transmitting COVID-19. Thus, we assumed the infectious period to be 8 days with range 4 to 12 days.

All the new infections entered the “Exposed” compartment as a Poisson distribution with mean as a product of time-varying reproductive rate r(t) and proportion infected at the time. The model used the available information on the latent period of the disease and the infectious period to move people between the compartments using independent geometric transitions with first-order Euler method [23]. Random-walk methods are used elsewhere for modelling the reproductive rate in outbreak situations [24]. The model used a multiplicative normal random walk with a log-linear drift to generate the r(t) parameter. Assuming the uniform prior distribution of parameters the model was fit to the number of reported cases [10] [25]. The reported cases were adjusted at each state space using particle Markov Chain Monte Carlo simulation with Metropolis-Hastings updating [26].

The interquartile range, 25th-75th and in the 5-95th percentile ranges were calculated. The model used a minimum of 400 particles (400 - 800) and 30,000 iterations until the overall acceptance value was within the acceptable range of 20 - 30 per cent (Table 1) [27] [28]. The starting timeline and the initial cases for the states varied as the initial case detection was a different period for different states. credible intervals for the reported cases, forecasted cases and time-varying reproduction number were generated from the posterior distribution samples. Two weeks forecast starting from 5th July 2020 were generated and were validated during the actual reported data during the forecasted period. R-version 4.0 was used for the analysis [29]. C++ using the Rcpp package for computational efficiency, and ggplot2 package was used to produce charts [30].

3. Results

The results show an increasing number of cases for all the states with varying level except in Tamil Nadu, where is expected to remain stable [Figures 1-4]. For every figure, the input incidence data is denoted by black dots the lower left. The time-varying reproductive number r(t) during the same period is denoted with black lines and shaded regions in the upper left. The forecasted results for the r(t) and the number of cases is illustrated with blue lines and shaded regions. In all shaded regions, the central line indicates the median, the darker shaded region indicates the interquartile range and the lighter shaded region indicates the 5-95th percentile range.

The common result for all the state indicates towards a clear time-varying r(t) with more than 1 and showing a stable trend for the forecasted period. The number of cases is showing an increasing trend in the rest of the three states (Table 2). All the forecasted figures are within the 25th to 75th percentile/interquartile range. However, the first week of projection matches more closely with the actual values in comparison to the following week.

Table 1. Parameters and Priors.

Table 2. Comparison of Day-wise projections, upper and lower interquartile range with actual reported COVID-19 cases.

4. Discussion

The study covered nearly 200 million population at risk. Though all the states are in the coastal region of India are different in language, socioeconomic dynamics and health system capacity [9] [37]. Though the first case of COVID was reported from Kerala in late January, most of the other states in the study reported the first cases in March. This analysis uses reported cases from the public health system, shows the heterogeneity of the epidemic movement through the time-varying effective reproductive rate and near-future forecast of COVID-19 burden, which closely matches with the actual number of cases.

Figure 1. Two weeks forecast: Time varying reproduction number and new cases in Kerala.

Figure 2. Two weeks forecast: Time varying reproduction number and new cases in Tamil Nadu.

Figure 3. Two weeks forecast: Time varying reproduction number and new cases in Andhra Pradesh.

Figure 4. Two weeks forecast: Time varying reproduction number and new cases in Odisha.

4.1. Epidemic Model and Validity

The results from four different states show that the model is robust and can be deployed for other states too. The variations in the actual number and the forecasted figures need to be analysed with the perspective of finding reasonable answers [38]. Most of the models are based on the assumption that “given current situation remains the same for future”, which is difficult to achieve, particularly concerning changing policy in testing and response strategy in the states [39] [40]. Changing case definitions, testing strategy and response to the pandemic is known to have influenced our understanding of the trajectory of the epidemic not only in India but also around the world [41]. The social distancing and lock down measures had shown to reduce the contagion, which was reported to be reversed once the started to open up [42] [43]. As this model demonstrates that the near future cases can be predicted with close certainty, repeated application of the model in constant time intervals can provide vital information on the projected number of cases in short runs and thus used for deployment of mitigation strategies. The criticism that outputs of mathematical models are not always useful may be considered in the right spirit, with a reasonable understanding of the models and the need for thinking beyond the one-time application of models [44] [45].

4.2. Effective Reproduction Number

The effective reproductive number indicates the risk of the epidemic in a given point of time. Though r(t) lower than unit indicates towards the loss of force in the epidemic, it remains sensitive to the fact that most of the population is not infected thus providing little protection, long way from herd immunity and thus the potential risk of transmission. It is also important to mention that r(t) lower than unit does not exclude the potential of localised outbreaks, which can influence the epidemic trajectory [33] [46]. However, the trend in r(t) does provide advance information on the highly unpredictable nature of this epidemic.

4.3. Need for Better Data

Any model output is as good as the data is as the assumptions are. The details about the day of symptom onset today of sample collection and the day of diagnosis and day of reporting was not available in the respective public domains of the states, thus were not considered in the model. Availability of this information may improve the model performance further. Also, as the capacity for testing was increased gradually the delay between the sample collection and testing had declined [3]. Having local data for each state with specific serial interval or generation time could have helped in the improvement of this analysis. The other published studies from India have also relied on external information source [7] [47]. It is expected that those who will be replicating the study can focus on further improving the model by addressing the gaps. The states’ response was mixed with some states going for only institutional quarantine while others going for home and institutional quarantine. An additional limitation of this model is reliant on the published information, which was dependence on the published case reports, thus sensitive to the local level variation in the testing strategy.

4.4. Consensus on Models

There are many types of forecasting and predictive modelling has been published on the COVID epidemic in India [48] [49]. All these models are relevant in the scientific quest and add value to the knowledge base of this novel virus. However, different models will provide different results creating confusion, and making it difficult for the policymakers to take decisions. However, there are other epidemic response initiatives in India which focuses on a single tool for the epidemic projection leading to harmony in the response [50]. The other model of managing a large number of real-time scientific publication is by aggregating them all and adding mutual accountability [51]. Most importantly the real-time forecasts can happen only if it is repeated again and again regular intervals [14].

5. Conclusion

The real-time short-term forecasting used in the four states provides a good approximation of the near future epidemic trajectory. The tool is available in the public domain and needs to be used on a given interval repeatedly to ensure tracking of the epidemic at the local level. This can be used for understanding the future burden.


The constant guidance and sharing of original codes by Jason Asher are highly appreciated. The paper also acknowledges the critical inputs of Dr. Shailaja Tetali and Dr. Jammy Rajesh.

Ethics Statement

The analysis is done using publicly available secondary data and no patient information was taken for analysis, thus ethical clearance was not required.

Cite this paper: Choudhury, L. and Kumar, B. (2020) Real-Time COVID-19 Forecasting for Four States of India Using a Regression Transmission Model. Open Journal of Epidemiology, 10, 335-345. doi: 10.4236/ojepi.2020.104027.

[1]   Ministry of Health and Family Welfare. Government of India (2020) Update COVID-19 India, as on 4th June.

[2]   Press Information Bureau. Government of India (2020) Update on Novel Coronavirus: One Positive Case Reported in Kerala.

[3]   Abraham, P., et al. (2020) Laboratory Surveillance for SARS-CoV-2 in India: Performance of Testing & Descriptive Epidemiology of Detected COVID-19, January 22-April 30, 2020. Indian Journal of Medical Research, 151, 424-437.

[4]   Mandal, S., et al. (2020) Prudent Public Health Intervention Strategies to Control the Coronavirus Disease 2019 Transmission in India: A Mathematical Model-Based Approach. Indian Journal of Medical Research, 151, 190-199.

[5]   Ray, D., et al. (2020) Predictions and Role of Interventions for COVID-19 Outbreak in India: Crisis of Virus in India (COVIND).

[6]   Chatterjee, K., Chatterjee, K., Kumar, A. and Shankar, S. (2020) Healthcare Impact of COVID-19 Epidemic in India: A Stochastic Mathematical Model. Medical Journal Armed Forces India, 76, 147-155.

[7]   Patel, P., Athotra, A., Vaisakh, T., Dikid, T., Jain, S. and NCDC COVID Incident Management Team (2020) Impact of Nonpharmacological Interventions on COVID-19 Transmission Dynamics in India. Indian Journal of Public Health, 64, 142.

[8]   Kucharski, A.J., et al. (2020) Early Dynamics of Transmission and Control of COVID-19: A Mathematical Modelling Study. The Lancet Infectious Diseases, 20, 553-558.

[9]   Aayog, N. (2019) Healthy States Progressive India: Report on the Ranks of States and Union Territories. Health Index.

[10]   Nishiura, H. (2011) Real-Time Forecasting of an Epidemic Using a Discrete Time Stochastic Model: A Case Study of Pandemic Influenza (H1N1-2009). BioMedical Engineering OnLine, 10, 15.

[11]   Kelly, J.D., et al. (2019) Real-Time Predictions of the 2018–2019 Ebola Virus Disease Outbreak in the Democratic Republic of the Congo Using Hawkes Point Process Models. Epidemics, 28, Article ID: 100354.

[12]   Akhmetzhanov, A.R., et al. (2018) Real Time Forecasting of Measles Using Generation-Dependent Mathematical Model in Japan, 2018. PLOS Currents.

[13]   Roosa, K., et al. (2020) Real-Time Forecasts of the COVID-19 Epidemic in China from February 5th to February 24th, 2020. Infectious Disease Modelling, 5, 256-263.

[14]   Althaus, C.L. (2020) Real-Time Modeling and Projections of the COVID-19 Epidemic in Switzerland.

[15]   Hindustan Times (2020) Discrepancies in Covid Data Leave Haryana Health Authorities Baffled—Chandigarh.

[16]   Government of Karnataka (2020) Home—COVID-19 Information Portal.

[17]   Health and Family Welfare Department (2020) Daily Bulletin—StopCoronaTN. Government of Tamil Nadu.

[18]   Department of Health and Family Welfare (2020) COVID Dashboard. Government of Odisha.

[19]   Governmment of Andhra Pradesh (2020) COVID-19: Andhra Pradesh Department of Health, Medical, & Family Welfare.

[20]   Asher, J. (2018) Forecasting Ebola with a Regression Transmission Model. Epidemics, 22, 50-55.

[21]   Stephen, M. and Baum, G. (2020) COVID-19 Incubation Period: An Update. NEJM Journal Watch, 2020.

[22]   Hindustan Times (2020) City to Breach 3k Covid-19 Cases Today as Health Department Clears Backlog of Samples—Gurugram—Hindustan Times.

[23]   Aderhold, A., Husmeier, D. and Grzegorczyk, M. (2017) Approximate Bayesian Inference in Semi-Mechanistic Models. Statistics and Computing, 27, 1003-1040.

[24]   Funk, S., Camacho, A., Kucharski, A.J., Eggo, R.M. and Edmunds, W.J. (2018) Real-Time Forecasting of Infectious Disease Dynamics with a Stochastic Semi Mechanistic Model. Epidemics, 22, 56-61.

[25]   Endo, A., van Leeuwen, E. and Baguelin, M. (2019) Introduction to Particle Markov-Chain Monte Carlo for Disease Dynamics Modellers. Epidemics, 29, Article ID: 100363.

[26]   Liu, F., Li, X. and Zhu, G. (2020) Using the Contact Network Model and Metropolis-Hastings Sampling to Reconstruct the COVID-19 Spread on the “Diamond Princess”. Science Bulletin, 65, 1297-1305.

[27]   Roberts, G.O., Gelman, A. and Gilks, W.R. (1997) Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms. Annals of Applied Probability, 7, 110-120.

[28]   Bédard, M. (2008) Optimal Acceptance Rates for Metropolis Algorithms: Moving beyond 0.234. Stochastic Processes and Their Applications, 118, 2198-2222.

[29]   R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.

[30]   Wickham, H. (2009) ggplot2: Elegant Graphics for Data Analysis. In: ggplot2, Springer, New York, 1-7.

[31]   Liu, Y., Gayle, A.A., Wilder-Smith, A. and Rocklöv, J. (2020) The Reproductive Number of COVID-19 Is Higher Compared to SARS Coronavirus. Journal of Travel Medicine, 27, taaa021.

[32]   Viceconte, G. and Petrosillo, N. (2020) Covid-19 R0: Magic Number or Conundrum? Infectious Disease Reports, 12, 8516.

[33]   Peirlinck, M., Linka, K., Sahli Costabal, F. and Kuhl, E. (2020) Outbreak Dynamics of COVID-19 in China and the United States. Biomechanics and Modeling in Mechanobiology, 1-15.

[34]   Song, R., et al. (2020) Clinical and Epidemiological Features of COVID-19 Family Clusters in Beijing, China. Journal of Infection, 81, E26-E30.

[35]   Ganyani, T., et al. (2020) Estimating the Generation Interval for Coronavirus Disease (COVID-19) Based on Symptom Onset Data, March 2020. Eurosurveillance, 25, Article ID: 2000257.

[36]   Boldog, P., Tekeli, T., Vizi, Z., Dénes, A., Bartha, F.A. and Röst, G. (2020) Risk Assessment of Novel Coronavirus COVID-19 Outbreaks Outside China. Journal of Clinical Medicine, 9, 571.

[37]   Indian Institute for Population Sciences (IIPS) and ICF (2017) National Family Health Survey (NFHS-4), 2015-16: India. Int. Inst. Popul. Sci. ICF, 1-192.

[38]   Roda, W.C., Varughese, M.B., Han, D. and Li, M.Y. (2020) Why Is It Difficult to Accurately Predict the COVID-19 Epidemic? Infectious Disease Modelling, 5, 271-281.

[39]   Telangana Coronavirus News: Highest Positivity, Slow Recoveries, Low Testing: Telangana’s Worsening Covid-19 Stats. The Economic Times.

[40]   Covid Testing Numbers in Karnataka Down Sharply and So Are Fresh Cases. The Economic Times.

[41]   Tsang, T.K., et al. (2020) Effect of Changing Case Definitions for COVID-19 on the Epidemic Curve and Transmission Parameters in Mainland China: A Modelling Study. The Lancet Public Health, 5, e289-e296.

[42]   Anderson, R.M., Hollingsworth, T.D., Baggaley, R.F., Maddren, R. and Vegvari, C. (2020) COVID-19 Spread in the UK: The End of the Beginning? The Lancet.

[43]   ECDC (2020) Rapid Risk Assessment: Resurgence of Reported Cases of COVID 19 in the EU/EEA, the UK and EU Candidate and Potential Candidate Countries. Risk Assessment.

[44]   Shah, K., Awasthi, A., Modi, B., Kundarpur, R. and Saxena, D. (2020) Unfolding Trends of COVID-19 Transmission in India: Critical Review of Available Mathematical Models. Indian Journal of Community Health, 32, 206-214.

[45]   Holmdahl, I. and Buckee, C. (2020) Wrong But Useful—What Covid-19 Epidemiologic Models Can and Cannot Tell Us. The New England Journal of Medicine, 383, 303-305.

[46]   Delamater, P.L., Street, E.J., Leslie, T.F., Yang, Y.T. and Jacobsen, K.H. (2019) Complexity of the Basic Reproduction Number (R0). Emerging Infectious Diseases, 25, 1-4.

[47]   Bhaskar, A., Ponnuraja, C., Srinivasan, R. and Padmanaban, S. (2020) Distribution and Growth Rate of COVID-19 Outbreak in Tamil Nadu: A Log-Linear Regression Approach. Indian Journal of Public Health, 64, 188.

[48]   Tiwari, S., Kumar, S. and Guleria, K. (2020) Outbreak Trends of Coronavirus (COVID-19) in India: A Prediction. Disaster Medicine and Public Health Preparedness.

[49]   Tomar, A. and Gupta, N. (2020) Prediction for the Spread of COVID-19 in India and Effectiveness of Preventive Measures. Science of the Total Environment, 728, Article ID: 138762.

[50]   National Institute of Medical Statistics and National AIDS Control Organisation (2012) Technical Report India: HIV Estimates-2012. New Delhi.

[51]   COVID-19 Forecasts: Cumulative Deaths CDC.