The State of São Paulo, in the Southeast of Brazil, has presented a history with high number of storms accompanied by lightning that causes several impacts to the society. These storms are associated with the climatic characteristics of the region, which has a large space-time variation in the lightning incidence, as well as a continuous process of urbanization, which intensifies the development of these storms   .
Over the years, several studies using different methodologies     , have already shown that the Southeast Region of Brazil is inserted in the spatial context of the regions of the world with the highest incidence of this phenomenon. Only in the State of São Paulo, there are around 700,000 lightning per year  .
Due to this, there is currently great concern regarding the increase in the lightning incidence, mainly due to the great power of destruction caused by this phenomenon that although much occurs inside the cloud, that is, without the contact with the surface of the Earth  , the portion that reaches the ground is numerous enough to cause considerable damage to structures built by man, particularly in large cities. These damages consist of electric systems failures, breakdowns in telecommunications towers and buildings, burning of electronic equipment, among others  , causing damage to society estimated at 500 million dollars a year in Brazil alone  .
In addition, the lightning can cause fatalities, being the second major cause of death by meteorological phenomena on the planet, according to World statistics. In Brazil alone, there are around 130 deaths per year, according to data from a survey of lightning deaths between 2000 and 2009. In the last decade, 1321 people died of being struck by lightning, with a higher number of fatalities, the Southeast Region, with 29% of the total  . In recent statistics, it was observed that, between 2000 and 2014, there were 263 fatalities in the State of São Paulo  .
These data reveal the great importance of understanding the behavior of this phenomenon in the future climate. In the short-term forecast scale, studies have been developed, based on meteorological parameters and/or cloud microphysics     . However, a complex obstacle that still requires several studies and methodological techniques to be supplied, is in relation to the long-term projection of this phenomenon, since it is not an output variable of the forecasting numerical models, and still needs studies on the climatic parameters that modulate their occurrence.
In view of this, the present study proposes to contribute with the advance in the knowledge of the lightning incidence of the cloud-to-ground type (CG) in the State of São Paulo, by means of future climatic projections of the occurrence of this phenomenon.
The results obtained will serve as a basis for the construction and improvement of alert systems, in the short and long term for the State of São Paulo, thus allowing preventive measures to be taken to minimize the impacts caused by this phenomenon.
Associated with this information, the alert in relation to increase of the frequency of the extreme climatic events caused by the intensification of the global warming, divulged by the Intergovernmental Panel in Climate Change-IPCC   in its latest report, AR5, strengthens the development of research in the predictable scope, which may point to periods of higher lightning incidence.
Finally, one of the main justifications for this kind of evaluation is that studies of this nature for this phenomenon in this region are still very incipient. However, it is of great relevance to several sectors of interest and can be used as a subsidy for environmental interventions that minimize the impacts caused by the lightning incidence.
2. Data and Methodology
Given the fact that the lightning is not an output variable of the climatic models, to obtain the projections of this phenomenon, tests were carried out to evaluate the relationship between ocean-atmospheric variables, which are outputs of the models, and lightning, using observed data (Reanalysis by National Centers for Environmental Predictions/National Center for Atmospheric Research-NCEP/NCAR) for the period of greatest occurrence of the phenomenon, summer. This was done because, based on the knowledge of the mathematical function that describes the behavior of a dependent variable (explained or predicted) as a function of the dynamics of other independent variables (explanatory or predictive), it is possible to make future projections using model data.
2.1.1. Observational Data
The CG lightning data used in this work for the State of São Paulo, in the Southeast of Brazil (Figure 1) come from the Integrated Network for the Detection of Atmospheric Discharges (RINDAT) and the Brazilian Network for the Detection of Atmospheric Discharges (BrasilDAT).
Sixteen years of data were considered, corresponding to the austral summer period from 1999 to 2014, of which the 1999-2010 data are from the RINDAT network and the data from 2011-2014 are from the BrasilDAT network. For the studied period, RINDAT showed detection efficiency above 80% and Brazil DAT above 90%   . These values indicate that both networks had full conditions to use their data. The networks detect the electromagnetic pulse from a lightning strike and calculate latitude and longitude of the point of incidence, time of occurrence in UTC, among other characteristics.
Several tests were performed using oceanic-atmospheric parameters such as sea surface temperature (SST), precipitation, air temperature, outgoing longwave radiation (OLR) and the omega difference between the tropospheric levels of 850 and 500 hPa, to verify which of these variables presented the best relation with the lightning. These tests were done for both simultaneous and lagged correlations.
The data of the atmospheric variables were selected in the area on the State of
Figure 1. Location of the State of São Paulo, in the Southeast of Brazil. Elevation data source―National Institute for Space Research (INPE), made available by the Environmental Planning Coordination of the Environment Secretariat of the State of São Paulo (CPLA/SMA).
São Paulo and for the SST. The SST selected areas correspond to the oceanic regions with the highest correlation values, comprising an area of 5˚ × 5˚ in the Pacific Ocean (Lat: 46˚S to 50˚S and Lon: 111˚W to 107˚W), South Atlantic (Lat: 57˚S to 61˚S and Lon: 50˚W to 46˚W), and Tropical Atlantic (Lat: 24˚S to 28˚S and Lon: 40˚W to 36˚W).
Despite the use of all these parameters in the tests, we tried to apply the regression model that used a small number of independent variables, given that the size of the data sample is not very extensive. Since when one sets a model for a small sample, the more predictors one chooses to use, closer to the perfection the prediction will be, which, counterintuitively, is actually a bad thing because we want to choose only one or two variables to make a good prediction so as not to rely on several sources of data and to properly determine the relationship between the parameters. Thus, one must reduce the number of independent variables or increase the sample size  .
The results of these tests showed that the parameters that pointed to a higher degree of relation with the lightning were the SST of the South Atlantic Ocean and Omega. Therefore, these variables were used for future projections. The methodological procedures used to achieve such projections will be described in Subsection 2.2.
2.1.2. Climate Models
For the projection of future climate scenarios, we used data from two robust CMIP5 models: HadGEM2-ES e CSIRO-Mk3.6. The Hadley Centre Global Environmental Model version 2-Earth System (HadGEM2-ES) from the UK Met Office Hadley Centre, is a general circulation model of the atmosphere coupled to an ocean model. It has an atmospheric component with horizontal resolution N96, that is, approximately 1.250˚ in latitude and 1.875˚ in longitude, with 38 vertical levels, whereas the oceanic component presents horizontal resolution of 1˚, increasing to 1/3˚ in the equator, and 40 vertical levels    . The model has a time step of 30 minutes for the atmosphere and surface components and one hour for the oceanic component  . The HadGEM2-ES presents a good representation of the atmospheric conditions on South America, especially in the quarter of DJF-summer  .
The CSIRO-Mk3.6 global climate model is an ocean-atmosphere coupled model of the Commonwealth Scientific and Industrial Research Organisation (CSIRO) of the Australia, with sea ice dynamics and a soil-canopy scheme that presents prescribed vegetation properties. The atmospheric component of the CSIRO-Mk3.6 model presents horizontal resolution (spectral T63) of approximately 1.875˚ in latitude and 1.875˚ in longitude, with 18 vertical levels.
The oceanic component is based on version 2.2 of the Modular Ocean Model (MOM2.2) described by  , with horizontal resolution of approximately 0.9375˚ in latitude and 1.875˚ in longitude and comprises 30 vertical levels   . The importance of the use of CSIRO-Mk3.6 model simulations in this work is given, among others, by representing the SST variability closer to the observed data and is more reliable for climate projections   .
2.2.1. Multiple Linear Regression
To carry out climatic projections of lightning, the multiple linear regression technique was used to evaluate the relationship between a single predicted variable and two or more predictor variables and to carry out projections from this uncovered relationship  . The importance of this technique occurs because, in general, the phenomena of nature have multivariate essence and are not dependent on a single factor  .
Through this analysis, it is also possible to determine the individual weight that each variable has in the set of relations, obtaining as a final result, the contextualized product of all the partitions involved and the degree of relationship between the variables under analysis  .
Thus, in this work the dependent variable consists of the CG lightning and the independent variables comprise SST in the South Atlantic Ocean and the omega variable. The combination of independent variables used together to predict the dependent variable is also known as the equation or regression model  . The equation used follows the function of the type:
The dependent variable is represented by y, and the independent variables by. The term β0 is called the intercept or linear coefficient, and represents the value of the intersection of the regression line with the Y-axis. The terms are the angular coefficients, and the term ε, represents the residue or regression error.
With the result of the multiple correlation and with a view to the detailed and systematic examination of the results, their validation was performed through the application of the cross validation method. Details of this method can be found in  . In this work, the development of the method was performed in such a way that the dataset was divided into 16 subsets, according to the number of years in the time series of CG lightning, that is, in each simulation fifteen subsets were used for training and a subset was used for testing.
2.2.2. Measures of Error and Correction of the Models
In order to evaluate the performance of predictions of climate models, a direct comparison was made between observed data and simulated data (bias), as well as the mean square error (RMSE).
The bias (Medium Error-ME) is the most objective measure of the prediction of a numerical model, it reports if the simulation underestimated or overestimated the actual values. If the result has a negative value, it means that the model tends to underestimate the observed data, and if the value is positive, it means that the model tends to overestimate the observed data. This measure of error can be obtained from Equation (2):
where is the observed value of the variable at the i-th instant of time; is the value of the same variable derived from the model, corresponding to the same time instant of the observed data and N is sample size. The result can be any real value and has the same unit of the variable under analysis. The closer to zero is the result, the better the performance of the model is, the smaller the deviation between simulated and observed data.
Another way to verify the efficiency of the models is to use the mean square error (RMSE), which is given by the sum of the squares of the differences between the simulated and observed data, as presented in Equation (3):
The RMSE can assume any positive value, and has the same unit of measure of the series under study. Like bias, the closer its result is to zero, the greater the efficiency of the model in reproducing the actual data. In general, the RMSE is expressed as a percentage of the average of observations (relative errors). Thus, the RMSE (%) represents the ratio between the error values and the mean of the observations, multiplied by one hundred  .
In order to perform the adjustment of the data of the models (removal of the systematic error of the data obtained by the simulations), a statistical method, adapted from  and  and widely used by  and  . The method is based on the use of the mean and standard deviation of the data series observed and simulated, given by Equation (4):
wherein represents a value of the simulation, the mean of the simulated values, the mean of the standard deviations of the observed series, the mean of the standard deviations of the simulated series, and represents the average of the observed data.
3. Results and Discussions
This section presents the results obtained in the projections of CG lightning, for the State of São Paulo. The following equation presents the values obtained in the cross validation process, which aims to evaluate the stability of the relationship found. In this equation, L(t) represents the variation of lightning over time, O is omega and SA is the SST in South Atlantic Ocean.
For this analysis, the values of the variables were normalized to a unit value, in order to obtain the contribution of each member in the correlation equation. Thus, it was observed that among the variables in studies, the SST of the South Atlantic Ocean was the one that presented the greatest contribution in the correlation equation. This probably occurs because SST is a basic parameter for climatic anomalies   . However, the omega variable also presented a satisfactory value in the relation with the lightning, since it is associated with the observed convection/nebulosity over the study area. The residue or regression error obtained in this relation was 0.63, and the multiple correlation coefficient (R) was 0.84, equivalent to approximately 84%.
The Figure 2 presents the values of multiple R in the simulations of cross- validation, in which, it is observed that in most simulations, the correlation coefficient was approximately 0.84. However, it was found that in some simulations, the relationship between the study variables and discharges reached values of approximately 0.87 (97%) as in the case of simulations 1, 2 and 10, equivalent to the years 1999, 2000 and 2008 respectively. This fact shows the representative degree of the relationship between the variables under analysis and lightning. The validation process is important because it shows whether the observed equation can be applied to other data samples.
Given the above, it became feasible to analyze future lightning projections using model data. However, to properly analyze the future dynamics of the lighting incidence, it is necessary to first examine the performance of these models in simulating the variables used. Therefore, the model prediction evaluations will be presented first, bias and RMSE will be quantified, and future projections will be performed using the RCP’s scenarios.
Figure 2. Multiple R values of the cross validation simulations 95% confidence level.
Table 1 presents the results of the error evaluation of the models. For the SST of the South Atlantic Ocean, it was observed that HadGEM2-ES overestimated this parameter, with high bias and RMSE values (3.6˚C, 180.3%, respectively). CSIRO-Mk3.6 presented a good result, with an underestimate of only −0.2˚C. The RMSE of this model for this parameter was 34.2.
Thus, it was observed that for the South Atlantic SST, the CSIRO-Mk3.6 model presented a more satisfactory performance than the HadGEM2-ES, due to the greater approximation of the simulated data with the observed data. HadGEM2-ES tends to have higher SST in this region, which in a future climate could indicate the intensification of the lightning incidence on the State of São Paulo, given the relation between the SST of these regions and the lightning.
Similarly to SST, the omega variable was also better simulated by the CSIRO-Mk3.6 model, in both indices under analysis. The systematic error of HadGEM2-ES was −0.009 W∙m−2 whereas that of CSIRO-Mk3.6 was −0.006 W∙m−2. The RMSE of the HadGEM2-ES was of 45.5%, and the CSIRO-Mk3.6 was of 20.5%. These results show that in the future climate the HadGEM2-ES will represent greater convection/cloudiness over the study area, which would also intensify the lightning incidence over São Paulo.
Through the analysis of these indices, it was possible to observe the preponderance of CSIRO-Mk3.6 in relation to HadGEM2-ES for the proximity of the reanalysis data in the simulations of the omega variable.
In the face of the evaluation of the systematic errors of the models, it was essential to correct them before generating future projections as such. Therefore, the Figure 3 presents the results obtained by applying the bias correction method. In this figure, the comparison between the simulated and corrected observed data (reanalysis) of the parameters under study is exposed. The statistical method of model correction only removes the bias, without making changes in the trend of the time series of the model. This fact is most clearly evidenced in
Figure 3. Correction of bias of the HadGEM2-ES (a, c) and CSIRO-Mk3.6 (b, d) models for the TSM (˚C) of the South Atlantic Ocean (Lat.: 57˚S a 61˚S e Long.: 50˚W a 46˚W) e ômega (Pa∙s−1).
Table 1. Measurements of the simulations of the HadGEM2-ES and CSIRO-Mk3.6 models for the South Atlantic Ocean SST fields (Lat.: 57˚S/1˚S and Lon.: 50˚W/46˚W) and Omega (Pa∙s−1). The units of the error measures are: bias in ˚C; and RMSE in percentage.
the SST of the South Atlantic Ocean, simulated by HadGEM2-ES model (Figure 3(a)), which presented the highest error rates, as previously mentioned. For this case, it was verified that the model (red line) could represent the tendency of the observed data (black line), however, it presented high bias. When applying the method of correction of bias (blue line), it is noticed that there was high degree of bias reduction.
For the omega variable, although with small differences, it was observed that the CSIRO-Mk3.6 represented more effectively the observed data (Figure 3(d)) that the HadGEM2-ES (Figure 3(c)), which resulted in the best performance of the CSIRO-Mk3.6 bias correction method in relation to the HadGEM2-ES model. A similar result, also using the proposed bias correction method was obtained by Lima, et al. (2017), in a study that evaluated the solar irradiance estimated by the BRAMS model for Northern Brazil.
In general, the HadGEM2-ES and CSIRO-Mk3.6 models tend to overestimate SST in the South Atlantic Ocean, and tend to underestimate Omega, while the applied post-processing statistical technique approximates these simulated values of observed data. Thus, it is verified that the statistical technique of bias correction applied to approximate the data of the simulated time series to the observed values was effective, since it did not change the profile of the real data of the model, only reduced the bias of the data.
Figure 4 presents the results of future projections of the incidence of discharges to the State of São Paulo, through the ensemble between the aforementioned climatic models. In the analysis of Figure 4(a), comparing the entire lightning anomaly data series, considering the observed and the simulated period, it is observed that there is a change in the pattern, that is, in the first 10 years of the series, between 1999 and 2009, the deviations of lightning incidence were around −0.5, and as of 2010 it is observed that most of the deviations are positive, with value around 1. Another important aspect to be highlighted in the analysis of the time series of the lightning incidence on the State (still in Figure 4(a)), is that a cycle seems to occur between maxima and minima, from observed data to the end of future projections, however, to generate statistically significant results, it would be necessary to widen the data series, both observed and future projections.
Figure 4. Climatic projections of atmospheric discharges to the State of São Paulo, considering: (a) Anomaly; (b) Frequency (in percentage) of the events above and below.
To determine the occurrence of these deviations, Figure 4(b) shows the percentage of these deviations, realized by means of a rule of three simple ones. Note that during the series from 1999 to 2014 most of the events occurred for negative anomalies, with a value of 61.1%, while the positive deviations were 38.9%, mainly concentrated between the years 2010 and 2014. For the first period of the future projections, between 2017 and 2032 intermediate-low emissions scenario (RCP4.5) presents a preponderance of 100% of the deviations above the average. The high emissions scenario (RCP8.5) also shows a higher percentage of above-average events, with a value of 81.25%, while below-average events occur with a percentage of 18.8%.
In the last analyzed period, covering the years 2033 to 2048, the RCP4.5 scenario presents most of the above average events, with a percentage of 93.7% and 6.3% below the mean. Similarly, in the RCP4.5 scenario for this period, RCP8.5 shows a greater percentage of deviations above the mean, with a value of 93.3%, while negative deviations occur around 6.7%. These results reveal that, in general, a percentage increase is expected in the occurrence of above average lightning events in the future climate in both emission scenarios.
These results suggest that lightning are more susceptible to extreme weather events. Results similar to this, however for other meteorological variables was described by  . The author explains that, as these extreme events are associated with natural climate variability, there is evidence that they may be even more intense and frequent in a warmer future climate, as evidenced in the fifth IPCC report (AR5), which an increase in the global temperature was detected, of the order of 0.9˚C from 1850. In this report, the IPCC AR5 attributed global warming to human activities. Therefore, it is demonstrated the importance of the climatic predictability of lightning, aiming at the prevention of the impacts caused by this phenomenon.
The present work deals with the study of the future projections of CG lightning to the State of São Paulo, based on the multiple regression technique and using the global climate models HadGEM2-ES and CSIRO-Mk3.6. The normalization of the data was performed for a unit value, admitting the mean as zero, in order to measure the individual weight of each variable in the regression analysis, and the anomalies of the incidence of the phenomenon in the future projections.
Due to some systematic errors of the models, bias correction was performed before the use of the data of the same, aiming to obtain more satisfactory results. Through the applying of the regression technique, a multiple R value equal to 0.84 was observed, revealing the expressive degree of the relationship between the variables under study and the lightning in the State of São Paulo, thus enabling the elaboration of the climatic projections of the lightning incidence.
In the future projections, when comparing the deviations of the present climate and the future climate, it was observed that in the current climate most of the events occurred below the average, that is, they presented negative anomalies of the lightning incidence. However, in the future climate the projections indicate the occurrence of above average anomalies in most events, both in the low emission scenario and in the high emission scenario.
The authors thank the Pos-Graduate Program in Earth System Science at the National Institute for Space Research and the financial support from the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), process 2013/09557-8.