Climate is one of the main elements of the natural environment. Temperature has a direct impact on atmospheric stability, evaporation, precipitation, and many other conditions of life  . Temperature change affects living conditions, agriculture, industry, tourism and engineering designs. Atmospheric temperature is needed for planning and forecasting both in the average and the variation components of temperature in order to prevent hazards to life as well as finances  .
Based on the long-term trends study of maximum, minimum and mean annual air temperature, e.g., in the northwest Himalayan region during the twentieth century, increasing trends are seen both in the mean and the diurnal range of temperature. The daily maximum temperatures have increased more rapidly than the decrease in the low temperatures in the last century resulting in a risen mean temperature of about 1.6˚C  . An extrapolation for the next century would give an increase about 3˚C to 5˚C in the global mean annual temperature  . A more exact forward prediction approach may give different result. It is difficult, however, to analyze slowly-changing trends in the outside temperature as they are buried within large-amplitude, harmonic, cyclic variations as well as stochastic and/or chaotic changes.
The observed, outside air temperature, , includes both seasonal components represented by t (days of the year) and (the hours in day t). When tabulated, is a matrix with t rows (the total number of days) and columns (24, the number of hours in a day). The outside air temperature has a regular, seasonal variation for the daily average temperature during every year, (more exactly, every four years as a true time period) and an hourly variation during any given day for the hourly mean temperature. These two components are regular and periodic in nature caused by the earth movement in the solar system. Other regular changes superimposed to that of the known movement of the Earth may also be present, such as caused by the heat balance of the globe by industrialization.
The goal of the paper is to separate the observed outside air temperature into variation caused by the Earth’s movement, plus any other quasi-stationary thermodynamic effects, and random variation caused by stochastic and/or chaotic, local environmental changes. It’s necessary to separate the hourly temperate variations from that of the seasonal first. For describing the seasonal temperature variation, the daily average temperature must be defined. The may be defined as the integral mean value:
In (1) t denotes days, is the hours in a day, and . If hourly average temperature is available, (1) gives the accurate value for the daily average temperature meaning the thermodynamic energy of the air. If the daily mean temperature is obtained from weather service, the data may be given pre-calculated from the Standard Model that provides the average of the daily maximum and minimum air temperatures, . The Standard Model, reviewed by Bilbao et al.  , may not be accurate if the temperature change is not symmetrical between day and night  . Nevertheless, the convenience in using the Standard Model instead of using the hourly temperature data for every day and using the expression (1) may overwhelm the concerns in the accuracy of the mean, daily temperature.
Two components of the air temperature are distinguished in the present paper for modeling daily mean air temperature variation, , with time. The first component, , describes a regular trend, expressed by functions of time and constants unchanged with time over which the model is defined. The regular trend is defined as stationary for a long period of time, characteristic to a given physical location governed by deterministic causes such as the Earth’s movement in the solar system. The second component, , is a random, stochastic variation around the regular trend. The component is caused by the stochastic and/or chaotic process in the atmosphere, defined as difference between the observed outside air temperature, , and the temperature from the stationary trend model as . The daily mean value of the outside temperature at any given day is the sum of the regular trend component, , and a stochastic variation part, :
Note that the stochastic component is stationary and irrespective of the seasonal variation, a simplification for model formulation. However, the stochastic temperature variation in some part of the year may be more disturbed than in another, raising the possibility for improvement of the assumption used in the current work, a task left for the interested reader.
The analytic function for must be the best fit to the measured outside temperature data for a given location. The concept of Fourier’s series approximation  is employed for constructing a model for a mainly periodic ambient air temperature, . Joseph Fourier, a French mathematician and physicist introduced in 1807 the approximation of any function over a finite interval with an infinite sum of sine and cosine functions as such that , where are unknown constants. An equivalent formulation may be used as for brevity. Instead of an infinite series, is used for the approximation of a mainly periodic function, eliminating the terms multiplied by the insignificant amplitudes .
Applying the concept for with a mean temperature, , and a harmonic variation component, , where is the amplitude of the harmonic variation component of . The amplitude, , may be a linear function of time in some models, .
There are various choices to model the component, listed as M1 through M5. The M1-type model is a linear function. It has the least coefficients and can be used to describe the yearly mean temperature change. However, it does not have the ability to reflect any periodic temperature variation. The M2-type model is the general Fourier series function. It assumes that the yearly mean temperature and amplitudes for the pre-selected, finite number of frequencies are constant. It might be accurate for a short period of time, such as one year. The problem with the M2-type model is that it cannot reflect the long-term average, the maximum and the minimum temperature changes with time as in Bhutiyani’s study  . The M3-type model is an updated function from the M2-type model. It has the variable yearly mean temperature with time. However, it still does not have the ability to reflect the maximum and minimum temperature changes with time. The M4-type model is an improved form over the M2-type model in terms of allowing the variation of maximum and minimum temperatures, but still using the constant yearly mean temperature. The M5-type model combines the advantages of both the M3-type and M4-type of models.
Therefore, the five different models tested are as follows:
M1. Variable mean temperature
M2. Constant mean temperature and constant amplitude series:
M3. Variable mean temperature and constant amplitude series:
M4. Constant mean temperature and variable amplitude series:
M5. Variable mean temperature and variable amplitude series:
The M1-type model is used for comparison with other models for the yearly mean temperature variation evaluation. Bhutiyani  studied the average, maximum and minimum temperature trends for 100 years and found them all changing with time. Therefore, the M2-type model with a stationary, constant mean temperature and constant amplitudes is inferior for practical application and not used for comparisons. The M4-type model is not recommended nor studied for brevity as the M5-type model gives better result for the same effort. Therefore, for comparison purpose, only the M1, M3 and M5 model types are used in model testing.
The first task is to depress the stochastic temperature variation in order to find the statistically most significant trend for a base stationary temperature model. Second, the stochastic or chaotic deviations must be defined to match the observed temperature. Therefore, the designers or analysts may conduct their studies or design safely without missing the expected, maximum or minimum temperature values for the study time period. Figure 1 shows the components of establishing an analytical temperature model.
Figure 1. Components of establishing an analytical temperature model.
2. Measured Temperature Data
Daily mean temperature measurements are used in the study from a middle-west location in North America for 20 years. The data, , are taken for 7305 days from 04/01/1996 to 03/31/2016, downloaded from https://www.wunderground.com, and plotted in Figures 2(a). Figure 2(b) and Figure 2(c) are enlargement of Figure 2(a) for year 1 and year 20, respectively. Although the regular trend, , is to be discussed in Point 3 of the paper, it is also shown from the M5-type model in Figure 2(a). Figure 2(b) and Figure 2(c) for illustrating the model concept. The daily mean temperature data, , is taken from the Standard Model method in the study as a practical compromise. Three different ways of using the measured data are tried for comparison and to find the best way of data processing.
2.1. Single-Year Temperature Cycle Evaluation
In the first usage of the data, , the measured mean daily temperatures for 20 years are divided into twenty sets for single year from 1 to 20 to be able to analyze the model validity for model type M1 and M3. Individual yearly data, , , are grouped for 15 regular years and for 5 leap years.
2.2. Four-Year Temperature Cycle Evaluation
In the second usage of the data, , the 20 years measured mean daily temperatures are divided into 4-year period sets, giving 5 groups as 1), 2), 3), 4) and 5). The justification of employing four years as the true solar time period is that the yearly time period for regular years is distorted by the deficiency of 0.25 days while the time period in leap years is longer by 0.75 days and affecting the averaging. The 4-year time sequence of 1461 days is considered as the repeating time period for the stationary temperature component in the model. Therefore, properties of temperature (average, etc.) also must be considered distinguished when evaluating for the yearly time period.
1) For year 1 - 4:
2) For year 5 - 8:
3) For year 9 - 12:
(a) (b) (c)
4) For year 13 - 16:
5) For year 17 - 20:
2.3. Continuous, Repeating Four-Year Temperature Cycles Evaluation
In the third usage of the data, , the total 20 years 7,305 data, , , are used for testing and establishing properties of in model type M4 and M5.
3. Determination of TR
To determine the model coefficients, the obvious choice is to use the Least-squares (LSQ) fit method. The LSQ method optimally fits the data to a given function with unknown constant parameters in such a way that the root-mean-square of the error between model and measured data is minimalized. The expected, fitted equations represent the significant, regular temperature trend, . Determining the significant part of the data out of a noisy observation may also be done by filtering or neural network. The advantage of using the LSQ method is to be able to define function in advance, whereas signal processing does not give an analytical form of such a function  .
For supporting the ways of using measured data, different fitting function may be defined with a number of unknown parameters to be determined by the LSQ fitting algorithm. Five different fitting functions may be considered for M1, M3 and M5 model types as follows.
3.1. TR Function for the Entire 20 Years for M1-Type Model
The LSQ linear function is:
3.2. TR Function for M3-Type Model
For single year data in the M2-type model, a regular year of 365 days and a leap year of 366 days must be distinguished. The LSQ function for regular year is written as:
where ; for , ; ;
, all fixed frequency components. Unknown parameters are Tm, c, A1 through A15, and b1 through b15.
The LSQ function for leap year is derived from (14) by changing 365 days to 366 days as:
where ; for , ; ;
, all fixed frequency components. Unknown parameters are Tm, c, A1 through A15, and b1 through b15.
The LSQ function for 4 years data and 20 years data will add the four-year and two-year period frequencies, also will change the one-year period to 365.25 days, the function is written as:
where ; for , ; ;
, all fixed frequency components. Unknown parameters are Tm, c, A1 through A17 and b1 through b17.
3.3. TR Function for M5 Type Model
With the assumption that the amplitudes may also vary with time, a modified LSQ function over (16) is established as:
where ; for , ; ;
, all fixed frequency components. Unknown parameters are Tm, c, through , through , and b1 through b17.
4. Evaluation of the Statistically Most Significant, Mean Temperature Trends
The LSQ fitting method is used to determine the mean temperature trend, , of the measured data, . The LSQ method provides the statistically most significant result for as a regular, deterministic trend, depressing the random variation component of temperature around with assumed, normal distribution as a noise due to stochastic or chaotic causes.
First, the best LSQ fit is determined on all single year data separately. The parameters of function (14) are applied for the regular years, and (15) is applied for the leap years. The fitted results are shown in Figure 3(a), Figure 3(b) and Figure 3(c). The mean, maximum and minimum values from the fitted result are shown in Figure 4.
Second, the best LSQ fit is found using function (16) for five 4-year temperature data sets separately. The fitted results are shown in Figure 5(a), Figure 5(b) and Figure 5(c) the mean, maximum and minimum values from the fitted
(a) (b) (c)
Figure 4. The mean, maximum and minimum values for LSQ fitting results for each year.
result are shown in Figure 6. Parameters for the fitted functions , are listed in Tables 1-5, respectively.
Third, the best LSQ fit is determined using the M3-type function (16) for all 20 years data, , together. The fitted function is depicted in Figure 7(a), Figure 7(b) and Figure 7(c). The parameters of the fitted function are listed in Table 6. The best LSQ fit is also determined using the improved, M5-type function (17), applied also for all 20 years data, . The fitted results are depicted in Figure 8(a), Figure 8(b) and Figure 8(c); and the parameters of the fitted function are listed in Table 7.
5. Evaluation of the Stochastic Variation and Final Analytical Temperature Model
The stochastic variation must be defined by subtracting the statistic periodic function from the measured data. First, the stochastic variation, , is defined by the difference between deterministic function result, and measured data as:
The data of (18) is depicted in Figure 9(a), Figure 9(b) and Figure 9(c). The density analysis on the (18) data and the MATLAB normal distribution fit of the data is shown in Figure 10. From the fit, a mean value , and standard deviation of are obtained, where ,
and n is the number of days, t. Using and
, a normally-distributed random noise series is generated to represent . The function is used in Matlab that
(a) (b) (c)
Figure 6. The mean, maximum and minimum values for LSQ fitting results for 4-year sets.
Table 1. Parameters for year 1 - 4 data, the LSQ fitted function (16) result, .
Table 2. Parameters for year 5 - 8 data, the LSQ fitted function (16) result, .
Table 3. Parameters for year 9 - 12 data, the LSQ fitted function (16) result, .
Table 4. Parameters for year 13 - 16 data, the LSQ fitted function (16) result, .
Table 5. Parameters for year 17 - 20 data, the LSQ fitted function (16) result, .
Table 6. Parameters for year 1 - 20 data, the LSQ fitted function (16) result.
Table 7. Parameters for year 1 - 20 data, the LSQ fitted function (17) result.
generates t number of random values from the normal distribution with a mean value , and standard deviation value . Applying it to , it gives:
For , (19) is added to (17):
However, (20) is not an analytical function since it includes an algorithm. To overcome this and understanding that the daily variation for random causes is a sample of , the maximum and minimum values can be generated with a 99 per cent confidence by a fluctuating temperature with a 2-day cycle time:
Substituting the preferred model in (17), the final analytical temperature model, is:
Comparison between simulated temperature, , from (20) and measured data, , is show in Figure 11(a), Figure 11(b) and Figure 11(c). Using an uncertainty interval of in the example, safe limits of maximum and minimum temperature are show in Figure 12(a), Figure 12(b) and Figure 12(c). A smaller or wider uncertainty interval may also be selected based on the risk to be taken regarding the missing number of days to be in the expected temperature range in the outside air temperature.
(a) (b) (c)
(a) (b) (c)
(a) (b) (c)
Figure 10. Data Equation (18) result density analysis and normal distribution fit.
6. Discussion of the Results
6.1. Discussion of the Results for the Air Temperature Variation as a Regular Trend
For comparison purposes, the linear regression function (13) for the entire 20-year data set is applied to various, fitted model results; or original, unprocessed data. The fitted model results for the shorter time periods represent the significant part of the repeated trends whereas the noise is intentionally depressed in the LSQ norm sense. Therefore, a linear regression evaluation for the 20-year long time period is assumed to evaluate the most significant, time-average of the linear change in the magnitude of . Common expectation dictates that the linear regression evaluation for the 20-year long time period of the original data may provide an un-biased result for the linear change in the magnitude of . The following studied are completed for fitting a longer-time linear regression to model results, , of shorter time periods:
a) Yearly mean temperatures, , (for 15 regular and 5 leap years) evaluated from fitted function to single-years data, with M1-type model;
b) Yearly mean temperatures, , evaluated from fitted function data to 4-year data sets, with M1-type model;
c) Yearly mean temperatures, , from fitted function to continuous 20 years data, with M1-type model;
d) 20 years measured data, , used unprocessed.
The results from the evaluation are listed in Table 8. As shown, the linear trends for the mean value, , and the slope, c, are very similar for cases a) and b) with b) giving a slightly lower RMS error variation than those of a). The reason lays in the fact that the true time period for the variation of the outside
(a) (b) (c)
(a) (b) (c)
Table 8. Linear temperature variation trends evaluated from different models and processes.
temperature is 365.25 days, giving a rounding error with a weight of −0.25/4 days for the regular years and of +0.75/4 day for the leap year in the single-year model fits. The model fit to the 4-year time periods does not have the rounding error problem and, therefore, a smoother fit is expected. Indeed, the RMS error of 0.39 is lower for case b) than value of 0.603 for case a).
The results in case c) is identical to those of case a) for obvious reason of using the same linear regression repeated two times sequentially, the second time obtaining zero RMS value. The result for case d) is very different from those in cases a) through c). Why does a 20-year long data set gives an average decrease of temperature change negative that would translate to “global cooling” as opposed to “global warming” for the example location? The answer is the wrong-type function choice for the most significant variation trend, , being a linear function with time. This exercise highlights the importance of the selection for the shape of . If a form as inadequate as a linear function is selected for for estimating the periodic nature of the outside temperature, the coefficients of the function cannot be trusted even for the general slope, as demonstrated with case d).
Two more choices are also studied for comparison for evaluating the linear trend which the models already include as the mean value, , and the slope, c. Due to these built-in components, no additional, linear regression fit is needed for determining the values of and c:
e) 20 years measured data, , unprocessed; M3-type model;
f) 20 years measured data, , unprocessed; M5-type model.
The results from the evaluation are listed in Table 8. As shown, the linear trends for the mean value,
, and the slope,
Re-fitting another linear regression model to the model output data, , from the M5-type model in case f) for re-capturing the mean value, , and the slope, c, does not give back the same values as those built in the best-fit model, shown in case g) in Table 8. The reason for the mismatch is the inadequate function type of a linear variation attempting to evaluate the more complex interactions of periodic and monotonic components in . However, the general proof of this mismatch is left to the reader.
6.2. Discussion of the Results for the Random Component of Air Temperature Variation
The air temperature model component for the description of the random part due to stochastic or chaotic causes is simplified to be time-independent. The stochastic component, , satisfies the zero mean value and zero slope with time. No attempt has been made to vary the magnitude of randomness with the seasons. Refinement for this component is left for the interested reader. The observed histogram for the example shows a close-to normal distribution, allowing to estimate the error limit for daily mean temperature fluctuations from the standard deviation, σ, obtained from model identification.
6.3. Discussion of the Complete Temperature Model for Daily Air Temperature Variation
The complete temperature model is given in (21) and (22). The model predicts the daily average temperature variation with time as well as the expected the maximum and minimum temperatures due to stochastic process components. The comparison between measured data and model prediction with ±3σ amplitude around the function from the M5-type model in (22) is illustrated in Figure 12(a). The graphs in the zoom-in Figure 12(b) and Figure 12(c) convincingly show that the measured values are almost always remain between the modeled maximum and minimum values.
w Analytical functional forms and their numerical algorithms are presented for representing the measured time-variable outside air temperature, for engineering design and analysis of the human environment. The algorithms for and are easy to use for processing the available data sets, , at any physical location from the weather service, typically using several tens of thousands of measured values. In the final functional form of the outside air temperature function, , only a few dozens of constants are needed.
w The final analytical temperature, , can be used not only to represent the historical temperature data but also to predict the future temperature variations at any given location from which the input data is used from measurements. The upper and lower boundaries may be used for safe temperature prediction.
w The regular component of temperature change with time, , in the M5-type model is described by a linear function plus a time-variable Fourier series to represent the long term linear change both in mean temperature and amplitude. Only 53 constants are needed, obtainable from the presented method, to represent the outside mean air temperature at any day of the year as long as need over decades of time.
w The confidence interval for the stochastic variation may be selected by the user via the multiplication factor of the standard deviation of the model match between measured, time-variable outside air temperature, and the regular component in the analytical mode, .
w The stochastic component used in the final model, , is stable and stationary. The variability of the stochastic component over the season of the year may be considered in a future study, but presently is omitted for simplicity.
w The study shows that the prediction of temperature trends such as for cooling or warming in the future can only be evaluated using an M5-type model fit to the data. The trend-setting components, such as the annual change of the mean temperature or the variation of the amplitude change with time of the periodic components can only be evaluated with a model which has these components built into the structure of the model.
w The minimum, adequate time period for building an outside air temperature model is 4 years, the periodic cycle time of the solar environment. It is recommended to use a multiple of the 4-year periods for model-building (e.g., the 5 × 4 = 20 years period in present study) preferably for as long a time period as data are available.
A research grant from National Institute of Occupational Safety and Health (NIOSH) is gratefully recognized. The research was thankfully supported by the GINOP-2.3.2-15-2016-00010 “Development of enhanced engineering methods with the aim at utilization of subterranean energy resources” project of the Research Institute of Applied Earth Sciences of the University of Miskolc in the framework of the Széchenyi 2020 Plan, funded by the European Union, co-financed by the European Structural and Investment Funds.