Meteorological data is useful for varied applications across many socio-economic sectors. They can be used in weather and climate forecasting, disaster risk reduction and water resources management, landscape planning, and many others. However, the availability of meteorological data requires a good network of manual observation stations at the surface, upper-air, and on the ocean as well as other support systems which facilitate the collection, recording, processing, archiving, and other data management operations. In sub-Saharan Africa, such networks are limited due to low investment and capacity (Dupar et al., 2021). Such situations constrain the development, provision, and maintenance of quality climate services and their application.
In Kenya and the East African region, meteorological data is a very important resource considering that weather and climate variability are driven by several global influences including the El Niño and La Niña phenomena in the tropical Pacific, the Congo air mass, the Inter-Tropical Convergence Zone, the Indian Ocean temperatures and local climatic-factors such as the lake circulation effects among others (Marchant et al., 2007; Berhane & Zaitchik, 2014) which require regular monitoring and evaluation. The region has had its fair share of severe weather and extreme climate impacts such as flooding, hailstorms, droughts which have caused loss of human life, and other adverse socio-economic and environmental impacts. To monitor and evaluate the weather patterns in Kenya, the National Meteorological Service (KMD) operates and controls 40 Synoptic Stations spread across the country (Figure 1, https://meteo.go.ke/) and about 600 rain gauge stations operated by private observers. These stations are operated manually by KMD personnel on a continuous basis. Generally, the station network is sparse compared to the World Meteorological Organization’s (WMO)
Figure 1. Map of Kenya showing location of some of the Synoptic Stations (ringed symbols) operated by KMD. The study location is indicated in red. Source: https://meteo.go.ke/ .
recommended practice regarding the spacing between neighboring stations of 20 km. Although the KMD has in the last few years installed a number of Automatic Weather Stations (AWS), the data is not yet fully integrated into meteorological applications or shared globally through the Global Telecommunication System (GTS) of the World Meteorological Organisation (WMO) as required. This is mainly because the quality of the AWS datasets is not yet well known.
Most studies have carried out inter-comparisons of meteorological data but focused more on satellite based weather parameters and gridded data (Ayasha, 2021; Rivoire et al., 2021; Schumacher et al., 2020; Ford & Quiring, 2019; Zeng et al., 2018). These have largely ignored the significant biases that can be addressed by data from Automatic weather stations relative to surface observation stations.
The key advantage of comparisons between ground station observations and datasets from AWSs is that the datasets can provide more coverage in time and space and hence a better description of the weather and climate of a given area. Further, since ground station observations may have some uncertainties (especially when some data are missing) comparing with AWS data may bridge the gap and hence improve its quality and use. This study, therefore, provides the means to enhance the quality and quantity of available observational datasets through the calibration of the AWS data leading to improvements in early warning services. Recently in 2019, the WMO’s HIGH Impact Weather LAke SYstem (HIGHWAY) project funded by the United Kingdom (UK) Department for International Development (DFID) have been promoting early warning systems (EWS) to improve resilience to weather and climate extremes for the local communities around the Lake Victoria region by exploring the potential of using AWS data sets in Kenya.
Within the above context, this study aims at 1) Inter-comparison of the ground based observational data sets (rainfall, temperature, wind speed and direction, surface pressure, and relative humidity) from a KMD synoptic weather station with data from the Trans-African Hydro-Meteorological Observatory (TAHMO) AWS and the 3D-Printed Automatic Weather Station (3D-PAWS), co-located at KMD; 2) Carrying out inter-comparison of data between the two AWSs (TAHMO and 3D-PAWS).
This study is organized as follows. Section 2 describes the data sets used in the study including details of the study area. In section 3 statistical methods used in the analysis and comparison of the different data sets are presented. The results from the analyses are given in Section 4 while the discussion of the results is in Section 5.
2. Data and Study Area
2.1. Study Area Description
The datasets used in this study are from the Dagoretti Corner Meteorological Station which is located in Nairobi Kenya. All three stations are located in separate sites within the Meteorological Station compound. The manual synoptic weather station is located at Lon. 36.75˚E and Lat. 1.3˚S. The TAHMO AWS is at Lon. 36.7602˚E and Lat. 1.3018389˚S. The 3D-PAWS AWS is located at Lon. 36.7601˚E and Lat. 1.30172˚S. The three stations are located at an average altitude of 1790 m above mean sea level. The geographic positions of the AWSs and the synoptic weather stations clearly show that these stations are more or less collocated.
The weather regime of the study area is semi-humid tropical with average annual rainfall of 1060 mm mean annual temperature of 17.8˚C, and maximum temperatures reaching about 25.5˚C. The rainfall distribution is bimodal occurring in two seasons: March-May (long rains) and October-December (short rains) season. The January-February period is generally dry while June-September is cool and dry with occasional rains.
The data sets used in this study are from the manual synoptic weather station, TAHMO AWS and the 3D-PAWS at the Dagoretti Corner Meteorological Station in Nairobi. The manual synoptic station data sets of daily rainfall, daily temperature at 06Z and 12Z, daily minimum and maximum temperature, daily relative humidity, hourly wind speed, hourly wind direction, surface pressure and solar radiation were acquired from the National Climate Database at KMD. Thus the data observations from the manual synoptic weather station are on daily and hourly intervals. The TAHMO observations were obtained from KMD and are at 5 minutes intervals while 3D-PAWS observations at 1-minute intervals were obtained from the University Corporation for Atmospheric Research (UCAR).
Quality controls and checks were carried out on the data from the three sources and examined for consistency. Subsequently, the parameters selected for the comparative analysis were Rainfall, Temperature, Pressure, Relative Humidity, Solar Radiation, Wind speed and Direction. The data used in the analysis covered the period 2016 to 2018 and part of 2019.
2.3. Data Structure
Data sets for all the parameters from the manual station were prepared in a single “.csv” file with daily values in a cross tab format while the TAHMO data were organized in five-minute values in a list in multiple files summarized in one “.csv” file per day. The 3D-PAWS data in ASCII format were stored in separate files for each sensor (e.g., humidity and temperature, rainfall, wind direction and speed, and surface pressure) with a resolution of 1-min records. All the AWS data from 3D-PAWS and TAHMO were processed to match the temporal resolution of the manual station by aggregating the minute and hourly data into daily values using the R-statistical software (R Core Team, 2017) and Excel. To enable the analysis, the specific parameters were prepared as follows.
To match the daily observation period (24 hours) of the manual synoptic weather station, the rainfall data from TAHMO and 3D-PAWS were accumulated to daily values starting from 0600Z of the current day to 0600Z of the next day and cast back by one day.
2.5. Surface Pressure
The observed surface pressure from the manual synoptic station is at hourly timescale and therefore the AWSs surface pressure data from TAHMO and 3D-PAWS were processed to extract the observation from the top of each hour to match the observation time of the manual synoptic station.
2.6. Relative Humidity
The manual synoptic weather station data were at 0600Z and 1200Z and therefore AWSs observations from TAHMO and 3D-PAWS were matched for the 0600Z and 1200Z times to match the observations from the manual synoptic weather station.
2.7. Solar Radiation
The hourly manual synoptic weather station daily total radiation data were not considered in the analysis since they could not be matched with the observations from the 1-minute and 5-minute temporal resolutions from the 3D-PAWS and TAHMO stations.
2.8. Wind Speed
The wind speed data for the manual synoptic weather station was at the hourly time stamp and hence the AWSs data from TAHMO and 3DPAWS were matched at the top of each hour with the manual synoptic station observations. However, due to its structure, the 3D-PAWS data were aggregated to average the 10-minute observations before the top of the hour so as to match the procedure used for the manual observations.
2.9. Wind Direction
Similar to other hourly data sets the wind direction data from the manual synoptic weather station were matched with the AWSs data from TAHMO and 3D-PAWS at the top of the hour. The 3D-PAWS data were also processed to match the manual observations using similar procedure as that of the wind speed.
2.10. Temperature (Dry Bulb, Tmax, Tmin)
The dry bulb temperature data from the manual synoptic weather station were at 0600Z and 1200Z where maximum (Tmax) and minimum (Tmin) temperature data were extracted as single values for each day. To match, the AWSs temperature observations from TAHMO and 3D-PAWS were extracted at 0600Z and 1200Z on each day of the inter-comparison. Similar to solar radiation, the AWSs Tmax and Tmin were extracted from the 1-minute and 5-minute temperature values from 3D-PAWS and TAHMO, respectively for any given day. However, the Tmax and Tmin values for the TAHMO and 3D-PAWS were only extracted and matched with the manual data if a complete record was available for a given day.
2.11. Data Processing, Quality Checks and Controls
Missing data were not indicated after the initial extraction of the different parameters from the 3 data sources, leading to data sets of varying lengths. To correct this, identification and insertion of gaps were done in all the parameters until they all had equal lengths. However, the rainfall data from the manual synoptic station was in a cross tab table format (Figure 2) which was converted to a list to enable inter-comparison with the AWS datasets before the gaps were inserted. The gaps that were identified in all the datasets were filled using “NA”.
Further, the 3D-PAWS data files were in “.xml” format which was converted to the “.csv” format prior to extraction and aggregation. In addition, the 3D-PAWS data posed some challenges in aggregating and selecting the right values since they contained multiple values for each top of the hours. This led to
Figure 2. Sample of rainfall data from the manual station in a crossbar format.
the 3D-PAWS data being reprocessed into consistent, continuous ASCII formatted files that included missing records and bad data flags. Once this was achieved, it was possible to consistently compare the reformatted 3D-PAWS data (see Figure 3) and TAHMO data files in similar formats. From this process, the gaps in the AWSs datasets should be considered to be potential sources of errors in comparison analysis and data checks and tests of meteorological observations.
2.12. Analysis Methods
To address the objectives of this study, we used several statistical methods to assess and carry out inter-comparison analyses. Graphical methods were used to visualize the data through line and scatter plots. To visually define a matching point between the different distributions of a data series from the three data sources, we checked on concurrent points that did not overlap. To gain better comparisons, each variable (e.g. rainfall) was plotted for all the three sources (TAHMO, 3D-PAWS and Manual) on one graph.
We used the simple Pearson correlation method to assess the strength of association between the variables from the three sources (Zou et al., 2003). The Pearson correlation method measures the linear correlation between two variables X and Y using the correlation coefficient (r) and is given by the equation:
where, rXY or r is the correlation coefficient, N is the number of observations, and ΣX is the sum of x scores (values), ΣY is the sum of y scores (values), while ΣXY is the sum of the products of x and y values. The values of r range between +1 and −1, with 1 showing that there is a perfect/positive linear correlation, 0 showing no linear correlation, and −1 showing there is a negative linear correlation. The higher the value of r is, the stronger the association between two variables.
We tested the statistical significance of the correlation coefficients (r) using the Shapiro-Wilk normality test (Emerson, 2015) and the Anderson-Darling normality test (Liebscher, 2016). Shapiro-Wilk’s method is based on the correlation between the data and the corresponding normal scores and is widely
Figure 3. Screen shot of reformatted 3D-PAWS data (for Temperature and RH).
used and recommended for normality tests because it provides better estimates compared to the Kilmogorov-Smirnov method. However, the Anderson-Darling method is efficient in analysis of samples with N > 5000 compared to the Shapiro Wilk method (N < 5000).
These normality tests are done with the assumptions that for a test that is not significant it satisfies the condition of the null hypothesis for normally distributed sample (p value > 0.05). Alternatively, if the test is significant, the distribution is considered to be non-normal. All correlations are considered to be significant at the 0.05 level.
To support the analysis of normality tests we used the graphical method to visualize the associations between the variables.
In particular, we used Q-Q (quantile-quantile) plots to assess how best the compared data set samples associated with the normal distribution. Overall, we summarized and dispalyed the coefficients from the comparisons using a correlation matrix.
Lastly, we used regression method to estimate the best line of fit from the correlations between the variables from the three data sources.
3.1. Pre-Analysis Results
The initial analysis of the 3 datasets for the period 2017-2018, indicated that there was good agreement between the temperatures readings of the manual station and TAHMO but comparatively lower agreement with the 3D-PAWS. Comparisons for rainfall and relative humidity were largely variable. One reason for this might be that the TAHMO AWS had a broken rain-gauge sensor, for a brief period between 2017 and June 2018.
3.2. Time Series Comparisons
Figure 4 displays the variability in the time series of the Tmax, Tmin and T06, RH06 for the manual station and 3DPAWS from 2017-2019. The graph shows that only few of the values of Tmax, Tmin, and T06, RH06 overlap for both stations. This was apparently due to battery failure at night and thus more observations for temperature and RH at 12Z from the 3D-PAWS station which was mostly operational during daytime hours.
Figure 5 compares the time series of minimum temperatures and maximum temperatures for 3 data sources for the periods with consistent data. The temporal patterns indicated in the three data sets show reasonable agreements although the 3D-PAWS minimum temperature indicates a high bias of about 10˚C higher compared to the TAHMO and manual station series. As earlier mentioned, the battery on the 3D-PAWS was not working properly during the non-daylight hours of the inter-comparison period. This problem calls for continuous monitoring and maintenance of AWS and all other stations to ensure measurements continue. Due to this, the reprocessing and computing of the temperature for the 3D-PAWS significantly reduced the number of matching records but also produced fair comparison between the stations. Similarly, the patterns of the maximum temperature indicate some disparities in some periods where all the three data sets are observed. For instance, there are considerable agreements for the maximum temperature between the manual station and 3D-PAWS save for the manual temperatures indicating some spikes and missing data. These results show that the consistent concurrence for the maximum temperature between 3D-PAWS and the manual station was because 3D-PAWS was working more efficiently during daytime hours relative to night-time hours.
Figure 4. Time series of Tmax/Tmin from KMD and 3D-PAWS (top panel), temperature and RH at 06 UTC for KMD and 3D-PAWS (middle panel), and temperature and RH at 12 UTC for KMD at 3D-PAWS.
Figure 5. Comparisons for (A) Daily minimum temperature (Dagoretti (red), TAHMO (blue), 3D-PAWS (green) within the period 2017/2018; (B) Daily maximum temperature between Dagoretti, TAHMO and 3D-PAWS; (C) Daily maximum temperature between Dagoretti and 3D-PAWS; (D) Between TAHMO and 3D-PAWS.
3.3. Correlations Results
Modestly higher correlations were observed for the minimum temperature between TAHMO and manual station (r = 0.65) and for the maximum temperature (r = 0.61 to r = 0.86) (Figure 6). The correlations for maximum temperature for TAHMO and 3D-PAWS were modest (r = 0.56).
The Shapiro Wilk normality test showed that the distributions of the minimum and maximum temperatures are not significantly different from normal distribution and hence normally distributed (p-value < 0.05). For example, the maximum temperature for the manual station had the best normal distribution pattern compared to the other data sets.
Most of the other meteorological variables from the 3 different stations indicated positive correlations. Strong correlations were indicated in the relative humidity at 1200Z between the manual and 3D-PAWS (r = 0.59) with lowest correlations being observed at 0600Z. The surface pressure and relative humidity displayed normal distribution compared to the other variables such as rainfall which is not normally distributed for all the 3 stations and can be described using non-linear distribution (Figure 7). Rainfall in Kenya has largely followed a log-normal distribution and other exponential distributions. Subsequently, surface pressure at the manual stations was highly correlated with the TAHMO (r = 0.67, p < 0.05) and 3D-PAWs (r = 0.65, p < 0.005) and between the two AWS datasets (Figure 9, p = 0.99). Similarly, several variables indicated normal distribution patterns which were statistically significant (<0.05, Table 1).
Figure 6. Correlations for daily minimum and maximum temperature between manual and TAHMO ((A) and (D)), manual and 3D-PAWS ((B) and (E)) and 3D-PAWS and TAHMO ((C) and (F)). The minimum and maximum temperatures are fitted with a linear regression line, while the gray envelope is the 95% confidence interval.
Figure 7. Q-Q plots for normality test for daily rainfall ((A) and (B)), surface pressure at 006Z ((C) and (D)), relative humidity at 006Z and at 12Z ((E) and (F)), wind direction and wind speed at 006Z between manual (Dagoretti) and 3D-PAWS (RH_3D_12Z) stations (G, H, I, and J).
The correlations for wind speed and direction for the manual, 3D-PAWS and TAHMO were fairly strong. To validate this, the wind rose between 3D-PAWS and the manual station indicated consistent and strongly NE winds at the manual station location (Figure 8). However, the wind speeds were higher at the manual station than at the AWSs possibly due to height differences. This may need to be further examined.
Some earlier results had indicated that there was a very strong positive correlation for surface pressure between manual and TAHMO (r = 0.67), between TAHMO and 3D-PAWS (r = 0.96) and between manual and 3D-PAWS (r = 0.65) (Table 2 and Figure 9).
Figure 8. Windrose for 3D-PAWS (left panel) and KMD (right panel)
Figure 9. Correlation for hourly surface pressure between manual station and TAHMO AWS observations (left) and between the TAHMO and 3D-PAWS (right) and fitted with a linear regression line at the 95% confidence interval.
Table 1. Shapiro-Wilk normality test for comparisons between manual station, TAHMO and 3D-PAWS meteorological variables.
Table 2. Summary of correlation coefficients for Dagoretti minimum and maximum temperatures between manual, TAHMO AWS and 3D-PAWS observations.
4. Discussion and Conclusions
Rainfall and other meteorological parameters such as temperature are invaluable for not only the monitoring and forecasting of the weather but also management of climate related disasters. However, sparseness of observation stations for measurements and collection of these data especially in sub-Saharan Africa and other developing countries is a major challenge. National Meteorological and Hydrological Services globally have been the main meteorological data collecting institutions. There is a need for enhancement of the observation network, through establishment of more automatic weather stations in most countries, as well as improved monitoring and prediction of weather and climate patterns.
Our study of manual and AWS data sets has demonstrated that the two modes of weather observations compare well and can therefore guide decision making. The analyses of meteorological data and comparisons between the manual station and AWSs revealed considerable agreements between most of the weather parameters inspite of the low correlations found between the rainfall and wind observations compared to the other variables (Table 2). This is in agreement with other findings that have shown that meteorological parameters measured from a ground station can compare relatively well with other observations from a reference source e.g. AWS (Dombrowski et al., 2021).
Whereas the surface pressure was highly correlated between the three stations, some studies have found uncertainties in the comparisons between on ground station pressure and the reference data, e.g. (Dombrowski et al., 2021). The high variability between some meteorological variables at different stations could be due to several factors such as instrument error, change of location, damages or lack of station maintenance (Ford et al., 2020).
Overall, comparison of the manual station data and the TAHMO and 3D-PAWS observations showed that there was potential for concurrences between the different variables even at some small spatial co-locations of the different measurement instruments (Table 2).
Despite the strong correlations between the different variables, anomalies were present when assessing some parameters such as wind speed and solar radiation due to complexities in aggregating such observations between the manual and automatic stations. This provides a challenge where such parameters may be required to complement monitoring and forecasting of the weather.
Based on our findings, the different meteorological parameters compared reasonably well between the three stations. The correlation coefficients between the parameters from the manual station and the AWSs were within acceptable levels and can be used as a basis for validation and application of the data in forecasting and other uses. There is high potential from the findings that observations from the 3D-PAWS and TAHMO stations compared well when both were in operation and relative to the manual station.
In conclusion, manual station datasets can be used alongside observations from AWS after adequate assessment of the quality and agreements between the data sets have been done.
In addition to the WMO HIGHWAY project for facilitating the two workshops that enabled the organisation and analysis of the datasets, we recognize the UK Aid (UK Met Office) support. We would also like to thank the Kenya Meteorological Department, Trans African Meteorological and Hydrological Observatories (TAHMO), and the University Corporation for Atmospheric Research/National Centre for Atmospheric Research (UCAR/NCAR) for providing the data that was used in this study.
Figure S1. (A): Correlation matrix for minimum and maximum temperature for 3D-PAWS, Dagoretti (manual) and TAHMO; (B): QQ plots for minimum temperature for (a) TAHMO (b) Manual (Dagoretti) and (c) 3D-PAWS and for maximum temperature for (d) TAHMO (e) Manual (Dagoretti) and (f) 3D-PAWS; (C): Comparisons of daily rainfall (a), surface pressure at 0600Z (b), relative humidity at 0600Z (c), relative humidity at 1200Z, wind direction at 006Z and wind speed at 006Z between manual (Dagoretti) station and 3D-PAWS; (D): Comparisons of correlations for daily rainfall, surface pressure at 006Z, relative humidity at 006Z, relative humidity at 12Z, wind direction and wind speed at 006Z between manual (Dagoretti) and 3D-PAWS (RH_3D_12Z) stations; (E): Correlation matrix for daily rainfall, surface pressure at 006Z, relative humidity at 006Z and at 12Z, wind direction and wind speed at 006Z between manual (Dagoretti) and 3D-PAWS (RH_3D_12Z) stations; (F): Correlation for hourly surface pressure between manual station and 3D-PAWS observations fitted with a linear regression line at the 95% confidence interval.
 Ayasha, N. (2021). A Comparison of Rainfall Estimation Using Himawari-8 Satellite Data In Different Indonesian Topographies. International Journal of Remote Sensing and Earth Sciences (IJReSES), 17, 189-200.
 Berhane, F., & Zaitchik, B. (2014). Modulation of Daily Precipitation over East Africa by the Madden-Julian Oscillation. Journal of Climate, 27, 6016-6034.
 Dombrowski, O., Hendricks Franssen, H. J., Brogi, C., &Bogena, H. R. (2021). Performance of the ATMOS41 All-In-One Weather Station for Weather Monitoring. Sensors, 21, Article No. 741.
 Dupar, M., Weingärtner, L., & Opitz-Stapleton, S. (2021). Investing for Sustainable Climate Services: Insights from African Experience.
 Ford, T. W., & Quiring, S. M. (2019). Comparison of Contemporary in situ, Model, and Satellite Remote Sensing Soil Moisture with a Focus on Drought Monitoring. Water Resources Research, 55, 1565-1582.
 Ford, T. W., Quiring, S. M., Zhao, C., Leasor, Z. T., & Landry, C. (2020). Triple Collocation Evaluation of In Situ Soil Moisture Observations from 1200+ Stations as part of the US National Soil Moisture Network. Journal of Hydrometeorology, 21, 2537-2549.
 Liebscher, E. (2016). Approximation of Distributions by Using the Anderson Darling Statistic. Communications in Statistics—Theory and Methods, 45, 6732-6745.
 Marchant, R., Mumbi, C., Behera, S., & Yamagata, T. (2007). The Indian Ocean dipole—The Unsung Driver of Climatic Variability in East Africa. African Journal of Ecology, 45, 4-16.
 Rivoire, P., Martius, O., & Naveau, P. (2021). A Comparison of Moderate and Extreme ERA-5 Daily Precipitation with Two Observational Data Sets. Earth and Space Science, 8, e2020EA001633.
 Schumacher, V., Justino, F., Fernández, A., Meseguer-Ruiz, O., Sarricolea, P., Comin, A. et al. (2020). Comparison between Observations and Gridded Data Sets over Complex Terrain in the Chilean Andes: Precipitation and Temperature. International Journal of Climatology, 40, 5266-5288.
 Zeng, Q., Wang, Y., Chen, L., Wang, Z., Zhu, H., & Li, B. (2018). Inter-Comparison and Evaluation of Remote Sensing Precipitation Products over China from 2005 to 2013. Remote Sensing, 10, Article No. 168.