Rainfall is an essential boundary condition for the design and operation of urban drainage systems. Compared with natural hydrology, detailed knowledge of the distribution of precipitation characteristics such as duration, depth, and intensity, is more vital for operating such systems due to small spatial-scale involvement and short reaction times from rainfall to runoff. In Mumbai, the occurrence of severe floods almost every year has resulted in considerable property, network, and infrastructure damage. Furthermore, the extreme rainfall event that occurred on 26 July 2005 demonstrates the high spatial variability of the rainfall event in different areas over a 24 h period  . Therefore, it is vital to represent the time series of rainfall amounts in the form of entities (rain events) that can be employed in specific applications  . Such individual rain events could also overcome the problem of a lack of long-term precipitation data at a specific temporal and spatial resolution   .
Based on a comprehensive review, the time series of rain gauge records is broken down into separate rain events by a defined dry period between rains referred to as the minimum inter-event time (MIT), which is often combined with minimum event depth   . Dunkerley  noted that MIT values ranged from 3 min to 24 h, with minimum event depths ranging from 0.1 mm to 13 mm. Additionally, statistical techniques, such as autocorrelation analysis   , and multifractal and self-organized criticality theories  have been used. Many researchers    have used the assumption that the probability density of inter-event times (IETs) can be adequately represented by an exponential distribution using the coefficient of variation (COV) method.
Therefore, the objective of this research was to study the spatial characterizes of individual events. Hence, to identify individual events at each station using MIT values calculated by the COV method. In addition to MIT values, numerous properties of individual events, such as total depth, total duration, mean rain rate, maximum rain rate, peak timing, previous inter-event time, a fraction of event depth until event maximum and fraction of intra-event rainless periods, can be computed. The interdependence between these properties and IETs can be studied to understand precipitation characteristics in various regions, and this interdependence has been analyzed in various countries by researchers (e.g., Blue Nile River, Ethiopia,  ; Italy,  ; Illinois,  ; the Czech Republic,  ; Malaysia,   ; Australian drylands,  ). In this study, we have used eight properties (used by the many researchers) to characterize rainfall.
The identified properties used for rainfall characterization can further be employed to cluster the rain gauges. Cluster analysis techniques have been used by researchers in various parts of the world including India, e.g., correlation analysis   , principal component analysis  , cluster analysis   , neural networks  and shared nearest neighbor (SNN) clustering  ; however, most of these studies have been conducted on a national and not a regional scale. Apart from the abovementioned techniques, self-organizing maps (SOMs) are a promising technique for cluster analysis   . In this study, cluster analysis involved the use of SOMs to cluster rain gauges in groups (regions) such that the comparability of rain gauges within a region is augmented while the similitude of those between regions is limited.
The research presented in this paper is motivated by the fact that, most of the studies on this region used data from two Indian Metrological Department (IMD) stations (Colaba and Santacruz) to investigate the formation and prediction of extreme events and design rainfall estimation, and that, until now, there have not been any reports related to MIT, rain event property analysis and rain gauge clustering using sub-hourly data. For example, Sherly et al.  proposed a structure for design precipitation estimation utilizing a multivariate semiparametric approach, while Sen et al.  used the IMD rainfall dataset of 43 years and performed on-site design rainfall estimation for quantifying spatiotemporal variability used in  . Additionally, Nayak and Ghosh  used support vector machine and statistical techniques based on machine learning to predict extreme rainfall. Singh et al.  noted that the unusual pattern of spatiotemporal behavior of extreme rainfall might be related to various physical variables, such as the Indian Ocean Dipole, the El Nino-Southern Oscillation along with the East Atlantic Pattern and the high-speed wind blowing from the Arabian Sea conveying abundance dampness to the surrounding regions. Their results were based on a study of the spatiotemporal characteristics of extreme rainfall using two years of data from 26 rain gauges. Therefore, considering that previous studies have not focused on rain event properties, the present study conducts an effective investigation of rain event properties and their interdependence on IETs, with Mumbai, India, as the study region. This study also seeks to answer the following questions:
1) What is the appropriate MIT for the entire Mumbai region using the COV method?
2) How can we analyze and understand the spatial characteristics of rainfall properties using SOMs?
3) Which rain gauges have similar properties, i.e., lie in the same cluster?
To answer the above questions, the sub-hourly data from a dense gauge network in Mumbai were analyzed. The paper is structured as follows. Section 2 provides a brief description of the study area, data source, and rain gauge characteristics. Section 3 describes the methodology used for the analysis of MIT, rain event properties and clustering using SOM. Section 4 presents the primary findings and their significance. Finally, Section 5 concludes the paper.
2. Study Area and Data Description
Mumbai is situated on the western coast of India and extends between 18.00˚ - 19.20˚N and 72.00˚ - 73.00˚E. It has a total land area of 468 km2, with ~68 km2 making up the city and 370 km2 making up the suburbs  . The region has a humid, tropical climate, with monsoons occurring from June to September that is brought in from the southwest Indian Ocean. Initially, Mumbai was composed of a group of islands, which have now been reclaimed to manage the demand for space. This has resulted in the region having a complex topography that comprises flatlands (below or at mean sea level), urban areas, and mountain and hills areas [Sanjay Gandhi National Park located in the northern part with an elevation of up to 450 m above mean sea level (a.m.s.l.)].
Sub-hourly precipitation data (Rd, 15-min. interval) were acquired from the Municipal Corporation of Greater Mumbai (MCGM). These data were collected from 60 rain gauges for the southwest monsoon period (June-September) from 2006 to 2016. However, after accounting for missing data; newer installations; and the reliability, consistency and operation period of each rain gauge, the present study considered data from 47 stations. Figure 1 shows the unique codes and locations of the gauges with Shuttle Radar Topography Mission (SRTM), and Table 1 presents the basic characteristics of the rain gauges at the meteorological stations.
This section describes the methodology used to achieve the objectives of this study.
3.1. Data Preparation and Determination of MIT
In this study, continuous rainfall data recorded by the rain gauges at the meteorological stations were analyzed a sub-hourly basis. Furthermore, to ensure the
Figure 1. Unique codes and locations of rain gauges at meteorological stations throughout Mumbai with SRTM.
Table 1. Basic characteristics of rain gauges at meteorological stations throughout Mumbai.
reliability of the data and to identify suspect or incorrect values, a validation method was used. In the validation process, the data were not modified but supplemented with an appropriate flag, namely, “Suspect” and “Missing” followed by the application of a range test  and double mass curve. Only stations with more than three years of data were considered in the process. The entire procedure resulted in the inclusion of 47 stations, with recorded data time varying from 4 to 11 years. To assuarence, a near-uniform spatial distribution of regions, marginally deficient records with minor gaps in the sub-hourly precipitation records (splits of up to a few days) was utilized.
Precipitation is recorded using a rain gauge, which measures the amount of water, precipitated by clouds, reaching the ground. Based on homogeneity, the precipitation records can then be clustered into independent events. To Separate and interpret independent events from the time series rain gauge records is subjective. For example, in Figure 2, for groups A, B, and C, either each group can be considered as an independent one or all three groups may be part of one event. Another probability is to club A and B together and consider it as an event and Group C to be another event. Hence, to calculate the start and end times of such independent events, an MIT estimate is required. Accordingly, two tempests isolated by a rainless period with values less than a specified MIT value are considered as a single event, as shown in the left-hand corner of Figure 2 and vice versa on the right side of Figure 2. In the present study, optimal MIT was estimated using the COV method proposed by Restrepo-Posada and Eagleson  and applied by several researchers. This method proposed a simple check using the COV of IETs. Assuming that the distribution of IETs to be roughly exponential, which implies equal mean and standard deviation, the COV ought to be 1. Therefore, MIT values are systematically modified, and the MIT is prompting COV = 1 is identified as ideal.
3.2. Self-Organized Maps (SOM)
The vital information contained in each rain events should be extracted using a
Figure 2. Diagram illustrating the classification of rain events using inter-event times.
limited set of well-chosen properties. However, there is no standard or generally acknowledged list or a particular set of properties that can be utilized to precisely depict and abridge an event. Based on literature reviews, the often-used properties in various studies are duration, depth, mean rain rate, maximum rainfall intensity and intra-event dry periods and hydrology studies have used event peak intensity for an aggregation scale at different time steps     . We also include the property describing the fraction of intra-event rainless periods, previous inter-event time and the position of the peak in the event (time of peak and fraction of event depth until event maximum), which are relevant overland flow generation, runoff, and infiltration     . In this study, eight properties were obtained for each of the rain events derived from the 47 rain gauges, and these are listed in Table 2.
The event characteristics at each station described by the properties (Table 2) can be seen as a vector that has a high-dimensional data space. We used a flexible, data-mining, SOM method   for exploratory data to analyze such high-dimensional data spaces of the studied rain gauges. A SOM is an unsupervised learning algorithm based on artificial neural networks that produce a low-dimensional representation of a high-dimensional input dataset. SOMs can be used for a variety of operations in exploratory data analysis, such as clustering, data compression, non-linear projection and pattern recognition.
A SOM comprises cells that are sorted out on a regular grid. Each cell is drawn by a d-dimensional weight vector and associated with nearby cells by a relation, which decides the structure, i.e., the topology of the resulting SOM. The SOM is then created through iterative training; input vectors relating to data samples in the given data matrix are randomly picked in each turn, and the distances between them and all weight vectors of the SOM are computed. The cell that has a weight vector nearest to the input vector in question is the input vector’s best-matching unit (BMU). After obtaining the BMU, the weight vector is
Table 2. Variables are characterizing rain events in this study.
updated so that the BMU and its neighbors are moved towards the input vector. The SOM is then trained with the net effect of the whole dataset by the batch algorithm, which computes an average of the data samples weighted by the neighborhood function of each data sample at its BMU.
In this paper, we run the SOM tool in MATLAB to create a SOM using the SOM algorithm as described in Vesanto et al.  . The training dataset considered in this analysis comprises eight properties, along with three geographic coordinates―x, y and z positions―of each rain gauge. The final dataset has an overall dimension of 11 attributes and a size of about 23,000 data samples. Prior to the training, the data were normalized to a [0 1] interval. To avoid the disproportionate influence of the high index values on the training, linear transformation of the data is carried out. On the other hand, geospatial data posed some important features of nature that occurs at all scale, also perceived by its gradual, fuzzy or vague changes raise some issues in the utilization of the SOM algorithm. However, these issues can be resolved using recommended approaches such as data pre-processing (normalization and attribute weighting) and geo- initialization   . Therefore, before starting the analysis, the data were normalized, and to preserve the geospatial aspects, the x variable was scaled down using the ratio max(x)/max(y) and was weighted with a value of eight.
There are two phases in which the SOM algorithm training is usually performed, viz. rough and fine-tuning. Relatively large initial learning rate (0.5) and neighborhood radius (3) are used in the rough tuning phase while in the fine-tuning phase, both the above values are taken smaller right from the beginning (0.05 and 1). But, before the algorithm is applied and analyzed, there are few parameters (Table 3), that need to define. All these parameters together play an important role in the SOM algorithm as they could influence the result obtained. The parameters used for training the SOM are summarised in Table 3. The MATLAB package proposed for the data a map of size 49 × 15, based on the hexagonal lattice. However, for an easier presentation of the map, we built a smaller map of size 16 × 5 in this study. This brought about a decay of the quantization error from 0.420 to 0.559 and enhancement of the topological error from 0.126 to 0.052.
A perfect SOM analysis creates such apparent outcomes that envisioned maps could be dependably deciphered simply by taking a look at them, even though extra apportioning that utilizes SOM as a halfway step is often prescribed to obtain more precise outcomes   . In this study, popular visualizations of SOMs, such as U-matrix, component plans, assignment of rain gauges to neurons and the distribution of index properties represented by bar charts, are used. Furthermore, the study uses hierarchical clustering, an unverified method, for clustering the SOM  . The approach begins with single data points as individual clusters, and at each progression, each cluster consolidates with the nearest pair of clusters until one cluster remains. Hence, the approach is also known as the agglomerative approach and calls for the definition of cluster proximity. In this investigation, cluster proximity is characterized by the average pairwise proximity among all sets of points in various groups and is represented by the average group distance. The outcome is called a dendrogram which is a tree-like diagram. A dendrogram shows both the cluster and sub-cluster relationships and the order in which the clusters were consolidated. The closeness of the clusters can be depicted by lengths of the limbs, and the data items can be clustered by cutting the dendrogram.
Table 3. Parameters for SOM analysis.
4. Results and Discussion
The results obtained in this study are presented and discussed in this section, which is further divided into three subsections.
4.1. Minimum Inter-Event Time (MIT)
The MIT values were estimated using the COV method. In this approach, for all stations, the MIT values varied from 15 min to 24 h. Furthermore, events for each MIT were identified, and the COV of the IETs was obtained. The outcomes are shown in Figure 3, where COV is noted to decrease with increasing MIT. The approximate values of MIT for which COV = 1 was assessed; Table 1 shows the appropriate MIT values for all stations. The average MIT for the study area was noted to be 5 h appear reasonable as it was reported that MIT with less than 6 h could be suggested for the urbanest application    , with the minimum value being 2h at F North station and the maximum values being eight at Workshop Kandivali station. The average annual number of storm events that occurred in Mumbai during the southwest monsoon season in the studied period at each station is given in Table 1; the average annual number of storms events varies from 25 to 121.
4.2. Self-Organized Maps (SOM)
The resulting dataset considered in this analysis comprises eight essential properties, along with three geographic coordinates―x, y and z positions―of each
Figure 3. Coefficient of variation against interevent time for stationAndheri (a), Gawan Pada (b), Mulund (c) and Worli (d) in Mumbai.
rain gauge. The final dataset has an overall dimension of 11 attributes representing the characteristics of about 23,000 rain events at 47 rain gauges. The rain gauges with similar response are grouped by training a SOM and implementing hierarchical clustering on the SOM.
The U-matrix (Figure 4(a)) demonstrates two distinctive parts of the map: blue-colored areas in the southwest part indicate units with a high level of similitude, which can be viewed as discrete clusters. A column of red and orange colors at the focal point of the south side isolates the neurons from the rest of the map and forms a cluster border. Further, we can visualize that there are at least 4 - 5 clusters in the data. The SOM grid with the assignment of 47 available rain gauges to neurons (Figure 4(c)) indicates that the gauges marked together on one neuron can be considered as having comparable behavior and form the smallest unit of a cluster. The SOM reduced the variability of the 47 rain gauges to 40 neurons. Additionally, dendrogram cluster analysis was applied to the SOM to reduce the number of clusters further. By assigning the rain gauges to neurons (Figure 4(c)), we can group the rain gauges into six sensible clusters―C1 to C6. The spatial distribution of the clustered rain gauges (Figure 5) shows that rain gauges with similar response behavior are grouped together.
A marginal rain gauge is identified as a rain gauge that is labeled to neighboring
Figure 4. Representations of a SOM: (a) U-matrix: neurons of the SOM are labeled by numbers, indicating structures formed by visualizing distances between neighboring neurons and, on additional hexagons between neurons, medium distances between two neurons; (b) Component planes for each index display mean values of each vital property on the neurons of the SOM; (c) Assignment of rain gauge using neurons labels; gauge IDs correspond to their BMU.
Figure 5. Location and clustering of each rain gauge using SOM.
neurons with a not exceeding medium distance and belonging to different clusters. As a definitive measure of marginal rain gauges, we utilized their quantization errors to the BMU and the neuron of the neighboring cluster. Rain gauges with a similar quantization error to different neurons can be perceived as having a place with two neurons and thus may have a place with two clusters. With this strategy, we enhanced the clustering and areal grouping of rain gauges (Figure 5) and identified three marginal cases (rain gauges M-05, M-12, and M-38) from the clusters through response behavior. These rain gauges belonged to neurons at the edge of a cluster with medium distance to a neighboring neuron of another cluster. In their first 4 BMUs, there was a frequent change between both clusters with slightly increasing quantization errors. In this manner, we assigned the rain gauges to two clusters―M-05 to C1 and C2, M-12 to C5 and C6, and M-38 to C1 and C2.
4.3. Cluster Analysis
In this study, we assumed that two or more rain gauges behave similarly and are grouped by training a SOM and implementing hierarchical clustering. With the information from the component planes (Figure 4(b)) and the distribution of properties for each neuron, we can characterize each cluster by a unique combination of aspects of the rain event characteristics, listed in Table 4.
Table 4. Characteristics of clusters C1 to C6 based on rain event properties.
Note: ++, very high, +, high, , medium, −, low, ? ?, very low.
These clusters are discussed as below:
Cluster 1 (C1) consists of 14 stations located in the area near the Arabian Sea, mainly covering the southwest area of Mumbai. It is characterized by high-intensity, low-depth and short-duration events. A large number of storms were observed in this area, as it has a low previous IET. The component plane (Figure 4(b)) of a fraction of event depth until event maximum and time of peak highlighted that the events in this area received the maximum amount of rain before reaching high intensity with peaks in the second or third quartile. This is a result of the southwestern trade winds that carry significant moisture inland from the Arabian Sea in the south. Due to the high moisture, this region experiences heavy monsoonal rainfall and hence has a typical tropical monsoon climate.
Cluster 2 (C2) comprises nine sites, covers the southwest and extends further inland. This cluster represents lowland, urban area with high-rise buildings and is characterized by a large duration, very high intensity and high amount of rainfall. This region experiences peaks of rainfall in the first quartile of the duration (low time of peak value), with most of the rainfall occurring after the peak. This may be due to the high-rise buildings in this region.
Cluster 3 (C3) consists of four sites located between the Sanjay Gandhi National Park and Chembur hills. While this cluster is not similar to cluster 2, it exhibits similar characteristics such as average duration, very high intensity and high amount of rainfall. This cluster does not present a clear picture of the rainfall peaks within an event, which may be due to the funneling action by the hills.
Cluster 4 (C4) is located predominantly in the western part of the city between the Arabian Sea and Sanjay Gandhi National Park. It is characterized by low previous IET, high precipitation intensity, large duration and average depth events, with maximum intensity during the first quartile of the event. Furthermore, the value of the property (fraction of event depth until event maximum) varies in this region between 0.4 - 0.7 indicates that most of the storm events may receive an equal amount of rainfall before and after a high-intensity, which may be attributed to the high wind gusts caused by the funneling action by the hills.
Cluster 5 (C5) consists of 10 sites located in the flatland that lies between the Arabian Sea and the windward side of the Sanjay Gandhi National Park predominantly in the northern part of the city. The characteristics of this cluster are similar to those of cluster 3, except that this cluster has a very low IET, and have a high time of peak value.
Cluster 6 (C6) comprises four sites, located predominantly in the northeastern part of the Sanjay Gandhi National Park on the leeward side of the hills. Results demonstrate that this region experiences minimal rainfall regarding frequency, duration, and intensity (see Table 4). Additionally, the low value of properties (Rmax, a fraction of event depth until event max and depth) highlighted that the events received by this region are smooth with small numbers of sharp peaks. This may be due to the shadowing effect of the hills, resulting in a decrease in rainfall amount.
The results confirm the effect of complex topography, namely, the flatland near the Arabian Sea, high-rise buildings (urban area), mountain and hills areas (Sanjay Gandhi National Park located in the northern part) on the spatial variation of rainfall. The results highlighted the rain gauges within the cluster 2 (C2) located in the urban pocket received the intense precipitation, which supports the findings of Paul et al.  , shows the urban signature for extreme precipitation will be reflected on rainfall recorded by the stations only when the stations are located within the urban pockets affected by intense precipitation. Considering the following factors, we did not regroup the sites further:
1) To evade loss of valuable information on precipitation during the analysis of dissimilar sites (M-10, M-27, and M-35), we did not exclude these sites.
2) To evade extraordinary events is unlikely since we consider it to be an intolerable hindrance for the process of evaluating it independently.
The primary objective of this research was to study the spatial characteristics of rain events in Mumbai. This was achieved by computing the SOM with 11 properties, represents the characteristics of rain events, analyzed using sub-hourly rain gauge data from 2006-2016. The following results and conclusions can be drawn as:
1) This study emphasizes the need for event analysis over the traditional usage of sample analysis and brings out the advantages of event analysis in the case of limited data availability. The sample analysis involving excessively long integration times (hours or days) usually consists of rainy and clear sky periods in the same sample. Furthermore, it consists of distinct physical processes of the rainfall event. Such long integration times can lead to the mixing of rainy and clear sky observations. On the other hand, the data acquired in a short time (minutes) are sensitive to the sensor’s characteristics (detection threshold, sensor area, and noise). Therefore, by analyzing rain events, individual relationships between the properties of rain events can be identified.
2) The various properties derived from rain gauge data can be utilized to portray precipitation in rain time-series studies, which are then used to study various topics including meteorology, hydrology, and climate.
3) We can conclude that the usage of a combination of the selected properties has helped in describing the depth of rain events by providing a relatively accurate summary. These properties are the as total depth, total duration, mean rain rate, maximum rain rate, peak timing, previous inter-event time, a fraction of event depth until event maximum and fraction of intra-event rainless periods.
4) The dependence of the event depth and previous IET, confirms that for events interrupted by no rain periods, an independent rain event cannot be considered as an alternative to rain time series in a forecast,
5) The results show that the gauges can be clustered into six distinct groups using SOM. Furthermore, the clusters confirm the spatial variation of rainfall as a result of the complex topography of Mumbai, namely, the flatland near the Arabian Sea, high-rise buildings (urban area), and mountain and hill areas (Sanjay Gandhi National Park located in the northern part).
The limitations of the study are the followings:
1) In the present analysis, uncertainties associated with the MIT value selection criteria has not been addressed, this is a potential area of future research.
2) The hypothesis of clusters needs to be further tested using the climatological data (temperature, wind speed, wind direction, etc.) measured by MCGM at the 47 stations and radar data obtained by the Indian Metrological Department, Mumbai; this can be a potential area of future research.
3) As this study was based on the analysis of observation made in Mumbai having complex orography and coastal area, the significance of this hypothesis remains to be confirmed through the studies of multiple cases across different cities; this can be a potential area of future research.
The authors thank the MCGM for providing rain gauge data and the Indian Metrological Department, Mumbai, for their comments and suggestions, which significantly contributed to improving the clarity of the paper.