Currently, fossil fuels such as crude oil, coal and gas are the main resources of energy that are used in today’s world. Fossil fuel reserves are expected to deplete in the near future, with studies showing the depletion of oil and gas as fuels expected to occur as soon as 35 to 37 years from the time of writing .
As a means to slow down the depletion of fossil fuels, renewable sources of energy are being utilized as alternatives to the aforementioned fossil fuels. Solar energy is one of the alternative sources of energy being researched, as the influx of solar radiation on the Earth’s surface is several magnitudes larger than the global power consumption of humanity as a whole .
The ability to accurately predict the incoming solar radiation is an important factor to improve the efficiency of a solar energy conversion system. One of the methods utilized to predict incoming solar radiation is the use of an empirical model. The empirical model is a technique which uses meteorological parameters as inputs to predict future values of solar radiation.
The main shortcomings of the empirical model are its focus on long-term prediction, its reliance on existing meteorological data, as well as its inability to identify abnormalities and account for sudden changes in data.
This paper proposes an alternative to the use of empirical models, which is the utilization of a combination of pattern recognition through data clustering techniques  as well as data modelling through artificial neural networks . Through the clustering of data, an organized set of inputs can be used to more accurately train the neural network for use in predicting data. A focused time-delay neural network  is then used on each cluster.
2. Perceptually Important Point (PIP)
Perceptually Important Point (PIP) is a concept introduced by Chung FL, Fu TC, Luk R, and Ng V . The purpose of PIP is to define the shape of a graph using points on the graph that are shown to be “critical points”: points which signify a point of change in the trend of the graph. The graph can be divided into clusters separated by the PIPs, which allows for data to be grouped by trend similarity.
The process behind identifying PIPs is as follows:
1) The start and end of the graphs are set to be the starting and end points.
2) The point on the graph with the greatest vertical distance is set as the first PIP, and is also used as the end point of the first cluster, and the start point of the second cluster.
3) Subsequent PIPs are obtained by finding a point with the greatest vertical distance from the start and end points of each cluster, forming new clusters.
Algorithmically, the process behind segmenting the graph through PIPs can be explained as follows (Figure 1).
Points P1 and P2 on the time-series are established and a gradient is obtained between the two points.
A point on the gradient Pc and a point on the time-series Pn with the same x-axis value are obtained. Initially, (x1 + 1) is selected as the initial x-axis value.
Figure 1. Finding first point.
The difference in values of the y-axis of points Pn and Pc is obtained, and set as distance d.
The x-value of Pc and Pn are incremented by one step and the distance d is calculated again. If the value of d obtained is greater than the previous value of d, the new value is stored. Otherwise, the new value is discarded and the old value of d is kept as the greatest vertical distance.
The x-value of Pc and Pn is repeatedly incremented until P2 is reached.
The time-series can subsequently be segmented into cluster 1 (data from points P1 to Pn) and cluster 2 (data from points Pn to P2) (Figure 2).
The process is repeated recursively for cluster 1 and cluster 2 to obtain points Pm and Pl, which segments the time-series into 4 clusters: P1 to Pm, Pm to Pn, Pn to Pl and Pl to P2.
3. FTDNN Using MATLAB
The algorithm for training and utilizing a Focused Time-Delay Neural Network (FTDNN) can be obtained from MathWorks’ Neural Network Toolbox  . FTDNN was selected as the neural network of choice due to its speed relative to other forms of neural networks.
All values or names within <> braces are subject to change as per user specifications.
Firstly, the cluster dataset has to be loaded and converted into a time sequence using the following commands:
y = y(1:
y = con2seq(y);
The FTDNN is then created using the following commands, with the tapped delay lines, hidden layer neurons and number of epochs being variable depending on optimal parameters:
Figure 2. Finding subsequent points.
The prediction begins on the value in the series after the delay, and the initial values in the delay are also required to be loaded:
p = y(
t = y(
i = y(1:
The network is then trained to perform one-step-ahead prediction:
The network is then ready for use through the calling of the network as a function:
The resulting prediction can subsequently be converted for plotting:
Using the PIP method described earlier, 3 points are obtained. Due to the recursive nature of the PIP algorithm, an odd number of points will always be obtained.
For this paper, all solar radiation data is obtained from the Geography Weather Station of the National University of Singapore.
As shown in Figure 3, the point (705, 784) is seen to be a PIP, indicating a turning point in the graph. (705, 784) in the context of Figure 3 indicates that on the 705th minute of the day, there is an incoming solar energy of 784 WH/m2.
The first cluster is selected to be data from minute 405 to minute 705. The point at minute 480 is ignored to reduce the number of clusters due to the similarity in trend from the data prior to and after minute 480.
The next 17 readings are used to form cluster 2, and the subsequent 20 readings are used to form cluster 3. Readings before cluster 1 and after cluster 3 are
Figure 3. Executing PIP on 1st August 2017 solar radiation data.
ignored due to them being approximately zero, which are not required for use as training data as it indicates that there is no solar radiation incident during that period of time.
Figure 4 shows a visual representation of the actual data (blue) plot against the predicted data (orange). The data is obtained using 1 day of training data as inputs for the network, to predict a day’s worth of solar radiation. Visually, the two sets of data are shown to be fairly close.
Using MAPE , it can be observed that for most of the readings, the percentage difference between the forecast and actual value are sub-10%, ignoring the values that are close to 0 due to the large deviation it causes in percentage errors when the values are small (Figure 5).
Figure 6 shows the result of utilizing the FTDNN on all 3 clusters individually, with 15 days of training data used as the input for each cluster.
Figure 4. FTDNN on cluster 1.
Figure 5. Percentage differences in readings of cluster 1.
Figure 6. FTDNN on 3 clusters combined.
The resulting adjusted MAPE values of cluster 1, 2 and 3 are 3.890%, 4.129% and 1.180% respectively.
While MAPE is shown to not be entirely exact in portraying the accuracy of the prediction, it is sufficient to show a basic level of competency of the network in performing predictions.
In this paper, a system combining the use of data clustering and neural networks is proposed to optimize the prediction of solar radiation.
The paper provides a fundamental level of knowledge on Perceptually Important Points and the focused time-delay neural network that was used in the project.
The use of the methodologies discussed are not limited to the prediction of solar radiation, but can also be used in a more general case for other fields such as the prediction of stock market trends or water current strength for turbines.
 De Soete G. and Carroll, J. (1994) K-Means Clustering in a Low-Dimensional Euclidean Space. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P. and Burtschy, B., Eds., New Approaches in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-51175-2_24
 Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang, K. (1989) Phoneme Recognition Using Time Delay Neural Networks. IEEE Transactions on Acoustics, Speech and Signal Processing, 37, 328-339. https://doi.org/10.1109/29.21701
 Fu, T.C., Chung, F.L., Ng, V. and Luk, R. (2001) Evolutionary Segmentation of Financial Time Series into Subsequences. Proceedings of 2001 Congress on Evolutionary Computation, Seoul, 27-30 May 2001, 426-430.