Electricity has been one of the main essential resources to the human activities. Power plant has been established to provide human communities with the needed amount of electricity. Power provided from power plants fluctuates through the year due to many reasons including the environmental conditions. The accurate analysis of thermodynamic power plants using mathematical models requires high number of parameters and assumptions, in order to represent the actual system unpredictability   . Instead of mathematical modelling the system’s thermodynamics, machine learning approaches can be used   . One of these methods is artificial neural networks (ANNs). With the ability of artificial neural networks to address nonlinear relationships, environmental conditions are studied as inputs of the model, and the generated power as the output of the model. Using this model, we can predict the output power of the plant given the environmental conditions.
Artificial neural networks (ANNs) were originally proposed in the mid-20th century as a computational model of the human brain. Their use was limited due to the limited computational power available at the time, and some unsolved theoretical problems. However, they have been increasingly studied and applied with the recent existence of higher computational power and the availability of datasets  . In a typical modern power plant, a large amount of parametric data is stored over long periods of time; therefore, a large data based on the operational data is always ready without any additional cost  .
Researches have considered ANN to model many various engineering systems  -  . Many researchers reported the feasibility and reliability of ANN models as simulation and analysis tool for power plant processes and components  -  . Relatively few studies have considered the Steam Turbine (ST) in a combined cycle power plant (CCPP)     . In  the total power output of a cogeneration power plant with three gas turbine, three HRSGs and one steam turbine were predicted. Niu  studied the control strategy of the gas turbine in a combined cycle power plant using a linearization model technique. Samani  used two subsequent artificial neural networks to model combined cycle power plant with inputs as the relative humidity, atmospheric pressure, ambient temperature and the exhaust vacuum of the steam turbine. The exhaust steam pressure alone is a function of ambient conditions and is not a deterministic parameter. Tüfekci  and Kaya  compared various machine learning methods to predict the full load electrical power output of a base load operated combined cycle power plant. In this work, detailed study of regression ANN model is studied.
1.1. Combined Cycle Power Plant (CCPP)
One kind of power plants is the combined cycle power plant (CCPP), which is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators (HRSG) (Figure 1).
In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle  . A combined-cycle power plant produces up to 50 percent more electricity from the same fuel than a traditional simple-cycle plant
Figure 1. Combined Cycle Power Plant CCPP diagram  .
by routing the waste heat from the gas turbine to the nearby steam turbine, which generates extra power  . Combined cycle power plant mechanism can be stated as follows  :
1) Fuel burns at the gas turbine, makes the turbine blades spinning and driving electricity generators.
2) Heat Recovery Steam Generator (HRSG) captures exhaust heat from the gas turbine. The HRSG creates steam from the gas turbine exhaust heat and delivers it to the steam turbine.
3) Steam turbine uses the steam delivered by the heat recovery system to generate additional electricity by driving an electricity generator.
Gas turbine load is sensitive to the ambient conditions; mainly ambient temperature (AT), atmospheric pressure (AP), and relative humidity (RH). However, steam turbine load is sensitive to the exhaust steam pressure (or vacuum, V)   .
Combined cycle power plants (CCPPs) have a higher fuel conversion efficiency compared to the conventional power plants, i.e. consuming less fuel to produce the same amount of electricity, which results in lower power price and less emission to the environment  .
1.2. ANN Definition
Artificial neural networks (ANNs) or connectionist systems are a computational model used in computer science and other research disciplines, which is based on a large collection of simple neural units (artificial neurons), loosely analogous to the observed behavior of a biological brain’s axons. Each neural unit is connected with many others, and links can enhance or inhibit the activation state of adjoining neural units. Each individual neural unit computes using summation function. There may be a threshold function or limiting function on each connection and on the unit itself, such that the signal must surpass the limit before propagating to other neurons. These systems are self-learning and trained, rather than explicitly programmed, and excel in areas where the solution or feature detection is difficult to express in a traditional computer program  . The training of ANN starts with random weights and then neurons works to make sure the error is minimal.
1.3. ANN Advantages
Major advantage of using ANN is non-parametric model while most of statistical methods are parametric model that need higher background of statistic. In addition, ANNs easily handles highly non-linear modelling (main advantage). However, ANN is a black box learning approach, cannot interpret relationship between input and output and cannot deal with uncertainties  .
1.4. Regression ANN
Neural networks are good at fitting functions. In fact, there is proof that a fairly simple neural network can fit any practical function  .
1.5. ANN in MATLAB
Neural Network Toolbox™ provides algorithms, functions, and apps to create, train, visualize, and simulate neural networks. It includes algorithms and tools for regression, pattern recognition, classification, clustering, deep learning, time series and dynamic systems, and many others which cover the usage of ANN models   . The work flow for the neural network design process has seven primary steps:
1) Collect data
2) Create the network
3) Configure the network
4) Initialize the weights and biases
5) Train the network
6) Validate the network
7) Use the network
Some of these steps could be done automatically using default values and settings in the toolbox; however, user can set every detail by himself. Neural Network Toolbox offers four levels of design i.e. four different levels at which the Neural Network Toolbox™ software can be used.
The first level is represented by the GUIs. These provide a quick way to access the power of the toolbox for many problems of function fitting, pattern recognition, clustering and time series analysis. In addition a.m Matlab code script can be generated with the desired level of details copying settings used in the network study.
The second level of toolbox use is through basic command-line operations. The command-line functions use simple argument lists with intelligent default settings for function parameters. (Users can override all of the default settings, for increased functionality.)
A third level of toolbox use is customization of the toolbox. This advanced capability allows user to create custom neural networks, while still having access to the full functionality of the toolbox.
The fourth level of toolbox usage is the ability to modify any of the code files contained in the toolbox. Every computational component is written in MATLAB® code and is fully accessible.
1.6. Regression ANN in MATLAB
Regression (Fit Data) in Neural Network Toolbox in Matlab can be accessed using GUI or command-line functions  . There are two GUIs can be used to design and train the network  :
1) Neural Network tool (nntool), which is the general neural network tool, offers full control of settings. Using this GUI, user can design any type of neural network, not only the regression ANN.
2) Neural Fitting tool (nftool), which leads user through solving a data fitting problem, solving it with a two-layer feed-forward network trained with Levenberg-Marquardt or scale conjugate gradient back-propagation. It has limited set of options. User can select data from the MATLAB® workspace or use one of the example datasets. After training the network, evaluate its performance using mean squared error and regression analysis. Further, analyze the results using visualization tools such as a regression fit or histogram of the errors. User can then evaluate the performance of the network on a test set.
1.7. Aim of the Study
Aim of this work is to apply and experiment various options effects on feed-foreword artificial neural network (ANN) which used to obtain regression model that predicts electrical output power (EP) of combined cycle power plant based on 4 inputs. More specifically, this work uses MATLAB neural networks toolbox to study stochastic behavior of the regression neural, effect of number of neurons of the hidden layers, effect of data subset size for training, effect of number of variables as input, different training functions results, data preprocessing, and statistical error study.
2. Method Description
In this study, MATLAB neural networks toolbox is used; database is obtained freely from  . Through this paper, terms Test and Performance have the following meanings:
1) Test: refers to the test of the whole dataset (9568 observations) which gives results that are more realistic.
2) Performance: means squared errors (MSE).
The main scheme in this study is conducting comparisons between resulted networks using various variations of options. Comparison will always be between the performances of networks on the whole dataset (Test dataset). The following subsections describe the data, shows how training sub dataset is obtained, illustrates which features (inputs) to be studied, discusses data normalization and shows selection of neural network structure size.
2.1. Data Overview
Dataset is obtained from online site  . The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (AT), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. Dataset features are summarized at Table 1.
Figure 2 shows the relationships between each of the variables (AT, AP, RH, and V) and output power (EP) with the linear regression for each chart.
Table 1. Features (dataset variables) summary.
Figure 2. Linear Correlation between inputs and output PE.
2.2. Sub Dataset Selection
Since the main goal of this work is to apply and test regression with neural network, so no need for all the dataset. Only a smaller subset is systematically picked from the dataset to train, validate and initial test of the network. Matlab Neural Network toolbox divides the dataset to subsets of train, validate and test, by default percentages as 75%, 15% and 15%. Since we have huge dataset, we can run the test on it, so we will reduce the test subset to 0%, training subset will be 75% and validation subset will be 25%. Finally, test is performed with all the data points from the original dataset and compared with different subset sizes.
One way to improve training networks is to linearly normalize inputs and outputs to certain range. The standard normalization maps the feature to the range of (−1, 1); which is default in the Matlab Neural Network toolbox; both inputs and outputs are normalized by default. Other range may be used, e.g. (0.01, 0.09). Another normalization practice is to map feature to a range with specified mean and variance. Typically the mean would be (0) and standard derivation would be (1). Since Matlab Neural Network toolbox makes this step for us, we do not have to worry about it. However, mapping to range (0.01, 0.09) is also applied and results are compared to the use of not normalized data.
2.4. Feature Selection
As shown in Figure 2, AT and V have a strong negative linear relations to the output PE, AP and RH have weak positive linear relations to the output PE. This can be further shown with the correlation coefficients between inputs and the output (PE) as shown at Table 2.
The linear relations strength between the variables are shown in Table 3 of correlation R and correlation strength R2. It can be seen that AT and V are strongly linearly related to each other and to the output PE, while AP and RH have weak linear relations to all other variables and output.
Although it is obvious that the governing variables are AT and V, the effect of absence and presence of each of variables is studied.
2.5. Setting of Networks
2.5.1. Two-Layer Feed-Forward Network
We are using the tan-sigmoid transfer function in the hidden layer, and a linear output layer. This is the standard network for function approximation  . This network has been shown to be a universal approximating network. The used network diagram is shown in Figure 3.
2.5.2. Training Algorithms
Here we will examine two algorithms: (trainlm) Levenberg-Marquardt algorithm (default) and (trainbr) Bayesian regularization algorithm.
Table 2. Variables correlation with output PE.
Table 3. Correlation between features.
Figure 3. Universal approximating network (Regression Neural Network Structure  ).
2.5.3. Hidden Layer Size
The number of neurons in the hidden layer will depend on the function to be approximated. This is something that cannot generally be known before training. Levenberg-Marquardt algorithm needs the number of neurons (hidden layer size) to be given to the algorithm. However, the effect of the hidden layer size will be examined in this study by applying a variety of hidden layer sizes.
3. Results and Discussion
Here are the results of many options variation on the neural networks:
3.1. Variation of Results Using Same Settings and Sub Datasets Data
Training the same network with the same settings and the same dataset gives different output each run; because of:
1) The randomness of the initial weights and bias at every training run of the neural network.
2) The randomness of dividing dataset into train, validate and test sets.
Table 4 shows the setting used for each run.
Results are shown at Table 5; we can observe this behavior of variation in resulted network each run. By looking at the resulted performance (MSE) values for the same test data, we can see that it varies between 0.78 up to 0.94 and the resulted performance (MSE) values varies between 140 and 32.
3.2. Effect of Different Values of Hidden Layer Size (Number of Neurons)
To examine the effect of the hidden layer size the network is trained with the settings shown at Table 6. Note that each value of hidden layer size is trained for 10 times, and only the best resulted network is considered. This is done to overcome the variation behavior shown in section (3.1).
From Table 7 it can be observed that for the same dataset and settings higher size of the hidden layer is not always useful for the network. Comparing the performance (MSE) values for the test dataset, we can notice that at size 100 the worst network obtained, while the best performance obtained with size of 3. The tendency to have better network with smaller hidden layer size indicates that the relation is strongly linear; given that zero hidden layer size means just a linear relationship. Same results are shown in Figure 4.
Table 4. Settings of Experiment (Variation of results using same settings and sub datasets data).
Table 5. Variation of the results using same settings and sub datasets data.
Table 6. Settings of Experiment (Effect of different values of hidden layer size).
Table 7. Effect of different values of hidden layer size.
Figure 4. Effect of different values of hidden layer size on performance (MSE).
3.3. Effect of Different Train Dataset Size
Here we will examine different sizes of train datasets, which actually train and validate datasets. Settings for this experiment are shown at Table 8.
As shown at Table 8, network is trained for each dataset size for 10 times to overcome the variation in results mentioned in section (3.1). From the results in Table 9 we can observe that by increasing the train dataset size generally networks improves. Notice that at small dataset size any increase results in improved network performance. In the other hand, by reaching dataset size of 100 only little improvement is obtained by increasing in the dataset size. The same results are in Figure 5.
Figure 5. Effect of different train dataset size on performance (MSE).
Table 8. Settings of Experiment (Effect of different train dataset size).
Table 9. Effect of different train dataset size.
3.4. Effects of Absence and Presence of Each Variable
Each variable has certain effect on the output, some has huge effect (main variables) while others may have little effect if at all. Here we will examine the four variables (AT, V, AP and RH), which makes 15 different combinations. Each is repeated for 10 times as to overcome problem of randomness discussed in section (3.1). These settings are shown at Table 10.
From the results shown at Table 11, it is obvious that presence of AT has the main effect of the quality of the network; actually, even just using AT alone gives us satisfying model. Introducing the remaining variables to the network so as to increase its quality. In addition, V also has good impact on the model quality. AP and RH have just improving effect on the model. Notice that the best network obtained when using only AT, V and RH. It has the best performance and the best correlation (Regression) when tested on the complete dataset (9538 data point). Imposing AP has generally bad effect on the model quality. Therefore, we can conclude that: introducing some variables may act negatively on the network quality; i.e. not every added variable has improving effect on the model.
3.5. Effect of Using Different Training Function
In all previous sections, we used Levenberg-Marquardt algorithm (trainlm) function. Here we will examine and compare another famous training function, Bayesian regularization (trainbr), which is an improved algorithm to the former one. Setting is shown at Table 12.
From the result Table 13, we can notice that for the same settings and dataset Bayesian regularization (trainbr) is better than Levenberg-Marquardt algorithm (trainlm) function. It is also notable that it has no validation sub set, only training set. Lastly, notice that the number of epochs needed to obtain the network; it is more than 10 times of those needed by Levenberg-Marquardt algorithm (trainlm) function.
3.6. Effect of Normalizing Dataset before Manipulate It to Network Training
Here dataset is normalized to the range (0.01, 0.99) and the quality of the resulted network is compared to network trained with not normalized dataset, which is normalized by Matlab Neural Network toolbox; which has two options for normalization. The first is the standard normalization to the range of (−1, 1), using the function (mapminmax) which is the default in the toolbox. Secondly, is the normalization to a range with specified mean (typically 0) and standard variation (typically 1), using the function (mapstd). Settings for this experiment are shown at Table 14.
From the result Table 15 notice that performance values of the normalized data are also normalized, so they are here very small values. By comparing the regression (correlation) and the epochs numbers, we could notice that these three methods are equivalent. Using the not normalized data is easier in reading results and more convenient since Matlab Neural Network toolbox does it for us anyway. Note that network training is run for 10 times for each set of settings as to overcome problem of randomness discussed in section (3.1).
Table 10. Settings of Experiment (Effects of absence and presence of each variable).
Table 11. Effects of absence and presence of each variable.
Table 12. Settings of Experiment (Effect of using different training function).
Table 13. Effect of using different training function.
Table 14. Settings of Experiment (Effect of normalizing dataset).
Table 15. Effect of normalizing dataset.
3.7. Comparisons of Target and Resulted Outputs
Here we consider the two groups of resulted outputs.
1) Training & Validation group: outputs resulted from the network for the input data used in training and validation. This group gives a sense of the validity of the model.
2) Test (Complete dataset) group: outputs resulted from the network for the test input data, which in our study is the complete dataset. This group gives success measure for the network.
Comparisons are based on network with the settings and sub dataset size shown at Table 16.
General comparison is presented visually in Figure 6 and Figure 7. It can be observed that the results from the network for input data used in train and validation are closer to their target outputs. Whereas the outputs resulted from complete dataset test are more deviated from their target outputs. That becomes clear when comparing the performance (MSE) of each output group as shown in Table 17.
Furthermore, comparison of error for the two groups is also shown in Table 17, Figure 8 and Figure 9. From Table 17, it is noticed the close values of error and standard deviation between the two studied groups.
Table 16. Experiment Settings (Comparisons of target and resulted outputs).
Table 17. Error in result output.
Figure 6. Target (MW) vs. Result Output PE (MW) (Training and Validation).
Figure 7. Target (MW) vs. Result Output PE (MW) (Complete Database).
Figure 8. Output power (PE) Error (MW) Vs. No of Instances (Training and Validation).
Figure 9. Output power (PE) Error (MW) vs. No of Instances (Complete Database).
Figure 9 shows error in results for the train and validation group and for the test group (complete dataset) along with the amount of instances at the group with the same error value. By first look we find that error distributes among each groups’ instances as normal distribution. When compared to each other, it can be seen that they both have most error in range between −10 and +10, which agrees with the statistical fact that says 99.73% of data will fall in the range of 6 sigma (6σ), i.e. range of (µ − 3σ, µ + 3σ)    , using data from Table 17 this range in our case is approximately (−12.5, 12.5). In addition, we can notice that the error range in test results (−44, +21) is double of the error range of thee train and validation group (−25, +11). The doubled error range resulted from doubled minimum and maximum errors in the two groups. Note that from Figure 7, the model tends to overestimate output at some points, since the far negative range is wider than the right side range. But by looking at Table 17, mean errors which are positive values near zero, so, it can be concluded these far negative error points are just few points and there are more points with positive errors.
To compare the amount of instances vs. error between the two groups, it is more convenient to compare the normalized amount of instances, i.e. percentage of the group. This is shown in Figure 10 and Figure 11 along with lines of normal distribution of properties of mean and standard deviation shown in Table 17. The two charts are very similar to each other.
Figure 10. Output power (PE) Error (MW) vs. percentage of instances (Training and Validation).
Figure 11. Output power (PE) Error (MW) vs. percentage of instances (Complete dataset).
If the error sign (Positive or Negative) is to be neglected, as we want to describe how close the group results to its target values, we can make the same chart but with absolute values. This is presented in Figure 12 and Figure 13.
At this case, train and validation group has more percentage of its instances closer to zero mean error. It provides us with the same info we extracted from Figure 6 and Figure 7 that test results are more deviated from their target.
As experiment, 20 data points are selected randomly and tested, results and errors are shown in Figure 14 and Figure 15. They show that our findings of range of error between (−12.5, +12.5) hold nicely for these randomly selected points.
Figure 12. Absolute Output power (PE) Error (MW) vs. percentage of Instances (Training and Validation).
Figure 13. Absolute Output power (PE) Error (MW) vs. percentage of Instances (Complete dataset).
Figure 14. Predicted Output Power (MW) and error (MW) at 20 random data points from test results.
Figure 15. Prediction error (MW) at 20 random data points from test results.
Regression artificial neural networks (ANN) is used to model electrical output power (EP) of combined cycle power plant based on four inputs. Data are collected from published work freely available online. MATLAB neural networks toolbox is used to program the ANN model. The ANN model is applied and studied through experimenting various settings effects on the neural network performance. Total seven experiments are applied.
Results show the randomness of the ANN model performance for each time it trained, this is because of the randomness of the initial values of weights and bias. It is also observed that increasing in number of neurons at the hidden layer does not necessarily lead to increased quality of the model; in fact, number of neurons has an oscillating effect on the model performance. Increasing dataset size (more data points for the same variables) provides better networks for some extend. Increasing the number of input variables does not always lead to better network quality; some variables when introduced reduce the quality of the model, others increase it. It has to be studied through correlation between variables themselves and between variables and output. In addition, different training functions are compared for the same setting and dataset; in this work Bayesian regularization performed better than Levenberg-Marquardt algorithm. Dataset normalization methods provided by the toolbox are also experimented.
Lastly, results are compared with target values of output for the train and validation group and for the test group, which is the complete dataset group. Comparison shows that results are very close to target outputs for both groups. In addition, it shows the normal distribution of error among each group with mean value of zero. The standard deviations of the error at the two groups are almost equal.
 Dehghani Samani, A. (2018) Combined Cycle Power Plant with Indirect Dry Cooling Tower Forecasting Using Artificial Neural Network. Decision Science Letters, 7, 131-142.
 Jahed Armaghani, D., Mohd Amin, M.F., Yagiz, S., Faradonbeh, R.S. and Abdullah, R.A. (2016) Prediction of the Uniaxial Compressive Strength of Sandstone Using Various Modeling Techniques. International Journal of Rock Mechanics and Mining Sciences, 85, 174-186.
 Moayedi, H. and Jahed Armaghani, D. (2018) Optimizing an ANN Model with ICA for Estimating Bearing Capacity of Driven Pile in Cohesionless Soil. Engineering with Computers, 34, 347-356.
 Khandelwal, M., et al. (2018) Implementing an ANN Model Optimized by Genetic Algorithm for Estimating Cohesion of Limestone Samples. Engineering with Computers, 34, 307-317.
 Baghban, A., Pourfayaz, F., Ahmadi, M.H., Kasaeian, A., Pourkiaei, S.M. and Lorenzini, G. (2017) Connectionist Intelligent Model Estimates of Convective Heat Transfer Coefficient of Nanofluids in Circular Cross-Sectional Channels. Journal of Thermal Analysis and Calorimetry, 132, 1-27.
 Khosravani, H., Castilla, M., Berenguel, M., Ruano, A. and Ferreira, P. (2016) A Comparison of Energy Consumption Prediction Models Based on Neural Networks of a Bioclimatic Building. Energies, 9, 57.
 Jihad, A.S. and Tahiri, M. (2018) Forecasting the Heating and Cooling Load of Residential Buildings by Using a Learning Algorithm “Gradient Descent”, Morocco. Case Studies in Thermal Engineering, 12, 85-93.
 Noori Rahim Abadi, S.M.A., Mehrabi, M. and Meyer, J.P. (2018) Prediction and Optimization of Condensation Heat Transfer Coefficients and Pressure Drops of R134a Inside an Inclined Smooth Tube. International Journal of Heat and Mass Transfer, 124, 953-966.
 Wan, C., Xu, Z., Pinson, P., Dong, Z.Y. and Wong, K.P. (2014) Optimal Prediction Intervals of Wind Power Generation. IEEE Transactions on Power Systems, 29, 1166-1174.
 Bizzarri, F., Bongiorno, M., Brambilla, A., Gruosso, G. and Gajani, G.S. (2013) Model of Photovoltaic Power Plants for Performance Analysis and Production Forecast. IEEE Transactions on Sustainable Energy, 4, 278-285.
 Mahmoud, T., Dong, Z.Y. and Ma, J. (2018) An Advanced Approach for Optimal Wind Power Generation Prediction Intervals by Using Self-Adaptive Evolutionary Extreme Learning Machine. Renewable Energy, 126, 254-269.
 Khosravi, A., Nahavandi, S. and Creighton, D. (2013) Prediction Intervals for Short-Term Wind farm Power Generation Forecasts. IEEE Transactions on Sustainable Energy, 4, 602-610.
 Embrechts, M.J., Schweizerhof, A.L., Bushman, M. and Sabatella, M.H. (2000) Neural Network Modeling of Turbofan Parameters. Volume 4: Manufacturing Materials and Metallurgy; Ceramics; Structures and Dynamics; Controls, Diagnostics and Instrumentation; Education, V004T04A008.
 Boccaletti, C., Cerri, G. and Seyedan, B. (2001) A Neural Network Simulator of a Gas Turbine with a Waste Heat Recovery Section. Journal of Engineering for Gas Turbines and Power, 123, 371.
 Erdem, H.H. and Sevilgen, S.H. (2006) Case Study: Effect of Ambient Temperature on the Electricity Production and Fuel Consumption of a Simple Cycle Gas Turbine in Turkey. Applied Thermal Engineering, 26, 320-326.
 Tüfekci, P. (2014) Prediction of Full Load Electrical Power Output of a Base Load Operated Combined Cycle Power Plant Using Machine Learning Methods. International Journal of Electrical Power & Energy Systems, 60, 126-140.
 Niu, L.X. and Liu, X.J. (2008) Multivariable Generalized Predictive Scheme for Gas Turbine Control in Combined Cycle Power Plant. IEEE Conference on Cybernetics and Intelligent Systems, 21-24 September 2008, 791-796.
 Kaya, H., Tüfekci, P. and Gürgen, S.F. (2012) Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. International Conference on Emerging Trends in Computer and Electronics Engineering, Dubai, 24-25 March 2012, 13-18.