Randomized weights neural networks have fast learning speed and good generalization performance with one single hidden layer structure. Input weighs of the hidden layer are produced randomly. By employing certain activation function, outputs of the hidden layer are calculated with some randomization. Output weights are computed using pseudo inverse. Mutual information can be used to measure mutual dependence of two variables quantitatively based on the probability theory. In this paper, these hidden layer’s outputs that relate to prediction variable closely are selected with the simple mutual information based feature selection method. These hidden nodes with high mutual information values are maintained as a new hidden layer. Thus, the size of the hidden layer is reduced. The new hidden layer’s output weights are learned with the pseudo inverse method. The proposed method is compared with the original randomized algorithms using concrete compressive strength benchmark dataset.
 Shang, C., Yang, F., Huang, D.X. and Lu, W.X. (2014) Data-Driven Soft Sensor Development Based on Deep Learning. Journal of Process Control, 24, 223-233. http://dx.doi.org/10.1016/j.jprocont.2014.01.012
 Pao, Y.H. and Takefuji, Y. (1992) Functional-Link Net Computing, Theory, System Architecture, and Functionalities. IEEE Computer, 25, 76-79. http://dx.doi.org/10.1109/2.144401
 Igelnik, B. and Pao, Y.H. (1995) Stochastic Choice of Basis Functions in Adaptive Function Approximation and the Functional-Link Net. IEEE Trans. Neural Network, 6, 1320-1329. http://dx.doi.org/10.1109/72.471375
 Huang, G.B., Chen, L. and Siew, C.K. (2006) Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes. IEEE Transactions on Neural Networks, 17, 879-892. http://dx.doi.org/10.1109/TNN.2006.875977
 Tapson, J. and Schaik, A.V. (2013) Learning the Pseudoinverse So-lution to Network Weights. Neural Networks, 45, 94-100. http://dx.doi.org/10.1016/j.neunet.2013.02.008
 Alhamdoosh, M. and Wang, D.H. (2014) Fast Decorrelated Neural Network Ensembles with Random Weights. Information Sciences, 264, 104-117. http://dx.doi.org/10.1016/j.ins.2013.12.016
 Bartlett, P.L. (1997) For Valid Generalization, the Size of the Weights Is More Important Than the Size of the Network. IEEE Conference on Neural Information Processing Systems, MIT Press, Cambridge, 134-140.
 de la Rosa, E. and Yu, W. (2015) Nonlinear System Identification Using Deep Learning and Randomized Algorithms. IEEE International Conference on Information and Automation (ICIA2015), Lijing, 274-279. http://dx.doi.org/10.1109/ICInfA.2015.7279298
 Liu, H.W., Sun, J.G., Liu, L. and Zhang, H.J. (2009) Feature Se-lection with Dynamic Mutual Information. Pattern Recognition, 42, 1330-1339. http://dx.doi.org/10.1016/j.patcog.2008.10.028
 Peng, H.C., Long, F.H. and Ding, C. (2005) Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238. http://dx.doi.org/10.1109/TPAMI.2005.159
 Tan, C. and Li, M.L. (2008) Mutual Information-Induced Interval Selection Combined with Kernel Partial Least Squares for Near-Infrared Spectral Calibration. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 71, 1266-1273. http://dx.doi.org/10.1016/j.saa.2008.03.033
 Tang, J., Chai, T.Y., Yu, W. and Zhao, L.J. (2012) Feature Extraction and Selection Based on Vibration Spectrum with Application to Estimate the Load Parameters of Ball Mill in Grinding Process. Control Engineering Practice, 20, 991- 1004. http://dx.doi.org/10.1016/j.conengprac.2012.03.020
 Battiti, R. (1994) Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transaction on Neural Network, 5, 537-550. http://dx.doi.org/10.1109/72.298224
 Yeh, I.C. (1998) Modeling of Strength of High Performance Concrete Using Artificial Neural Networks. Cement and Concrete Research, 28, 1797-1808. http://dx.doi.org/10.1016/S0008-8846(98)00165-3