The speech is the most prominent and essential mode of communication among that of human being. Technological interest in the mechanism of mechanical recognition of human speech capabilities has attracted a great deal of consideration over the past five decades   . At the present time, many residential areas and institutes are consuming different types of security system to make sure that their belongings are secured, for instance using password and User ID/Pin for safeguard. Nevertheless, these security systems are not definitely secured because the pin code can be easily hacked, and the ID card can be pinched and reproduced, etc. Regarding these reasons, a different technology of security system must be emphasized to increase the intimate of the private citizen about the security system  . A remarkable development has been detected in the use of speech recognition in security systems, where better protection of confidential of the civilian is needed  . Biometric technology is known to make use of the costumer attributes element as the password. The attribute element of everyone is specific, even if the costumers are twins. Therefore, the voice recognition system is protected for the administrator user (or costumer). In this paper, the matter of voice recognition is considered and a voice analysis and recognition system is constructed for definite spoken words.
Automatic Speech Recognition (ASR) is the technique of depicting human speech in a computer. Another technical definition is provided by Gurafsky as  , the building of system for mapping acoustic signal to a string of words. The main objective of an ASR system is to transform a speech gesture into a text copy message of the verbal words accurately and efficiently without considering the characteristics of the speaker, environment or the scheme used to record the speech (i.e. microphone). This process begins when a speaker decides what to say and actually speaks a sentence. Then, a speech wave form will be produced by the software. This speech wave contains the words of the sentence along with the inessential sounds and pauses in the spoken input. Subsequently, the software attempts to decode the speech into the best estimate of the sentence. First, the speech signal will be changed into a sequence of vectors that are dignified all through the period of the speech gesture. Then, using a syntactic decoder it produces a valid sequence of representations  .
The voice recognition system comprises two main modules which are feature extraction and feature matching. Feature abstraction is to abstract a slight amount of data from the speech signal to be subsequently utilized for signifying each consumer. On the other hand feature matching includes the step to recognize the unidentified user by matching extracted features from admin voice input with that from a set of recognized user  .
The speech signal and its characteristics can be represented in two different domains which are time and frequency domain. An utterance is the vocalization of a word or words that represent a single meaning to computer   .
Meanwhile, the genetic development of spectrum analysis facilitates the development of modern communication systems. Spectrum analyzer is a part of the system that divides a signal according to its frequency component. Patterns recognition represents the spectrum of specified sound signals that are stored as a reference patterns. If an object sound has sufficient similarity to any stored reference pattern then it is specified as that of the reference. The computer is used to perform the correlation operations between the input signal and the reference signal   .
This article attempts to provide a design of sound recognition system for the Frequency range up to 20 KHz. Figure 1 shows the sound spectrum (representation of sound in terms of the amount of vibration of each individual frequency).
Figure 1. Sound spectrum.
By comparing the spectrum of different signals, we can recognize between different sounds.
Object identification is a pattern-recognition and it is a worthwhile to consider some of the general properties of a pattern-recognition system. It operates by measuring a set of features from a representation of an object and then applying a classification to make a classification decision. In ARS, the comparison is done by the computer for any other recognizer. The general block diagram of such a system is shown in Figure 2.
2. Real Time Analyzer
Sound signal is a type of continuous signal and such signal will normally be analyzed in short section (short time). The most suitable analyzer for the sound signal is the real-time analyzer. The real-time analyzer obtains the whole spectrum in parallel from the same section of signal, and this is not only able to flow variable signals but can also obtain the spectrum very much faster because it searches for all frequencies within its frequency range at the same time. The most direct way of performing such system is simply to apply the signal to a parallel bank of filter/amplifier/detector channels as shown in Figure 3.
The electrical signal coming from microphone is pre-amplified to the BPF by means of a signal operational amplifier. Very high “Q-factor” centers at selective frequencies, then to the detector which detect the RMS value of the signal. The detection accuracy is governed by how narrow a band of frequencies of the filters is, and by how slowly the analyzer sweeps over the desired range of frequencies. In this system, each channel measures the power of its band, increasing the system channel will increase the analysis’s precision, i.e. the accuracy will increase. The sharpness of the filter will increase the accuracy.
2.1. Digital Analyzer
A digital spectrum analyzer takes advantage of the ability to convert a sampled
Figure 2. General sound recognition system.
Figure 3. Real time analyzer.
time series into a frequency spectrum directly, two principles are used: digital filter and Fast Fourior Transform (FFT) technique both of which can be realized either as a fast software on computer or hard ware using (DSP) processor. FFT is the efficient method to compute Discrete Fourier Transform (DFT), where it performs as a digital processing tool  .
The analyzer samples incoming signal at a high rates, saving the date into a bloke. The unit then performs a digital FFT operation on the data block to generate the power spectrum, and the resolution depend on the number of sample points in the block.
2.2. The Implemented System
Figure 4 shows the block diagram of the system. The microphone and pre-amplifier output signal is analyzed into octave band by means of a filter bank channels. Each filter in the bank allows a narrow frequency band to pass when the filter have equal pass band, they form a uniform filler bank; otherwise they form a non-uniform filtering.
The pass band filter helps to define the band of energy to be detected according to the standard specifications using non-uniform bandwidth filtering, sound band (125 - 1600) Hz divided into eight octave band:
where fc(i) is the centre frequency, fu(i) is the upper frequency, and fl(i) is the
Figure 4. Block diagram of the implemented circuit.
2.2.1. Automatic Control System
The objective of the control circuit is to detect the existence of the sound signal, and then it gives an order to the other circuits in the system to operate.
The detection is done by using a (710) comparator and (7404) inverter, the output RMs detector circuit is connected to the comparator to detect the sound signal, thus every conservative threshold voltage is chosen, so that the sound level always exceeds it. At the same time it is greater than the noise level in the air. The value of this threshold level is (100 mv) which is exacted from (+15V) supply by means of a voltage divider. The output of the comparator is at logic “0” when there is no signals, but when detect sound signal the output at logic “1”.
Figure 5 shows the timing diagram of the control circuit, signal (A) represent the RMS value of the input sound signal, while signal (B) is used to turn the multiplexer on, signal (C) is used to operate the counter type (74293) connected to the multiplexer.
2.2.2. The Interface Circuit
The interface block diagram is showing in Figure 6. The heart of the sound system is a special purpose processor called the Digital Signal Processor (DSP) which has a special architecture with an instruction set designed to process analogue information which has been converted to digital.
The RAM chip is usually used for storing temporary sound data. The ROM chip contains all the instructions necessary to operate the DSP. The sound card adapter can process and store audio signal directly from microphone or from the audio spectrum analyzer.
Figure 5. Timing diagram of the control circuit.
Figure 6. The interface circuit block diagram.
The use of the sound card as an interface circuit is very useful and easy, because it is available in all type of the modern computer, and it’s software is very flexible to be used, providing some necessary options like: a number of samples/sec, a number of bit/sample, etc.
3. Experimental Results
In this Section, the obtained experimental data will be demonstrated. Then experimental results will be discussed. A simple MATLAB program has been built to evaluate the Relative Average Voltage (RAV) of each measured signal.
3.1. Preparation of Experimental Data
The recognition system in this work is designed to recognize the objects by their sound signals. The data of the source sound signals are used in testing the system. Data are collected according to the following conditions:
- The environment recording condition of the sound samples must be the same, otherwise considerable difference in spectrogram will appear.
- The recording of the sound should be made in a quite free field in order to minimize the undesired noise.
- The recording of the one sound source is repeated many times in order to minimize the dissimilarities of the spectrograms.
- Using a high quality microphone in addition to the use of a wind-screen over the microphone to attenuate wind noise.
The spectrogram of the sound signal that obtained by the spectrum analyzer can be described as a three dimensional time-frequency-amplitude patterns, if the patterns of the spectrum vary with time then the signal will be time variant, as in Figure 7(a) otherwise it will be time invariant which means it has the same spectrum patterns at any time, as in Figure 7(b).
For speech signal the short time spectrum analysis must be accomplished where the recognition process is done with a part or short time of the sound signal, as shown in Figure 8.
The system of this work uses the RMS spectrum representation, the sound of the source are converted into digital data (samples) and delivered to the computer then they can be written as matrix, whereas the rows represent the eight frequencies of the spectrum analysis, while the columns represent the number of spectrum patterns and each element in matrix represents the RMS amplitude of the signal in the individual frequency, using “RAV” approach in which the spectrum patterns are converted into a single patterns vector
where AK(i) is Amplitude of the pattern(i) and N is Total number of samples per channel.
According to Nyquist theory (which states that the analogue signal must be sampled at a rate of twice the highest frequency)  , therefore the sampling rate of the A/D convertor must be 32 KHz (2 × 16) since the highest frequency in the system is 16 KHz.
Figure 7. (a) Spectrum pattern. (b) Spectrum pattern.
Figure 8. Spectrum the short time spectrum analysis for speech signal.
3.2. Recognition of Signals
All the patterns vectors of the reference sound are stored when a new analyzed signal of the unknown is received then it will be converted into RAV vector. This vector is then compared with all reference vector by calculating the absolute differences between its element and the element of the other vectors. The decision rule is that of the “nearest neighbor”.
Sound of eight words were analyzed and converted into RAV vectors and then stored as a reference patterns, these vectors are shown in Table 1.
Sound of six signals for people will be repeated again to obtain vectors that will be considered as test patterns this is shown in Table 2.
Table 3 repeated the final results by calculating the average absolute difference between the test vector and each of the reference vectors. The highlighted in red numbers indicated the lowest value in the row, where the best accuracy can be obtained.
The accuracy of the recognition depends on the similarity in the acoustic properties of the sound signal. Greater accuracy will be obtained whenever there are larger differences between these sounds.
Using a large amount of the sounds in the test might further improves the investigation of the system accuracy. It has been observed that greater accuracy
Table 1. Reference pattern.
Table 2. Test patterns.
Table 3. Average differences between references and test patterns.
would be obtained if each sound source has series of the patterns produced by several repetition of the recording process in different environmental conditions.
This means that increased accuracy depends on increasing the information available to the recognition process.
The sound of the sources is analyzed and converted into RAV vector and stored as eight reference patterns, while six of sound is repeated and stored as a test patterns as in Table 1 and Table 2 respectively. By using the average absolute difference between these tables, result reported in Table 3. For example, the value of 4.78 is the average absolute difference between the parameters of the coloumn (B) from Table 1 in test patterns and each parameter in coloumn in vector (A) in reference patterns from Table 2. Each parameter in the row of Table 3 is the average difference between test vector and reference vector.
By using real time analysis scheme with eight channels, a very good performance is verified in the evaluation of the spectrograms of the sound signals. Regarding the results in Table 3 which represent the average absolute difference between the test vector and each of the reference vectors, it is mentioned that the percentage recognition for (B, C, D, E, G) was 100% and for (F) was 90%, this means that the probability of error is 2%. Although there are substantial similarities in the sound of the words, and to improve the investigation of the system accuracy a large amount of the sounds in the test must be used.
Increasing the octave bands at centered frequencies may increase the accuracy of the sound recognition, also using a large number of filters such as a bank of (16) or (24) filters instead of the (8) filters in the spectra analysis may lead to a better accuracy. The system will very easily fail to recognize the sound source if the spectrum analyzer is very precise.
It has been observed that greater accuracy would be obtained if each sound source has series of patterns produced by several repetitions of the recoding process in different environmental conditions.
 Martonow, M. (2009) Design of Fusion Classifiers for Voice-Based Access Control System of Building Security. WRI World Congress on Computer Science and Information Engineering, Los Angels, CA, 31 March-2 April 2009, 80-84.
 Jurafsky, D. and Martin, J.H. (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.
 Rabiner, L.R. and Juang B.H. (2004) Statistical Method for the Recognition and Understanding of Speech. Rutgers University and the University of California, Santa Barbara; Georgia Institute of Technology, Atlanta.
 Poornima, S. (2016) Basic Characteristics of Speech Signal Analysis. International Journal of Innovative Research & Development, 5, 169-173.