The steganography is a process of making the presence of secret data undetectable in a carrier  . The secret messages are embedded within a larger cover in such a way that the observer cannot identify the presence of contents of the hidden message. Various carrier file formats are included but digital images are mostly used as carriers due to the redundancy of data in images and the frequency of use on the internet. The steganalysis is the procedure of detecting the presence of hidden message embedded using steganography. The image steganalysis method utilizes the features that are affected by steganography and then a machine learning classification. To use this method, the steganalyst must extract the features from a training data set to train the classifier. Then, the classifier is validated by using a testing dataset. If the results are accurate, then the classifier is considered as an effective classifier.
The classification capability is decreased as the testing dataset differs from the training dataset. The data set acquired after feature extraction depends on the steganography algorithm used for hiding the data into cover source, feature extraction algorithm and the cover source properties. If similar cover source is employed, the feature extraction process delivers the data set with similar representation. Therefore, the results produced from the classifier would be satisfactory. Likewise, if different cover source is used, the obtained data set from feature extraction process is also different and the classification results are degraded.
Nowadays, the complexity of steganography algorithms is increased. So, the image steganalysis must be designed with high dimensional feature space  . Some of the ancient steganographic methods are invisible ink, spread spectrum communication, covert channel and microdot  . The fundamental requirement of steganographic system is that the hidden information causes only minor modifications in the cover object. The steganography methods are classified into two types.
・ Spatial domain steganography,
・ Transform domain steganography.
The spatial domain method is frequently used due to its high capacity of hidden information and easy implementation  . Here, the secret messages are embedded directly by altering the pixel intensity values of an image. This method is further divided into Least Significant Bit (LSB) insertion technique, Pixel Value Difference (PVD) method, mapping pixel to hidden data, Random Pixel Embedding (RPE) method, Edge Based data Embedding (EBE) method, Labeling or connectivity method, Histogram shifting method, Pixel intensity based method, Enhanced Least Significant Bit (ELSB) algorithm, and Texture based method. In LSB based methods, least significant bits of the pixels are replaced by the message bit for embedding. In block based method, the cover is divided into equal sized blocks before embedding the data. In Edge based method, the sharper edge regions are used for hiding the message. In these methods, ELSB algorithm offers more security. In grey level method, the host images are divided into blocks and the corresponding secret message bits are embedded into each block by using layers. The layers are prepared by the binary representation of pixel values. The spatial domain methods reduce the chance of degradation of original image and deposit larger information in an image. The disadvantages are that these methods are less robust, and hidden data may be lost in image manipulation and hidden data can be easily destroyed by simple attacks.
The transform domain method embeds the secret message information in the format of transform coefficients of the cover image  . Numerous transformations and algorithms are applied on the image for hiding the message information. Transform domain has advantages over spatial domain as the information hidden in the image is will not be affected by compression, cropping, and image processing. These methods are classified into following categories: Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (DWT), Lossless or reversible method, and Embedding methods. Every steganography method employs a special mechanism to embed the secret data in the images. Therefore, it positions a distinct pattern on the stego images.
In general in a steganalysis scheme the stages are image preprocessing, feature extraction and classification. The preprocessing eliminates the noise present in the images. Feature extraction is an important concept in the image classification. The most relevant features are extracted from an image and used for the classification.
This paper is arranged as follows. Section 2 describes the steganalysis techniques. Section 3 discusses the filtering techniques for image preprocessing, Section 4 presents feature extraction techniques and Section 5 presents classification techniques. Section 6 discusses the results and conclusions based on the survey.
2. Steganalysis Techniques
The steganalysis techniques are categorized into two types, signature steganalysis and statistical steganalysis. This classification is based on the method used to detect the presence of hidden message in the image embedded by the steganography. The secret information hidden in the image or any other digital media, which is invisible to human, causes the modification of media properties that produces the degradation, unusual characteristics and patterns. These patterns and characteristics may act as a signature that is broadcasted with the embedded message. The signature steganalysis is further divided into specific signature steganalysis and universal signature steganalysis. The information hiding process changes the statistical properties of the cover, which a steganalyst attempts to detect. The process of attempting to detect such statistical traces is called statistical steganalysis. Statistical steganalysis is considered as a powerful tool as compared with signature steganalysis due to its sensitivity. Some of the methods based on statistical steganalysis are LSB embedding steganalysis, LSB matching steganalysis, spread spectrum steganalysis, JPEG compression steganalysis, addictive noise steganalysis, and transform domain steganalysis  . The following section of the paper discusses the recent steganalysis methods are presently used.
The block based steganalysis method  provides an accurate steganalytic performance without increasing the amount of features. It has two phases, training process and testing process. In training, an image is decomposed into smaller blocks and each block is considered as the basic unit for steganalysis. The stego image is obtained by embedding the secret image into the cover image. This process is implemented on every cover image to acquire the cover- stego image pairs. This image pair is decomposed into smaller blocks of size. A set of desired features are extracted from each image block and a tree structured clustering algorithm is applied to classify the blocks into various classes based on their extracted features. In testing, same block decomposition and feature extraction are carried out on the test images. Then, the blocks are classified into single block class and the decision is taken to identify the stego block. The decision of multiple blocks are integrated into a single decision based on the majority voting rule. If the number of cover blocks is larger or smaller than the stego blocks, then it is declared as stego image.
The Spatio-Color Rich Model comprises of two different components, the Spatio Rich Model (SRMQ1) and Color Rich Model (CRMQ1)  . The SRMQ1 and the CRMQ1 features are calculated and used efficiently for the detection of steganography in images. The SRM noise residuals are calculated by using linear and non-linear local pixel predictors. The residuals are further classified into five classes on the basis of the structure of filters: first order, second order, third order, edge kernels and square kernels. The Collaborative Representation (CR)  utilizes all the training samples from both cover and stego for the representation of testing samples to solve the least square problem. The regularization term is changed from norm to norm for regularizing the coding vector and producing lower complexity.
The Least Significant Bit (LSB) matching steganalysis operates from the difference of neighboring pixels before data embedding and after data embedding  . The significant information of neighboring pixels are gathered to create patterns. Then, the numbers of patterns present in the image are calculated and the random data is embedded into the test image. The number of patterns are again calculated after completing embedding. The relationship among number of patterns in each case is varied, based on whether the test image is cover or stego. The accurate detection of steganography in color images is achieved by the Difference Histogram Characteristics Function (DHCF)  . If the distance among the pixels are high, then the difference between the pixel gray values are smoothed by the stego bits obtained by LSB matching. The extracted features from the difference of the non-adjacent pixels are used to train the classifier.
3. Image Preprocessing Techniques
The preprocessing is applied to images to remove the redundant and irrelevant content present in the image  . The noise incorporated into the image is also eliminated.
3.1. Discrete Wavelet Transform (DWT)
The Discrete Wavelet Transform (DWT) based preprocessing technique improves the security of steganography  , which embeds the secret message in the frequency domain of the cover image. It hides the messages in substantial areas of the cover image to make it more robust to the attacks.
3.2. Vector Rank M-Type L Filter (VRML)
The Vector Rank M-type L (VRML) filter removes the impulsive noise and speckle noise from the color images and video sequences  . This filter consists of two estimators such as Median-M type (MM) and Ansari-Bradley Siegel-Tukey M-type (AM) for providing robustness in the filtering technique. The impulsive noise detectors are also included to improve the properties of noise suppression in the low and high densities of impulsive noise. It has edge preservation property and power in noise detection during the presence of impulsive noise.
3.3. Successive Mean Quantization Transform (SMQT)
The Successive Mean Quantization Transform (SMQT) extracts the structure of data in robust manner  . Here, the capacity of the embedding is identified from the non-zero Discrete Cosine Transform (DCT) coefficients. In Boosted Steganography Scheme (BSS), a cover image is preprocessed in two stages, once in preprocessing stage and then in embedding stage. In preprocessing stage, the image processing techniques such as contrast enhancement, SMQT and other manipulation methods are applied on the cover images.
3.4. Scale Invariant Feature Transform (SIFT)
The image saliency detection based preprocessing technique utilizes the quaternion transform. The region segmentation and Scale Invariant Feature Transform (SIFT) are considered as the preprocessing technique to extract the accurate features by removing the irrelevant features  .The textural features like contrast, coarseness, directionality, regularity and roughness are traced.
3.5. Anisotropic Diffusion (AD)
The Anisotropic Diffusion (AD) based pre-processing performs the image smoothing and image enhancement  . The matrix representing the image is flipped left to right in the vertical axis.
3.6. Chirp Z-Transform (CZT)
The combination of Chirp Z-Transform (CZT) and Goertzel algorithm are used as the preprocessing technique to normalize the image  . The input image is divided into specific number of same sized blocks and CZT is applied to every individual blocks. The CZT calculates the Z transform at M points in Z-plane and it transforms the image into Z-domain. Then, the Goertzel algorithm is applied to the transformed image as a reconstruction algorithm that is produced an image with the inversion of an original image. The normalized images is acquired after completion of reconstruction process.
3.7. Other Filters
The 2 Dimensional (2D) Wiener filter is an adaptive noise removal filter that performs low pass filtering of a gray scale image.
A Gaussian filter is a low pass filter with minimum time bandwidth product. The Gamma Intensity Correlation (GIC) enhances the local dynamic range of the image in the dark or shadowed region during compression.
4. Feature Extraction Techniques
The feature extraction is the process of extracting the essential information or characteristics from the original image  . Some of the feature extraction techniques are discussed in this section.
4.1. Relative Auto Decorrelation (RAD)
Relative Auto-Decorrelation (RAD) feature extraction method extracts the intrinsic features that improve the detection of stegos from the cover  . The appropriate features like Local Entropies Sum (LES) and Clouds Min Sum (CMS) are acquired to obtain fast detection accuracy and improves the steganalysis results. The common part of an image is processed by a smart partioning procedure known as two-dimensional Decorrelation of the received images (2D). The presence of secret message is identified by the quadratic estimator that provides the estimation of embedding rate. The rate estimation is modified by the thresholds derived from the RAC, LES, and CMS. This type of steganalysis consists of three stages such as image clouding, RAD feature extraction and quadratic rate estimation. The clouding splits the input image into luminance aware slices. Every cloud has same luminance pixels in an edge free region of the image.
4.2. Discrete Cosine Transform (DCT)
The differential Discrete Cosine Transform (DCT) coefficients extracts the absolute values of the Neighboring Joint probability density (absNJ) features  . This is done by applying
4.3. Gabor Wavelet Transform
The Shannon entropy of 2-Dimensional (2D) Gabor wavelet feature extraction model has joint localization properties in spatial domain  . The image texture characteristics are captured by the spatial frequency. The Shannon entropy values of image filtering coefficients are considered as steganalysis features. Initially, JPEG image is decompressed to the spatial domain without quantizing the pixel values to avoid the information loss. The image is decomposed into 8 × 8 DCT blocks and each block has 64 different DCT modes. Each filtered image can be subsampled to form a sub image. The decompressed image is filtered through 2D Gabor wavelets. Then, the entropy features are extracted from all filtered images. The entropy is the randomness measure that is used to characterize the image texture. The classifier is trained by the selected entropy features.
4.4. Spectrum Based Feature Extraction
The spectrum based feature extraction utilize the combination of DCT and Discrete Fourier Transform (DFT)  . The input image contains many features and the transformation of the image in frequency domain is applied to reduce the redundancy information present in the image. The subset of transformed coefficients are required to reserve the desired important features for recognition. The DFT is applied on the pre-processed image. The low frequency components are situated in the center of the spectrum. The center portion is extracted by using centered rectangular mask. Then, the DCT is applied to the DFT spectrum to enhance the recognition rate. The DWT-Dual subband Frequency domain Feature Extraction (DDFFE) includes the combination of DWT, DFT and DCT for efficient extraction, translation and illumination of invariant features  . The DWT utilize the approximation coefficients along with the horizontal coefficients of 2-dimensional images. The DFT is applied to compensate the translation variance problem of DWT, which extracts the frequency characteristics of the image. The low frequency components present in the center of the DFT spectrum are extracted by the quadruple ellipse mask around spectrum center. The DFT includes the information of both magnitude and phase that is extracted separately. DFT is applied on the preprocessed image and the low frequency components at the corner of the image spectrum is shifted into the center of the spectrum.
4.5. Perturbed Quantization (PQ)
The Perturbed Quantization (PQ) steganography analyzes the special positions of the feature extraction  . The changes of the global, local and dual histogram features are extracted from the DCT coefficients. The PQ is applied to the double compressed JPEG image and its DCT coefficients may be changed after completing PQ embedding. The global histogram is developed from all DCT coefficients and it is referred as the first order statistic feature to reflect the overall distribution of DCT coefficients. The local histogram reveals the distribution of all DCT blocks. The dual histogram exposes the distribution of the distinct value in some local positions of all DCT blocks. All histogram features are calculated and combined to form the enhanced histogram features to improve the detection performance.
4.6. Non-Sampled Contourlet Transform (NSCT)
The Logarithmic Non-Sampled Contourlet Transform(NSCT) extracts the invariant features like strong edges, weak edges and noise in the image  . Through the extraction and recombination of these features, two types of components are produced such as illumination component and reflectance component. The low pass subband of the image and the low frequency part of strong edges are included as the illumination component. The pathetic edges and high frequency part of strong edges are denoted as the reflectance component. The face image is polluted by the noise and the reflectance component is applied to remove such noise. The NSCT algorithm is include to effectively capture contour in the original image.
5. Machine Learning Based Classification Techniques
The steganalyst trains the classifier with increasing more complex cover model and large data set to achieve more accurate and robust detectors.
5.1. Ensemble Classifier
The Ensemble classifier allows the fast construction of steganography detector with improved detection accuracy  . To build a detector, the model for the cover source is selected for the detection of steganography. Such covers are represented in a lower dimensional feature space before training the classifier. The ensemble classifier includes many base learners that is trained on a set of cover and stego images. The concluding decision is made by aggregating the decisions of every individual base learners.
5.2. Ensemble Based Extreme Learning Machine (EN-ELM) Classifier
The Ensemble based Extreme Learning Machine (EN-ELM) algorithm is required, where the ensemble learning and cross validation are implanted into the training phase  . The ensemble is constructed with several predictors on the training set by various set of random parameters. The learning is started by partioning the entire training set into number of subsets. Each learner is trained with the reduced subset. The parameters of every predictor is updated according to particular conditions. The decision is made by the testing samples by majority voting scheme. After completion of training, all predictors present in the ensemble are stored on the basis of its norm in an increasing order. The first half of the ensemble is utilized for making the decisions through majority voting scheme. The class receives the highest vote is treated as the predicted label and the total vote received by the each class is computed.
5.3. Extreme Learning Machine (ELM) Classifier
The Extreme Learning Machine (ELM)  produces the best performance in multi-label classification of large dataset. It is applied for the Single hidden Layer Feed forward Neural networks (SLFN). The hidden node parameters are created randomly from the continuous sampling distribution and it is need not to be tuned. This technique reaches the smallest training error and smallest norm of output weights to produce better generalization performance. The minimized norm of output weight leads to maximize the distance of two different classes in the ELM feature space. The ELM implementation procedure includes the minimal norm least square method. The ELM provides a unified learning platform with a large type of feature mappings that can be applied to the regression and multi class classification  . It utilizes the single output node and the class label that is closer to the output value is considered as the predicted class label of the input data. The solution of binary classification case is became a specific case of multiclass solution. If the ELM utilizes multi output nodes, the index of the output node with highest output value is chosen as the label for the input data. The ELM involves the reduced human intervention than Support Vector Machine (SVM). If the feature mapping is known to the user, only one parameter is required to specify the user.
5.4. Differential Evolution (DE) Based ELM Classifier
The Differential Evolution (DE) based ELM includes the cross validation accuracy as the performance indicator to determine the optimal ELM parameters  . The integration of spatial context information in the learning process produces an enhanced classification results. The hyper spectral images are categorized by high dimensional spectral features. Initially, the feature reduction is applied to reduce the dimensionality of the data. Then, the morphological operations are included for this feature. The mode selection issue associated with the ELM is addressed with the simple grid selection procedures. The required kernel matrix is calculated from the training samples.
5.5. Cognitive Ensemble ELM Classifier
The cognitive ensemble ELM classifier is based on the hinge loss function  . Here, the individual ELM classifier is established on the basis of hinge error loss function. The cognition is obtained by the weighted sum of individual classifiers through the enhancement of wining classifier. The performance of the ELM classifier is affected by the random choice of centers and widths. The Gaussian center is distributed in the entire range of input space. The cognition of the ensemble calculates the output of all classifiers and listed the correctly classified samples. The samples that are correctly classified repeatedly are denoted as winning samples. The misclassified samples by many classifier is referred as losing samples. Finally, the output is obtained by the weighted sum of all contributing classifiers for all samples.
5.6. Support Vector Machine (SVM) Classifier
The Support Vector Machine (SVM) classifier is a standard classifier of the machine learning algorithm for binary classification  . The optimal linear decision surface is found through the minimization of training feature vectors. The feature vectors are planned in the high dimensional feature space. The maximum margin hyperplane is established by the SVM that divides the different class feature vectors with maximum margin. If the margin between the vectors is high, then the classification performance is improved. The LIBSVM offers a penalty additional function for unbalanced class, which has 4 kernel functions. The kernel function helps to project the input vector in the high dimensional feature space. Generally, the kernel functions are categorized as radial basis function, polynomial, linear and sigmoid. The LIBSVM utilizes the radial basis function as the kernel function, which non-linearly moves the input feature vector into high dimensional feature space.
The difference of the detection performance and the relationship of detection results of sub-classifiers are learned by the fusing SVM classifier  . It has three phases: training of sub-classifiers, training of fusing classifier and testing of fusion classifier. The extracted features are divided into various groups regarding the correlation of features. The fusion classifier is trained on the basis of the detection results. The detection accuracy of steganalysis is improved by increasing the classifiers.
5.7. Bayesian Ensemble Classifier
In Bayesian ensemble classifier, the Bayesian estimation method is incorporated with the ensemble classifier for the improvement of classification performance  . The extracted features are applied to train the various sub-classifiers. The training set has original images and stego images. The relevant features are extracted from each image to formulate a feature vector. A series of low dimensional classifiers are employed to avoid the classification in high dimensional space. The Fisher Linear Discriminate (FLD) classifier is trained by the sub vectors and its threshold is found by the minimal sum of false alarm and detection rate. The sub classifiers is denoted as FLD, which is trained with the feature vectors are integrated to make the optimized decisions by the Bayesian mechanism.
6. Results & Discussion
From this survey, it is evident that the existing VRML, SMQT, SIFT and AD preprocessing techniques has less probability to detect hidden data, increased computational complexity with reduced speed. The folding based median filter provides the better reduction of noise. Also, the LNSCT, DCT, Shannon entropy methods has slow processing speed and the complex regions are not extracted. The texture patterns are extracted by the symmetrical pattern based extraction method. Moreover, the ELM and SVM classifiers have increased number of feature vectors. The machine learning classifier efficiently classifies the labels in the steganography. The results of this survey is depicts in the Table 1. The overall process flow of the steganography is shown in the Figure 1. The median testing error for various stegmethod is shown in the Table 2 and its corresponding graphical representation is depicts in Figure 2. Likewise, the median testing error for various feature extraction method is represented in Table 3 and its graphical representation describes in Figure 3.
7. Conclusion and Future Work
In this paper, various techniques for preprocessing, feature extraction and classification of image steganography are surveyed. The existing preprocessing techniques have less probability to detect the hidden data and reduced speed of op-
Table 1. Information about different preprocessing, feature extraction and classification techniques.
Figure 1. Overall process flow of steganography.
Table 2. Median testing error for various steg method.
Figure 2. FMedian testing error (PE) for various Stego methods.
Table 3. Median testing error for various feature extraction method.
Figure 3. Median testing error (PE) for various Feature extraction methods.
eration. The traditional feature extraction techniques do not provide the efficient detection of stego images and the complexity of the message retrieval is also increased. The previous classification methods are very difficult to understand and do not provide the exact classification result. Some of the disadvantages of these classification algorithms are computational complexity, large computational time and requires large training set. To overcome the above mentioned drawbacks, the folded version of mean filter will be developed as the preprocessing technique for smoothening the image. The symmetrical pattern based feature extraction for effectively extracting the large number of features from the image and the integrated machine learning classification method for accurate classification of steganography methods will be implemented in future.