Text sentiment analysis is also known as opinion mining and tendency analysis. In short, it is the process of analyzing, processing, inducing, and inferring subjective text with emotion. It has a wide range of applications in public opinion monitoring, stock and movie box office forecasting, and consumer preference analysis  . Traditional affective analysis methods are mainly based on affective dictionary and machine learning, but there are some difficulties in using these two methods for affective analysis. Firstly, the text is unstructured. The length of the text is difficult to fit the classic machine learning classification model. Secondly, feature extraction is difficult. The text may be talking about a certain topic, or it may be talking about a person, a product, or an event. Not only does it take a lot of effort to extract features manually, but the results are not good. Thirdly, there is a link between words, and it is also difficult to incorporate this part of the information into the model. “How to reduce manual work to a greater extent, and can quickly mine valuable information and perform sentiment analysis”―as a result of thinking about the above issues, deep learning successfully entered everyone’s field of vision.
Deep learning is a general term for a series of machine learning algorithms based on feature self-learning and deep neural networks (DNN). Its advantages are its strong discriminative ability and feature self-learning ability. It is very suitable for high-dimensional, unlabeled, and big data features. This article divides text sentiment analysis based on deep learning into the following research tasks: 1) Briefly introduce and compare several classic methods of text sentiment analysis, and point out the advantages of deep learning; 2) Introduce several existing mature deep learning methods and make relevant notes; 3) Summarize the existing problems in text sentiment analysis based on deep learning, and put forward suggestions and prospects.
2. Brief Review on the Research Progress of Text Sentiment Analysis
Text sentiment analysis is also called sentiment mining. The core of sentiment analysis is to classify the data you have, The first is the subjective and objective classification of text to reduce the interference caused by objective text to the analysis, and the other is to classify subjective texts  , and dividing emotions into several categories according to people’s emotional expressions for analysis of certain situations; In addition, the analyzed text can be divided into chapter-level, paragraph-level, and sentence-level. Different text lengths will result in different methods used in processing text. The following mainly introduces some mainstream methods of subjective information sentiment classification, and points out the problems and deficiencies in the current stage of sentiment classification research.
2.1. Emotion Dictionary-Based Approach
The sentiment dictionary-based sentiment analysis method is an unsupervised analysis method, and usually requires the method of “affective dictionary + manual judgment” for analysis. Turney  divides emotions into two categories: excellent and poor, and then introduces the method of pointwise mutual information (PMI) to calculate the semantics between the selected word and the excellent or poor words, respectively. The similarity is used to find the semantic orientation (SO) of the candidate words. The formula is as follows:
SO (phrase) = PMI (phrase, “excellent”) ? PMI (phrase, “poor”) (1)
Alistair et al.  believe that it is necessary to consider the polarity transition factor of each sentiment word in the current context (CVS); in 2012, Jinan et al.  studied two different sentiment dictionaries and three different scoring methods are used for sentiment analysis. The scoring method includes the commonly used weighting techniques for retrieving data, word frequency-inverse text frequency (TF-IDF), and potential Dirichlet allocation (LDA) strategy. However, the above methods are all based on artificial dictionaries, with limited coverage and artificial errors. In recent years, with the explosion of network data and the continuous increase of network language, this single method has been unable to solve the problems of a large number of unknown words and complex ambiguous words. But for small amounts of text, its accuracy is very high, so we can consider using it in combination with other methods.
2.2. Machine Learning Methods
The core of sentiment analysis based on machine learning is effective feature extraction, and then using classifiers for emotion classification. In 2002, Pang  and others first used machine learning algorithms for sentiment classification tasks, and proposed Naive Bayes (NB), Maximum Entropy (ME), and Support Vector Machine (SVM) and other models for sentiment classification of text. Here only introduces an algorithm, taking the Weibo comment of an event as an example, The NB algorithm is that given several sentiment categories, it is assumed that the target data is independent between several sentiments, and then input text data to find the maximum probability of the target data appearing in each text category, which is the corresponding text categories to solve text classification problems; In recent years, machine learning-based sentiment classification models have been widely studied, which has led to rapid development of machine learning in sentiment analysis. The machine learning-based method runs faster, but still requires a lot of manual annotation and other operations. High-quality data integration is costly and time-consuming. Its classification performance is also limited by the design of complex features, and has poor adaptability in different fields.
3. Introduction to Text Sentiment Analysis Based on Deep Learning
In 2006, the concept of deep learning was proposed  , and in 2011, Socher  introduced a model based on recursive autoencoders to perform sentiment analysis on movie evaluation, and the effect is more obvious than traditional methods. In recent years, CNN, RNN, LSTM and other methods have been gradually applied to sentiment analysis, and their effects have been significant. This chapter will summarize the characteristics of deep learning methods and introduce the characteristics of several deep neural networks and their applicability in sentiment analysis of texts.
3.1. Features of Deep Learning Methods
Compared with the sentiment dictionary method and machine learning method, deep learning method is not perfect. It also has advantages and disadvantages for different types of text. In order to make it play a better role, the following summarizes and discusses its advantages and disadvantages.
Firstly, deep learning methods can automatically learn multi-level features, replacing the tedious manual feature extraction in machine learning, and because of the powerful learning and expression capabilities of deep neural networks, the results are often more accurate than traditional methods. However, due to its powerful expression ability, many useless parameters will be generated at runtime, which requires a large number of data samples for network training. It can be seen that this method is more suitable for sentiment analysis of large amounts of data, and traditional methods are more accurate for sentiment analysis of small volumes of data.
Secondly, the focus of traditional machine learning methods and dictionary construction methods is how to build a mathematical model and what features to extract. However, the focus of deep learning methods is to design a more efficient network structure and how to train more accurate network parameters.
Thirdly, due to the powerful autonomous learning function, deep neural networks can automatically adjust the weights of network parameters to achieve the desired effect as much as possible. The same model and training method may be applied to different problems, but for different problems, the network structure and parameter weights are different, the whole structure is like a function, the input and output are one-to-one corresponding. Because of this, deep learning can be applied to many different fields and has achieved good results. However, due to the diversity and complexity of the language text, it is easy to make the emotional evaluation deviate, especially for the Chinese language, which is also the key to further improve the deep learning.
3.2. Characteristics and Applicability of Several Deep Networks
In recent years, deep network models have been continuously innovated and developed. Different network structures have made their respective characteristics and functions different. It is mainly reflected in the type of text (for example, long text and short text), the granularity and scale of the problem, and the type of the problem. In the following, some of the more classic deep network models are briefly analyzed and summarized in terms of text sentiment analysis.
3.2.1. Based on CNN (Convolutional Neural Network Model)
Convolutional neural network is a kind of feedforward neural network  . In recent years, it has been widely used in natural language processing, speech recognition, and image processing. Its structure is mainly composed of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. The structure is shown below in Figure 1:
Figure 1. Structure of a classic convolutional neural network (Note: picture reference  ).
As shown in Figure 1, taking short text as an example, the input layer is a vector representation of the input data, where the matrix is represented as:
Among them, n is the word length of the sentence and k is the dimension of the word vector.
Next, the convolution layer performs a convolution operation on the input matrix and vectorizes the input data to extract local features. The result can be expressed as:
Among them, represents the i-th eigenvalue corresponding to the convolution operation; represents the weight matrix; represents the bias; represents the activation function; represents the length of the i to words in the sentence. After performing the convolution operation on the input matrix, the convolution kernel feature vector map is obtained as:
among them, .
The pooling layer is an important layer in the network structure. It can extract important features from the feature vector map obtained from the previous layer. In more operations, the maximum pooling method is used for sampling. The obtained features are expressed as:
The convolution operation is used to obtain the vectorization of the sentence through the vectorization of the words, and then learn the vector representation of the sentence as a feature, which makes it more suitable as a way to deal with the sentiment analysis problem of short text. Not only can multiple channels be used for multi view feature extraction, but also the number of parameters can be reduced by sharing weights, but the main disadvantage is that the complexity is high when processing long text, and with the increase of convolution layer, there will be problems such as gradient disappearance.
3.2.2. Based on RNN (Recurrent Neural Network Model)
Recurrent neural network mainly includes input layer, hidden layer and output layer. For some text data, there may be a relationship between the front and back, that is, there is a temporal relationship between the data. The “memory function” of the recurrent neural network is reflected here. Compared to ordinary fully connected neural networks, each neuron of the recurrent neural network will remember the output value of the previous moment, and affect the calculation of the output value of the current moment to a certain extent. The structure of the recurrent neural network is shown below in Figure 2.
Calculated as follows:
Among them, x is the value of the input layer; s is the output of the hidden layer; U is the weight parameter when calculating from x to s; V is the weight parameter when calculating the hidden layer to the input layer; W represents the weight parameter of the influence of the value of the hidden layer before calculation on the value of the hidden layer at the current moment; O represents the value of the output layer.
But the recurrent neural network has its own shortcomings. During data training, if a longer sequence appears, the gradient will disappear or the gradient cannot be updated. Therefore, RNNs have a poor ability to capture long text information. Based on traditional RNNs, they are more suitable for sentence-level sentiment analysis problems (such as Weibo reviews). Hochreiter  and others proposed long-short-term memory networks (LSTMs), and Cho  and others proposed gated recurrent units (GRU). These recurrent neural network variant structures effectively solve the problem of long-term dependence by introducing gate layers such as forget gates to process input data. Text sentiment analysis belongs to a type of natural language processing. Words are related to each other and depend on each other. Therefore, the “memory function” of the recurrent neural network shows its advantages. It can analyze the feature associations between the words before and after in the sentence to extract more accurate features. With the introduction of LSTM, GRU and other models, the problem of long text gradient disappearance has been solved, making recurrent neural networks widely used in the field of sentiment analysis.
3.2.3. Based on FNN (Fuzzy Neural Network Model)
FNN networks, the initial text representations are generally BOW and VSM
Figure 2. RNN structure based on time. (Note: picture reference  )
models with great sparsity, which is more suitable for processing text-level sentiment analysis problems at the chapter level. Because the text set of the same size will cause the initial representation of the short text to be too sparse, the problem will not be obvious. Therefore, the short text can be processed by controlling the size of the text set. Model training generally combines unsupervised pre-training and supervised parameter adjustment; accordingly it can use a large amount of unlabeled data, which is also its advantage.
4. Summary and Prospect
This article briefly reviews and analyzes traditional methods of text sentiment analysis. It mainly introduces several different deep learning methods and text data for different categories, and further summarizes and analyzes their unique advantages and applicability. Deep learning method saves a lot of complicated process of complicated feature extraction compared with machine learning method, but it has its own shortcomings. If there is supervised deep learning, it still needs to label a large number of data sets for model training. In the case of unsupervised deep learning, the requirements for semantic association are very strict. But the understanding of semantics is diverse and often causes ambiguity, which affects the degree of relevance. Therefore, the sentiment analysis of text based on deep learning still needs further research, and the author will continue to work hard in this direction.
 Turney (2002) Thumbs Up or Thumbs Down? Semantic Orientation Applied Unsupervised Classification of Reviews. Meeting on Association for Computational Linguistics. Association for Computational Linguistics, ACM Press, Philadelphia, PA, 417-424.
 Alistair, K. and Diana, I. (2006) Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence, 22, 110-125.
 Jinan, F., Osama, M., Sabah, M., et al. (2012) Opinion Mining over Twitter Space Classifying Tweets Programmatically Using the R Approach. Proceedings of the 7th International Conference on Digital Information Management, Macau, China, 313-319.
 Pang, B., Lee, L. and Vaithyanathan, S. (2002) Thumbs up? Sentiment Classification Using Machine Learning Techniques. In: Proceedings of Empirical Methods in Natural Language Processing, MIT Press, Cambridge, MA, 79-86. https://doi.org/10.3115/1118693.1118704
 Socher, R., Cliff, C.L., Andrew, Y., et al. (2011) Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In: Getoor, L. and Scheffer, T., Eds., Proceedings of the 28th International Conference on Machine Learning Bellevue, Omni Press, Madison, WI, 129-136.
 Cho, K., Van Merrienboer, B., Gulcehre, C., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv Preprint arXiv:1406.1078.