Xu Yilong (徐翼龍), Li Wenfa, Wang Gongming, Huang Lingyun
(*Smart City College, Beijing Union University, Beijing 100101, P.R.China)(**College of Robotics, Beijing Union University, Beijing 100101, P.R.China)(***Tianyuan Network Co., Ltd., Beijing 100193, P.R.China)(****Beijing Tianyuan Network Co., Ltd., Beijing 100193, P.R.China)(*****Chinatelecom Information Development Co., Ltd., Beijing 100093, P.R.China)
Abstract
Key words: long short-term memory (LSTM), multi-target, natural language processing, stance detection
In recent years, the continuous improvement and development of social media has led an increasing number of people to use social media to share and discuss their attitudes toward different people, events, and objects. Analysis of such texts containing stances in the social media may help us understand their preferences and opinions. Such information plays an important role in public opinion analysis. A large number of researchers, such as Wang et al.[1]and Li et al.[2], have used stance detection technology to find such information.
Multi-target stance detection[3]is a sub-task of stance detection. It is used to mine the stance classifications of different targets in one text. Typical examples include mining the opinions of different politicians in elections and analyzing user recognition of different brands in similar products. In addition to determining the stance of the target, multi-target stance detection identifies the corresponding positions of different targets in the same text.
In the fields of multi-target stance detection, most methods combine the 2 tasks of target location (determine the context that describes the different goals) and stance detection (determine stance label based on a goal) into one task during execution. So, these methods tend to enlarge the structure of the model to enhance its ability to mine features.The result, however, provides the comprehensive optimal solution of the 2 tasks rather than the optimal solution of stance detection. Thus, stance detection of a certain target is easily affected by other target descriptions, which may reduce the accuracy of the result. Therefore, target location and stance detection should be executed successively and independently.
To enable such execution, the proposal is as follows. First, the context range in the text corresponding to the different goals needs to be located. In the case of the above example presented here, the context range concerns ‘Hillary Clinton being a liar.’ Then, the stance is determined by analyzing the target text in the context range. Based on the above statement, a bidirectional long short-term memory (Bi-LSTM) with position-weight is proposed to carry out the multi-target stance detection. Bi-LSTM can describe the dependency relationship between words from front to back and from back to front, and the position-weight vector can describe the impact of words on the different targets of stance detection.The multi-target stance detection database of the American election in 2016 is used to validate the proposed method.
In recent years, given the rapid growth in the number of social media users, researchers have begun to focus on stance detection from social media texts. In 2016, the international conferences SemEval[4]and the 5th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC)[5]provided annotated data on stance detection from the social media. Thereafter, some researchers began studying different types of data for stance detection[6,7]. Given the similarity between stance detection and sentiment analysis, researchers made efforts to distinguish between them[8], and attempted using sentiment analysis to obtain improved stance detection effects[9].
With regard to the social media, deep learning methods, including the traditional CNN and recurrent neural network (RNN)[10-13]as well as fusion models[14-17], are typically used to detect stance.
Subsequently, researchers began applying stance detection to texts containing different targets in the same category. This is called multi-target stance detection, and is regarded as a sub-task of stance detection[18-21]. For example, Liu et al.[19]proposed an approach to automatically obtain zones of each target and combined it with the LSTM method to obtain the corresponding stance. The results demonstrated the effectiveness of two-target stance detection. Lin et al.[21]designed a topic-based approach to detect multiple standpoints in Web text to generate a stance classifier according to the distribution of the standpoint-related topic-term. They produced the parameter values of the classifier with this adaptive method and proved its effectiveness through experiments.
The above methods do not use the positional relationship between words in the text and targets to help the algorithm enhance the ability to distinction between the content describing the different targets.Therefore, in order to obtain the best stance detection effect, it is necessary to extract the appropriate clause as the input text according to the context range corresponding to the target so as to avoid the influence of unrelated text. Therefore, in this paper, an unsupervised method to extract the context ranges of different targets in the text is proposed. Then, the Bi-LSTM network with position-weight is generated by combining position-weight with the Bi-LSTM approach. Finally, the stance labels of different targets are predicted with LSTM and Softmax classification. The details of the approach are explained in the next section.
The architecture of our model is shown in Fig.1. It consists of the following 5 modules: embedded layer, Bi-LSTM layer, position-weight fusion layer, LSTM layer, and Softmax classifier. The result of combining the word embeddings of all the target topics and the input text serves as the input of the model, and the output consists of the author’s stance labels for all the possible target topics.
Fig.1 Model architecture
The embedding layer transforms every word in the input text into one vector, each of which expresses the relationship between the words applicable to the context.By representing each word in a text as a 1×nvector, the text can be represented as anl×nmatrix (lis the number of words in the text, andnis the dimension of each lexical vector). Accordingly, the input text can be converted into a numerical matrix, which facilitates the feature extraction by the algorithm.
To extract valid features from unstructured text, LSTM is used to encode the text. The input of the LSTM is a tensor formed by arranging the embedded vectors of the words to be processed in order from front to back. The corresponding output is a tensor composed of implicit states of the LSTM units in order from front to back. LSTM can describe long-distance lexical dependency in the text and is suitable for text data modeling[22].
The Bi-LSTM network consists of a forward LSTM and a backward LSTM. The input of the forward LSTM network is composed of the embedded vectors of words arranged in order from front to back, and the input of the backward LSTM network is a series of the embedded vectors of words arranged in the opposite order. Thus, Bi-LSTM can describe the dependency relationship between words in the front to back and back to front directions. The output of Bi-LSTM is the result of the splicing of the output of the forward and backward LSTM units. Therefore, in Bi-LSTM network, each word will be first transmitted to a forward LSTM unit and then to a backward LSTM unit, and its output is the result of the splicing of the hidden states of the 2 LSTM units.
In the LSTM model[23], a unittis calculated as follows:
it=σ(xtUi+ht-1Wi+bi)
(1)
ft=σ(xtUf+ht-1Wf+bf)
(2)
ot=σ(xtUo+ht-1Wo+bo)
(3)
qt=tanh(xtUq+ht-1Wq+bq)
(4)
pt=ft×pt-1+it×qt
(5)
ht=ot×tanh(pt)
(6)
whereU∈Rd×nandW∈Rn×nare weight matrixes,b∈Rnis the offset vector,dis the dimension of the word embedding,σand tanh represent sigmoid and tanh activation functions andnis the output size of the LSTM network. The LSTM model consists of input gateit, forgetting gateft, and output gateot.
When using Bi-LSTM alone to extract the features, it becomes difficult to analyze the differences between multiple targets of the text. This leads to lack of pertinence when the algorithm processes multiple targets in the same text. Therefore, in order to reflect the differences among the corresponding features of different targets in the text, a two-stage method is designed. The first step calculates the ultimate position-weight vector, and the second step concatenates the position-weight vector and output of the Bi-LSTM layer.
2.3.1 Calculating the final position-weight vector
(7)
(8)
(9)
At this point, each component of vectorErepresents the influence of each word on the target, as shown in Fig.2. Each element inEis a value between 0 and 1.
Fig.2 Position-weight vector of 2 targets in the same text
In order to control the effect of vectorEon the prediction result, the coefficientμis used to expand each component of vectorEby a factor ofμ. The influence of positional weight on system can be changed by adjustingμ, as follows:
Eμ=E×μ
(10)
whereEμis the ultimate position-weight vector. Each element inEμis a value between 0 andμ.
2.3.2 Concatenating position-weight vector and output of Bi-LSTM
In the Bi-LSTM network, the output of each word is composed of the spliced hidden states of the forward and backward LSTM units. In addition, each word corresponds to one position-weight in theEμ. Thus, a new vector is produced by concatenating the position-weight of each word and the Bi-LSTM output. This vector is taken as the output of the position-weight fusion layer. This vector can not only describe the dependency between words in the different directions, but it can also describe the impact of a word on the different targets of stance detection.
To determine the stance labels from the fusion of the position-weight and the output of Bi-LSTM, the LSTM is used for re-encoding[23]. This process will re-extract features from the fused tensor from the previous section in the order of the text.The input of this layer is the output vector of the position-weight fusion layer, and the output is the hidden state of the last LSTM unit.
The output of the LSTM layer is taken as the input of this layer, and the Softmax classifier[24]is used to predict the stance labels of the different targets.
The specific process of completing the multi-objective position detection task is shown in Fig.3.
Fig.3 Flow chart for multi-target position detection
The experiment used the stance detection corpus for the US 2016 general election constructed by Sobhani et al.[3]. This corpus contains 3 datasets, each of which is a collection of tweets and stance labels of 2 candidates. In the original corpus, 2 target words of each sentence were combined for analysis in Ref.[3]. Distribution of data are shown in Table 1. In addition, the model parameters are shown in Table 2.
Table 1 Details of the experimental datasets
Table 2 Main parameter setting in our experiment
As a category task, stance detection is more inclined to improve the classification accuracy of the “favor” and “against” stances. Therefore, the averageF1 scores of “favor” and “against” (Favg) were used as the evaluation indictors[4].
The selected baselines are as follows.
Sequence-to-sequence (Seq2Seq)[26]. Recently, the Seq2seq model has achieved good performance when dealing with timing problems. Therefore, Ref.[3] applied this model to the multi-target stance detection problem. In this method, text is used as the input of the model, and the stance labels representing different targets are output. The advantage of this algorithm is that it not only mines the stance related to the target from the text, but also refers to the relationship between multiple targets.
Target-related zone modeling (TRZM)[19]. This model is proposed for multi-target stance detection tasks.It uses a region segmentation method to divide a text containing 2 targets into 4 parts, and then a multi-input LSTM is used to process these parts to detect the stance results.
In order to verify the effectiveness of the proposed method, the following 2 experiments are carried out: comparison between the proposed method and the related baselines, and comparisons of different parameters in the position-weight fusion layer.
3.4.1 Comparison between the proposed method and the related baselines
The experimental result of the algorithm is compared with those of the other algorithms, as shown in Table 3, where PW-Bi-LSTM is the bidirectional LSTM network with position-weight proposed in this paper. There is the result of PW-Bi-LSTM>TRZM >Seq2Seq, when comparing theF1 value of different methods. The conclusions drawn from these results are as follows.
1) Although this method has the ability to refer to different labels to detect the stance, the method does not take into account the effect of the positional relationship between the text and the target. This may be the reason why its effect is lower than TRZM and the model in this article.
2) The effect of TRZM is not good, but it can meet the actual requirement, because the combination of feature extraction and deep learning is a good way to improve multi-target stance detection.But this method splits the integrity of the text, which may be the reason for its poor performance.
3) The proposed method outperformed the other methods in 3 datasets and macroFavgare at 1.4% higher than the corresponding values in the other algorithms. Compared with the other methods, the proposed method can automatically extract the position features of different targets in the text and expand the tolerance for input difference. For input text with different targets, other methods may be impacted by other targets when detecting the given target stance, and their accuracies decrease subsequently. However, the proposed method can avoid the influence of irrelevant targets, and the accuracy does not change much.
Table 3 Performances of our approach and the compared methods
3.4.2 Comparisons of different parameters in the position-weight fusion layer
One of the key parameters affecting the performance of the proposed method is the coefficientμ, mentioned in Section 2.3. In order to determine the influence of the ultimate position-weight vector on the algorithm and to find the optimal coefficientμin the position-weight fusion layer,Favgfor different values ofμin the development and test sets in the 3 datasets are compared, as shown in Fig.4. The figure shows that when the ultimate position-weight vector is added to our algorithm (i.e.,μ≠0),Favgare significantly improved, which indicates that this addition can improve the result of the multi-target stance detection.In addition, the effect of the proposed algorithm is related to the coefficientμ. In the 3 datasets, the best results in development sets are achieved whenμequals 3, 5 and 10, respectively. Thus, the effect of the proposed algorithm can be improved by adjusting the coefficientμ.
Fig.4 Favg for different coefficients (μ) in the proposed method. The x-axis denotes the coefficient size, and the y-axis refers to Favg
In this study, Bi-LSTM network with position-weight for multi-target stance detection is proposed.The positional relationship between word and target is represented as a vector. And then this vector is embedded into the Bi-LSTM model to refine the stance detection. The experimental results demonstrate the validity of the proposed method, which states that adding the multi-target information can expand the tolerance for the input difference and diversity. In the future, additional position feature extraction methods and actual data covering a wider range of topics will be adopted for continuous improvement and optimization of the algorithm. In addition, it leads to a large volatility of the experiment that the number of data sets used in this paper is small. Therefore, in the follow-up work, the study of the volatility of the results will be considered.
High Technology Letters2020年4期