Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.  相似文献   

Social media represents an emerging challenging sector where the natural language expressions of people can be easily reported through blogs and short text messages. This is rapidly creating unique contents of massive dimensions that need to be efficiently and effectively analyzed to create actionable knowledge for decision making processes. A key information that can be grasped from social environments relates to the polarity of text messages. To better capture the sentiment orientation of the messages, several valuable expressive forms could be taken into account. In this paper, three expressive signals – typically used in microblogs – have been explored: (1) adjectives, (2) emoticon, emphatic and onomatopoeic expressions and (3) expressive lengthening. Once a text message has been normalized to better conform social media posts to a canonical language, the considered expressive signals have been used to enrich the feature space and train several baseline and ensemble classifiers aimed at polarity classification. The experimental results show that adjectives are more discriminative and impacting than the other considered expressive signals.  相似文献   

Although deep learning breakthroughs in NLP are based on learning distributed word representations by neural language models, these methods suffer from a classic drawback of unsupervised learning techniques. Furthermore, the performance of general-word embedding has been shown to be heavily task-dependent. To tackle this issue, recent researches have been proposed to learn the sentiment-enhanced word vectors for sentiment analysis. However, the common limitation of these approaches is that they require external sentiment lexicon sources and the construction and maintenance of these resources involve a set of complexing, time-consuming, and error-prone tasks. In this regard, this paper proposes a method of sentiment lexicon embedding that better represents sentiment word's semantic relationships than existing word embedding techniques without manually-annotated sentiment corpus. The major distinguishing factor of the proposed framework was that joint encoding morphemes and their POS tags, and training only important lexical morphemes in the embedding space. To verify the effectiveness of the proposed method, we conducted experiments comparing with two baseline models. As a result, the revised embedding approach mitigated the problem of conventional context-based word embedding method and, in turn, improved the performance of sentiment classification.  相似文献   

Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort.  相似文献   

Sentiment analysis on Twitter has attracted much attention recently due to its wide applications in both, commercial and public sectors. In this paper we present SentiCircles, a lexicon-based approach for sentiment analysis on Twitter. Different from typical lexicon-based approaches, which offer a fixed and static prior sentiment polarities of words regardless of their context, SentiCircles takes into account the co-occurrence patterns of words in different contexts in tweets to capture their semantics and update their pre-assigned strength and polarity in sentiment lexicons accordingly. Our approach allows for the detection of sentiment at both entity-level and tweet-level. We evaluate our proposed approach on three Twitter datasets using three different sentiment lexicons to derive word prior sentiments. Results show that our approach significantly outperforms the baselines in accuracy and F-measure for entity-level subjectivity (neutral vs. polar) and polarity (positive vs. negative) detections. For tweet-level sentiment detection, our approach performs better than the state-of-the-art SentiStrength by 4–5% in accuracy in two datasets, but falls marginally behind by 1% in F-measure in the third dataset.  相似文献   

As a hot spot these years, cross-domain sentiment classification aims to learn a reliable classifier using labeled data from a source domain and evaluate the classifier on a target domain. In this vein, most approaches utilized domain adaptation that maps data from different domains into a common feature space. To further improve the model performance, several methods targeted to mine domain-specific information were proposed. However, most of them only utilized a limited part of domain-specific information. In this study, we first develop a method of extracting domain-specific words based on the topic information derived from topic models. Then, we propose a Topic Driven Adaptive Network (TDAN) for cross-domain sentiment classification. The network consists of two sub-networks: a semantics attention network and a domain-specific word attention network, the structures of which are based on transformers. These sub-networks take different forms of input and their outputs are fused as the feature vector. Experiments validate the effectiveness of our TDAN on sentiment classification across domains. Case studies also indicate that topic models have the potential to add value to cross-domain sentiment classification by discovering interpretable and low-dimensional subspaces.  相似文献   

Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.  相似文献   

Existing methods for text generation usually fed the overall sentiment polarity of a product as an input into the seq2seq model to generate a relatively fluent review. However, these methods cannot express more fine-grained sentiment polarity. Although some studies attempt to generate aspect-level sentiment controllable reviews, the personalized attribute of reviews would be ignored. In this paper, a hierarchical template-transformer model is proposed for personalized fine-grained sentiment controllable generation, which aims to generate aspect-level sentiment controllable reviews with personalized information. The hierarchical structure can effectively learn sentiment information and lexical information separately. The template transformer uses a part of speech (POS) template to guide the generation process and generate a smoother review. To verify our model, we used the existing model to obtain a corpus named FSCG-80 from Yelp, which contains 800K samples and conducted a series of experiments on this corpus. Experimental results show that our model can achieve up to 89.93% aspect-sentiment control accuracy and generate more fluent reviews.  相似文献   

We propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words’ local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users’ reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training.  相似文献   

Nowadays, online word-of-mouth has an increasing impact on people's views and decisions, which has attracted many people's attention.The classification and sentiment analyse in online consumer reviews have attracted significant research concerns. In this thesis, we propose and implement a new method to study the extraction and classification of online dating services(ODS)’s comments. Different from traditional emotional analysis which mainly focuses on product attribution, we attempted to infer and extract the emotion concept of each emotional reviews by introducing social cognitive theory. In this study, we selected 4,300 comments with extremely negative/positive emotions published on dating websites as a sample, and used three machine learning algorithms to analyze emotions. When testing and comparing the efficiency of user's behavior research, we use various sentiment analysis, machine learning techniques and dictionary-based sentiment analysis. We found that the combination of machine learning and lexicon-based method can achieve higher accuracy than any type of sentiment analysis. This research will provide a new perspective for the task of user behavior.  相似文献   

Compared with explicit sentiment analysis that attracts considerable attention, implicit sentiment analysis is a more difficult task due to the lack of sentimental words. The abundant information in an external sentimental knowledge base can play a significant complementary and expansion role. In this paper, a sentimental commonsense knowledge graph embedded multi-polarity orthogonal attention model is proposed to learn the implication of the implicit sentiment. We analyzed the effectiveness of different knowledge relations in the ConceptNet knowledge base in detail, and proposed a matching and filtering method to distill useful knowledge tuples for implicit sentiment analysis automatically. By introducing the sentimental information in the knowledge base, the proposed model can extend the semantic of a sentence with an implicit sentiment. Then, a bi-directional long–short term memory model with multi-polarity orthogonal attention is adopted to fuse the distilled sentimental knowledge with the semantic embedding, effectively enriching the representation of sentences. Experiments on the SMP2019-ECISA implicit sentiment dataset show that our model fully utilizes the information of the knowledge base and improves the performance of Chinese implicit sentiment analysis.  相似文献   

The increasing interest around emotions in online texts creates the demand for financial sentiment analysis. Previous studies mainly focus on coarse-grained document-/sentence-level sentiment analysis, which ignores different sentiment polarities of various targets (e.g., company entities) in a sentence. To fill the gap, from a fine-grained target-level perspective, we propose a novel Lexicon Enhanced Collaborative Network (LECN) for targeted sentiment analysis (TSA) in financial texts. In general, the model designs a unified and collaborative framework that can capture the associations of targets and sentiment cues to enhance the overall performance of TSA. Moreover, the model dynamically incorporates sentiment lexicons to guide the sentiment classification, which cultivates the model faculty of understanding financial expressions. In addition, the model introduces a message selective-passing mechanism to adaptively control the information flow between two tasks, thereby improving the collaborative effects. To verify the effectiveness of LECN, we conduct experiments on four financial datasets, including SemEVAL2017 Task5 subset1, SemEVAL2017 Task5 subset2, FiQA 2018 Task1, and Financial PhraseBank. Results show that LECN achieves improvements over the state-of-art baseline by 1.66 p.p., 1.47 p.p., 1.94 p.p., and 1.88 p.p. in terms of F1-score. A series of further analyses also indicate that LECN has a better capacity for comprehending domain-specific expressions and can achieve the mutually beneficial effect between tasks.  相似文献   

Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in the conventional social media. Consequently, the conventional text-based sentiment analysis has evolved into more complicated studies of multimodal sentiment analysis. To tackle the challenge of how to effectively exploit the information from both visual content and textual content from image-text posts, this paper proposes a new image-text consistency driven multimodal sentiment analysis approach. The proposed approach explores the correlation between the image and the text, followed by a multimodal adaptive sentiment analysis method. To be more specific, the mid-level visual features extracted by the conventional SentiBank approach are used to represent visual concepts, with the integration of other features, including textual, visual and social features, to develop a machine learning sentiment analysis approach. Extensive experiments are conducted to demonstrate the superior performance of the proposed approach.  相似文献   

Every day millions of news articles and (micro)blogs that contain financial information are posted online. These documents often include insightful financial aspects with associated sentiments. In this paper, we predict financial aspect classes and their corresponding polarities (sentiment) within sentences. We use data from the Financial Question & Answering (FiQA) challenge, more precisely the aspect-based financial sentiment analysis task. We incorporate the hierarchical structure of the data by using the parent aspect class predictions to improve the child aspect class prediction (two-step model). Furthermore, we incorporate model output from the child aspect class prediction when predicting the polarity. We improve the F1 score by 7.6% using the two-step model for aspect classification over direct aspect classification in the test set. Furthermore, we improve the state-of-the-art test F1 score of the original aspect classification challenge from 0.46 to 0.70. The model that incorporates output from the child aspect classification performs up to par in polarity classification with our plain RoBERTa model. In addition, our plain RoBERTa model outperforms all the state-of-the-art models, lowering the MSE score by at least 28% and 33% for the cross-validation set and the test set, respectively.  相似文献   

Electronic word of mouth (eWOM) is prominent and abundant in consumer domains. Both consumers and product/service providers need help in understanding and navigating the resulting information spaces, which are vast and dynamic. The general tone or polarity of reviews, blogs or tweets provides such help. In this paper, we explore the viability of automatic sentiment analysis (SA) for assessing the polarity of a product or a service review. To do so, we examine the potential of the major approaches to sentiment analysis, along with star ratings, in capturing the true sentiment of a review. We further model contextual factors (specifically, product type and review length) as two moderators affecting SA accuracy. The results of our analysis of 900 reviews suggest that different tools representing the main approaches to SA display differing levels of accuracy, yet overall, SA is very effective in detecting the underlying tone of the analyzed content, and can be used as a complement or an alternative to star ratings. The results further reveal that contextual factors such as product type and review length, play a role in affecting the ability of a technique to reflect the true sentiment of a review.  相似文献   

The digital currency has taken the financial markets by storm ever since its inception. Academia and industry are focussing on Artificial intelligence (AI) tools and techniques to study and gain an understanding of how businesses can draw insights from the large-scale data available online. As the market is driven by public opinions, and social media today provides an encouraging platform to share ideas and views; organizations and policy-makers could use the natural language processing (NLP) technology of AI to analyze public sentiments. Recently, a new and moderately unconventional instrument known as non-fungible tokens (NFTs) is emerging as an upcoming business market. Unlike the stock market, no precise quantitative parameters exist for the price determination of NFTs. Instead, NFT markets are driven more by public opinion, expectations, the perception of buyers, and the goodwill of creators. This study evaluates human emotions on the social media platforms Twitter posted by the public relating to NFTs. Additionally, this study conducts secondary market analysis to determine the reasons for the growing acceptance of NFTs through sentiment and emotion analysis. We segregate tweets using Pearson Product-Moment Correlation Coefficient (PPMCC) and study 8-scale emotions (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) along with Positive and Negative sentiments. Tweets majorly contained positive sentiment (~ 72%), and positive emotions like anticipation and trust were found to be predominant all over the world. This is the first of its kind financial and emotional analysis of tweets pertaining to NFTs to the best of our understanding.  相似文献   

