首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到12条相似文献,搜索用时 15 毫秒
1.
In this work, we propose BERT-WMAL, a hybrid model that brings together information coming from data through the recent transformer deep learning model and those obtained from a polarized lexicon. The result is a model for sentence polarity that manages to have performances comparable with those at the state-of-the-art, but with the advantage of being able to provide the end-user with an explanation regarding the most important terms involved with the provided prediction. The model has been evaluated on three polarity detection Italian dataset, i.e., SENTIPOLC, AGRITREND and ABSITA. While the first contains 7,410 tweets released for training and 2,000 for testing, the second and the third respectively include 1,000 tweets without splitting , and 2,365 reviews for training, 1,171 for testing. The use of lexicon-based information proves to be effective in terms of the F1 measure since it shows an improvement of F1 score on all the observed dataset: from 0.664 to 0.669 (i.e, 0.772%) on AGRITREND, from 0.728 to 0.734 (i.e., 0.854%) on SENTIPOLC and from 0.904 to 0.921 (i.e, 1.873%) on ABSITA. The usefulness of this model not only depends on its effectiveness in terms of the F1 measure, but also on its ability to generate predictions that are more explainable and especially convincing for the end-users. We evaluated this aspect through a user study involving four native Italian speakers, each evaluating 64 sentences with associated explanations. The results demonstrate the validity of this approach based on a combination of weights of attention extracted from the deep learning model and the linguistic knowledge stored in the WMAL lexicon. These considerations allow us to regard the approach provided in this paper as a promising starting point for further works in this research area.  相似文献   

2.
The polarity shift problem is a major factor that affects classification performance of machine-learning-based sentiment analysis systems. In this paper, we propose a three-stage cascade model to address the polarity shift problem in the context of document-level sentiment classification. We first split each document into a set of subsentences and build a hybrid model that employs rules and statistical methods to detect explicit and implicit polarity shifts, respectively. Secondly, we propose a polarity shift elimination method, to remove polarity shift in negations. Finally, we train base classifiers on training subsets divided by different types of polarity shifts, and use a weighted combination of the component classifiers for sentiment classification. The results on a range of experiments illustrate that our approach significantly outperforms several alternative methods for polarity shift detection and elimination.  相似文献   

3.
This article describes in-depth research on machine learning methods for sentiment analysis of Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets for various domains are widely available, in the case of the Czech language no systematic research has yet been conducted. We tackle this issue and establish a common ground for further research by providing a large human-annotated Czech social media corpus. Furthermore, we evaluate state-of-the-art supervised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. We also experiment with five different feature selection algorithms and investigate the influence of named entity recognition and preprocessing on sentiment classification performance. Moreover, in addition to our newly created social media dataset, we also report results for other popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competition, potentially leading to the production of high-end commercial solutions.  相似文献   

4.
Web 2.0 allows people to express and share their opinions about products and services they buy/use. These opinions can be expressed in various ways: numbers, texts, emoticons, pictures, videos, audios, and so on. There has been great interest in the strategies for extracting, organising and analysing this kind of information. In a social media mining framework, in particular, the use of textual data has been explored in depth and still represents a challenge. On a rating and review website, user satisfaction can be detected both from a rating scale and from the written text. However, in common practice, there is a lack of algorithms able to combine judgments provided with both comments and scores. In this paper we propose a strategy to jointly measure the user evaluations obtained from the two systems. Text polarity is detected with a sentiment-based approach, and then combined with the associated rating score. The new rating scale has a finer granularity. Moreover, also enables the reviews to be ranked. We show the effectiveness of our proposal by analysing a set of reviews about the Uffizi Gallery in Florence (Italy) published on TripAdvisor.  相似文献   

5.
Despite growing efforts to halt distasteful content on social media, multilingualism has added a new dimension to this problem. The scarcity of resources makes the challenge even greater when it comes to low-resource languages. This work focuses on providing a novel method for abusive content detection in multiple low-resource Indic languages. Our observation indicates that a post’s tendency to attract abusive comments, as well as features such as user history and social context, significantly aid in the detection of abusive content. The proposed method first learns social and text context features in two separate modules. The integrated representation from these modules is learned and used for the final prediction. To evaluate the performance of our method against different classical and state-of-the-art methods, we have performed extensive experiments on SCIDN and MACI datasets consisting of 1.5M and 665K multilingual comments, respectively. Our proposed method outperforms state-of-the-art baseline methods with an average increase of 4.08% and 9.52% in the F1-score on SCIDN and MACI datasets, respectively.  相似文献   

6.
Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.  相似文献   

7.
Web 2.0 changed everyday life in many aspects, including the whole system that orbits around the purchase of products and services. This revolution necessarily involved also companies, because customers became increasingly demanding. The diffusion of social media platforms pushed customers to prefer this channel for quickly obtaining information and feedback about what they want to buy, as well as for asking help after the selling. In this framework, many organisations adopted a new way of providing assistance known as social customer care. A direct link to companies allows customers to obtain real-time solutions. In this paper, we introduce a new strategy for automatically managing the information listed in the requests that customers send to the social media accounts of companies. Our proposal relies on the use of network techniques for extracting high-level structures from texts, highlighting the different concepts expressed into the customers’ written requests. The texts can be then organised on the basis of this new emerging information. An application to the requests sent to the AppleSupport service on Twitter shows the effectiveness of the strategy.  相似文献   

8.
Social media has become the most popular platform for free speech. This freedom of speech has given opportunities to the oppressed to raise their voice against injustices, but on the other hand, this has led to a disturbing trend of spreading hateful content of various kinds. Pakistan has been dealing with the issue of sectarian and ethnic violence for the last three decades and now due to freedom of speech, there is a growing trend of disturbing content about religion, sect, and ethnicity on social media. This necessitates the need for an automated system for the detection of controversial content on social media in Urdu which is the national language of Pakistan. The biggest hurdle that has thwarted the Urdu language processing is the scarcity of language resources, annotated datasets, and pretrained language models. In this study, we have addressed the problem of detecting Interfaith, Sectarian, and Ethnic hatred on social media in Urdu language using machine learning and deep learning techniques. In particular, we have: (1) developed and presented guidelines for annotating Urdu text with appropriate labels for two levels of classification, (2) developed a large dataset of 21,759 tweets using the developed guidelines and made it publicly available, and (3) conducted experiments to compare the performance of eight supervised machine learning and deep learning techniques, for the automated identification of hateful content. In the first step, experiments are performed for the hateful content detection as a binary classification task, and in the second step, the classification of Interfaith, Sectarian and Ethnic hatred detection is performed as a multiclass classification task. Overall, Bidirectional Encoder Representation from Transformers (BERT) proved to be the most effective technique for hateful content identification in Urdu tweets.  相似文献   

9.
In this paper, a new approach to non-parametric signal detection with independent noise sampling is presented. The present approach is based on the locally asymptotically optimum (LAO) methodology, which is valid for vanishingly small signals and very large sample sizes, and on semi-parametric statistics. Its unique feature and essential difference from other techniques is that LAO non-parametric detectors are optimum according to the Neyman-Pearson criterion by being asymptotically uniformly most powerful at false alarm level α (AUMP (α)) and adaptive in the sense that no loss in Fisher's information number is incurred when the underlying noise process is no longer parametrically defined. Accordingly, they are robust against deviations from the postulated noise model and, unlike other non-parametric detectors, are distribution-free under both hypotheses H0 (“noise only present”) and H1 (“signal and noise present”). Non-parametric LAO detectors are derived from an asymptotic stochastic expansion of the log-likelihood ratio for coherent and narrowband incoherent “on-off” signals. Moreover, under the present framework it is shown that, in direct contrast to already known results, the non-parametric sign detector is AUMP (α) and adaptive even for non-constant signal samples.  相似文献   

10.
Up to now, negation is a challenging problem in the context of Sentiment Analysis. The study of negation implies the correct identification of negation markers, the scope and the interpretation of how negation affects the words that are within it, that is, whether it modifies their meaning or not and if so, whether it reverses, reduces or increments their polarity value. In addition, if we are interested in managing reviews in languages other than English, the issue becomes even more problematic due to the lack of resources. The present work shows the validity of the SFU ReviewSP-NEG corpus, which we annotated at negation level, for training supervised polarity classification systems in Spanish. The assessment has involved the comparison of different supervised models. The results achieved show the validity of the corpus and allow us to state that the annotation of how negation affects the words that are within its scope is important. Therefore, we propose to add a new phase to tackle negation in polarity classification systems (phase iii): i) identification of negation cues, ii) determination of the scope of negation, iii) identification of how negation affects the words that are within its scope, and iv) polarity classification taking into account negation.  相似文献   

11.
Hate speech detection refers broadly to the automatic identification of language that may be considered discriminatory against certain groups of people. The goal is to help online platforms to identify and remove harmful content. Humans are usually capable of detecting hatred in critical cases, such as when the hatred is non-explicit, but how do computer models address this situation? In this work, we aim to contribute to the understanding of ethical issues related to hate speech by analysing two transformer-based models trained to detect hate speech. Our study focuses on analysing the relationship between these models and a set of hateful keywords extracted from the three well-known datasets. For the extraction of the keywords, we propose a metric that takes into account the division among classes to favour the most common words in hateful contexts. In our experiments, we first compared the overlap between the extracted keywords with the words to which the models pay the most attention in decision-making. On the other hand, we investigate the bias of the models towards the extracted keywords. For the bias analysis, we characterize and use two metrics and evaluate two strategies to try to mitigate the bias. Surprisingly, we show that over 50% of the salient words of the models are not hateful and that there is a higher number of hateful words among the extracted keywords. However, we show that the models appear to be biased towards the extracted keywords. Experimental results suggest that fitting models with hateful texts that do not contain any of the keywords can reduce bias and improve the performance of the models.  相似文献   

12.
针对使用传统的Canny边缘检测器难以从低对比度的空中飞机图像中提取出飞机的主体轮廓,研究了使用基于小波分析的算法来解决这一问题。与使用Canny边缘检测器得到的结果比较,较好地抑制了云的影响,并且提取出了飞机的主体轮廓。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号