首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the expansion of information on the web, recommendation systems have become one of the most powerful resources to ease the task of users. Traditional recommendation systems (RS) suggest items based only on feedback submitted by users in form of ratings. These RS are not competent to deal with definite user preferences due to emerging and situation dependent user-generated content on social media, these situations are known as contextual dimensions. Though the relationship between contextual dimensions and user’s preferences has been demonstrated in various studies, only a few studies have explored about prioritization of varying contextual dimensions. The usage of all contextual dimensions unnecessary raises the computational complexity and negatively influences the recommendation results. Thus, the initial impetus has been made to construct a neural network in order to determine the pertinent contextual dimensions. The experiments are conducted on real-world movies data-LDOS CoMoDa dataset. The results of neural networks demonstrate that contextual dimensions have a significant effect on users’ preferences which in turn exerts an intense impact on the satisfaction level of users. Finally, tensor factorization model is employed to evaluate and validate accuracy by including neural network’s identified pertinent dimensions which are modeled as tensors. The result shows improvement in recommendation accuracy by a wider margin due to the inclusion of the pertinent dimensions in comparison to irrelevant dimensions. The theoretical and managerial implications are discussed.  相似文献   

2.
Relation extraction aims at finding meaningful relationships between two named entities from within unstructured textual content. In this paper, we define the problem of information extraction as a matrix completion problem where we employ the notion of universal schemas formed as a collection of patterns derived from open information extraction systems as well as additional features derived from grammatical clause patterns and statistical topic models. One of the challenges with earlier work that employ matrix completion methods is that such approaches require a sufficient number of observed relation instances to be able to make predictions. However, in practice there is often insufficient number of explicit evidence supporting each relation type that could be used within the matrix model. Hence, existing work suffer from a low recall. In our work, we extend the work in the state of the art by proposing novel ways of integrating two sets of features, i.e., topic models and grammatical clause structures, for alleviating the low recall problem. More specifically, we propose that it is possible to (1) employ grammatical clause information from textual sentences to serve as an implicit indication of relation type and argument similarity. The basis for this is that it is likely that similar relation types and arguments are observed within similar grammatical structures, and (2) benefit from statistical topic models to determine similarity between relation types and arguments. We employ statistical topic models to determine relation type and argument similarity based on their co-occurrence within the same topics. We have performed extensive experiments based on both gold standard and silver standard datasets. The experiments show that our approach has been able to address the low recall problem in existing methods, by showing an improvement of 21% on recall and 8% on f-measure over the state of the art baseline.  相似文献   

3.
Aspect mining, which aims to extract ad hoc aspects from online reviews and predict rating or opinion on each aspect, can satisfy the personalized needs for evaluation of specific aspect on product quality. Recently, with the increase of related research, how to effectively integrate rating and review information has become the key issue for addressing this problem. Considering that matrix factorization is an effective tool for rating prediction and topic modeling is widely used for review processing, it is a natural idea to combine matrix factorization and topic modeling for aspect mining (or called aspect rating prediction). However, this idea faces several challenges on how to address suitable sharing factors, scale mismatch, and dependency relation of rating and review information. In this paper, we propose a novel model to effectively integrate Matrix factorization and Topic modeling for Aspect rating prediction (MaToAsp). To overcome the above challenges and ensure the performance, MaToAsp employs items as the sharing factors to combine matrix factorization and topic modeling, and introduces an interpretive preference probability to eliminate scale mismatch. In the hybrid model, we establish a dependency relation from ratings to sentiment terms in phrases. The experiments on two real datasets including Chinese Dianping and English Tripadvisor prove that MaToAsp not only obtains reasonable aspect identification but also achieves the best aspect rating prediction performance, compared to recent representative baselines.  相似文献   

4.
Hashing has been an emerging topic and has recently attracted widespread attention in multi-modal similarity search applications. However, most existing approaches rely on relaxation schemes to generate binary codes, leading to large quantization errors. In addition, amounts of existing approaches embed labels into the pairwise similarity matrix, leading to expensive time and space costs and losing category information. To address these issues, we propose an Efficient Discrete Matrix factorization Hashing (EDMH). Specifically, EDMH first learns the latent subspaces for individual modality through matrix factorization strategy, which preserves the semantic structure representation information of each modality. In particular, we develop a semantic label offset embedding learning strategy, improving the stability of label embedding regression. Furthermore, we design an efficient discrete optimization scheme to generate compact binary codes discretely. Eventually, we present two efficient learning strategies EDMH-L and EDMH-S to pursue high-quality hash functions. Extensive experiments on various widely-used databases verify that the proposed algorithms produce significant performance and outperform some state-of-the-art approaches, with an average improvement of 2.50% (for Wiki), 2.66% (for MIRFlickr) and 2.25% (for NUS-WIDE) over the best available results, respectively.  相似文献   

5.
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations present in other techniques such as principal component analysis for semantic feature abstraction. Existing techniques for nonnegative matrix factorization are reviewed and a new hybrid technique for nonnegative matrix factorization is proposed. Performance evaluations of the proposed method are conducted on a few benchmark text collections used in standard topic detection studies.  相似文献   

6.
Semi-supervised multi-view learning has recently achieved appealing performance with the consensus relation between samples. However, in addition to the relation between samples, the relation between samples and their assemble centroid is also important to the learning. In this paper, we propose a novel model based on orthogonal non-negative matrix factorization, which allows exploring both the consensus relations between samples and between samples and their assemble centroid. Since this model utilizes more consensus information to guide the multi-view learning, it can lead to better performance. Meanwhile, we theoretically derive a proposition about the equivalency between the partial orthogonality and the full orthogonality. Based on this proposition, the orthogonality constraint and the label constraint are simultaneously implemented in the proposed model. Experimental evaluations on five real-world datasets show that our approach outperforms the state-of-the-art methods, where the improvement is 6% average in terms of ARI index.  相似文献   

7.
In existing unsupervised methods, Latent Semantic Analysis (LSA) is used for sentence selection. However, the obtained results are less meaningful, because singular vectors are used as the bases for sentence selection from given documents, and singular vector components can have negative values. We propose a new unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. The proposed method uses non-negative constraints, which are more similar to the human cognition process. As a result, the method selects more meaningful sentences for generic document summarization than those selected using LSA.  相似文献   

8.
An automatic patent categorization system would be invaluable to individual inventors and patent attorneys, saving them time and effort by quickly identifying conflicts with existing patents. In recent years, it has become more and more common to classify all patent documents using the International Patent Classification (IPC), a complex hierarchical classification system comprised of eight sections, 128 classes, 648 subclasses, about 7200 main groups, and approximately 72,000 subgroups. So far, however, no patent categorization method has been developed that can classify patents down to the subgroup level (the bottom level of the IPC). Therefore, this paper presents a novel categorization method, the three phase categorization (TPC) algorithm, which classifies patents down to the subgroup level with reasonable accuracy. The experimental results for the TPC algorithm, using the WIPO-alpha collection, indicate that our classification method can achieve 36.07% accuracy at the subgroup level. This is approximately a 25,764-fold improvement over a random guess.  相似文献   

9.
Contextual feature selection for text classification   总被引:1,自引:0,他引:1  
We present a simple approach for the classification of “noisy” documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for call for tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are conducted on our in-house collection as well as on the 4-Universities data set, Reuters 21578 and 20 Newsgroups. We find a significant improvement on our collection and the 4-Universities data set (10.9% and 4.1%, respectively). Although the best results are obtained by combining bigrams and named entities, the impact of the latter is not found to be significant.  相似文献   

10.
Search result diversification is an effective way to tackle query ambiguity and enhance result novelty. In the context of large information networks, diversifying search result is also critical for further design of applications such as link prediction and citation recommendation. In previous work, this problem has mainly been tackled in a way of implicit query intent. To further enhance the performance on attributed networks, we propose a novel search result diversification approach via nonnegative matrix factorization. Our approach encodes latent query intents as well as nodes as representation vectors by a novel nonnegative matrix factorization model, and the diversity of the results accounts for the query relevance and the novelty w.r.t. these vectors. To learn the representation vectors of nodes, we derive the multiplicative updating rules to train the nonnegative matrix factorization model. We perform a comprehensive evaluation on our approach with various baselines. The results show the effectiveness of our proposed solution, and verify that attributes do help improve diversification performance.  相似文献   

11.
Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zero-shot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.  相似文献   

12.
Noise reduction through summarization for Web-page classification   总被引:1,自引:0,他引:1  
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.  相似文献   

13.
Associative classification methods have been recently applied to various categorization tasks due to its simplicity and high accuracy. To improve the coverage for test documents and to raise classification accuracy, some associative classifiers generate a huge number of association rules during the mining step. We present two algorithms to increase the computational efficiency of associative classification: one to store rules very efficiently, and the other to increase the speed of rule matching, using all of the generated rules. Empirical results using three large-scale text collections demonstrate that the proposed algorithms increase the feasibility of applying associative classification to large-scale problems.  相似文献   

14.
Sylvester quaternion tensor equations have a wide range of applications in image processing and system and control theory. In this paper, by the Kronecker product and vectorization operator and the properties of quaternion tensors, we focus mainly on proposing the tensor form of the generalized product-type biconjugate gradient method for solving generalized Sylvester quaternion tensor equations. As an application, we apply the proposed method to restore a blurred and noisy-free color video. The obtained numerical results illustrate the effectiveness of our method compared with some existing methods.  相似文献   

15.
This paper focuses on temporal retrieval of activities in videos via sentence queries. Given a sentence query describing an activity, temporal moment retrieval aims at localizing the temporal segment within the video that best describes the textual query. This is a general yet challenging task as it requires the comprehending of both video and language. Existing research predominantly employ coarse frame-level features as the visual representation, obfuscating the specific details (e.g., the desired objects “girl”, “cup” and action “pour”) within the video which may provide critical cues for localizing the desired moment. In this paper, we propose a novel Spatial and Language-Temporal Tensor Fusion (SLTF) approach to resolve those issues. Specifically, the SLTF method first takes advantage of object-level local features and attends to the most relevant local features (e.g., the local features “girl”, “cup”) by spatial attention. Then we encode the sequence of the local features on consecutive frames by employing LSTM network, which can capture the motion information and interactions among these objects (e.g., the interaction “pour” involving these two objects). Meanwhile, language-temporal attention is utilized to emphasize the keywords based on moment context information. Thereafter, a tensor fusion network learns both the intra-modality and inter-modality dynamics, which can enhance the learning of moment-query representation. Therefore, our proposed two attention sub-networks can adaptively recognize the most relevant objects and interactions in the video, and simultaneously highlight the keywords in the query for retrieving the desired moment. Experimental results on three public benchmark datasets (obtained from TACOS, Charades-STA, and DiDeMo) show that the SLTF model significantly outperforms current state-of-the-art approaches, and demonstrate the benefits produced by new technologies incorporated into SLTF.  相似文献   

16.
The present paper deals with the following three aspects:      1. It attempts to discuss the problems on primitive forms of the family Araliaceae. The genus Tupidanthus Hook. f. & Thoms. was considered by H. Harms (1894) and H. L. Li (1942) as primitive, whilst another genus Plerandra A. Gray was regarded as primitive by R. H. Eyde & C. C. Tseng in 1971. Having made a detailed comparison of the taxonomical characters of these two genera, the present authors believe that both genera are not the most primitive in the Araliaceae. Their affinit yis not close enough and they possibly evolved in parallel lines from a common ancestor which is so far un- known yet.      2. By studying the systems of the past, the present authors believe that none of them is entirely satisfactory. Bentham (1867) recognized five ‘series’ (in fact, equival- ent to ‘tribe’ with the ending-eae of names) based on the petaline arrangement in the bud, the numbers of stamen and the types of endospem. This is a plausible funda- mental treatment for the Araliaceae, but choosing the endosperm as a criteria in dividing tribe is artifical. As we know today, both ruminate and uniform endosperm are usually presente in the same genus.  Seemann’s system (1868) divided the Hederaceae (excl. Trib. Aralieae) into five tribes, in addition to the locules of ovary.  The criteria are essentially the same as Bentham’s. The system of Hams (1894) divided the family into three tribes. Two tribes, Aralieae and Mackinlayeae, of  Bentham are  retained,  but other groups were combined in the Trib. Schefflereae.  However, Harms did not retain one of those three oldest legitimate names which had named by Bentham, that is con- trary to the law of priority in the International Code of Botanical Nomenelature. Hut- chinson (1967) adopted seven tribes for the family. The criteria essentially follow those of Bentham, but the inflorescence is overstressed. The inflorescence is an artifical taxono- mical character in dividing tribes, because of some dioecious plants, such as Meryta sin- clairii (Hook. f.) Seem., have two types of inflorescence in male and female plants. Ac- cording to Hutchinson’s arrangement, the male and female plants would be put in se- parate tribes.     3.   The present authors are of the opinion that in the study of a natural classi- fication of plant groups emphasis should be laid not only on the characters of the repro- ductive organs, but on those of vegetative organs as well.  The present revised system is based principally upon the characters of both flowers and leaves of the five tribes as follows:       Trib. 1. Plerandreae Benth. emend. Hoo & Tseng      Trib. 2. Tetraplasandreae Hoo & Tseng       Trib. 3. Mackinlayeae Benth.      Trib. 4. Aralieae Benth.       Trib. 5. Panaceae Benth. emend. Hoo & Tseng  相似文献   

17.
Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.  相似文献   

18.
Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.  相似文献   

19.
Extraction of pattern class associated discriminative subspace is critical to many pattern classification problems. Traditionally, pattern class labels are regarded as indicators to discriminate between pattern classes. In this work, a novel indicator model is proposed to extract discriminant subspace by projecting samples onto a space where the projected categories are mutually orthogonal and in-category normalized. Category orthonormal property and its connections to discriminative subspace extraction are derived. It is shown that the proposed method has a strong connection with the existing Fukunaga-Koontz Transformation but extends the category number from two to multiple. For applications with a large dimension size but limited number of samples, an analytic least-norm solver is developed for calculating the projection function. A discriminative subspace extraction method for multiple classes is proposed and is evaluated by a combination with classifiers. Experiments demonstrate a promising result of using the extracted category orthonormal subspace for multi-class subspace extraction when sample number is small.  相似文献   

20.
As a hot spot these years, cross-domain sentiment classification aims to learn a reliable classifier using labeled data from a source domain and evaluate the classifier on a target domain. In this vein, most approaches utilized domain adaptation that maps data from different domains into a common feature space. To further improve the model performance, several methods targeted to mine domain-specific information were proposed. However, most of them only utilized a limited part of domain-specific information. In this study, we first develop a method of extracting domain-specific words based on the topic information derived from topic models. Then, we propose a Topic Driven Adaptive Network (TDAN) for cross-domain sentiment classification. The network consists of two sub-networks: a semantics attention network and a domain-specific word attention network, the structures of which are based on transformers. These sub-networks take different forms of input and their outputs are fused as the feature vector. Experiments validate the effectiveness of our TDAN on sentiment classification across domains. Case studies also indicate that topic models have the potential to add value to cross-domain sentiment classification by discovering interpretable and low-dimensional subspaces.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号