首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Social networks and many other graphs are attributed, meaning that their nodes are labelled with textual information such as personal data, expertise or interests. In attributed graphs, a common data analysis task is to find subgraphs whose nodes contain a given set of keywords. In many applications, the size of the subgraph should be limited (i.e., a subgraph with thousands of nodes is not desired). In this work, we introduce the problem of compact attributed group (AG) discovery. Given a set of query keywords and a desired solution size, the task is to find subgraphs with the desired number of nodes, such that the nodes are closely connected and each node contains as many query keywords as possible. We prove that finding an optimal solution is NP-hard and we propose approximation algorithms with a guaranteed ratio of two. Since the number of qualifying AGs may be large, we also show how to find approximate top-k AGs with polynomial delay. Finally, we experimentally verify the effectiveness and efficiency of our techniques on real-world graphs.  相似文献   

2.
Local community detection is an emerging topic in network analysis that aims to detect well-connected communities encompassing sets of priorly known seed nodes. In this work, we explore the similar problem of ranking network nodes based on their relevance to the communities characterized by seed nodes. However, seed nodes may not be central enough or sufficiently many to produce high quality ranks. To solve this problem, we introduce a methodology we call seed oversampling, which first runs a node ranking algorithm to discover more nodes that belong to the community and then reruns the same ranking algorithm for the new seed nodes. We formally discuss why this process improves the quality of calculated community ranks if the original set of seed nodes is small and introduce a boosting scheme that iteratively repeats seed oversampling to further improve rank quality when certain ranking algorithm properties are met. Finally, we demonstrate the effectiveness of our methods in improving community relevance ranks given only a few random seed nodes of real-world network communities. In our experiments, boosted and simple seed oversampling yielded better rank quality than the previous neighborhood inflation heuristic, which adds the neighborhoods of original seed nodes to seeds.  相似文献   

3.
4.
Previous federated recommender systems are based on traditional matrix factorization, which can improve personalized service but are vulnerable to gradient inference attacks. Most of them adopt model averaging to fit the data heterogeneity of federated recommender systems, requiring more training costs. To address privacy and efficiency, we propose an efficient federated item similarity model for the heterogeneous recommendation, called FedIS, which can train a global item-based collaborative filtering model to eliminate user feature dependencies. Specifically, we extend the neural item similarity model to the federated model, where each client only locally optimizes the shared item feature matrix. We then propose a fast-convergent federated aggregation method inspired by meta-learning to address heterogeneous user updates and accelerate the convergence of global training. Furthermore, we propose a two-stage perturbation method to protect both local training and transmission while reducing communication costs. Finally, extensive experiments on four real-world datasets validate that FedIS can provide more competitive performance on federated recommendations. Our proposed method also shows significant training efficiency with less performance degradation.  相似文献   

5.
Opinion summarization can facilitate user’s decision-making by mining the salient review information. However, due to the lack of sufficient annotated data, most of the early works are based on extractive methods, which restricts the performance of opinion summarization. In this work, we aim to improve the informativeness of opinion summarization to provide better guidance to users. We consider the setting with only reviews without corresponding summaries, and propose an aspect-augmented model for unsupervised abstractive opinion summarization, denoted as AsU-OSum. We first employ an aspect-based sentiment analysis system to extract opinion phrases from reviews. Then, we construct a heterogeneous graph consisting of reviews and opinion clusters as nodes, which is used to enhance the Transformer-based encoder–decoder framework. Furthermore, we design a novel cascaded attention mechanism to prompt the decoder to pay more attention to the aspects that are more likely to appear in summary. During training, we introduce a sentiment accuracy reward that further enhances the learning ability of our model. We conduct comprehensive experiments on the Yelp, Amazon, and Rotten Tomatoes datasets. Automatic evaluation results show that our model is competitive and performs better than the state-of-the-art (SOTA) models on some ROUGE metrics. Human evaluation results further verify that our model can generate more informative summaries and reduce redundancy.  相似文献   

6.
Human collaborative relationship inference is a meaningful task for online social networks and is called link prediction in network science. Real-world networks contain multiple types of interacting components and can be modeled naturally as heterogeneous information networks (HINs). The current link prediction algorithms in HINs fail to effectively extract training samples from snapshots of HINs; moreover, they underutilise the differences between nodes and between meta-paths. Therefore, we propose a meta-circuit machine (MCM) that can learn and fuse node and meta-path features efficiently, and we use these features to inference the collaborative relationships in question-and-answer and bibliographic networks. We first utilise meta-circuit random walks to obtain training samples in which the basic idea is to perform biased meta-path random walks on the input and target network successively and then connect them. Then, a meta-circuit recurrent neural network (mcRNN) is designed for link prediction, which represents each node and meta-path by a dense vector and leverages an RNN to fuse the features of node sequences. Experiments on two real-world networks demonstrate the effectiveness of our framework. This study promotes the investigation of potential evolutionary mechanisms for collaborative relationships and offers practical guidance for designing more effective recommendation systems for online social networks.  相似文献   

7.
Learning latent representations for users and points of interests (POIs) is an important task in location-based social networks (LBSN), which could largely benefit multiple location-based services, such as POI recommendation and social link prediction. Many contextual factors, like geographical influence, user social relationship and temporal information, are available in LBSN and would be useful for this task. However, incorporating all these contextual factors for user and POI representation learning in LBSN remains challenging, due to their heterogeneous nature. Although the encouraging performance of POI recommendation and social link prediction are delivered, most of the existing representation learning methods for LBSN incorporate only one or two of these contextual factors. In this paper, we propose a novel joint representation learning framework for users and POIs in LBSN, named UP2VEC. In UP2VEC, we present a heterogeneous LBSN graph to incorporate all these aforementioned factors. Specifically, the transition probabilities between nodes inside the heterogeneous graph are derived by jointly considering these contextual factors. The latent representations of users and POIs are then learnt by matching the topological structure of the heterogeneous graph. For evaluating the effectiveness of UP2VEC, a series of experiments are conducted with two real-world datasets (Foursquare and Gowalla) in terms of POI recommendation and social link prediction. Experimental results demonstrate that the proposed UP2VEC significantly outperforms the existing state-of-the-art alternatives. Further experiment shows the superiority of UP2VEC in handling cold-start problem for POI recommendation.  相似文献   

8.
Search result diversification is an effective way to tackle query ambiguity and enhance result novelty. In the context of large information networks, diversifying search result is also critical for further design of applications such as link prediction and citation recommendation. In previous work, this problem has mainly been tackled in a way of implicit query intent. To further enhance the performance on attributed networks, we propose a novel search result diversification approach via nonnegative matrix factorization. Our approach encodes latent query intents as well as nodes as representation vectors by a novel nonnegative matrix factorization model, and the diversity of the results accounts for the query relevance and the novelty w.r.t. these vectors. To learn the representation vectors of nodes, we derive the multiplicative updating rules to train the nonnegative matrix factorization model. We perform a comprehensive evaluation on our approach with various baselines. The results show the effectiveness of our proposed solution, and verify that attributes do help improve diversification performance.  相似文献   

9.
The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate.Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.  相似文献   

10.
Modeling the temporal context efficiently and effectively is essential to provide useful recommendations to users. In this work, we focus on improving neighborhood-based approaches where we integrate three different mechanisms to exploit temporal information. We first present an improved version of a similarity metric between users using a temporal decay function, then, we propose an adaptation of the Longest Common Subsequence algorithm to be used as a time-aware similarity metric, and we also redefine the neighborhood-based recommenders to be interpreted as ranking fusion techniques where the neighbor interaction sequence can be exploited by considering the last common interaction between the neighbor and the user.We demonstrate the effectiveness of these approaches by comparing them with other state-of-the-art recommender systems such as Matrix Factorization, Neural Networks, and Markov Chains under two realistic time-aware evaluation methodologies (per user and community-based). We use several evaluation metrics to measure both the quality of the recommendations – in terms of ranking relevance – and their temporal novelty or freshness. According to the obtained results, our proposals are highly competitive and obtain better results than the rest of the analyzed algorithms, producing improvements under the two evaluation dimensions tested consistently through three real-world datasets.  相似文献   

11.
Recent investigations have revealed that dynamics of complex networks and systems are crucially dependent on the temporal structures. Accurate detection of the time instant at which a system changes its internal structures has become a tremendously significant mission, beneficial to fully understanding the underlying mechanisms of evolving systems, and adequately modeling and predicting the dynamics of the systems as well. In real-world applications, due to a lack of prior knowledge on the explicit equations of evolving systems, an open challenge is how to develop a practical and model-free method to achieve the mission based merely on the time-series data recorded from real-world systems. Here, we develop such a model-free approach, named temporal change-point detection (TCD), and integrate both dynamical and statistical methods to address this important challenge in a novel way. The proposed TCD approach, basing on exploitation of spatial information of the observed time series of high dimensions, is able not only to detect the separate change points of the concerned systems without knowing, a priori, any information of the equations of the systems, but also to harvest all the change points emergent in a relatively high-frequency manner, which cannot be directly achieved by using the existing methods and techniques. Practical effectiveness is comprehensively demonstrated using the data from the representative complex dynamics and real-world systems from biology to geology and even to social science.  相似文献   

12.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

13.
By only designing the internal coupling, quasi synchronization of heterogeneous complex networks coupled by N nonidentical Duffing-type oscillators without any external controller is investigated in this paper. To achieve quasi synchronization, the average of states of all nodes is designed as the virtual target. Heterogeneous complex networks with two kinds of nonlinear node dynamics are analyzed firstly. Some sufficient conditions on quasi synchronization are obtained without designing any external controller. Quasi synchronization means that the states of all nonidentical nodes will keep a bounded error with the virtual target. Then the heterogeneous complex network with impulsive coupling which means the network only has coupling at some discrete impulsive instants, is further discussed. Some sufficient conditions on heterogeneous complex network with impulsive coupling are derived. Based on these results, heterogeneous complex network can still reach quasi synchronization even if its nodes are only coupled at discrete impulsive instants. Finally, two examples are provided to verify the theoretical results.  相似文献   

14.
Influence maximization (IM) has shown wide applicability in immense fields over the past decades. Previous researches on IM mainly focused on the dyadic relationship but lacked the consideration of higher-order relationship between entities, which has been constantly revealed in many real systems. An adaptive degree-based heuristic algorithm, i.e., Hyper Adaptive Degree Pruning (HADP) which aims to iteratively select nodes with low influence overlap as seeds, is proposed in this work to tackle the IM problem in hypergraphs. Furthermore, we extend algorithms from ordinary networks as baselines. Results on 8 empirical hypergraphs show that HADP surpasses the baselines in terms of both effectiveness and efficiency with a maximally 46.02% improvement. Moreover, we test the effectiveness of our algorithm on synthetic hypergraphs generated by different degree heterogeneity. It shows that the improvement of our algorithm effectiveness increases from 2.66% to 14.67% with the increase of degree heterogeneity, which indicates that HADP shows high performance especially in hypergraphs with high heterogeneity, which is ubiquitous in real-world systems.  相似文献   

15.
Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees’ nearest neighbor test (NNT).  相似文献   

16.
Link prediction, which aims to predict future or missing links among nodes, is a crucial research problem in social network analysis. A unique few-shot challenge is link prediction on newly emerged link types without sufficient verification information in heterogeneous social networks, such as commodity recommendation on new categories. Most of current approaches for link prediction rely heavily on sufficient verified link samples, and almost ignore the shared knowledge between different link types. Hence, they tend to suffer from data scarcity in heterogeneous social networks and fail to handle newly emerged link types where has no sufficient verified link samples. To overcome this challenge, we propose a model based on meta-learning, called the meta-learning adaptation network (MLAN), which acquires transferable knowledge from historical link types to improve the prediction performance on newly emerged link types. MLAN consists of three main components: a subtask slicer, a meta migrator, and an adaptive predictor. The subtask slicer is responsible for generating community subtasks for the link prediction on historical link types. Subsequently, the meta migrator simultaneously completes multiple community subtasks from different link types to acquire transferable subtask-shared knowledge. Finally, the adaptive predictor employs the parameters of the meta migrator to fuse the subtask-shared knowledge from different community subtasks and learn the task-specific knowledge of newly emerged link types. Experimental results conducted on real-world social media datasets prove that our proposed MLAN outperforms state-of-the-art models in few-shot link prediction in heterogeneous social networks.  相似文献   

17.
Estimating the similarity between two legal case documents is an important and challenging problem, having various downstream applications such as prior-case retrieval and citation recommendation. There are two broad approaches for the task — citation network-based and text-based. Prior citation network-based approaches consider citations only to prior-cases (also called precedents) (PCNet). This approach misses important signals inherent in Statutes (written laws of a jurisdiction). In this work, we propose Hier-SPCNet that augments PCNet with a heterogeneous network of Statutes. We incorporate domain knowledge for legal document similarity into Hier-SPCNet, thereby obtaining state-of-the-art results for network-based legal document similarity.Both textual and network similarity provide important signals for legal case similarity; but till now, only trivial attempts have been made to unify the two signals. In this work, we apply several methods for combining textual and network information for estimating legal case similarity. We perform extensive experiments over legal case documents from the Indian judiciary, where the gold standard similarity between document-pairs is judged by law experts from two reputed Law institutes in India. Our experiments establish that our proposed network-based methods significantly improve the correlation with domain experts’ opinion when compared to the existing methods for network-based legal document similarity. Our best-performing combination method (that combines network-based and text-based similarity) improves the correlation with domain experts’ opinion by 11.8% over the best text-based method and 20.6% over the best network-based method. We also establish that our best-performing method can be used to recommend/retrieve citable and similar cases for a source (query) case, which are well appreciated by legal experts.  相似文献   

18.
Text clustering is a well-known method for information retrieval and numerous methods for classifying words, documents or both together have been proposed. Frequently, textual data are encoded using vector models so the corpus is transformed in to a matrix of terms by documents; using this representation text clustering generates groups of similar objects on the basis of the presence/absence of the words in the documents. An alternative way to work on texts is to represent them as a network where nodes are entities connected by the presence and distribution of the words in the documents. In this work, after summarising the state of the art of text clustering we will present a new network approach to textual data. We undertake text co-clustering using methods developed for social network analysis. Several experimental results will be presented to demonstrate the validity of the approach and the advantages of this technique compared to existing methods.  相似文献   

19.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

20.
Teaching images, as an important auxiliary tool in teaching and learning, are fundamentally different from the general domain images. Besides visually similar images being more likely to share common labels, teaching images also face the challenge of visual-knowledge inconsistency, including intra-knowledge visual difference and inter-knowledge visual similarity. To address the above challenges, we present KBHN, a knowledge-aware bi-hypergraph network, which not only considers coarse-grained visual features, but also extracts fine-grained knowledge features that reflect knowledge intention hidden in teaching images. In detail, a visual hypergraph is constructed to connect images with visual similarity. It further enriches coarse-grained visual features by modeling the high-order visual relations among teaching images. Moreover, a knowledge hypergraph based on typical images is built to aggregate images with similar knowledge information, which innovatively extracts fine-grained knowledge features by modeling high-order knowledge correlations between local regions. Furthermore, a multi-head attention mechanism is adopted to fuse visual-knowledge features for enriching image representation. A teaching image dataset is constructed to train and validate our model, which contains 20744 real-world images annotated with 24 knowledge points. Experimental results demonstrate that KBHN, incorporating visual-knowledge features, achieves state-of-the-art performance compared to existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号