首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations.We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters.A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions.  相似文献   

2.
This paper studies how to learn accurate ranking functions from noisy training data for information retrieval. Most previous work on learning to rank assumes that the relevance labels in the training data are reliable. In reality, however, the labels usually contain noise due to the difficulties of relevance judgments and several other reasons. To tackle the problem, in this paper we propose a novel approach to learning to rank, based on a probabilistic graphical model. Considering that the observed label might be noisy, we introduce a new variable to indicate the true label of each instance. We then use a graphical model to capture the joint distribution of the true labels and observed labels given features of documents. The graphical model distinguishes the true labels from observed labels, and is specially designed for ranking in information retrieval. Therefore, it helps to learn a more accurate model from noisy training data. Experiments on a real dataset for web search show that the proposed approach can significantly outperform previous approaches.  相似文献   

3.
This paper is concerned with the quality of training data in learning to rank for information retrieval. While many data selection techniques have been proposed to improve the quality of training data for classification, the study on the same issue for ranking appears to be insufficient. As pointed out in this paper, it is inappropriate to extend technologies for classification to ranking, and the development of novel technologies is sorely needed. In this paper, we study the development of such technologies. To begin with, we propose the concept of “pairwise preference consistency” (PPC) to describe the quality of a training data collection from the ranking point of view. PPC takes into consideration the ordinal relationship between documents as well as the hierarchical structure on queries and documents, which are both unique properties of ranking. Then we select a subset of the original training documents, by maximizing the PPC of the selected subset. We further propose an efficient solution to the maximization problem. Empirical results on the LETOR benchmark datasets and a web search engine dataset show that with the subset of training data selected by our approach, the performance of the learned ranking model can be significantly improved.  相似文献   

4.
排序是信息检索、数据挖掘以及社会网络分析的基础工作之一。 在线社交网络和社 会媒体的快速发展积累了大量的图数据——由表示实体的节点和表示实体间关系的连边构 成。 图数据中节点之间连接关系复杂, 通常缺少显式的全序结构, 使得图排序在图数据分析 中显得尤为重要。 图排序算法主要包括 2 大类, 面向节点中心度的图排序算法和面向节点集 合多样性的图排序算法。 与传统的图排序不同 , 多样性图排序考虑排序和聚类的融合, 体现 为节点集合对网络整体的覆盖程度。 近年来, 多样性图排序得到了广泛的关注, 取得了一系 列研究进展,研究成果成功应用到了搜索结果排序、文档自动摘要、信息推荐系统和影响最大 化等诸多场景中。 文章评述了多样性图排序的研究现状及主要进展, 将现有的多样性图排序 方法按照研究思路的不同分为边际效益最大化、竞争随机游走、聚类与排序互增强 3 类, 分别 评述了每类方法的优势和不足。 最后指出 , 设计有效的评价指标和标准测试集、克服多样性 图排序面临的精度和速度的矛盾等是多样性图排序未来的研究重点。  相似文献   

5.
In this paper, we focus on the problem of discovering internally connected communities in event-based social networks (EBSNs) and propose a community detection method by utilizing social influences between users. Different from traditional social network, EBSNs contain different types of entities and links, and users in EBSNs have more complex behaviours. This leads to poor performance of the traditional social influence computation method in EBSNs. Therefore, to quantify the pairwise social influence accurately in EBSNs, we first propose to compute two types of social influences, i.e., structure-based social influence and behaviour-based social influence, by utilizing the online social network structure and offline social behaviours of users. In particular, based on the specific features of EBSNs, the similarities of user preference on three aspects (i.e., topics, regions and organizers) are utilized to measure the behaviour-based social influence. Then, we obtain the unified pairwise social influence by combining these two types of social influences through a weight function. Next, we present a social influence based community detection algorithm which is referred to as SICD. In SICD, inspired by the nonlinear feature learning ability of the autoencoder, we first devise a neighborhood based deep autoencoder algorithm to obtain nonlinear community-oriented latent representations of users, and then utilize the k-means algorithm for community detection. Experimental results conducted on real-world dataset show the effectiveness of our proposed algorithm.  相似文献   

6.
张理  魏奇锋  顾新 《现代情报》2009,40(2):122-131
[目的/意义] 面向科研合作网络中的学术社群,提出基于合作者吸收能力的知识扩散种子选择方法,以提升社群知识扩散效率,促进社群成员知识吸收。[方法/过程] 运用R软件完成仿真实验。基于斯坦福大型网络数据集(SNAP)的真实科研合作网络数据,运用社团检测算法"WalkTrap"检测出学术社群。将各学术社群中合作对象吸收能力总和最大的节点作为各社群的知识扩散种子,基于此,在各社群内部实施知识扩散仿真实验,并用其余4种基于网络中心性的方法与本文方法作对比。[结果/结论] 基于合作者吸收能力的种子选择方法,在网络整体知识水平、知识水平分布均匀性与扩散初期的知识增长速度等方面均优于其它4种方法,且节点吸收能力差异越大,这种优势就越突出。节点平均吸收能力越强或网络中节点平均度越大,知识扩散效率受种子选择方法的影响越小。  相似文献   

7.
Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization – they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence–concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization – users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles.  相似文献   

8.
Hiring appropriate editors, chairs and committee members for academic journals and conferences is challenging. It requires a targeted search for high profile scholars who are active in the field as well as in the publication venue. Many author-level metrics have been employed for this task, such as the h-index, PageRank and their variants. However, these metrics are global measures which evaluate authors’ productivity and impact without differentiating the publication venues. From the perspective of a venue, it is also important to have a localised metric which can specifically indicate the significance of academic authors for the particular venue. In this paper, we propose a relevance-based author ranking algorithm to measure the significance of authors to individual venues. Specifically, we develop a co-authorship network considering the author-venue relationship which integrates the statistical relevance of authors to individual venues. The RelRank, an improved PageRank algorithm embedding author relevance, is then proposed to rank authors for each venue. Extensive experiments are carried out to analyse the proposed RelRank in comparison with classic author-level metrics on three datasets of different research domains. We also evaluate the effectiveness of the RelRank and comparison metrics in recommending editorial boards of three venues using test data. Results demonstrate that the RelRank is able to identify not only the high profile scholars but also those who are particularly significant for individual venues.  相似文献   

9.
In this paper, we focus on the problem of automatically generating amplified scientific paper’s abstract which represents the most influential aspects of scientific paper. The influential aspects can be illustrated by the target scientific paper’s abstract and citation sentences discussing the target paper, which are provided in papers citing the target paper. In this paper, we extract representative sentences through data-weighted reconstruction approach(DWR) by jointly leveraging target scientific paper’s abstract and citation sentences’ content and structure. In our study, we make two-folded contributions.Firstly, sentence’s weight was learned by exploiting regularization for ranking on heterogeneous bibliographic network. Specially, Sentences-similar-Sentences relationship was identified by language modeling-based approach and added to the bibliographic network. Secondly, a data-weighted reconstruction objective function is optimized to select the most representative sentences which reconstructs the original sentence set with minimum error. In this process, sentences’ weight plays a critical role. Experimental evaluation over real dataset confirms the effectiveness of our approach.  相似文献   

10.
   鉴于发明者在创新活动中的“抱团”研发现象,采用GN算法,识别可再生能源行业发明者合作创新网络的社群结构,根据社群内部成员和社群经纪人网络位点的不同,通过相邻期社群动态追踪,划分社群结构动态配置,实证社群配置与社群创新能力间的关系。结果表明:发明者合作创新网络存在明显的社群划分,不同类型的社群动态配置对社群发明者创新影响显著不同。总体为,动态与静态相协调的社群配置优于双动或双静的社群配置,具体为,“动荡”社群创新能力最弱,“纽带”社群创新能力最强,“独立”及“固化”社群介于两者间。以促进发明者创新,政策应该有利于创新网络动态及稳定的折中。  相似文献   

11.
Identifying and extracting user communities is an important step towards understanding social network dynamics from a macro perspective. For this reason, the work in this paper explores various aspects related to the identification of user communities. To date, user community detection methods employ either explicit links between users (link analysis), or users’ topics of interest in posted content (content analysis), or in tandem. Little work has considered temporal evolution when identifying user communities in a way to group together those users who share not only similar topical interests but also similar temporal behavior towards their topics of interest. In this paper, we identify user communities through multimodal feature learning (embeddings). Our core contributions can be enumerated as (a) we propose a new method for learning neural embeddings for users based on their temporal content similarity; (b) we learn user embeddings based on their social network connections (links) through neural graph embeddings; (c) we systematically interpolate temporal content-based embeddings and social link-based embeddings to capture both social network connections and temporal content evolution for representing users, and (d) we systematically evaluate the quality of each embedding type in isolation and also when interpolated together and demonstrate their performance on a Twitter dataset under two different application scenarios, namely news recommendation and user prediction. We find that (1) content-based methods produce higher quality communities compared to link-based methods; (2) methods that consider temporal evolution of content, our proposed method in particular, show better performance compared to their non-temporal counter-parts; (3) communities that are produced when time is explicitly incorporated in user vector representations have higher quality than the ones produced when time is incorporated into a generative process, and finally (4) while link-based methods are weaker than content-based methods, their interpolation with content-based methods leads to improved quality of the identified communities.  相似文献   

12.
【目的/意义】研究从用户节点和网络全局两个视角出发,基于用户相似度与信任度对虚拟学术社区中学者进行推荐,提高学者推荐的质量。【方法/过程】首先,利用LDA主题模型挖掘学者发表的博文主题,计算博文相似度;通过学者共同好友比例计算好友相似度;然后将博文相似度和好友相似度融合计算用户相似度;最后,融合用户相似度和信任度进行学者推荐。【结果/结论】提出虚拟学术社区中基于用户相似度与信任度的学者推荐方法,综合利用用户节点和网络全局信息,为虚拟学术社区用户进行学者推荐。【创新/局限】从用户节点和网络全局两个角度进行学者信息融合,有效提高了虚拟学术社区中学者推荐的质量。局限在于本文主要考虑的是学者在网络全局中的信任度,用户节点间的交互信任关系还有待进一步研究。  相似文献   

13.
A fast and efficient page ranking mechanism for web crawling and retrieval remains as a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS and OPIC have been proposed. In this paper, we propose a novel recursive method based on reinforcement learning which considers distance between pages as punishment, called “DistanceRank” to compute ranks of web pages. The distance is defined as the number of “average clicks” between two pages. The objective is to minimize punishment or distance so that a page with less distance to have a higher rank. Experimental results indicate that DistanceRank outperforms other ranking algorithms in page ranking and crawling scheduling. Furthermore, the complexity of DistanceRank is low. We have used University of California at Berkeley’s web for our experiments.  相似文献   

14.
In this paper, we propose a novel method for addressing the multi-equilibria consensus problem for a network of n agents with dynamics evolving in discrete-time. In this method, we introduce, for the first time in the literature, two concepts called primary and secondary layer subgraphs. Then, we present our main results on directed graphs such that multiple consensus equilibria states are achieved, thereby extending the existing single-state consensus convergence results in the literature. Furthermore, we propose an algorithm to determine the number of equilibria for any given directed graph automatically by a computer program. We also analyze the convergence properties of multi-equilibria consensus in directed networks with time-delays under the assumption that all delays are bounded. We show that introducing communication time-delays does not affect the number of equilibria of the given network. Finally, we verify our theoretical results via numerical examples.  相似文献   

15.
Node clustering on heterogeneous information networks (HINs) plays an important role in many real-world applications. While previous research mainly clusters same-type nodes independently via exploiting structural similarity search, they ignore the correlations of different-type nodes. In this paper, we focus on the problem of co-clustering heterogeneous nodes where the goal is to mine the latent relevance of heterogeneous nodes and simultaneously partition them into the corresponding type-aware clusters. This problem is challenging in two aspects. First, the similarity or relevance of nodes is not only associated with multiple meta-path-based structures but also related to numerical and categorical attributes. Second, clusters and similarity/relevance searches usually promote each other.To address this problem, we first design a learnable overall relevance measure that integrates the structural and attributed relevance by employing meta-paths and attribute projection. We then propose a novel approach, called SCCAIN, to co-cluster heterogeneous nodes based on constrained orthogonal non-negative matrix tri-factorization. Furthermore, an end-to-end framework is developed to jointly optimize the relevance measures and co-clustering. Extensive experiments on real-world datasets not only demonstrate that SCCAIN consistently outperforms state-of-the-art methods but also validate the effectiveness of integrating attributed and structural information for co-clustering.  相似文献   

16.
Recently, social network has been paid more and more attention by people. Inaccurate community detection in social network can provide better product designs, accurate information recommendation and public services. Thus, the community detection (CD) algorithm based on network topology and user interests is proposed in this paper. This paper mainly includes two parts. In first part, the focused crawler algorithm is used to acquire the personal tags from the tags posted by other users. Then, the tags are selected from the tag set based on the TFIDF weighting scheme, the semantic extension of tags and the user semantic model. In addition, the tag vector of user interests is derived with the respective tag weight calculated by the improved PageRank algorithm. In second part, for detecting communities, an initial social network, which consists of the direct and unweighted edges and the vertexes with interest vectors, is constructed by considering the following/follower relationship. Furthermore, initial social network is converted into a new social network including the undirected and weighted edges. Then, the weights are calculated by the direction and the interest vectors in the initial social network and the similarity between edges is calculated by the edge weights. The communities are detected by the hierarchical clustering algorithm based on the edge-weighted similarity. Finally, the number of detected communities is detected by the partition density. Also, the extensively experimental study shows that the performance of the proposed user interest detection (PUID) algorithm is better than that of CF algorithm and TFIDF algorithm with respect to F-measure, Precision and Recall. Moreover, Precision of the proposed community detection (PCD) algorithm is improved, on average, up to 8.21% comparing with that of Newman algorithm and up to 41.17% comparing with that of CPM algorithm.  相似文献   

17.
In this paper we present the relevance ranking algorithm named PolarityRank. This algorithm is inspired in PageRank, the webpage relevance calculus method used by Google, and generalizes it to deal with graphs having not only positive but also negative weighted arcs. Besides the definition of our algorithm, this paper includes the algebraic justification, the convergence demonstration and an empirical study in which PolarityRank is applied to two unrelated tasks where a graph with positive and negative weights can be built: the calculation of word semantic orientation and instance selection from a learning dataset.  相似文献   

18.
钟磊  宋香荣  孙瑞娜 《情报杂志》2021,40(4):194-199
[目的/意义]随着网络和社交媒体的发展,网络"意见领袖"在网络社区的信息传播和交流中发挥着越来越重要的作用,在社会生活的各个方面对网络民意产生巨大的影响。因此,识别网络"意见领袖",掌握其特征和规律成为了网络信息传播研究的重要方面。[方法/过程]在PageRank思想的基础上,利用文本的TF-IDF计算网络社区用户节点的连接强度,以此改进PageRank算法,提出一种LeaderRank方法用来评价网络社区用户节点的重要度,并结合其他指标及BP神经网络进行"意见领袖"的发现实验以及进一步的数据挖掘工作。[结果/结论]实验结果表明,该方法相较于神经网络具有更高的识别率,该方法可以灵活配合其他指标和方法使用,具有更好的适用性、扩展性和稳定性。  相似文献   

19.
In this paper, a new robust relevance model is proposed that can be applied to both pseudo and true relevance feedback in the language-modeling framework for document retrieval. There are at least three main differences between our new relevance model and other relevance models. The proposed model brings back the original query into the relevance model by treating it as a short, special document, in addition to a number of top-ranked documents returned from the first round retrieval for pseudo feedback, or a number of relevant documents for true relevance feedback. Second, instead of using a uniform prior as in the original relevance model proposed by Lavrenko and Croft, documents are assigned with different priors according to their lengths (in terms) and ranks in the first round retrieval. Third, the probability of a term in the relevance model is further adjusted by its probability in a background language model. In both pseudo and true relevance cases, we have compared the performance of our model to that of the two baselines: the original relevance model and a linear combination model. Our experimental results show that the proposed new model outperforms both of the two baselines in terms of mean average precision.  相似文献   

20.
Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications – e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far.The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking models exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features.We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state-of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded implementations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (Ωλ-MART), an algorithm that induces forests of oblivious regression trees.We investigate how the different training parameters impact on the quality versus efficiency trade-off, and provide a thorough comparison of several algorithms in the quality-cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号