期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Unsupervised graph-based rank aggregation for improved retrieval

《Information processing & management》2019,56(4):1260-1279

This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations.We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters.A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions. 相似文献

2.

Semi-supervised document retrieval 总被引：2，自引：0，他引：2

Ming Li Hang Li Zhi-Hua Zhou 《Information processing & management》2009

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning. 相似文献

3.

Flexible sample selection strategies for transfer learning in ranking

Kevin Duh Akinori Fujino 《Information processing & management》2012

Ranking is a central component in information retrieval systems; as such, many machine learning methods for building rankers have been developed in recent years. An open problem is transfer learning, i.e. how labeled training data from one domain/market can be used to build rankers for another. We propose a flexible transfer learning strategy based on sample selection. Source domain training samples are selected if the functional relationship between features and labels do not deviate much from that of the target domain. This is achieved through a novel application of recent advances from density ratio estimation. The approach is flexible, scalable, and modular. It allows many existing supervised rankers to be adapted to the transfer learning setting. Results on two datasets (Yahoo’s Learning to Rank Challenge and Microsoft’s LETOR data) show that the proposed method gives robust improvements. 相似文献

4.

[2015年第2期] 多样性图排序的研究现状及展望

下载免费PDF全文

程学旗《中国科学院院刊》2015,30(2)

排序是信息检索、数据挖掘以及社会网络分析的基础工作之一。在线社交网络和社会媒体的快速发展积累了大量的图数据——由表示实体的节点和表示实体间关系的连边构成。图数据中节点之间连接关系复杂, 通常缺少显式的全序结构, 使得图排序在图数据分析中显得尤为重要。图排序算法主要包括 2 大类, 面向节点中心度的图排序算法和面向节点集合多样性的图排序算法。与传统的图排序不同 , 多样性图排序考虑排序和聚类的融合, 体现为节点集合对网络整体的覆盖程度。近年来, 多样性图排序得到了广泛的关注, 取得了一系列研究进展,研究成果成功应用到了搜索结果排序、文档自动摘要、信息推荐系统和影响最大化等诸多场景中。文章评述了多样性图排序的研究现状及主要进展, 将现有的多样性图排序方法按照研究思路的不同分为边际效益最大化、竞争随机游走、聚类与排序互增强 3 类, 分别评述了每类方法的优势和不足。最后指出 , 设计有效的评价指标和标准测试集、克服多样性图排序面临的精度和速度的矛盾等是多样性图排序未来的研究重点。相似文献

5.

A3CRank: An adaptive ranking method based on connectivity,content and click-through data

Ali Mohammad Zareh Bidoki Pedram Ghodsnia Nasser Yazdani Farhad Oroumchian 《Information processing & management》2010

Due to the proliferation and abundance of information on the web, ranking algorithms play an important role in web search. Currently, there are some ranking algorithms based on content and connectivity such as BM25 and PageRank. Unfortunately, these algorithms have low precision and are not always satisfying for users. In this paper, we propose an adaptive method, called A3CRank, based on the content, connectivity, and click-through data triple. Our method tries to aggregate ranking algorithms such as BM25, PageRank, and TF-IDF. We have used reinforcement learning to incorporate user behavior and find a measure of user satisfaction for each ranking algorithm. Furthermore, OWA, an aggregation operator is used for merging the results of the various ranking algorithms. A3CRank adapts itself with user needs and makes use of user clicks to aggregate the results of ranking algorithms. A3CRank is designed to overcome some of the shortcomings of existing ranking algorithms by combining them together and producing an overall better ranking criterion. Experimental results indicate that A3CRank outperforms other combinational ranking algorithms such as Ranking SVM in terms of P@n and NDCG metrics. We have used 130 queries on University of California at Berkeley’s web to train and evaluate our method. 相似文献

6.

ListMAP: Listwise learning to rank as maximum a posteriori estimation

《Information processing & management》2022,59(4):102962

Listwise learning to rank models, which optimize the ranking of a document list, are among the most widely adopted algorithms for finding and ranking relevant documents to user information needs. In this paper, we propose ListMAP, a new listwise learning to rank model with prior distribution that encodes the informativeness of training data and assigns different weights to training instances. The main intuition behind ListMAP is that documents in the training dataset do not have the same impact on training a ranking function. ListMAP formalizes the listwise loss function as a maximum a posteriori estimation problem in which the scoring function must be estimated such that the log probability of the predicted ranked list is maximized given a prior distribution on the labeled data. We provide a model for approximating the prior distribution parameters from a set of observation data. We implement the proposed learning to rank model using neural networks. We theoretically discuss and analyze the characteristics of the introduced model and empirically illustrate its performance on a number of benchmark datasets; namely MQ2007 and MQ2008 of the Letor 4.0 benchmark, Set 1 and Set 2 of the Yahoo! learning to rank challenge data set, and Microsoft 30k and Microsoft 10K datasets. We show that the proposed models are effective across different datasets in terms of information retrieval evaluation metrics NDCG and MRR at positions 1, 3, 5, 10, and 20. 相似文献

7.

Towards a unified approach to document similarity search using manifold-ranking of blocks

Xiaojun Wan Jianwu YangJianguo Xiao 《Information processing & management》2008

Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process. 相似文献

8.

Effective aggregation of various summarization techniques

Parth Mehta Prasenjit Majumder 《Information processing & management》2018,54(2):145-158

A large number of extractive summarization techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what are the factors that actually affect these systems. Such meaningful comparison if available can be used to create a robust ensemble of these approaches, which has the possibility to consistently outperform each individual summarization system. In this work we examine the roles of three principle components of an extractive summarization technique: sentence ranking algorithm, sentence similarity metric and text representation scheme. We show that using a combination of several different sentence similarity measures, rather than only one, significantly improves performance of the resultant meta-system. Even simple ensemble techniques, when used in an informed manner, prove to be very effective in improving the overall performance and consistency of summarization systems. A statistically significant improvement of about 5% to 10% in ROUGE-1 recall was achieved by aggregating various sentence similarity measures. As opposed to this aggregation of several ranking algorithms did not show a significant improvement in ROUGE score, but even in this case the resultant meta-systems were more robust than candidate systems. The results suggest that new extractive summarization techniques should particularly focus on defining a better sentence similarity metric and use multiple sentence similarity scores and ranking algorithms in favour of a particular combination. 相似文献

9.

Learning a merge model for multilingual information retrieval

Ming-Feng Tsai Hsin-Hsi Chen Yu-Ting Wang 《Information processing & management》2011

This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging. 相似文献

10.

An improved model predictive control for uncertain systems with input saturation

Ting Shi Hongye SuJian Chu 《Journal of The Franklin Institute》2013

In this work, a new design method of model predictive control (MPC) is proposed for uncertain systems with input constraints. By using a new method to deal with actuator constraints, our method can reduce the conservativeness. For the design of the robust MPC controllers, a sequence of feedback control laws is used and a parameter-dependent Lyapunov function is chosen to further reduce the conservativeness. The effectiveness and performance of our MPC design method are demonstrated by an example. 相似文献

11.

基于直觉模糊信息的多属性排序问题的软集决策方法

赵海燕马卫民孙秉珍《科技管理研究》2017,(12)

针对属性信息具有模糊性的一类多属性方案排序问题,提出一种基于直觉模糊软集的决策方法。直觉模糊软集结合软集和直觉模糊集的优势,对对象近似描述时没有任何限制,并且能够更客观地表达事物的模糊性本质。首先定义直觉模糊软集的综合精确度、综合犹豫度和综合得分值,以及直觉模糊软集水平集的选择值和接受水平。在此基础上,提出关于多属性方案排序的选择值准则、精确度准则、犹豫度准则以及得分值准则。然后,提出一种组合使用各排序准则的多属性方案排序方法。最后,通过数值算例及讨论证实该方法的可行性和有效性。这种方法充分考虑决策者的直觉信息及主观偏好,能够有效完成多属性方案的优选和排序决策。相似文献

12.

RoSAS: Deep semi-supervised anomaly detection with contamination-resilient continuous supervision

《Information processing & management》2023,60(5):103459

Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises contamination-resilient continuous supervisory signals. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%–30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies. 相似文献

13.

DistanceRank: An intelligent ranking algorithm for web pages

Ali Mohammad Zareh Bidoki Nasser Yazdani 《Information processing & management》2008

A fast and efficient page ranking mechanism for web crawling and retrieval remains as a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS and OPIC have been proposed. In this paper, we propose a novel recursive method based on reinforcement learning which considers distance between pages as punishment, called “DistanceRank” to compute ranks of web pages. The distance is defined as the number of “average clicks” between two pages. The objective is to minimize punishment or distance so that a page with less distance to have a higher rank. Experimental results indicate that DistanceRank outperforms other ranking algorithms in page ranking and crawling scheduling. Furthermore, the complexity of DistanceRank is low. We have used University of California at Berkeley’s web for our experiments. 相似文献

14.

Predefined-time tracking of nonlinear strict-feedback systems with time-varying output constraints

《Journal of The Franklin Institute》2022,359(8):3492-3516

In this paper, the problem of the predefined-time tracking with time-varying output constraints (TVOC) is investigated for a class of nonlinear strict-feedback systems. First, the sufficient conditions for the studied problem are presented. Then, a recursive design algorithm of the controller is proposed by backstepping technique. A novel stabilizing function is constructed by adding a fractional term, which is capable of decreasing the asymmetric time-varying Barrier Lyapunov Function (BLF) to the origin within any desired settling time. After that, it is shown that under our proposed control, all the closed-loop signals are bounded, and the tracking error converges to zero within any desired settling time and remains zero thereafter without the violation of the output constraint. The settling time in this paper is not only independent of the design parameters, nor does it depend on the initial conditions, and can be set according to per our will. Finally, two examples are given to illustrate the effectiveness of the proposed method. 相似文献

15.

Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification

Hyoungdong Han Youngjoong Ko Jungyun Seo 《Information processing & management》2007

Automatic text classification is the problem of automatically assigning predefined categories to free text documents, thus allowing for less manual labors required by traditional classification methods. When we apply binary classification to multi-class classification for text classification, we usually use the one-against-the-rest method. In this method, if a document belongs to a particular category, the document is regarded as a positive example of that category; otherwise, the document is regarded as a negative example. Finally, each category has a positive data set and a negative data set. But, this one-against-the-rest method has a problem. That is, the documents of a negative data set are not labeled manually, while those of a positive set are labeled by human. Therefore, the negative data set probably includes a lot of noisy data. In this paper, we propose that the sliding window technique and the revised EM (Expectation Maximization) algorithm are applied to binary text classification for solving this problem. As a result, we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised EM algorithm. The results of our experiments showed that our method achieved better performance than the original one-against-the-rest method in all the data sets and all the classifiers used in the experiments. 相似文献

16.

Forecasting hourly attraction tourist volume with search engine and social media data for decision support

《Information processing & management》2023,60(4):103399

Developing a tourism forecasting function in decision support systems has become critical for businesses and governments. The existing forecasting models considering spatial relations contain insufficient information, and the spatial aggregation of simple tourist volume series limits the forecasting accuracy. Using human-generated search engines and social media data has the potential to address this issue. In this paper, a spatial aggregation-based multimodal deep learning method for hourly attraction tourist volume forecasting is developed. The model first extracts the daily features of attractions from search engine data; then mines the spatial aggregation relationships in social media data and multi-attraction tourist volume data. Finally, the model fuses hourly features with daily features to make forecasting. The model is tested using a dataset containing several attractions with real-time tourist volume at 15-minute intervals from November 27, 2018, to March 18, 2019, in Beijing. And the empirical and Diebold-Mariano test results demonstrate that the proposed framework can outperform state-of-the-art baseline models with statistically significant improvements at the 1% level. Compared with the best baseline model, the MAPE values are reduced by 50.0% and 27.3% in 4A attractions and 5A attractions, respectively; and the RMSE values are reduced by 48.3% and 26.1%, respectively. The method in this paper can be used as a function embedded in the decision support system to help multi-department collaboration. 相似文献

17.

Rank-based self-training for graph convolutional networks

Daniel Carlos Guimarães Pedronette Longin Jan Latecki 《Information processing & management》2021,58(2):102443

Graph Convolutional Networks (GCNs) have been established as a fundamental approach for representation learning on graphs, based on convolution operations on non-Euclidean domain, defined by graph-structured data. GCNs and variants have achieved state-of-the-art results on classification tasks, especially in semi-supervised learning scenarios. A central challenge in semi-supervised classification consists in how to exploit the maximum of useful information encoded in the unlabeled data. In this paper, we address this issue through a novel self-training approach for improving the accuracy of GCNs on semi-supervised classification tasks. A margin score is used through a rank-based model to identify the most confident sample predictions. Such predictions are exploited as an expanded labeled set in a second-stage training step. Our model is suitable for different GCN models. Moreover, we also propose a rank aggregation of labeled sets obtained by different GCN models. The experimental evaluation considers four GCN variations and traditional benchmarks extensively used in the literature. Significant accuracy gains were achieved for all evaluated models, reaching results comparable or superior to the state-of-the-art. The best results were achieved for rank aggregation self-training on combinations of the four GCN models. 相似文献

18.

Answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology

Kyoung-Soo Han Young-In SongSang-Bum Kim Hae-Chang Rim 《Information processing & management》2007

We propose answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology. A passage expansion technique based on simple anaphora resolution is introduced to retrieve more informative sentences, and a phrase extraction method based on syntactic information of the sentences is proposed to generate a more concise answer. In order to rank the phrases, we use several evidences including external definitions and definition terminology. Although external definitions are useful, it is obvious that they cannot cover all the possible targets. The definition terminology score which reflects how the phrase is definition-like is devised to assist the incomplete external definitions. Experimental results show that the proposed answer extraction and ranking method are effective and also show that our proposed system is comparable to state-of-the-art systems. 相似文献

19.

QFD中客户需求权重确定的区间二元语义方法

李震《科技管理研究》2015,(13)

客户需求权重的确定是质量功能展开中最关键、最核心的阶段。针对客户需求评价者可能作出的不同语言评价集,提出一种基于标准化区间二元语义的多属性评价方法。定义区间二元语义、集结算子和可能度排序方法,介绍基于区间二元语义模型的模糊多属性信息评价算法,并通过实例验证该方法的可行性。相似文献

20.

A noise-tolerant graphical model for ranking

Xiubo Geng Tao Qin Tie-Yan Liu Xue-Qi Cheng 《Information processing & management》2012

This paper studies how to learn accurate ranking functions from noisy training data for information retrieval. Most previous work on learning to rank assumes that the relevance labels in the training data are reliable. In reality, however, the labels usually contain noise due to the difficulties of relevance judgments and several other reasons. To tackle the problem, in this paper we propose a novel approach to learning to rank, based on a probabilistic graphical model. Considering that the observed label might be noisy, we introduce a new variable to indicate the true label of each instance. We then use a graphical model to capture the joint distribution of the true labels and observed labels given features of documents. The graphical model distinguishes the true labels from observed labels, and is specially designed for ranking in information retrieval. Therefore, it helps to learn a more accurate model from noisy training data. Experiments on a real dataset for web search show that the proposed approach can significantly outperform previous approaches. 相似文献