首页 | 本学科首页   官方微博 | 高级检索  
 共查询到13条相似文献,搜索用时 15 毫秒
The Condorcet fusion is a distinctive fusion method and was found useful in information retrieval. Two basic requirements for the Condorcet fusion to improve retrieval effectiveness are: (1) all component systems involved should be more or less equally effective; and (2) each information retrieval system should be developed independently and thus each component result is more or less equally different from the others. These two requirements may not be satisfied in many cases, then weighted Condorcet becomes a good option. However, how to assign weights for the weighted Condorcet has not been investigated.  相似文献   

Measuring effectiveness of information retrieval (IR) systems is essential for research and development and for monitoring search quality in dynamic environments. In this study, we employ new methods for automatic ranking of retrieval systems. In these methods, we merge the retrieval results of multiple systems using various data fusion algorithms, use the top-ranked documents in the merged result as the “(pseudo) relevant documents,” and employ these documents to evaluate and rank the systems. Experiments using Text REtrieval Conference (TREC) data provide statistically significant strong correlations with human-based assessments of the same systems. We hypothesize that the selection of systems that would return documents different from the majority could eliminate the ordinary systems from data fusion and provide better discrimination among the documents and systems. This could improve the effectiveness of automatic ranking. Based on this intuition, we introduce a new method for the selection of systems to be used for data fusion. For this purpose, we use the bias concept that measures the deviation of a system from the norm or majority and employ the systems with higher bias in the data fusion process. This approach provides even higher correlations with the human-based results. We demonstrate that our approach outperforms the previously proposed automatic ranking methods.  相似文献   

The data fusion technique has been investigated by many researchers and has been used in implementing several information retrieval systems. However, the results from data fusion vary in different situations. To find out under which condition data fusion may lead to performance improvement is an important issue. In this paper, we present an analysis of the behaviour of several well-known methods such as CombSum and CombMNZ for fusion of multiple information retrieval results. Based on this analysis, we predict the performance of the data fusion methods. Experiments are conducted with three groups of results submitted to TREC 6, TREC 2001, and TREC 2004. The experiments show that the prediction of the performance of data fusion is quite accurate, and it can be used in situations very different from the training examples. Compared with previous work, our result is more accurate and in a better position for applications since various number of component systems can be supported while only two was used previously.  相似文献   

Nowadays, access to information requires managing multimedia databases effectively, and so, multi-modal retrieval techniques (particularly images retrieval) have become an active research direction. In the past few years, a lot of content-based image retrieval (CBIR) systems have been developed. However, despite the progress achieved in the CBIR, the retrieval accuracy of current systems is still limited and often worse than only textual information retrieval systems. In this paper, we propose to combine content-based and text-based approaches to multi-modal retrieval in order to achieve better results and overcome the lacks of these techniques when they are taken separately. For this purpose, we use a medical collection that includes both images and non-structured text. We retrieve images from a CBIR system and textual information through a traditional information retrieval system. Then, we combine the results obtained from both systems in order to improve the final performance. Furthermore, we use the information gain (IG) measure to reduce and improve the textual information included in multi-modal information retrieval systems. We have carried out several experiments that combine this reduction technique with a visual and textual information merger. The results obtained are highly promising and show the profit obtained when textual information is managed to improve conventional multi-modal systems.  相似文献   

It has been known that retrieval effectiveness can be significantly improved by combining multiple evidence from different query or document representations, or multiple retrieval techniques. In this paper, we combine multiple evidence from different relevance feedback methods, and investigate various aspects of the combination. We first generate multiple query vectors for a given information problem in a fully automatic way by expanding an initial query vector with various relevance feedback methods. We then perform retrieval runs for the multiple query vectors, and combine the retrieval results. Experimental results show that combining the evidence of different relevance feedback methods can lead to substantial improvements of retrieval effectiveness.  相似文献   

In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.  相似文献   

How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR.  相似文献   

The Authority and Ranking Effects play a key role in data fusion. The former refers to the fact that the potential relevance of a document increases exponentially as the number of systems retrieving it increases and the latter to the phenomena that documents higher up in ranked lists and found by more systems are more likely to be relevant. Data fusion methods commonly use all the documents returned by the different retrieval systems being compared. Yet, as documents further down in the result lists are considered, a document’s probability of being relevant decreases significantly and a major source of noise is introduced. This paper presents a systematic examination of the Authority and Ranking Effects as the number of documents in the result lists, called the list depth, is varied. Using TREC 3, 7, 8, 12 and 13 data, it is shown that the Authority and Ranking Effects are present at all list depths. However, if the systems in the same TREC track retrieve a large number of relevant documents, then the Ranking Effect only begins to emerge as more systems have found the same document and/or the list depth increases. It is also shown that the Authority and Ranking Effects are not an artifact of how the TREC test collections have been constructed.  相似文献   

张鹤琼  叶青 《大众科技》2013,(5):190-191
针对《医学信息检索与利用》这门课程在我校各医学专业教学的现状与问题,将PBL与LBL相结合的教学法引入教学中,既提高了教师的自身素质,又培养了学生的自主学习能力,形成教学相长的良好循环。  相似文献   

Automatic text classification is the problem of automatically assigning predefined categories to free text documents, thus allowing for less manual labors required by traditional classification methods. When we apply binary classification to multi-class classification for text classification, we usually use the one-against-the-rest method. In this method, if a document belongs to a particular category, the document is regarded as a positive example of that category; otherwise, the document is regarded as a negative example. Finally, each category has a positive data set and a negative data set. But, this one-against-the-rest method has a problem. That is, the documents of a negative data set are not labeled manually, while those of a positive set are labeled by human. Therefore, the negative data set probably includes a lot of noisy data. In this paper, we propose that the sliding window technique and the revised EM (Expectation Maximization) algorithm are applied to binary text classification for solving this problem. As a result, we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised EM algorithm. The results of our experiments showed that our method achieved better performance than the original one-against-the-rest method in all the data sets and all the classifiers used in the experiments.  相似文献   

王雅南 《中国科技信息》2011,(4):251-252,242
结合信息检索课程性质与高职院校课程改革实际,以信息资源类型为载体,设计了基于工作过程的高职信息检索课学习情境,分析研究学习情境的构建、具体的学习任务及其相互之间的逻辑联系,以及理→实一体化教学组织形式的学法指导与考核评价体系。  相似文献   

The activities in our current world are mainly supported by data-driven web applications, making extensive use of databases and data services. Such phenomenon led to the rise of Data Scientists as professionals of major relevance, which extract value from data and create state-of-the-art data artifacts that generate even more increased value. During the last years, the term Data Scientist attracted significant attention. Consequently, it is relevant to understand its origin, knowledge base and skills set, in order to adequately describe its profile and distinguish it from others like Business Analyst. This work proposes a conceptual model for the professional profile of a Data Scientist and evaluates the representativeness of this profile in two commonly recognized competences/skills frameworks in the field of Information and Communications Technology (ICT), namely in the European e-Competence (e-CF) framework and the Skills Framework for the Information Age (SFIA). The results indicate that a significant part of the knowledge base and skills set of Data Scientists are related with ICT competences/skills, including programming, machine learning and databases. The Data Scientist professional profile has an adequate representativeness in these two frameworks, but it is mainly seen as a multi-disciplinary profile, combining contributes from different areas, such as computer science, statistics and mathematics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号