首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
On the heterogeneous web information spaces, users have been suffering from efficiently searching for relevant information. This paper proposes a mediator agent system to estimate the semantics of unknown web spaces by learning the fragments gathered during the users' focused crawling. This process is organized as the following three tasks; (i) gathering semantic information about web spaces from personal agents while focused crawling in unknown spaces, (ii) reorganizing the information by using ontology alignment algorithm, and (iii) providing relevant semantic information to personal agents right before focused crawling. It makes the personal agent possible to recognize the corresponding user's behaviors in semantically heterogeneous spaces and predict his searching contexts. For the experiments, we implemented comparison-shopping system with heterogeneous web spaces. As a result, our proposed method efficiently supported the users, and then, network traffic was also reduced. An erratum to this article can be found at  相似文献   

2.
Information Retrieval Systeme haben in den letzten Jahren nur geringe Verbesserungen in der Retrieval Performance erzielt. Wir arbeiten an neuen Ans?tzen, dem sogenannten Collaborativen Information Retrieval (CIR), die das Potential haben, starke Verbesserungen zu erreichen. CIR ist die Methode, mit der durch Ausnutzen von Informationen aus früheren Anfragen die Retrieval Peformance für die aktuelle Anfrage verbessert wird. Wir haben ein eingeschr?nktes Szenario, in dem nur alte Anfragen und dazu relevante Antwortdokumente zur Verfügung stehen. Neue Ans?tze für Methoden der Query Expansion führen unter diesen Bedingungen zu Verbesserungen der Retrieval Performance . The accuracy of ad-hoc document retrieval systems has reached a stable plateau in the last few years. We are working on so-called collaborative information retrieval (CIR) systems which have the potential to overcome the current limits. We define CIR as a task, where an information retrieval (IR) system uses information gathered from previous search processes from one or several users to improve retrieval performance for the current user searching for information. We focus on a restricted setting in CIR in which only old queries and correct answer documents to these queries are available for improving a new query. For this restricted setting we propose new approaches for query expansion procedures. We show how CIR methods can improve overall IR performance.
CR Subject Classification H.3.3  相似文献   

3.
4.
5.
Recently direct optimization of information retrieval (IR) measures has become a new trend in learning to rank. In this paper, we propose a general framework for direct optimization of IR measures, which enjoys several theoretical advantages. The general framework, which can be used to optimize most IR measures, addresses the task by approximating the IR measures and optimizing the approximated surrogate functions. Theoretical analysis shows that a high approximation accuracy can be achieved by the framework. We take average precision (AP) and normalized discounted cumulated gains (NDCG) as examples to demonstrate how to realize the proposed framework. Experiments on benchmark datasets show that the algorithms deduced from our framework are very effective when compared to existing methods. The empirical results also agree well with the theoretical results obtained in the paper.  相似文献   

6.
浅析基于内容的视频信息检索技术   总被引:1,自引:0,他引:1  
本文对视频信息及其特点进行了简要介绍,对基于内容的视频检索技术进行了详细的探讨,并就检索技术的发展现状预测了基于内容的视频检索技术的未来发展趋势。  相似文献   

7.
Internet空间中的信息资源是异构的,人们要想从Internet中发现,收集和维护自己需要的信息则要花费大量的时间和精力。本文针对Internet检索系统存在的不足,进行了Agent对旅游信息资源个性化检索的研究。  相似文献   

8.
In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.  相似文献   

9.
信息检索扩展技术研究   总被引:1,自引:0,他引:1  
本文针对信息检索在查询扩展方面的不足,提出了一种结合本体理论和用户相关反馈技术的查询扩展方法。以FirteX作为检索平台, 选取WordNet作为本体扩展资源来验证本文所提出的查询扩展算法,实现结果表明该方法比基于余弦相似性的查询扩展方法在平均查全率、平均查准率方面有更大的优点。  相似文献   

10.
基于Web 的《信息检索》多媒体CAI课件的设计制作   总被引:1,自引:0,他引:1  
论述了基于Web的《信息检索》多媒体CAI课件的内容结构及其特点,着重探讨"专业文献信息检索与利用"课件的设计与制作技术.认为基于Web的《信息检索》多媒体CAI课件的设计与制作思路为:设计课件的内容结构-选择制作软件-搜集材料并制作课件脚本-填充课件树状结构各链接的相关内容-适当地运用色彩、声音、图像及动画效果准确而形象地表达信息内容.  相似文献   

11.
To evaluate Information Retrieval Systems on their effectiveness, evaluation programs such as TREC offer a rigorous methodology as well as benchmark collections. Whatever the evaluation collection used, effectiveness is generally considered globally, averaging the results over a set of information needs. As a result, the variability of system performance is hidden as the similarities and differences from one system to another are averaged. Moreover, the topics on which a given system succeeds or fails are left unknown. In this paper we propose an approach based on data analysis methods (correspondence analysis and clustering) to discover correlations between systems and to find trends in topic/system correlations. We show that it is possible to cluster topics and systems according to system performance on these topics, some system clusters being better on some topics. Finally, we propose a new method to consider complementary systems as based on their performances which can be applied for example in the case of repeated queries. We consider the system profile based on the similarity of the set of TREC topics on which systems achieve similar levels of performance. We show that this method is effective when using the TREC ad hoc collection.  相似文献   

12.
基于网络的信息检索教学设计   总被引:2,自引:0,他引:2  
随着网络的发展与教学环境的变化,信息检索的教学方式与教学内容发生了很大变化.文章对网络环境下高校信息检索的教育背景及其重要性进行分析,并对提高信息检索教育提出对策和建议.  相似文献   

13.
A standard approach to Information Retrieval (IR) is to model text as a bag of words. Alternatively, text can be modelled as a graph, whose vertices represent words, and whose edges represent relations between the words, defined on the basis of any meaningful statistical or linguistic relation. Given such a text graph, graph theoretic computations can be applied to measure various properties of the graph, and hence of the text. This work explores the usefulness of such graph-based text representations for IR. Specifically, we propose a principled graph-theoretic approach of (1) computing term weights and (2) integrating discourse aspects into retrieval. Given a text graph, whose vertices denote terms linked by co-occurrence and grammatical modification, we use graph ranking computations (e.g. PageRank Page et al. in The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998) to derive weights for each vertex, i.e. term weights, which we use to rank documents against queries. We reason that our graph-based term weights do not necessarily need to be normalised by document length (unlike existing term weights) because they are already scaled by their graph-ranking computation. This is a departure from existing IR ranking functions, and we experimentally show that it performs comparably to a tuned ranking baseline, such as BM25 (Robertson et al. in NIST Special Publication 500-236: TREC-4, 1995). In addition, we integrate into ranking graph properties, such as the average path length, or clustering coefficient, which represent different aspects of the topology of the graph, and by extension of the document represented as a graph. Integrating such properties into ranking allows us to consider issues such as discourse coherence, flow and density during retrieval. We experimentally show that this type of ranking performs comparably to BM25, and can even outperform it, across different TREC (Voorhees and Harman in TREC: Experiment and evaluation in information retrieval, MIT Press, 2005) datasets and evaluation measures.  相似文献   

14.
There have been a number of linear, feature-based models proposed by the information retrieval community recently. Although each model is presented differently, they all share a common underlying framework. In this paper, we explore and discuss the theoretical issues of this framework, including a novel look at the parameter space. We then detail supervised training algorithms that directly maximize the evaluation metric under consideration, such as mean average precision. We present results that show training models in this way can lead to significantly better test set performance compared to other training methods that do not directly maximize the metric. Finally, we show that linear feature-based models can consistently and significantly outperform current state of the art retrieval models with the correct choice of features.
  相似文献   

15.
Over the last three decades, research in Information Retrieval (IR) shows performance improvement when many sources of evidence are combined to produce a ranking of documents. Most current approaches assess document relevance by computing a single score which aggregates values of some attributes or criteria. They use analytic aggregation operators which either lead to a loss of valuable information, e.g., the min or lexicographic operators, or allow very bad scores on some criteria to be compensated with good ones, e.g., the weighted sum operator. Moreover, all these approaches do not handle imprecision of criterion scores. In this paper, we propose a multiple criteria framework using a new aggregation mechanism based on decision rules identifying positive and negative reasons for judging whether a document should get a better ranking than another. The resulting procedure also handles imprecision in criteria design. Experimental results are reported showing that the suggested method performs better than standard aggregation operators.  相似文献   

16.
We present a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank, which has been shown to be empirically optimal for a widely used information retrieval measure. Our algorithm is based on boosted regression trees, although the ideas apply to any weak learners, and it is significantly faster in both train and test phases than the state of the art, for comparable accuracy. We also show how to find the optimal linear combination for any two rankers, and we use this method to solve the line search problem exactly during boosting. In addition, we show that starting with a previously trained model, and boosting using its residuals, furnishes an effective technique for model adaptation, and we give significantly improved results for a particularly pressing problem in web search—training rankers for markets for which only small amounts of labeled data are available, given a ranker trained on much more data from a larger market.  相似文献   

17.
In this article, we introduce an out-of-the-box automatic term weighting method for information retrieval. The method is based on measuring the degree of divergence from independence of terms from documents in terms of their frequency of occurrence. Divergence from independence has a well-establish underling statistical theory. It provides a plain, mathematically tractable, and nonparametric way of term weighting, and even more it requires no term frequency normalization. Besides its sound theoretical background, the results of the experiments performed on TREC test collections show that its performance is comparable to that of the state-of-the-art term weighting methods in general. It is a simple but powerful baseline alternative to the state-of-the-art methods with its theoretical and practical aspects.  相似文献   

18.
A structured document retrieval (SDR) system aims to minimize the effort users spend to locate relevant information by retrieving parts of documents. To evaluate the range of SDR tasks, from element to passage to tree retrieval, numerous task-specific measures have been proposed. This has resulted in SDR evaluation measures that cannot easily be compared with respect to each other and across tasks. In previous work, we defined the SDR task of tree retrieval where passage and element are special cases. In this paper, we look in greater detail into tree retrieval to identify the main components of SDR evaluation: relevance, navigation, and redundancy. Our goal is to evaluate SDR within a single probabilistic framework based on these components. This framework, called Extended Structural Relevance (ESR), calculates user expected gain in relevant information depending on whether it is seen via hits (relevant results retrieved), unseen via misses (relevant results not retrieved), or possibly seen via near-misses (relevant results accessed via navigation). We use these expectations as parameters to formulate evaluation measures for tree retrieval. We then demonstrate how existing task-specific measures, if viewed as tree retrieval, can be formulated, computed and compared using our framework. Finally, we experimentally validate ESR across a range of SDR tasks.  相似文献   

19.
基于Agent的信息检索课程网络教学系统设计   总被引:1,自引:0,他引:1  
Agent是人工智能和计算机软件领域中一种新兴的技术,其在远程教育中的应用具有很大的优越性.文章在分析传统的基于web信息检索课远程教育不足的基础上,探讨了用Agent思想解决远程教育的问题,并设计了一个基于Agent的信息检索课教学系统模型.  相似文献   

20.
The application of visualization techniques to information retrieval (IR) has resulted in the development of innovative systems and interfaces that are now available for public use. Visualization tools have emerged in research environments and more recently on the Web to retrieve information. Questions arise in regard to the utility of Web-based IR visualization tools for assisting users not only in manipulating search output, but also in managing the information retrieval process. To understand how Web-based visualization tools enable visual information retrieval, this article reviews some of the human perceptual theory behind the graphical interface of information visualization systems, analyzes iconic representations and information density on visualization displays, and examines information retrieval tasks that have been used in visualization system user research. This article is timely since it addresses new technologies for Web information retrieval and discusses future information visualization user research directions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号