首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Adapting information retrieval to query contexts   总被引:1,自引:0,他引:1  
In current IR approaches documents are retrieved only according to the terms specified in the query. The same answers are returned for the same query whatever the user and the search goal are. In reality, many other contextual factors strongly influence document’s relevance and they should be taken into account in IR operations. This paper proposes a method, based on language modeling, to integrate several contextual factors so that document ranking will be adapted to the specific query contexts. We will consider three contextual factors in this paper: the topic domain of the query, the characteristics of the document collection, as well as context words within the query. Each contextual factor is used to generate a new query language model to specify some aspect of the information need. All these query models are then combined together to produce a more complete model for the underlying information need. Our experiments on TREC collections show that each contextual factor can positively influence the IR effectiveness and the combined model results in the highest effectiveness. This study shows that it is both beneficial and feasible to integrate more contextual factors in the current IR practice.  相似文献   

2.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

3.
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.  相似文献   

4.
马巍 《情报科学》2006,24(7):1066-1068
本文介绍了用以词为基础的概念学习法来自动扩展提问式的算法,该算法通过学习出现在当前提问中的概念描述词来逐词扩展提问。实验表明,与传统的向量空间检索模型及相关反馈算法相比,本算法能大大提高查全率和查准率。该方法可用于数字图书馆和WWW等的检索中。  相似文献   

5.
This paper presents a relevance model to rank the facts of a data warehouse that are described in a set of documents retrieved with an information retrieval (IR) query. The model is based in language modeling and relevance modeling techniques. We estimate the relevance of the facts by the probability of finding their dimensions values and the query keywords in the documents that are relevant to the query. The model is the core of the so-called contextualized warehouse, which is a new kind of decision support system that combines structured data sources and document collections. The paper evaluates the relevance model with the Wall Street Journal (WSJ) TREC test subcollection and a self-constructed fact database.  相似文献   

6.
研究RUP(Rational unified process)技术在ERP系统建模中的应用。运用UML建模语言与Rational统一过程进行ERP系统建模,在RUP建模框架的基础上建立加入业务领域维视角的三维建模框架,并利用多视图方法进行RUP过程中的业务建模,即以工作流视图为核心结合功能视图、资源视图、组织视图与信息视图实现其业务模型,然后通过业务用例与系统用例的映射实现业务模型与系统需求模型、系统分析模型、系统设计模型和系统实现模型之间的一一对应。达到RUP技术在ERP系统建模中的应用。  相似文献   

7.
The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias–variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias–variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias–variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias–variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.  相似文献   

8.
This paper proposes an approach to tackle the problem of querying large volume of statistical RDF data. Our approach relies on pre-aggregation strategies to better manage the analysis of this kind of data. Specifically, we define a conceptual model to represent original RDF data with aggregates in a multidimensional structure. A set of translations rules for converting a well-known multidimensional RDF modelling vocabulary into the proposed conceptual model is then proposed. We implement the conceptual model in six different data stores: two RDF triple stores (Jena TDB and Virtuoso), one graph-oriented NoSQL database (Neo4j), one column-oriented data store (Cassandra), and two relational databases (MySQL and PostGreSQL). We compare the querying performance, with and without aggregates, in these data stores. Experimental results, on real-world datasets containing 81.92 million triplets, show that pre-aggregation allows for reducing query runtime in all data stores. Neo4j NoSQL and relational databases with aggregates outperform triple stores speeding up to 99% query runtime.  相似文献   

9.
We propose a new query reformulation approach, using a set of query concepts that are introduced to precisely denote the user’s information need. Since a document collection is considered to be a domain which includes latent primitive concepts, we identify those concepts through a local pattern discovery and a global modeling using data mining techniques. For a new query, we select its most associated primitive concepts and choose the most probable interpretations as query concepts. We discuss the issue of constructing the primitive concepts from either the whole corpus or from the retrieved set of documents. Our experiments are performed on the TREC8 collection. The experimental evaluation shows that our approach is as good as current query reformulation approaches, while being particularly effective for poorly performing queries. Moreover, we find that the approach using the primitive concepts generated from the set of retrieved documents leads to the most effective performance.  相似文献   

10.
There are several language constructs and mechanisms that provide some sort of support for conceptual modeling in object-oriented programming: the Liskov substitution principle, the Meyer programming by contract, the Beta inner construct, interfaces in C# and Java, and the separation of subtype hierarchies from subclass hierarchies as in Timor and Sather. All these mechanisms and constructs are powerful and useful tools to enforce the conceptual modeling trait of the inheritance mechanism in object-oriented programming. Their purpose is to ensure semantic compatibility of classes related by an inheritance hierarchy. When applied independently, these mechanisms can lead to more correct inheritance hierarchies that are easy to understand, use, and extend. This article discusses the different mechanisms and studies the interaction between them. It investigates whether conceptual modeling can be satisfactorily achieved by using such tools. It will be shown that the interaction between these tools might lead to contradictions and might prevent legitimate inheritance hierarchies. This article proposes a framework for a conceptual modeling mechanism that will better support conceptual modeling at the language level.  相似文献   

11.
Ontologies are frequently used in information retrieval being their main applications the expansion of queries, semantic indexing of documents and the organization of search results. Ontologies provide lexical items, allow conceptual normalization and provide different types of relations. However, the optimization of an ontology to perform information retrieval tasks is still unclear. In this paper, we use an ontology query model to analyze the usefulness of ontologies in effectively performing document searches. Moreover, we propose an algorithm to refine ontologies for information retrieval tasks with preliminary positive results.  相似文献   

12.
In the KL divergence framework, the extended language modeling approach has a critical problem of estimating a query model, which is the probabilistic model that encodes the user’s information need. For query expansion in initial retrieval, the translation model had been proposed to involve term co-occurrence statistics. However, the translation model was difficult to apply, because the term co-occurrence statistics must be constructed in the offline time. Especially in a large collection, constructing such a large matrix of term co-occurrences statistics prohibitively increases time and space complexity. In addition, reliable retrieval performance cannot be guaranteed because the translation model may comprise noisy non-topical terms in documents. To resolve these problems, this paper investigates an effective method to construct co-occurrence statistics and eliminate noisy terms by employing a parsimonious translation model. The parsimonious translation model is a compact version of a translation model that can reduce the number of terms containing non-zero probabilities by eliminating non-topical terms in documents. Through experimentation on seven different test collections, we show that the query model estimated from the parsimonious translation model significantly outperforms not only the baseline language modeling, but also the non-parsimonious models.  相似文献   

13.
沉积微相约束条件下的随机地质建模方法及应用研究   总被引:12,自引:0,他引:12  
以地质建模为主要研究对象,根据油田开发地质研究需要完善的方法技术,以建模方法研究为基础,进行了沉积微相约束地质建模综合研究.在分析目前国内外研究现状的基础上,针对我国开发地质研究工作的特点,提出根据测井解释成果,应用随机模拟的方法定量地开展沉积微相研究.应用沉积微相研究成果约束测井储层参数的分布,形成油藏数值模拟研究所需要的准确的储层三维地质模型,解决了常规油田开发地质研究过程中沉积相研究成果与储层参数分布无法有效结合的缺陷.应用与该方法相适应的IRMS软件成功实现了山东胜利油区孤东油田第七开发区西部上第三系馆陶组油藏精确的沉积微相模型,由此微相约束形成的油藏储层三维地质模型数据体直接应用于油藏数值模拟研究,应用效果及现场实施的效果良好.  相似文献   

14.
This study was undertaken to characterize the information requirements of cancer researchers who were specifically interested in human biological specimens at a comprehensive cancer center, and to determine if existing information systems could meet those needs. Information required by the cancer center researchers at the University of Pittsburgh Cancer Institute (UPCI, Pittsburgh, PA) was identified through interviews, query analysis, and analysis of publications. For topical matters, the study found that the most frequent types of questions were the following: clinical (50.18%), prognosis (17.87%), diagnosis/disorder-based (50.72%), and research-oriented (51.9%) queries. In terms of the required data elements, pathology data (17.32%) was the most frequently required, followed by clinical history and outcomes (15.18%). In addition, the study identified the 10 main questions, concerning human biological samples, and the majority of the questions were represented in a fairly discrete set of information spaces that could be well mapped into the conceptual data model created through the study. The results found in this study can be used for an initial data modeling, when creating a biomedical research data warehouse that would support the majority of the transitional research requirements of the UPCI.  相似文献   

15.
Many of the approaches to image retrieval on the Web have their basis in text retrieval. However, when searchers are asked to describe their image needs, the resulting query is often short and potentially ambiguous. The solution we propose is to perform automatic query expansion using Wikipedia as the source knowledge base, resulting in a diversification of the search results. The outcome is a broad range of images that represent the various possible interpretations of the query. In order to assist the searcher in finding images that match their specific intentions for the query, we have developed an image organization method that uses both the conceptual information associated with each image, and the visual features extracted from the images. This, coupled with a hierarchical organization of the concepts, provides an interactive interface that takes advantage of the searchers’ abilities to recognize relevant concepts, filter and focus the search results based on these concepts, and visually identify relevant images while navigating within the image space. In this paper, we outline the key features of our image retrieval system (CIDER), and present the results of a preliminary user evaluation. The results of this study illustrate the potential benefits that CIDER can provide for searchers conducting image retrieval tasks.  相似文献   

16.
The notion of improvisation has recently emerged in managerial studies as a viable answer to flexibly dealing with unexpected occurrences. Nonetheless, research on improvisation has essentially approached the issue through a metaphorical framework, and has regularly relied on conceptual frameworks residing either at the individual or the team-level. We investigate how team-level processes affect individual improvisation in complex project domains. Using data from 138 team leaders and members belonging to 38 information systems development teams, we test cross-level hypotheses through hierarchical linear modeling. Team behavioral integration and team cohesion positively affect individual improvisation. Moreover, cohesion positively moderates the influence of team behavioral integration on individual improvisation. In concluding this paper we offer theoretical and practical implications.  相似文献   

17.
董丕彦  马巍 《情报科学》2004,22(8):967-970
本文介绍了利用相关词进行提问扩展的算法。该算法建立在检索词模糊聚类的基础上,聚类以检索词在文献中共同出现为标准,与提问中检索词相关的群集形成提问的上下文,群集中属于上下文的检索词可用于提问的扩展。实验表明该算法提高了检准率。  相似文献   

18.
This paper conducts an inquiry into regional transliteration variants across Chinese speaking regions. We begin by studying the social association of regional transliterations, followed by postulating a computational model for effective transliteration extraction from the Web. In the computational model, we first propose constraint-based exploration by incorporating transliteration knowledge from transliteration modeling and predictive query suggestions from search engines into query formulation as constraints so as to increase the chance of desired transliteration returns in learning regional transliteration variants. Then, we study a cross-training algorithm, which explores the attainably helpful information of transliteration mappings across related regional corpora for the learning of transliteration models, to improve the overall extraction performance. The experimental results show that the proposed method not only effectively harvests a lexicon of regional transliteration variants but also mitigates the need of manual data labeling for transliteration modeling. We also carry out an investigation into the underlying characteristics of regional transliterations that motivate the cross-training algorithm.  相似文献   

19.
An iterative method for information retrieval is presented. It uses searchonyms found from the previously retrieved set of documents in query expansion. Only largest values of relation of resemblance between the query and the documents are used to form the feedback seed. From this top retrieved set of documents, most informative features are selected as searchonyms, which are subsequently used in query reformulation. Large operational bibliographic data bases are used to simulate the behavior of this method.  相似文献   

20.
One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号