首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
In this paper, we introduce a novel knowledge-based word-sense disambiguation (WSD) system. In particular, the main goal of our research is to find an effective way to filter out unnecessary information by using word similarity. For this, we adopt two methods in our WSD system. First, we propose a novel encoding method for word vector representation by considering the graphical semantic relationships from the lexical knowledge bases, and the word vector representation is utilized to determine the word similarity in our WSD system. Second, we present an effective method for extracting the contextual words from a text for analyzing an ambiguous word based on word similarity. The results demonstrate that the suggested methods significantly enhance the baseline WSD performance in all corpora. In particular, the performance on nouns is similar to those of the state-of-the-art knowledge-based WSD models, and the performance on verbs surpasses that of the existing knowledge-based WSD models.  相似文献   

2.
In this paper, we present the state of the art in the field of information retrieval that is relevant for understanding how to design information retrieval systems for children. We describe basic theories of human development to explain the specifics of young users, i.e., their cognitive skills, fine motor skills, knowledge, memory and emotional states in so far as they differ from those of adults. We derive the implications these differences have on the design of information retrieval systems for children. Furthermore, we summarize the main findings about children’s search behavior from multiple user studies. These findings are important to understand children’s information needs, their search strategies and usage of information retrieval systems. We also identify several weaknesses of previous user studies about children’s information-seeking behavior. Guided by the findings of these user studies, we describe challenges for the design of information retrieval systems for young users. We give an overview of algorithms and user interface concepts. We also describe existing information retrieval systems for children, in specific web search engines and digital libraries. We conclude with a discussion of open issues and directions for further research. The survey provided in this paper is important both for designers of information retrieval systems for young users as well as for researchers who start working in this field.  相似文献   

3.
Evaluation research on information retrieval (IR) systems has thus far been narrowly focused and disjointed. This research attempts to narrow the gap by providing a comprehensive and integrated multiple criteria decision-theoretic approach for the evaluation of IR systems. The approach, which is based on the Analytic Hierarchy Process (AHP), is illustrated in the context of a domain-specific IR system. The novelty of this approach lies in the focus on the user aspect and the application of decision-making theories in the IR field.  相似文献   

4.
Contextual document clustering is a novel approach which uses information theoretic measures to cluster semantically related documents bound together by an implicit set of concepts or themes of narrow specificity. It facilitates cluster-based retrieval by assessing the similarity between a query and the cluster themes’ probability distribution. In this paper, we assess a relevance feedback mechanism, based on query refinement, that modifies the query’s probability distribution using a small number of documents that have been judged relevant to the query. We demonstrate that by providing only one relevance judgment, a performance improvement of 33% was obtained.  相似文献   

5.
In this paper, the scalability and quality of the contextual document clustering (CDC) approach is demonstrated for large data-sets using the whole Reuters Corpus Volume 1 (RCV1) collection. CDC is a form of distributional clustering, which automatically discovers contexts of narrow scope within a document corpus. These contexts act as attractors for clustering documents that are semantically related to each other. Once clustered, the documents are organized into a minimum spanning tree so that the topical similarity of adjacent documents within this structure can be assessed. The pre-defined categories from three different document category sets are used to assess the quality of CDC in terms of its ability to group and structure semantically related documents given the contexts. Quality is evaluated based on two factors, the category overlap between adjacent documents within a cluster, and how well a representative document categorizes all the other documents within a cluster. As the RCV1 collection was collated in a time ordered fashion, it was possible to assess the stability of clusters formed from documents within one time interval when presented with new unseen documents at subsequent time intervals. We demonstrate that CDC is a powerful and scaleable technique with the ability to create stable clusters of high quality. Additionally, to our knowledge this is the first time that a collection as large as RCV1 has been analyzed in its entirety using a static clustering approach.  相似文献   

6.
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR.  相似文献   

7.
A new concept of a bipolar query against collections of textual documents, i.e. in the context of information retrieval (IR), is introduced using recent developments in bipolar information modeling and bipolar database queries. Specifically, a particular approach to bipolar queries with an explicit “and possibly” type of an aggregation operator is used. An effective and efficient processing of such bipolar queries using standard IR data structures is briefly discussed. The bipolar queries proposed combine a flexibility provided by fuzzy logic with a more sophisticated representation of user preferences and intentions. This combination can make the search of vast resources of textual document, notably those available via the Internet, more intelligent.  相似文献   

8.
This research focuses specifically on uncertainty and information seeking in a digital environment. In this research we argue that different types of uncertainty are associated with the information seeking process and that, with the proliferation of new and different search tools, sources and channels, uncertainty, positive/desirable or negative/undesirable, continues to be a significant factor in the search process. Users may feel uncertain at any stage of the information search and retrieval process and uncertainty may remain even after completion of the process resulting in what may be called persistent uncertainty. An online questionnaire was used to collect data from users in the higher education sector. There were three parts to the questionnaire focusing on: information seeking activities, information seeking problems, and access to specific information channels or sources. Quantitative analysis was carried out on the data collected through the online questionnaire. A total of 668 responses were returned from the chosen user categories of academic staff, research staff and research students. This research has shown that there are some information seeking activities and information seeking problems that are the most common causes of uncertainty among significant number of users from different disciplines, age, gender, ICT skills, etc. This is also the case with respect to access to and use of specific information sources/channels, although the degrees of uncertainty in relation are relatively small. Possible implications of this study and further research issues are indicated.  相似文献   

9.
Documents circulating in paper form are increasingly being substituted by its electronic equivalent in the modern office today so that any stored document can be retrieved whenever needed later on. The office worker is already burdened with information overload, so effective and efficient retrieval facilities become an important factor affecting worker productivity. This paper first reviews the features of current document management systems with varying facilities to manage, store and retrieve either reference to documents or whole documents. Information retrieval databases, groupware products and workflow management systems are presented as developments to handle different needs, together with the underlying concepts of knowledge management. The two problems of worker finiteness and worker ignorance remain outstanding, as they are only partially addressed by the above-mentioned systems. The solution lies in a shift away from pull technology where the user has to actively initiate the request for information towards push technology, where available information is automatically delivered without user intervention. Intelligent information retrieval agents are presented as a solution together with a marketing scenario of how they can be introduced.  相似文献   

10.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

11.
Information Retrieval (IR) develops complex systems, constituted of several components, which aim at returning and optimally ranking the most relevant documents in response to user queries. In this context, experimental evaluation plays a central role, since it allows for measuring IR systems effectiveness, increasing the understanding of their functioning, and better directing the efforts for improving them. Current evaluation methodologies are limited by two major factors: (i) IR systems are evaluated as “black boxes”, since it is not possible to decompose the contributions of the different components, e.g., stop lists, stemmers, and IR models; (ii) given that it is not possible to predict the effectiveness of an IR system, both academia and industry need to explore huge numbers of systems, originated by large combinatorial compositions of their components, to understand how they perform and how these components interact together.We propose a Combinatorial visuaL Analytics system for Information Retrieval Evaluation (CLAIRE) which allows for exploring and making sense of the performances of a large amount of IR systems, in order to quickly and intuitively grasp which system configurations are preferred, what are the contributions of the different components and how these components interact together.The CLAIRE system is then validated against use cases based on several test collections using a wide set of systems, generated by a combinatorial composition of several off-the-shelf components, representing the most common denominator almost always present in English IR systems. In particular, we validate the findings enabled by CLAIRE with respect to consolidated deep statistical analyses and we show that the CLAIRE system allows the generation of new insights, which were not detectable with traditional approaches.  相似文献   

12.
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment.Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2.  相似文献   

13.
This paper investigates the influence of user characteristics (e.g. search experience and cognitive skills) on user effectiveness. A user study was conducted to investigate this effect, 56 participants completed searches for 56 topics using the TREC test collection. Results indicated that participants with search experience and high cognitive skills were more effective than those with less experience and slower perceptual abilities. However, all users rated themselves with the same level of satisfaction with the search results despite the fact they varied substantially in their effectiveness. Therefore, information retrieval evaluators should take these factors into consideration when investigating the impact of system effectiveness on user effectiveness.  相似文献   

14.
Mobile agent technology has been used in various applications including e-commerce, information processing, distributed network management, and database access. Information search and retrieval can be conducted by mobile agents in a decentralized system. As compared with the client/server model, the mobile agent approach has an advantage of saving network bandwidth and offering flexibility in information search and retrieval. In this paper, we present a model for mobile agents to select the most reputable information host to search and retrieve information. We use opinion-based belief structure to represent, aggregate and calculate the reputation of an information host. Since reputation is a multi-faced concept, our approach first allows the users to rank each information host's quality of service based on a set of evaluation categories. Then, a comprehensive, final reputation of the host is obtained by aggregating those specific category reputations. To recognize the subjective nature of a reputation, the transferable belief model is used to represent and rank the category reputation. Experiments are conducted using the Aglets technology to illustrate mobile agent migration.  相似文献   

15.
In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.  相似文献   

16.
The Condorcet fusion is a distinctive fusion method and was found useful in information retrieval. Two basic requirements for the Condorcet fusion to improve retrieval effectiveness are: (1) all component systems involved should be more or less equally effective; and (2) each information retrieval system should be developed independently and thus each component result is more or less equally different from the others. These two requirements may not be satisfied in many cases, then weighted Condorcet becomes a good option. However, how to assign weights for the weighted Condorcet has not been investigated.  相似文献   

17.
This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual representations is proposed. First, we mine the relationships between text and images and employ the mined relationships to construct visual queries from textual ones. Then, the retrieval results of textual and visual queries are combined. To evaluate the proposed approach, we conduct English monolingual and Chinese–English cross-language retrieval experiments. The selection of suitable textual query terms to construct visual queries is the major issue. Experimental results show that the proposed approach improves retrieval performance, and use of nouns is appropriate to generate visual queries.  相似文献   

18.
The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.  相似文献   

19.
In data fusion, the linear combination method is a very flexible method since different weights can be assigned to different systems. However, it remains an open question which weighting schema should be used. In some previous investigations and experiments, a simple weighting schema was used: for a system, its weight is assigned as its average performance over a group of training queries. However, it is not clear if this weighting schema is good or not. In some other investigations, different numerical optimisation methods were used to search for appropriate weights for the component systems. One major problem with those numerical optimisation methods is their low efficiency. It might not be feasible to use them in some situations, for example in some dynamic environments, system weights need to be updated from time to time for reasonably good performance. In this paper, we investigate the weighting issue by extensive experiments. The key point is to try to find the relation between performances of component systems and their corresponding weights which can lead to good fusion performance. We demonstrate that a series of power functions of average performance, which can be implemented as efficiently as the simple weighting schema, is more effective than the simple weighting schema for the linear data fusion method. Some other features of the power function weighting schema and the linear combination method are also investigated. The observations obtained from this study can be used directly in fusion applications of component retrieval results. The observations are also very useful for optimisation methods to choose better starting points and therefore to obtain more effective weights more quickly.  相似文献   

20.
Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号