共查询到20条相似文献,搜索用时 15 毫秒
1.
Cross-language information retrieval (CLIR) has so far been studied with the assumption that some rich linguistic resources such as bilingual dictionaries or parallel corpora are available. But creation of such high quality resources is labor-intensive and they are not always at hand. In this paper we investigate the feasibility of using only comparable corpora for CLIR, without relying on other linguistic resources. Comparable corpora are text documents in different languages that cover similar topics and are often naturally attainable (e.g., news articles published in different languages at the same time period). We adapt an existing cross-lingual word association mining method and incorporate it into a language modeling approach to cross-language retrieval. We investigate different strategies for estimating the target query language models. Our evaluation results on the TREC Arabic–English cross-lingual data show that the proposed method is effective for the CLIR task, demonstrating that it is feasible to perform cross-lingual information retrieval with just comparable corpora. 相似文献
2.
When speaking of information retrieval, we often mean text retrieval. But there exist many other forms of information retrieval applications. A typical example is collaborative filtering that suggests interesting items to a user by taking into account other users’ preferences or tastes. Due to the uniqueness of the problem, it has been modeled and studied differently in the past, mainly drawing from the preference prediction and machine learning view point. A few attempts have yet been made to bring back collaborative filtering to information (text) retrieval modeling and subsequently new interesting collaborative filtering techniques have been thus derived. In this paper, we show that from the algorithmic view point, there is an even closer relationship between collaborative filtering and text retrieval. Specifically, major collaborative filtering algorithms, such as the memory-based, essentially calculate the dot product between the user vector (as the query vector in text retrieval) and the item rating vector (as the document vector in text retrieval). Thus, if we properly structure user preference data and employ the target user’s ratings as query input, major text retrieval algorithms and systems can be directly used without any modification. In this regard, we propose a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data. Besides confirming the rationale of the framework, our preliminary experimental results have also demonstrated the effectiveness of the approach in using text retrieval models and systems to perform item ranking tasks in collaborative filtering. 相似文献
3.
Meriem Amina Zingla Chiraz Latiri Philippe Mulhem Catherine Berrut Yahya Slimani 《Information Retrieval》2018,21(4):337-367
Query expansion (QE) is an important process in information retrieval applications that improves the user query and helps in retrieving relevant results. In this paper, we introduce a hybrid query expansion model (HQE) that investigates how external resources can be combined to association rules mining and used to enhance expansion terms generation and selection. The HQE model can be processed in different configurations, starting from methods based on association rules and combining it with external knowledge. The HQE model handles the two main phases of a QE process, namely: the candidate terms generation phase and the selection phase. We propose for the first phase, statistical, semantic and conceptual methods to generate new related terms for a given query. For the second phase, we introduce a similarity measure, ESAC, based on the Explicit Semantic Analysis that computes the relatedness between a query and the set of candidate terms. The performance of the proposed HQE model is evaluated within two experimental validations. The first one addresses the tweet search task proposed by TREC Microblog Track 2011 and an ad-hoc IR task related to the hard topics of the TREC Robust 2004. The second experimental validation concerns the tweet contextualization task organized by INEX 2014. Global results highlighted the effectiveness of our HQE model and of association rules mining for QE combined with external resources. 相似文献
4.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target
languages in response to a user query in a single source language. In a multilingual federated search environment, different
information sources contain documents in different languages. A general search strategy in multilingual federated search environments
is to translate the user query to each language of the information sources and run a monolingual search in each information
source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information
sources that are in different languages. This is known as the results merging problem for multilingual information retrieval.
Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the
other side, a more effective merging method was proposed to download and translate all retrieved documents into the source
language and generate the final ranked list by running a monolingual search in the search client. The latter method is more
effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective
and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small
number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing
both the query-based translation method and the document-based translation method. Then, query-specific and source-specific
transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These
transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can
be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded
and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other
alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results
merging algorithm with different transformation models. This paper also provides thorough experimental results as well as
detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results
of the cross-language evaluation forum-CLEF 2005, 2005).
相似文献
Hao YuanEmail: |
5.
In this paper, which treats Swedish full text retrieval, the problem of morphological variation of query terms in the document database is studied. The Swedish CLEF 2003 test collection was used, and the effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Four of the seven tested combinations involved indexing strategies that used normalization, a form of conflation. All of these four combinations employed compound splitting, both during indexing and at query phase. SWETWOL, a morphological analyzer for the Swedish language, was used for normalization and compound splitting. A fifth combination used stemming, while a sixth attempted to group related terms by right hand truncation of query terms. The truncation was performed by a search expert. These six combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. Both the truncation combination, the four combinations based on normalization and the stemming combination outperformed the baseline. Truncation had the best performance. The main conclusion of the paper is that truncation, normalization and stemming enhanced retrieval effectiveness in comparison to the baseline. Further, normalization and stemming were not far below truncation. 相似文献
6.
In retrieving medical free text, users are often interested in answers pertinent to certain scenarios that correspond to common
tasks performed in medical practice, e.g., treatment or diagnosis of a disease. A major challenge in handling such queries is that scenario terms in the query (e.g., treatment) are often too general to match specialized terms in relevant documents (e.g., chemotherapy). In this paper, we propose a knowledge-based query expansion method that exploits the UMLS knowledge source to append the
original query with additional terms that are specifically relevant to the query's scenario(s). We compared the proposed method
with traditional statistical expansion that expands terms which are statistically correlated but not necessarily scenario
specific. Our study on two standard testbeds shows that the knowledge-based method, by providing scenario-specific expansion,
yields notable improvements over the statistical method in terms of average precision-recall. On the OHSUMED testbed, for
example, the improvement is more than 5% averaging over all scenario-specific queries studied and about 10% for queries that
mention certain scenarios, such as treatment of a disease and differential diagnosis of a symptom/disease.
相似文献
Wesley W. ChuEmail: |
7.
中文期刊全文数据库检索方法与技巧 总被引:5,自引:0,他引:5
在人类迈入信息时代的今天,掌握计算机信息检索技能,已成为各类专业人员的基本功.目前,无论是普通信息用户,还是专职检索人员,均存在着检索经验不足,检索水平不高的问题.为此,文章以国内影响最大、用户最多的2个全文数据库为例,对其检索功能及特点进行分析比较,并就如何制定、优化检索策略进行了探讨. 相似文献
8.
The effectiveness of a video retrieval system largely depends on the choice of underlying text and image retrieval components. The unique properties of video collections (e.g., multiple sources, noisy features and temporal relations) suggest we examine the performance of these retrieval methods in such a multimodal environment, and identify the relative importance of the underlying retrieval components. In this paper, we review a variety of text/image retrieval approaches as well as their individual components in the context of broadcast news video. Numerous components of text/image retrieval have been discussed in detail, including retrieval models, text sources, temporal expansion methods, query expansion methods, image features, and similarity measures. For each component, we conduct a series of retrieval experiments on TRECVID video collections to identify their advantages and disadvantages. To provide a more complete coverage of video retrieval, we briefly discuss an emerging approach called concept-based video retrieval, and review strategies for combining multiple retrieval outputs. 相似文献
9.
本介绍了选择TRS全检索系统建设动态图书馆网站的背景;TRS系统的体系结构,各部分的功能;详细探讨了TRS系统在图书馆网站中的应用,结合建设实践,探讨了在应用中的一些实际问题的解决。 相似文献
10.
Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping
Matthew S. Simpson Dina Demner-Fushman Sameer K. Antani George R. Thoma 《Information Retrieval》2014,17(3):229-264
The images found within biomedical articles are sources of essential information useful for a variety of tasks. Due to the rapid growth of biomedical knowledge, image retrieval systems are increasingly becoming necessary tools for quickly accessing the most relevant images from the literature for a given information need. Unfortunately, article text can be a poor substitute for image content, limiting the effectiveness of existing text-based retrieval methods. Additionally, the use of visual similarity by content-based retrieval methods as the sole indicator of image relevance is problematic since the importance of an image can depend on its context rather than its appearance. For biomedical image retrieval, multimodal approaches are often desirable. We describe in this work a practical multimodal solution for indexing and retrieving the images contained in biomedical articles. Recognizing the importance of text in determining image relevance, our method combines a predominately text-based image representation with a limited amount of visual information, in the form of quantized content-based visual features, through a process called global feature mapping. The resulting multimodal image surrogates are easily indexed and searched using existing text-based retrieval systems. Our experimental results demonstrate that our multimodal strategy significantly improves upon the retrieval accuracy of existing approaches. In addition, unlike many retrieval methods that utilize content-based visual features, the response time of our approach is negligible, making it suitable for use with large collections. 相似文献
11.
12.
On information retrieval metrics designed for evaluation with incomplete relevance assessments 总被引:1,自引:0,他引:1
Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments
has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention.
This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance
test collections with submitted runs—the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data
from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation
environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of
Type I Error, and on Kendall’s rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance
data sets. According to these experiments, Q′, nDCG′ and AP′ proposed by Sakai are superior to bpref proposed by Buckley and
Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased
Precision by examining their formal definitions.
相似文献
Noriko KandoEmail: |
13.
宋代是中国古籍版本学史上极为重要的时期。以版本鉴定为例,宋人初步形成了自己的方法体系。具有开创之功。本文从内容鉴定和形式鉴定两大方面,对宋人版本鉴定方法进行了总结和评价。 相似文献
14.
考古发掘中原始陶符的不断出现,为我们探求编辑思想的起源提供了某种可能。丁公陶文反映出先民联符成篇、整齐排列的质朴的编辑意识,而龙虬庄陶符则反映出图符互补的朦胧的编辑审美观。从某种角度说,汉字繁衍史就是不断对部件进行重新组合的历史。这种组合思维方式,与后世编辑活动中纂辑、组构思想一脉相承。事实上,正是由于文字编辑的存在,才推动着文字演变的升级换代。 相似文献
15.
基于TRS的文献检索课学生作业提交系统的设计与实现 总被引:1,自引:0,他引:1
系统介绍了利用TRS产品开发并实现“文献检索课学生作业提交系统’’的途径与方法,运行印证表明,系统能有效地减轻教师批改作业的工作量,方便学生作业提交。对提高文检课教学质量有重要价值。 相似文献
16.
谈科技期刊编辑的检索意识 总被引:7,自引:2,他引:7
科技论文经审稿、编辑加工到发表,然后被二次文献机构和数据库收录在检索工具里供读者使用,在这个过程中,检索发挥了重要作用.准确、合理的检索方法能提高信息传递的快速性和有效性;而编辑作为接触原始论文的第一人,其工作质量不仅影响到论文本身的质量,而且因其对论文题名、摘要、关键词的编辑将直接影响到二次文献机构和数据库的标引和收录,进而影响到读者的检索效率和所编期刊的被引频次而显得尤为重要. 相似文献
17.
18.
Robert W. P. Luk 《Information Retrieval》2008,11(6):539-561
This paper discusses various issues about the rank equivalence of Lafferty and Zhai between the log-odds ratio and the query
likelihood of probabilistic retrieval models. It highlights that Robertson’s concerns about this equivalence may arise when
multiple probability distributions are assumed to be uniformly distributed, after assuming that the marginal probability logically
follows from Kolmogorov’s probability axioms. It also clarifies that there are two types of rank equivalence relations between
probabilistic models, namely strict and weak rank equivalence. This paper focuses on the strict rank equivalence which requires
the event spaces of the participating probabilistic models to be identical. It is possible that two probabilistic models are
strict rank equivalent when they use different probability estimation methods. This paper shows that the query likelihood,
p(q|d, r), is strict rank equivalent to p(q|d) of the language model of Ponte and Croft by applying assumptions 1 and 2 of Lafferty and Zhai. In addition, some statistical
component language model may be strict rank equivalent to the log-odds ratio, and that some statistical component model using
the log-odds ratio may be strict rank equivalent to the query likelihood. Finally, we suggest adding a random variable for
the user information need to the probabilistic retrieval models for clarification when these models deal with multiple requests. 相似文献
19.
With the help of a team of expert biologist judges, the TREC Genomics track has generated four large sets of “gold standard”
test collections, comprised of over a hundred unique topics, two kinds of ad hoc retrieval tasks, and their corresponding
relevance judgments. Over the years of the track, increasingly complex tasks necessitated the creation of judging tools and
training guidelines to accommodate teams of part-time short-term workers from a variety of specialized biological scientific
backgrounds, and to address consistency and reproducibility of the assessment process. Important lessons were learned about
factors that influenced the utility of the test collections including topic design, annotations provided by judges, methods
used for identifying and training judges, and providing a central moderator “meta-judge”. 相似文献
20.
关于建立科技查新信息资源保障体系的探讨 总被引:9,自引:0,他引:9
本文论述了科技查新中文献信息资源保障的重要性,并探讨了建立科技查新的文献信息资源保障体系的可行性、总体思路、内涵及措施。 相似文献