期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval 总被引：2，自引：0，他引：2

Mirna Adriani 《Information Retrieval》2000,2(1):71-82

With the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a crucial step. We propose a sense disambiguation technique based on a term-similarity measure for selecting the right translation sense of a query term. In addition, we apply a query expansion technique which is also based on the term similarity measure to improve the effectiveness of the translation queries. The results of our Indonesian to English and English to Indonesian CLIR experiments demonstrate the effectiveness of the sense disambiguation technique. As for the query expansion technique, it is shown to be effective as long as the term ambiguity in the queries has been resolved. In the effort to solve the term ambiguity problem, we discovered that differences in the pattern of word-formation between the two languages render query translations from one language to the other difficult. 相似文献

2.

一种用于主题提取的非线性加权方法 总被引：15，自引：0，他引：15

韩客松王永成《情报学报》2000,19(6):650-653

主题提取是文本处理的一项重要工作。本文首先分析了主题抽取中加权方法形成时的一些定量问题,然后提出了主题相关词一种非线性加权处理方法,对比实验结果显示它不仅是一种比较稳健的方法,而且能在一定程度上提高主题提取的正确率。相似文献

3.

Incremental Relevance Feedback in Japanese Text Retrieval

Gareth Jones Tetsuya Sakai Masahiro Kajiura Kazuo Sumita 《Information Retrieval》2000,2(4):361-384

The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using number-to-view graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. 相似文献

4.

中文全文检索技术的研究及实现 总被引：9，自引：0，他引：9

李梅王庆林《情报学报》2003,22(1):10-17

本文设计了一个中文全文检索系统 ,在单汉字全文数据库的基础之上进行了全文检索的算法研究 ,提出了针对特定检索策略的计算公式。同时还对检索结果集的排序问题进行了讨论 ,并采用用户反馈信息量 ,使最后检出的结果在应用中不断得到优化相似文献

5.

基于关键词共现分析的检索结果聚类研究

李枫林何洲芳《情报学报》2011,30(8)

随着互联网规模的急剧扩张,提升信息检索的效用变得相当困难.本文首先通过特定算法提取每篇文档的关键词,然后运用统计方法计量不同文档的共现关键词并形成相应的共现关键词标签矩阵,最后利用层次聚类算法对共现关键词标签进行聚类并形成相应的层次标签树来构造文档聚类束.该方法可以对源搜索引擎返回的结果进行有效的分类,使用户在更高主题层次上查看检索词的相关信息,准确地找到感兴趣的信息.通过与Lingo算法的比较,显示本文算法所得的标签更具可读性和概括性,同时F-measure评价指标也表明本算法在文本聚类的质量上有了一定的提升. 相似文献

6.

一种面向语义的信息检索方法 总被引：1，自引：0，他引：1

张明宝马静《情报学报》2009,28(4)

传统的信息检索技术忽视了语义对检索过程的影响,这是造成查准率不高的一个重要原因.论文提出了一种面向语义的信息检索方法,该方法强调使用基于知网的语义处理技术实现对用户查询需求和目标文档的语义标注,使用基于知网的词汇链技术实现对文档特征词汇的过滤.一方面可以实现语义级别的检索匹配,另一方面可以降低大量无关词对检索结果的干扰.论文描述了一个实现该方法的信息检索系统SOIRS,并且利用该系统与传统检索系统做了对比实验.实验结果表明面向语义的信息检索方法在查准率方面要明显优于传统信息检索方法. 相似文献

7.

信息检索系统中的相关反馈技术 总被引：2，自引：0，他引：2

宋玲丽成颖单启成《情报学报》2005,24(1):34-41

本文论述了布尔模型、向量空间模型以及概率模型中所采用的相关反馈技术,其中主要集中于检索词权值调整以及查询扩展等两项技术。作者还讨论了相关反馈技术对检索性能影响的评估方法,并提出了相关反馈在实际应用中需要解决的问题。相似文献

8.

On Event Spaces and Probabilistic Models in Information Retrieval

Stephen?Robertson Email author 《Information Retrieval》2005,8(2):319-329

A basic notion of probability theory is the event space, on which the probability measure is defined. A probabilistic model needs an event space. However, some classes of events (which we may want to model probabilistically) exhibit structure which does not fit well into the traditional event space notion. A simple one-to-many example is discussed at length. The information retrieval case, involving queries, documents and relevance, is analysed. The event space issue makes for some difficulty in comparing different probabilistic models in IR.Revised version of a paper presented at the MF/IR Workshop, SIGIR 2002, Tampere, Finland, under the title On Bayesian models and event spaces in information retrieval. 相似文献

9.

利用语音识别进行信息检索 总被引：7，自引：1，他引：7

陈海英于金辉《情报学报》2003,22(1):18-21

本文先简要介绍语音识别技术的主要思想 ,然后着重讨论利用该技术进行网上文字、图像、语音和音频信息检索并针对目前受语音识别技术水平所限而带来的问题提出解决方法。相似文献

10.

Applying Machine Learning to Text Segmentation for Information Retrieval 总被引：2，自引：0，他引：2

Xiangji Huang Fuchun Peng Dale Schuurmans Nick Cercone Stephen E. Robertson 《Information Retrieval》2003,6(3-4):333-362

We propose a self-supervised word segmentation technique for text segmentation in Chinese information retrieval. This method combines the advantages of traditional dictionary based, character based and mutual information based approaches, while overcoming many of their shortcomings. Experiments on TREC data show this method is promising. Our method is completely language independent and unsupervised, which provides a promising avenue for constructing accurate multi-lingual or cross-lingual information retrieval systems that are flexible and adaptive. We find that although the segmentation accuracy of self-supervised segmentation is not as high as some other segmentation methods, it is enough to give good retrieval performance. It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. However, for Chinese, we find that the relationship between segmentation and retrieval performance is in fact nonmonotonic; that is, at around 70% word segmentation accuracy an over-segmentation phenomenon begins to occur which leads to a reduction in information retrieval performance. We demonstrate this effect by presenting an empirical investigation of information retrieval on Chinese TREC data, using a wide variety of word segmentation algorithms with word segmentation accuracies ranging from 44% to 95%, including 70% word segmentation accuracy from our self-supervised word-segmentation approach. It appears that the main reason for the drop in retrieval performance is that correct compounds and collocations are preserved by accurate segmenters, while they are broken up by less accurate (but reasonable) segmenters, to a surprising advantage. This suggests that words themselves might be too broad a notion to conveniently capture the general semantic meaning of Chinese text. Our research suggests machine learning techniques can play an important role in building adaptable information retrieval systems and different evaluation standards for word segmentation should be given to different applications. 相似文献

11.

基于查询词出现的相关度改进

赵东生单栋栋闫宏飞《情报学报》2011,30(4)

对信息检索系统返回结果相关度的改进,一直是信息检索领域重要的研究内容。本文首先引入查询词出现信息的概念,随后给出了查询词出现权重的形式化表示,进而将其与BM25模型结合起来。对于查询词出现权重的计算,本文采用了两种方法,即线性加权方法和因数加权方法。我们通过在GOV2数据集上的实验发现,无论哪种方法,通过加入查询词出现权重,都可以有效的改进检索结果的相关度。实验显示,对于TREC 2005的查询,MAP值的改进达到15.78%,p@10的改进达到3468%。本文所描述的方法已经应用到TREC 2009的WebTrack中。相似文献

12.

Using Corpus-Based Approaches in a System for Multilingual Information Retrieval

Martin Braschler Peter Schäuble 《Information Retrieval》2000,3(3):273-284

We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison. 相似文献

13.

Information Retrieval can Cope with Many Errors

Elke Mittendorf Peter Schäuble 《Information Retrieval》2000,3(3):189-216

The retrieval of documents that originate from digitized and OCR-converted paper documents is an important task for modern retrieval systems. The problems that OCR errors cause for the retrieval process have been subject to research for several years now. We approach the problem from a theoretical point of view and model OCR conversion as a random experiment. Our theoretical results, which are supported by experiments, show clearly that information retrieval can cope even with many errors. It is, however, important that the documents are not too short and that recognition errors are distributed appropriately among words and documents. These results disclose that an expensive manual or automatic post-processing of OCR-converted documents usually does not make sense, but that scanning and OCR must be performed in an appropriate way and with care. 相似文献

14.

汉语分词对中文搜索引擎检索性能的影响 总被引：3，自引：0，他引：3

金澎刘毅王树梅《情报学报》2006,25(1):21-24

针对中文网页的特点,研究了汉语分词对中文搜索引擎检索性能的影响。首先介绍中文分词在搜索引擎中的作用,然后介绍常用的分词算法。作者利用网页特征,提出一个简单的“带启发性规则的双向匹配分词策略”。最后,在10G的语料库中,就各种分词算法对查全率和查准率的影响进行了实验比较,结果表明分词性能和检索性能没有正比关系。相似文献

15.

按照FRBR模型构造书目检索体系的思路 总被引：1，自引：0，他引：1

富平《数字图书馆论坛》2008,(2):28-39

FRBR模型应用是当前数字图书馆研究的热点,是文献机构解决海量数字资源及多类型元数据检索的理论基础和应用模型。但在数字图书馆应用层面,利用FRBR的理论框架构造书目检索体系还在探索中,本文通过对国家图书馆典型元数据实例的分析,提出FRBR模型应用的思路与方法。相似文献

16.

论网络环境下加强文献揭示方式的组配检索、扩检、缩检功能 总被引：8，自引：0，他引：8

胡燕菘伍宪徐建华《图书馆论坛》2000,20(4):29-31,46

较详尽地论述了网络环境下加强文献揭示方式的组配检索、扩检、缩检功能的一些方法。相似文献

17.

利用ASP实现网络数据库的多方式检索 总被引：9，自引：0，他引：9

李贺王平《情报学报》2000,19(6):592-597

ASP是微软公司新近推出的解决网络数据库的完全方案。本文对ASP的内嵌对象,服务器组件和工作流程进行了详尽地介绍,并以一汽集团网络数据库为例,提出利用ASP实现网络数据库的多种检索方式的方法。相似文献

18.

主题词、自由词、关键词标引的问题及对策

熊定富《重庆图情研究》2007,8(3):54-56

分析了图书馆在文献主题标引中存在的问题并提出了解决这些问题的对策。相似文献

19.

试论信息检索途径的多样性 总被引：7，自引：0，他引：7

赵玉玲滕飞《重庆图情研究》2007,8(1):40-43,34

有感于平时上信息检索课找不到一泰关于信息检索途径方面的完整资料，于是自己动手归纳。文章在简要介绍信息检索概念和检索原理的基础上，从文献的形式特征和内容特征两个方面提出了多种信息检索途径，如题名途径、著者途径、序号途径、引文途径、“名称”途径、来源途径、关联途径、年代途径、分类号途径、摘要途径、代码途径、主题词途径，主题词途径包括标题词途径、元词途径、叙词途径、关键词途径等等。相似文献

20.

An Ontology-Based Binary-Categorization Approach for Recognizing Multiple-Record Web Documents Using a Probabilistic Retrieval Model

Quan Wang Yiu-Kai Ng 《Information Retrieval》2003,6(3-4):295-332

The Web contains a tremendous amount of information. It is challenging to determine which Web documents are relevant to a user query, and even more challenging to rank them according to their degrees of relevance. In this paper, we propose a probabilistic retrieval model using logistic regression for recognizing multiple-record Web documents against an application ontology, a simple conceptual modeling approach. We notice that many Web documents contain a sequence of chunks of textual information, each of which constitutes a record. This type of documents is referred to as multiple-record documents. In our categorization approach, a document is represented by a set of term frequencies of index terms, a density heuristic value, and a grouping heuristic value. We first apply the logistic regression analysis on relevant probabilities using the (i) index terms, (ii) density value, and (iii) grouping value of each training document. Hereafter, the relevant probability of each test document is interpolated from the fitting curves. Contrary to other probabilistic retrieval models, our model makes only a weak independent assumption and is capable of handling any important dependent relationships among index terms. In addition, we use logistic regression, instead of linear regression analysis, because the relevance probabilities of training documents are discrete. Using a test set of car-ads and another one for obituary Web documents, our probabilistic model achieves the averaged recall ratio of 100%, precision ratio of 83.3%, and accuracy ratio of 92.5%. 相似文献