首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Evaluating the effectiveness of content-oriented XML retrieval methods   总被引:1,自引:0,他引:1  
Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should be retrieved. In this article, we show that the evaluation methods developed for standard retrieval must be modified in order to deal with the structure of XML documents. More precisely, the size and overlap of document components must be taken into account. For this purpose, we propose a new effectiveness metric based on the definition of a concept space defined upon the notions of exhaustiveness and specificity of a search result. We compare the results of this new metric by the results obtained with the official metric used in INEX, the evaluation initiative for content-oriented XML retrieval.
Gabriella KazaiEmail:
  相似文献   

2.
XML及基于XML的广播式检索   总被引:3,自引:0,他引:3  
郭少友 《情报学报》2002,21(5):568-572
本文比较详细地介绍了XML的主要特点 ,并简要介绍了DTD和DOM技术 ,然后以对多个图书馆馆藏进行检索为例 ,初步探讨了利用XML技术进行广播式检索的基本思路。  相似文献   

3.
XML在图书馆采访工作中的应用   总被引:3,自引:0,他引:3  
本文讨论了利用因特网和XML元数据在图书馆采访自动化系统与书业系统之间进行电子数据交换(EDI)的应用可能 ,并设计实现了一个基于XML文档的图书采访中心和图书馆自动化系统应用实例  相似文献   

4.
XML及其在图书馆的应用   总被引:10,自引:1,他引:10  
文章在简要介绍了XML的定义、特点、功能与用途后,探讨了XML技术在图书馆的应用前景。  相似文献   

5.
This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments (General and Specific) and two categories of topics (Broad and Narrow). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call Coherent Retrieval Elements. The results of our experiments show that—when each of the three systems is evaluated against different retrieval scenarios (such as different cases of relevance assessments, different topic categories and different choices of evaluation metrics)—the XML retrieval systems exhibit varying behaviour and the best performance can be reached for different values of the retrieval parameters. In the case of INEX 2003 relevance assessments for the content-only topics, our newly developed hybrid XML retrieval system is substantially more effective than either Zettair or eXist, and yields a robust and a very effective XML retrieval.  相似文献   

6.
7.
对比分析了PubMed,BIOSISPreviews,EMBASE.corn3个数据库的收录情况、检索结果、关注度,为医学科研定题或立项检索时合理选择英文医学检索工具提供依据,提高外文文献的查全率。  相似文献   

8.
XML在数字图书馆的应用   总被引:3,自引:0,他引:3  
本文介绍了XML的产生、发展和数字图书馆的产生背景及概念,并对XML在数字图书馆中的应用进行探讨。  相似文献   

9.
阐述检索标识专指度各种概念,检索效率概念,检索标识专指度与检索效率的关系以及在文献检索全过程的三个环节中提高专指度的措施,专指度的适度控制问题,自然语言检索中的专指度问题.  相似文献   

10.
随着知识经济的迅猛发展和我国加入WTO,知识产权在经济发展中的地位与作用越来越重要。面对国内外市场一体化的激烈竞争和国际统一市场规则的约束,我们必须提高对知识产权的认识,增强全民的专利意识,加强对专利的开发、运用和保护。本文探讨了WTO环境下专利文献的作用及网上中外专利文献的检索与利用。  相似文献   

11.
A literature review of pedagogical methods for teaching and learning information retrieval is presented. From the analysis of the literature a taxonomy was built and it is used to structure the paper. Information Retrieval (IR) is presented from different points of view: technical levels, educational goals, teaching and learning methods, assessment and curricula. The review is organized around two levels of abstraction which form a taxonomy that deals with the different aspects of pedagogy as applied to information retrieval. The first level looks at the technical level of delivering information retrieval concepts, and at the educational goals as articulated by the two main subject domains where IR is delivered: computer science (CS) and library and information science (LIS). The second level focuses on pedagogical issues, such as teaching and learning methods, delivery modes (classroom, online or e-learning), use of IR systems for teaching, assessment and feedback, and curricula design. The survey, and its bibliography, provides an overview of the pedagogical research carried out in the field of IR. It also provides a guide for educators on approaches that can be applied to improving the student learning experiences.  相似文献   

12.
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being ‘friendly’ to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.  相似文献   

13.
信息检索中"相关性"的探究   总被引:3,自引:0,他引:3  
本文从“相关性”的动态、多维的内涵出发,探讨了“相关性”的影响因素,即信息源、检索系统、用户、时间与环境,最后得出了“相关性”对建立信息检索系统的一些启示。  相似文献   

14.
自然语言检索中的词汇控制研究   总被引:2,自引:0,他引:2  
本文概要分析了自然语言检索中词汇控制的必要性、措施、特点及发展的新趋势。  相似文献   

15.
Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.
Mounia LalmasEmail:
  相似文献   

16.
基于XML的MARC研究   总被引:3,自引:1,他引:3  
本文分析了机读目录MARC在未来数字化图书馆应用的局限性,并提出了改进方案,以哈尔滨工业大学为例,对其采用的中文机读目录CNMARC格式进行了XML转换的尝试,从而使得MARC书目数据库和Internet上的非书目数据库的集成成为可能.本文的研究对于现有MARC数据在未来数字图书馆中的利用具有重要意义.  相似文献   

17.
XQuery——一种全新的XML查询语言   总被引:5,自引:0,他引:5  
随着XML被广泛接受和使用 ,如何从XML数据源中准确有效地查询所需信息 ,也就变得越来越重要。本文介绍了一种全新XML查询语言—XQuery ,阐述了XQuery的结论、应用范围、支撑基础和应用前景  相似文献   

18.
XML文档自动聚类研究   总被引:6,自引:4,他引:6  
潘有能 《情报学报》2006,25(2):215-220
本文在文本聚类的基础上对XML文档自动聚类进行了研究,对划分聚类法和层次聚类法进行了改进,使之适合于XML文档聚类;给出了元素比较法、边集比较法和编辑距离法等三种计算文档间相似度的方法,并利用实际数据进行了测试和分析。  相似文献   

19.
本文简述了XML的概念、特点、构成及其在数字图书馆中的应用.  相似文献   

20.
面向Web的数据挖掘是一个非常前沿的研究问题,其主要目标就是找出符合Web的数据结构及相关模型。现在,人们通常把Web的结构看作是半结构化的。面向Web的数据挖掘首要解决的是寻找半结构化数据源模型问题。以XML为基础的新一代WWW环境是直接面对Web数据的,不仅可以良好地兼容原有的Web应用,而且可以实现Web中的信息共享与交换。XML是"可扩展标记语言"的缩写。XML规格是由全球信息网标准制定组织(W3C)制定,并于1992年成为推荐规格,目前已有许多家厂商采用,且视为关键性技术,如Adobe、IBM、Microsoft、Netscape、Oracle、Sun等。目前许多新版的软件,如Navigator、Internet、Explorer及RealPlayer,都已经在软件内部使用了XML的技术。XML技术在Web数据挖掘中应用于以下几方面:  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号