首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
When speaking of information retrieval, we often mean text retrieval. But there exist many other forms of information retrieval applications. A typical example is collaborative filtering that suggests interesting items to a user by taking into account other users’ preferences or tastes. Due to the uniqueness of the problem, it has been modeled and studied differently in the past, mainly drawing from the preference prediction and machine learning view point. A few attempts have yet been made to bring back collaborative filtering to information (text) retrieval modeling and subsequently new interesting collaborative filtering techniques have been thus derived. In this paper, we show that from the algorithmic view point, there is an even closer relationship between collaborative filtering and text retrieval. Specifically, major collaborative filtering algorithms, such as the memory-based, essentially calculate the dot product between the user vector (as the query vector in text retrieval) and the item rating vector (as the document vector in text retrieval). Thus, if we properly structure user preference data and employ the target user’s ratings as query input, major text retrieval algorithms and systems can be directly used without any modification. In this regard, we propose a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data. Besides confirming the rationale of the framework, our preliminary experimental results have also demonstrated the effectiveness of the approach in using text retrieval models and systems to perform item ranking tasks in collaborative filtering.  相似文献   

2.
Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval.  相似文献   

3.
网格技术在信息检索中的应用前景分析   总被引:1,自引:0,他引:1  
网格是刚兴起的一门信息技术。基于网格的特性与信息检索存在共通点,本文试图在二者结合点之上建立联系,着重探讨如何利用网格来解决问题,最后对未来进行了展望。  相似文献   

4.
国内文本分类研究论文的统计分析   总被引:1,自引:0,他引:1  
介绍文本分类是一个跨信息检索、机器学习和计算语言学的综合研究领域,是信息处理的重要研究方向,指出它在自动标引、信息检索、文本过滤和文献组织等领域中有着广泛的应用;并通过应用文献计量学的方法对1998-2005年国内文本分类的研究论文进行统计分析,探讨近年来我国文本分类研究现状和主要发展趋势。  相似文献   

5.
This paper describes a national experiment in the licensing of full text information in journals, primarily in the fields of science, technology and medicine. It discusses the initiative of the federal government of Canada through the creation of the Canada Foundation for Innovation as a new funding agency, with an objective of improving research and creativity in Canadian science. The successful efforts initiated by the Canadian Association of Research Libraries/Association des bibliothèques de recherche du Canada to create a funding opportunity to develop the ‘information infrastructure’ for Canadian researchers and the resulting Canadian National Site Licensing Project (CNSLP) progress is discussed. The evolution of a project governance structure to maintain the support of the 64 participating institutions is reviewed and the need to develop an appropriate exit strategy at the conclusion of the federal funding is also considered.  相似文献   

6.
The need to improve information access on the Web has resulted in Illinois’ implementation of lexicographer Dr. Jessica Milstead’s subject tree for the Find-It! Illinois Program. In 1999, when the Illinois State Library joined four other states in implementing a state Government Information Locator Service (GILS) project, developing a controlled vocabulary became an essential component for maximizing retrieval of government information. Furthermore, application of library cataloging tools such as the Library of Congress Subject Headings (LCSH) is insufficient for online retrieval. An analysis of the structure and content of Dr. Milstead’s subject tree reveals the importance of new tools for improving online access methods. Illinois’ implementation of Dr. Milstead’s subject tree exposed the interest for nationwide application. The Illinois subject tree has been named the “Jessica Tree” to convey its expanded utility. The national adoption of a controlled vocabulary for retrieving state government information online will require collaboration among all states, so that the vision of a Find-It! America can be actualized.  相似文献   

7.
The ability to find tables and extract information from them is a necessary component of many information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content presents difficulties for traditional retrieval techniques. This paper describes techniques for extracting tables from text and retrieving answers from the extracted information. We compare machine learning (especially, Conditional Random Fields) and heuristic methods for table extraction. To retrieve answers, our approach creates a cell document, which contains the cell and its metadata (headers, titles) for each table cell, and the retrieval model ranks the cells of the extracted tables using a language-modeling approach. Performance is tested using government statistical Web sites and news articles, and errors are analyzed in order to improve the system.  相似文献   

8.
After a brief discussion of mind mapping and concept mapping, a model for a ‘3‐D index’ is developed which combines a concept map with an index to enable hierarchical and relational data retrieval. It is suggested that such sophisticated information retrieval methods are required for the complex and extensive electronic data sources and encyclopaedic‐type titles now being compiled.  相似文献   

9.
本文从文本挖掘的定义着手,分析了文本挖掘的过程,包括文本预处理,文本知识发现,文本模式的评价以及文本模式的呈现,并详细介绍了文本挖掘在主动信息服务、信息检索系统、专利信息分析等方面的应用.  相似文献   

10.
本文采用BORLANDIDAPI关系数据库集成技术,集成多种关系数据库系统,并用信息存储与检索软件QUICKIMS进行管理,实现对关系数据库的全文检索。对基于PC和基于SQL的关系数据库数据结构、数据访问方式、数据类型进行集成;对基本表和单库或多库查询的结果进行转移,生成QUICKIMS的必要文件和索引;对关系数据库提供布尔检索、前方一致检索、字段限定检索、相邻检索和位置检索等检索方式。采用动态转换关系数据库数据,减少了空间的浪费  相似文献   

11.
基于Internet的汉语后控全文检索系统的研究与开发   总被引:2,自引:0,他引:2  
概述国内外后控词表检索系统的研究状况;重点论述基于iBASE非结构化数据库系统的汉语后控全文检索系统的研制与开发。  相似文献   

12.
基于Web挖掘技术的信息检索系统设计与实现   总被引:2,自引:0,他引:2  
王艳  张帆 《情报学报》2007,26(3):339-343
本文详细介绍一个基于Web文本挖掘技术的信息检索系统的设计与实现。基于Web文本挖掘技术的信息检索技术融合了文本挖掘的思想,它将单一的资源发现或者单一的信息提取的传统的信息检索方法结合起来,从而达到在WWW发现资源并将其中的信息提取出来进行处理的目的。  相似文献   

13.
[目的/意义]运用心流理论设计量表对信息检索体验进行测量与分析,探索个体在信息检索活动中的情感体验规律。[方法/过程]采用体验抽样法在三所大学图书馆对正在进行信息检索的用户进行问卷调查,利用数据统计分析检验量表的质量。依据调查对象在技巧维度、挑战维度及技巧与挑战平衡维度的得分,把样本划分为心流、焦虑、冷漠和无趣四通道,并比较不同通道的体验质量。[结果/结论]结果显示,该量表是测量信息检索体验的有效工具,技巧与挑战水平及两者的匹配程度是影响信息检索体验的关键变量,技巧与挑战匹配且都处于高水平的心流通道体验最佳,冷漠通道体验最差,焦虑和无趣通道居中。  相似文献   

14.
Summarizing Similarities and Differences Among Related Documents   总被引:10,自引:0,他引:10  
In many modern information retrieval applications, a common problem which arises is the existence of multiple documents covering similar information, as in the case of multiple news stories about an event or a sequence of events. A particular challenge for text summarization is to be able to summarize the similarities and differences in information content among these documents. The approach described here exploits the results of recent progress in information extraction to represent salient units of text and their relationships. By exploiting meaningful relations between units based on an analysis of text cohesion and the context in which the comparison is desired, the summarizer can pinpoint similarities and differences, and align text segments. In evaluation experiments, these techniques for exploiting cohesion relations result in summaries which (i) help users more quickly complete a retrieval task (ii) result in improved alignment accuracy over baselines, and (iii) improve identification of topic-relevant similarities and differences.  相似文献   

15.
数字图书馆中多媒体数据库信息的检索   总被引:1,自引:0,他引:1  
数字图书馆是一种新概念和新技术,数据类型包括文本、语音、图像、视频等多媒体信息.文章主要介绍了数字图书馆多媒体数据库信息中基于内容的图像、音频、视频检索技术,并阐述多媒体数据库信息检索技术的发展趋势.  相似文献   

16.
The article examines academics attitude towards e-journal use. A well structured questionnaire was designed to elicit the opinions of the users. Responses were gathered from 542 faculty members of five universities. The results of the study showed that the characteristics that affect the choice of e-format over print in order of preference are ‘faster access’, ‘available from desktop’, ‘convenience’, ‘remote access’, ‘timeliness’, ‘available at all times’, ‘hyperlinks’, ‘multi-user access’, ‘currency of information’, ‘inclusion of audio–video material’, ‘interactivity’ and ‘animation of graphics’. The characteristics that affect the choice of print format over electronic in order of priority are ‘physical comfort’, ‘portability’, ‘ability to underline’, ‘familiarity with format’ and ‘ability to browse’. A majority of the teachers use e-journals for ‘research’, ‘teaching’, ‘writing reports’, ‘current awareness’, ‘background research’ and ‘internal/external presentations’. The problems faced in accessing e-journals are ‘access difficulties’, ‘discomfort of reading from computer screen’, ‘lack of IT knowledge/skill’, ‘information overload’ etc. A majority of the teachers want future e-journals to have features such as ‘full text index of every article’, ‘searching capability across a wide range of journal articles’, ‘searching capability within an article, display relationship between a wide range of works’ and ‘links to multimedia files’, etc. On the basis of the findings, some suggestions are made for maximizing the use of e-journals.  相似文献   

17.
基于知识模型的文本信息检索可视化研究   总被引:5,自引:0,他引:5  
信息检索可视化是指把文献信息、用户提问、各种情报检索模型以及利用检索模型进行信息检索过程中不可见的内部语义关系转换成图形,在一个二维或三维的可视化空间中显示出来,并向用户提供信息检索的技术。基于知识模型的文本信息检索可视化,是利用信息资源的元数据信息来进行可视化检索。图1。参考文献29。  相似文献   

18.
[目的/意义]对比文件是用以判断专利能否授权或无效的重要文件,针对传统信息检索方法的不足且鲜有利用机器学习方法研究对比文件检索的问题,在引入对比文件信息的基础上,构建专利相关性判定模型.[方法/过程]以专利无效判决书中的目标专利与对比文件为数据集进行实验,提取文本相似度、共现词汇和共词数量特征信息,利用GBDT模型将对...  相似文献   

19.
从文献检索到信息检索最大的变化 :一是由文献单元向信息单元为基础的组织方式的改变 ;二是由手工分类、主题标引、著者标引经过机器的主题词、自由词抽取、标引发展到全文标引乃至超文本检索。网络技术、超媒体技术和智能技术等是促其变化的关键。作为一门学科的教学必须创建以CAI课件为主导的实践教学方法和建立信息检索课程的基本框架体系。参考文献 4。  相似文献   

20.
The article examines academics attitude towards e-journal use. A well structured questionnaire was designed to elicit the opinions of the users. Responses were gathered from 542 faculty members of five universities. The results of the study showed that the characteristics that affect the choice of e-format over print in order of preference are ‘faster access’, ‘available from desktop’, ‘convenience’, ‘remote access’, ‘timeliness’, ‘available at all times’, ‘hyperlinks’, ‘multi-user access’, ‘currency of information’, ‘inclusion of audio–video material’, ‘interactivity’ and ‘animation of graphics’. The characteristics that affect the choice of print format over electronic in order of priority are ‘physical comfort’, ‘portability’, ‘ability to underline’, ‘familiarity with format’ and ‘ability to browse’. A majority of the teachers use e-journals for ‘research’, ‘teaching’, ‘writing reports’, ‘current awareness’, ‘background research’ and ‘internal/external presentations’. The problems faced in accessing e-journals are ‘access difficulties’, ‘discomfort of reading from computer screen’, ‘lack of IT knowledge/skill’, ‘information overload’ etc. A majority of the teachers want future e-journals to have features such as ‘full text index of every article’, ‘searching capability across a wide range of journal articles’, ‘searching capability within an article, display relationship between a wide range of works’ and ‘links to multimedia files’, etc. On the basis of the findings, some suggestions are made for maximizing the use of e-journals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号