首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
The results from a series of three experiments that used Text Retrieval Conference (TREC) data and TREC search topics are compared. These experiments each involved three novel user interfaces (one per experiment). User interfaces that made it easier for users to view text were found to improve recall in all three experiments. A distinction was found between a cluster of subjects (a majority of whom were search experts) who tended to read fewer documents more carefully (readers, or exclusives) and subjects who skimmed through more documents without reading them as carefully (skimmers, or inclusives). Skimmers were found to have significantly better recall overall. A major outcome from our experiments at TREC and with the TREC data, is that hypertext interfaces to information retrieval (IR) tasks tend to increase recall. Our interpretation of this pattern of results across the three experiments is that increased interaction with the text (more pages viewed) generally improves recall. Findings from one of the experiments indicated that viewing a greater diversity of text on a single screen (i.e., not just more text per se, but more articles available at once) may also improve recall. In an experiment where a traditional (type-in) query interface was contrasted with a condition where queries were marked up on the text, the improvement in recall due to viewing more text was more pronounced with search novices. Our results demonstrate that markup and hypertext interfaces to text retrieval systems can benefit recall and can also benefit novices. The challenge now will be to find modified versions of hypertext interfaces that can improve precision, as well as recall and that can work with users who prefer to use different types of search strategy or have different types of training and experience.  相似文献   

回应型管理模式研究   总被引:1,自引:0,他引:1  
在推进高等教育管理这个问题上,已经有很多的论著,几乎都提出要转变政府职能,但政府的职能具体要怎样转,转向什么方向,几乎很少有人涉及,主要从高等教育管理职能转变的具体化出发,引出高等教育管理模式的创新。  相似文献   

The Web is an enormous set of documents connected through hypertext links created by authors of Web pages. These links have been studied quantitatively, but little has been done so far in order to understand why these links are created. As a first step towards a better understanding, we propose a classification of link types in academic environments on the Web. The classification is multi-faceted and involves different aspects of the source and the target page, the link area and the relationship between the source and the target. Such classification provides an insight into the diverse uses of hypertext links on the Web, and has implications for browsing and ranking in IR systems by differentiating between different types of links. As a case study we classified a sample of links between sites of Israeli academic institutions.  相似文献   

超文本系统信息结构组成元素—链的分析   总被引:3,自引:0,他引:3  
张海涛  刘甲学  宋川 《情报科学》2002,20(4):380-382
链是超文本的灵魂。文章通过对超文本系统信息结构组成元素--链的分析,探讨了超文本系统信息结构中所应用的各种不同类型的链接。  相似文献   

Addressed here is the issue of ‘topic analysis’ which is used to determine a text’s topic structure, a representation indicating what topics are included in a text and how those topics change within the text. Topic analysis consists of two main tasks: topic identification and text segmentation. While topic analysis would be extremely useful in a variety of text processing applications, no previous study has so far sufficiently addressed it. A statistical learning approach to the issue is proposed in this paper. More specifically, topics here are represented by means of word clusters, and a finite mixture model, referred to as a stochastic topic model (STM), is employed to represent a word distribution within a text. In topic analysis, a given text is segmented by detecting significant differences between STMs, and topics are identified by means of estimation of STMs. Experimental results indicate that the proposed method significantly outperforms methods that combine existing techniques.  相似文献   

A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user’s information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining.  相似文献   

Although there is an increasingly number of research about the design and use of conversational agents, it is still difficult for conversational agents to completely replace human service. Therefore, more and more companies have adopted human-AI collaborative systems to deliver customer service. It is important to understand how people obtain information from human-AI collaborative conversations. While the existing work relies on self-reported methods to elicit qualitative feedback from users, we have concluded a categorization system for user messages in human-AI collaborative conversations after a thorough examination of a real-world customer service log, which could objectively reflect the user's information needs. We categorize user messages into five categories and 15 specific types related to three high-level intentions. Two annotators independently classified the same set of 1,478 user messages from 300 conversations and reached a moderate consistency. We summarize and report the characteristics of different message types and compare their usage in sessions with only human, AI, or both representatives. Our results show that different message types vary significantly in usage frequency, length, and text similarities with other messages in a session. Also, the frequency of using different message types in our dataset seems consistent over sessions with different types of representatives. But we also observed some significant differences in a few specific message types across the sessions with different representatives. Our results are used to suggest some areas for improvement and future work in human-AI collaborative conversational systems.  相似文献   

在现代外语电化教学中,为实现文本、图片、声音、影像、动画、链接、数据库等输出功能,必须借助现代计算机技术的相关语言测试分析、语料库建设、字典词典、机器翻译、语音识别及可视语音合成等教学软件。计算机辅助教学是外语教学中的重要和有效手段,并且始终在不断地完善和发展。本文阐述了现代外语电化教学的基本内容、目的以及相应的计算机教学主要软件控件。  相似文献   

Substantial real cases can be formed in current online medical platforms, constituting potentially rich commercial medical value. In order to obtain the value, it is necessary to mine the preference for user perceived cancer risk in online medical platforms. However, user preference in the platforms varies with medical inquiry text environments, and a user's disease-specific online medical inquiry text environment would also affect his/her behavioral decisions in real time. In this sense, considering the inner relations between different contexts and user preferences under different diseases-specific inquiry text environments and integrating early cancer texts will facilitate the exploration on the law of preference for user perceived cancer risk. Therefore, in this paper, the matrix decomposition and Labeled-LDA models are expanded to propose a context-based method to access the preference for user perceived cancer risk. Firstly, modeling on the relationship between user preferences and information in multi-dimensional context is carried out, and the universal method of integrating multi-dimensional contextual information with user preferences is analyzed. Moreover, more accurate user references were obtained under the multi-dimensional text space and multi-dimensional disease space. Secondly, the similarity relationships between all disease-specific online medical inquiries and early cancer texts are used to obtain user perceived cancer risk, thus knowing the online medical inquiry texts of user cognized diseases and perceiving the cancer risk. Lastly, by accessing the user preferences under different disease topics and user perceived cancer risk in multi-dimensional contexts, the preference for user perceived cancer risk is obtained in a more accurate way. Based on the large-volume real-world dataset, the relationship between each context and user preferences is assessed, and it is concluded that the method proposed in this paper is superior to MF-LDA method in obtaining the preference for user perceived cancer risk. This indicates that the proposed method not only expresses user perceived risk, but also clearly expresses the characteristics of user's preference. Furthermore, it is verified that the integration of context with early cancer text and the establishment of user preference model are feasible and effective.  相似文献   

Multimedia objects can be retrieved using their context that can be for instance the text surrounding them in documents. This text may be either near or far from the searched objects. Our goal in this paper is to study the impact, in term of effectiveness, of text position relatively to searched objects. The multimedia objects we consider are described in structured documents such as XML ones. The document structure is therefore exploited to provide this text position in documents. Although structural information has been shown to be an effective source of evidence in textual information retrieval, only a few works investigated its interest in multimedia retrieval. More precisely, the task we are interested in this paper is to retrieve multimedia fragments (i.e. XML elements having at least one multimedia object). Our general approach is built on two steps: we first retrieve XML elements containing multimedia objects, and we then explore the surrounding information to retrieve relevant multimedia fragments. In both cases, we study the impact of the surrounding information using the documents structure.  相似文献   

依据安徽省2008—2017年间的188份自主创新税收优惠政策文本,从政策发文年度、政策发文区域、政策对象、政策优惠税种、政策优惠措施以及政策文本形式六个维度进行量化分析。在揭示安徽省自主创新税收优惠政策发展历程的同时,也揭露出5个问题,政策发文阶段性特征明显;区域间政策发文量差异较大;政策对象向工业产业、科技创新、众创方面倾斜明显;税收优惠政策税种、措施设计不合理;具体操作性政策文本缺乏,并针对上述问题提出相应对策。  相似文献   

基于陕西省省、市两级对比视角,采用内容分析法从总体数量、文本类型、颁布主体以及政策工具4个维度对陕西省科技金融政策文本进行量化分析.研究表明,目前陕西省科技金融政策体系较为完整,在数量上呈现稳定增长,在文种类型及主题上日渐丰富,省市两级能呈现省级引领、市级学习并赶超的态势,但政策工具结构失衡,政策颁布主体缺乏协作性,因此建议协调政策工具结构,增加需求面政策工具的使用,加强横向与纵向部门的协调性,并积极提升政策从省级传递到市级层面的效率,同时提高政策的实践性及针对性.  相似文献   

Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.  相似文献   

On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people may not contain all the detailed information needed by a professional. Many information retrieval applications, such as search engines, would offer better user experience if they were able to select the text sources that best fit the expertise level of the user. In this article, we propose a novel approach for assessing the difficulty level of a document: our method assesses difficulty for each user separately. The method enables, for instance, offering information in a personalised manner based on the user’s knowledge of different domains. The method is based on the comparison of terms appearing in a document and terms known by the user. We present two ways to collect information about the terminology the user knows: by directly asking the users the difficulty of terms or, as a novel automatic approach, indirectly by analysing texts written by the users. We examine the applicability of the methodology with text documents in the medical domain. The results show that the method is able to distinguish between documents written for lay people and documents written for experts.  相似文献   

丛敬军  阎辉 《情报科学》2003,21(12):1280-1282,1310
迷航问题是超文本信息模型中的主要弊端,解决该问题需要使用超文本数据信息的导航方法。本文从超文本数据信息链接结构的研究出发,解释了数字图书馆超文本数字信息出现迷航的主要原因,并对解决数字图书馆Web页面浏览时出现的迷航问题所采用的导航方法进行了研究。  相似文献   

Broken hypertext links are a frequent problem in the Web. Sometimes the page which a link points to has disappeared forever, but in many other cases the page has simply been moved to another location in the same web site or to another one. In some cases the page besides being moved, is updated, becoming a bit different to the original one but rather similar. In all these cases it can be very useful to have a tool that provides us with pages highly related to the broken link, since we could select the most appropriate one. The relationship between the broken link and its possible linkable pages, can be defined as a function of many factors. In this work we have employed several resources both in the context of the link and in the Web to look for pages related to a broken link. From the resources in the context of a link, we have analyzed several sources of information such as the anchor text, the text surrounding the anchor, the URL and the page containing the link. We have also extracted information about a link from the Web infrastructure such as search engines, Internet archives and social tagging systems. We have combined all of these resources to design a system that recommends pages that can be used to recover the broken link. A novel methodology is presented to evaluate the system without resorting to user judgments, thus increasing the objectivity of the results, and helping to adjust the parameters of the algorithm. We have also compiled a web page collection with true broken links, which has been used to test the full system by humans.  相似文献   

Multi-label text categorization refers to the problem of assigning each document to a subset of categories by means of multi-label learning algorithms. Unlike English and most other languages, the unavailability of Arabic benchmark datasets prevents evaluating multi-label learning algorithms for Arabic text categorization. As a result, only a few recent studies have dealt with multi-label Arabic text categorization on non-benchmark and inaccessible datasets. Therefore, this work aims to promote multi-label Arabic text categorization through (a) introducing “RTAnews”, a new benchmark dataset of multi-label Arabic news articles for text categorization and other supervised learning tasks. The benchmark is publicly available in several formats compatible with the existing multi-label learning tools, such as MEKA and Mulan. (b) Conducting an extensive comparison of most of the well-known multi-label learning algorithms for Arabic text categorization in order to have baseline results and show the effectiveness of these algorithms for Arabic text categorization on RTAnews. The evaluation involves four multi-label transformation-based algorithms: Binary Relevance, Classifier Chains, Calibrated Ranking by Pairwise Comparison and Label Powerset, with three base learners (Support Vector Machine, k-Nearest-Neighbors and Random Forest); and four adaptation-based algorithms (Multi-label kNN, Instance-Based Learning by Logistic Regression Multi-label, Binary Relevance kNN and RFBoost). The reported baseline results show that both RFBoost and Label Powerset with Support Vector Machine as base learner outperformed other compared algorithms. Results also demonstrated that adaptation-based algorithms are faster than transformation-based algorithms.  相似文献   

《天演论》是中国现代思想史的起点,从此西方的学术思想一直是中国知识界必须面对的核心问题。可是,《天演论》是一本译作;译者严复的学问经历与赫胥黎只有部分交集——海军;他们的终极关怀也只有部分交集——演化论的人文意义。严复自认为在宣传赫胥黎的"自强保种"之道,当年的读书人对《天演论》热烈反应,全冲着严复铸造的那套新语:物竞天择、适者生存,否则亡国灭种。至于赫胥黎究竟说了什么,他的论敌是谁,他的演化理论、政治立场又是什么?一个世纪之后,中国学界的流行见解仍然没有超越严复。本文指出,赫胥黎在《天演论》原著文本中提出了三个"演化"理论:人为选择(创造驯化生物)、自然选择(创造新物种)、文明(创造人文世界)。它们都以"选择"为核心概念,与中国传统政治理想的"选贤与能"呼应。  相似文献   

庞梅 《科研管理》2008,29(4):48-54
摘要:无论是在股权分散的英美国家的公司中,还是在股权集中、存在控股股东的欧亚大陆和东亚国家的公司中,股东权益的保护都是公司制度的主要问题。通过对不同公司治理模式下股东权保护演进历程的比较,法律的作用被证明是至关重要的,法律保护股东权利的有效途径是强化股东参与公司治理。我国《公司法》的修订、股权分置改革的启动,回应了股东权保护的迫切需要,顺应了股东权保护的国际化趋势。  相似文献   

This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting of a collection of fields, or arbitrary tree-structured documents that can be adequately modeled with such a flat structure. The approaches range from trivial modifications of text modeling to more elaborate schemes, specifically tailored to structured documents. We combine these methods with three different text classification algorithms and evaluate their performance on four standard datasets containing different types of semi-structured documents. The best results were obtained with stacking, an approach in which predictions based on different structural components are combined by a meta classifier. A further improvement of this method is achieved by including the flat text model in the final prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号