首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods.  相似文献   

2.
Ontologies are frequently used in information retrieval being their main applications the expansion of queries, semantic indexing of documents and the organization of search results. Ontologies provide lexical items, allow conceptual normalization and provide different types of relations. However, the optimization of an ontology to perform information retrieval tasks is still unclear. In this paper, we use an ontology query model to analyze the usefulness of ontologies in effectively performing document searches. Moreover, we propose an algorithm to refine ontologies for information retrieval tasks with preliminary positive results.  相似文献   

3.
This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.  相似文献   

4.
The documents retrieved by a web search are useful if the information they contain contributes to some task or information need. To measure search result utility, studies have typically focused on perceived usefulness rather than on actual information use. We investigate the actual usefulness of search results—as indicated by their use as sources in an extensive writing task—and the factors that make a writer successful at retrieving useful sources. Our data comprise 150 essays written by 12 writers whose querying, clicking and writing activities were recorded. By tracking authors’ text reuse behavior, we quantify the search results’ contribution to the task more accurately than before. We model the overall utility of the search results retrieved throughout the writing process using path analysis, and compare a binary utility model (Reuse Events) to one that quantifies a degree of utility (Reuse Amount). The Reuse Events model has greater explanatory power (63% vs. 48%); in both models, the number of clicks is by far the strongest predictor of useful results—with β-coefficients up to 0.7—while dwell time has a negative effect (β between −0.14 and −0.21). As a conclusion, we propose a new measure of search result usefulness based on a source’s contribution to an evolving text. Our findings are valid for tasks where text reuse is allowed, but also have implications on designing indicators of search result usefulness for general writing tasks.  相似文献   

5.
Documents circulating in paper form are increasingly being substituted by its electronic equivalent in the modern office today so that any stored document can be retrieved whenever needed later on. The office worker is already burdened with information overload, so effective and efficient retrieval facilities become an important factor affecting worker productivity. This paper first reviews the features of current document management systems with varying facilities to manage, store and retrieve either reference to documents or whole documents. Information retrieval databases, groupware products and workflow management systems are presented as developments to handle different needs, together with the underlying concepts of knowledge management. The two problems of worker finiteness and worker ignorance remain outstanding, as they are only partially addressed by the above-mentioned systems. The solution lies in a shift away from pull technology where the user has to actively initiate the request for information towards push technology, where available information is automatically delivered without user intervention. Intelligent information retrieval agents are presented as a solution together with a marketing scenario of how they can be introduced.  相似文献   

6.
Lexical cohesion is a property of text, achieved through lexical-semantic relations between words in text. Most information retrieval systems make use of lexical relations in text only to a limited extent. In this paper we empirically investigate whether the degree of lexical cohesion between the contexts of query terms’ occurrences in a document is related to its relevance to the query. Lexical cohesion between distinct query terms in a document is estimated on the basis of the lexical-semantic relations (repetition, synonymy, hyponymy and sibling) that exist between there collocates – words that co-occur with them in the same windows of text. Experiments suggest significant differences between the lexical cohesion in relevant and non-relevant document sets exist. A document ranking method based on lexical cohesion shows some performance improvements.  相似文献   

7.
In this paper, the scalability and quality of the contextual document clustering (CDC) approach is demonstrated for large data-sets using the whole Reuters Corpus Volume 1 (RCV1) collection. CDC is a form of distributional clustering, which automatically discovers contexts of narrow scope within a document corpus. These contexts act as attractors for clustering documents that are semantically related to each other. Once clustered, the documents are organized into a minimum spanning tree so that the topical similarity of adjacent documents within this structure can be assessed. The pre-defined categories from three different document category sets are used to assess the quality of CDC in terms of its ability to group and structure semantically related documents given the contexts. Quality is evaluated based on two factors, the category overlap between adjacent documents within a cluster, and how well a representative document categorizes all the other documents within a cluster. As the RCV1 collection was collated in a time ordered fashion, it was possible to assess the stability of clusters formed from documents within one time interval when presented with new unseen documents at subsequent time intervals. We demonstrate that CDC is a powerful and scaleable technique with the ability to create stable clusters of high quality. Additionally, to our knowledge this is the first time that a collection as large as RCV1 has been analyzed in its entirety using a static clustering approach.  相似文献   

8.
Professional work is often regulated by procedures that shape the information seeking involved in performing a task. Yet, research on professionals’ information seeking tends to bypass procedures and depict information seeking as an informal activity. In this study we analyze two healthcare tasks governed by procedures: triage and timeouts. While information seeking is central to both procedures, we find that the coordinating nurses rarely engage in information seeking when they triage patients. Inversely, the physicians value convening for timeouts to seek information. To explain these findings we distinguish between junior and expert professionals and between uncertain and equivocal tasks. The triage procedure specifies which information to retrieve but expert professionals such as the coordinating nurses tend to perform triage, which is an uncertain task, by holistic pattern recognition rather than information seeking. For timeouts, which target an equivocal task, the procedure facilitates information seeking by creating a space for open-ended collaborative reflection. Both junior and expert physicians temporarily suspend patient treatment in favor of this opportunity to reflect on their actions, though partly for different reasons. We discuss implications for models of professionals’ information seeking.  相似文献   

9.
Effective knowledge management in a knowledge-intensive environment can place heavy demands on the information filtering (IF) strategies used to model workers’ long-term task-needs. Because of the growing complexity of knowledge-intensive work tasks, a profiling technique is needed to deliver task-relevant documents to workers. In this study, we propose an IF technique with task-stage identification that provides effective codification-based support throughout the execution of a task. Task-needs pattern similarity analysis based on a correlation value is used to identify a worker’s task-stage (the pre-focus, focus formulation, or post-focus task-stage). The identified task-stage is then incorporated into a profile adaptation process to generate the worker’s current task profile. The results of a pilot study conducted in a research institute confirm that there is a low or negative correlation between search sessions and transactions in the pre-focus task-stage, whereas there is at least a moderate correlation between search sessions/transactions in the post-focus stage. Compared with the traditional IF technique, the proposed IF technique with task-stage identification achieves, on average, a 19.49% improvement in task-relevant document support. The results confirm the effectiveness of the proposed method for knowledge-intensive work tasks.  相似文献   

10.
The nature of the task that leads a person to engage in information interaction, as well as of information seeking and searching tasks, have been shown to influence individuals’ information behavior. Classifying tasks in a domain has been viewed as a departure point of studies on the relationship between tasks and human information behavior. However, previous task classification schemes either classify tasks with respect to the requirements of specific studies or merely classify a certain category of task. Such approaches do not lead to a holistic picture of task since a task involves different aspects. Therefore, the present study aims to develop a faceted classification of task, which can incorporate work tasks and information search tasks into the same classification scheme and characterize tasks in such a way as to help people make predictions of information behavior. For this purpose, previous task classification schemes and their underlying facets are reviewed and discussed. Analysis identifies essential facets and categorizes them into Generic facets of task and Common attributes of task. Generic facets of task include Source of task, Task doer, Time, Action, Product, and Goal. Common attributes of task includes Task characteristics and User’s perception of task. Corresponding sub-facets and values are identified as well. In this fashion, a faceted classification of task is established which could be used to describe users’ work tasks and information search tasks. This faceted classification provides a framework to further explore the relationships among work tasks, search tasks, and interactive information retrieval and advance adaptive IR systems design.  相似文献   

11.
With the advent of various services and applications of Semantic Web, semantic annotation has emerged as an important research topic. The application of semantically annotated ontology had been evident in numerous information processing and retrieval tasks. One of such tasks is utilizing the semantically annotated ontology in product design which is able to suggest many important applications that are critical to aid various design related tasks. However, ontology development in design engineering remains a time consuming and tedious task that demands considerable human efforts. In the context of product family design, management of different product information that features efficient indexing, update, navigation, search and retrieval across product families is both desirable and challenging. For instance, an efficient way of retrieving timely information on product family can be useful for tasks such as product family redesign and new product variant derivation when requirements change. However, the current research and application of information search and navigation in product family is mostly limited to its structural aspect which is insufficient to handle advanced information search especially when the query targets at multiple aspects of a product. This paper attempts to address this problem by proposing an information search and retrieval framework based on the semantically annotated multi-facet product family ontology. Particularly, we propose a document profile (DP) model to suggest semantic tags for annotation purpose. Using a case study of digital camera families, we illustrate how the faceted search and retrieval of product information can be accomplished. We also exemplify how we can derive new product variants based on the designer’s query of requirements via the faceted search and retrieval of product family information. Lastly, in order to highlight the value of our current work, we briefly discuss some further research and applications in design decision support, e.g. commonality analysis and variety comparison, based on the semantically annotated multi-facet product family ontology.  相似文献   

12.
Contextual document clustering is a novel approach which uses information theoretic measures to cluster semantically related documents bound together by an implicit set of concepts or themes of narrow specificity. It facilitates cluster-based retrieval by assessing the similarity between a query and the cluster themes’ probability distribution. In this paper, we assess a relevance feedback mechanism, based on query refinement, that modifies the query’s probability distribution using a small number of documents that have been judged relevant to the query. We demonstrate that by providing only one relevance judgment, a performance improvement of 33% was obtained.  相似文献   

13.
信息环境诸因素对文献信息标引加工的影响   总被引:2,自引:0,他引:2  
仇滨  赵爱民 《情报科学》2000,18(5):405-407,411
本文结合本馆实际,分析了构成信息环境的几个主要因素,指出了这些因素的变化对图书馆文献信息标引加工产生的影响,并阐述了自己的看法。  相似文献   

14.
The retrieval effectiveness of the underlying document search component of an expert search engine can have an important impact on the effectiveness of the generated expert search results. In this large-scale study, we perform novel experiments in the context of the document search and expert search tasks of the TREC Enterprise track, to measure the influence that the performance of the document ranking has on the ranking of candidate experts. In particular, our experiments show that while the expert search system performance is related to the relevance of the retrieved documents, surprisingly, it is not always the case that increasing document search effectiveness causes an increase in expert search performance. Moreover, we simulate document rankings designed with expert search performance in mind and, through a failure analysis, show why even a perfect document ranking may not result in a perfect ranking of candidate experts.  相似文献   

15.
信息资源管理的历史沿革--从信息源管理到信息资源管理   总被引:14,自引:0,他引:14  
马费成 《情报科学》1998,16(3):251-256
文章根据人类的信息过程,将人类对知识信息的管理划分为传统管理阶段、信息管理阶段和信息资源管理阶段,分析了不同阶段的特征、任务、目标和管理方式,阐明了当代信息资源管理模式产生的必然性和必要性.最后指出了面向高速信息网络信息资源管理的重要意义.  相似文献   

16.
Access to information via handheld devices supports decision making away from one’s computer. However, limitations include small screens and constrained wireless bandwidth. We present a summarization method that transforms online content for delivery to small devices. Unlike previous algorithms, ours assumes nothing about document formatting, and induces a hierarchical structure based on the relative importance of sentences within the document. As compared to delivering full documents, the method reduces the bytes transferred by half. An experiment also demonstrates that when given hierarchical summaries, users are no less accurate in answering questions about the documents.  相似文献   

17.
Scholars are reading more journal articles than ever, so it is important that they focus on the relevant text within the articles they read. To support this goal, this study explores enhancements to a journal reading system by applying the idea of the functional unit, the smallest information unit with a distinct function within four major components of scholarly journal articles—Introduction, Methods, Results and Discussion. This study examined a set of functional units and their associations with scholarly journal article use tasks through literature analysis and validation surveys. Forty-one typical functional units were found in psychology journal articles, with varying relevance to five tasks requiring use of information in journal articles. The relationships among sets of functional units for particular tasks were also identified. A taxonomy was developed incorporating the relationships between functional units and information use tasks, which can be used to inform system design. Based on this taxonomy, a prototype journal reading environment signalling functional units was designed and implemented for testing.  相似文献   

18.
Nowadays, new ways of managing and accessing to health-care information are continuously appearing. Web-based Personal Health Records (web PHRs) have the potential to make data about health-care available to clinicians, researchers and students in different medical contexts and applications. Therefore, the amount of web PHRs accessible through Internet has grown enormously and as a result health-care professionals are currently burdened with more and more data. It’s probable that these data, unfortunately, have not always the adequate levels of quality, making that their work cannot always be as successful as expected. As a way of alleviating this fact, the present work is focused on improving the document filtering results in the context of web PHRs management. To achieve this goal, a new kind of document filtering model is proposed. This model is based on fuzzy prototypes which are defined by means of conceptual prototypes. These prototypes are obtained by using a data quality analysis of documents. This analysis guarantees that filtered information will be relevant enough for the information user. The complete model provides an efficient strategy of document filtering that can be very useful when it is necessary to deal with a constant flow of new information.  相似文献   

19.
文档聚类在Web搜索结果中的应用研究   总被引:1,自引:0,他引:1  
随着互联网的迅猛发展,信息爆炸式增长,产生了信息过载,而在相当程度上,搜索是面临信息过载的唯一选择。但是,现在的搜索引擎缺陷也很明显:一是搜索结果数量庞大;二是搜索结果的线性排列。该文提出采用文档聚类的方法组织搜索引擎的结果,从一定程度上解决了上面的问题。  相似文献   

20.
基于HTMLParser对网页进行解析,可抽取标签间的Link、image、meta和title等信息。使用HTMLParser来提取Web文献中的题名、关键字、摘要、作者、来源等信息,清洗后存入MySql数据库当中,以备后续数据挖掘使用。对此进行了论述。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号