首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on ‘corrected AIC’ to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17–51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.  相似文献   

2.
Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a ‘good’ summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%.  相似文献   

3.
Automatic text summarization attempts to provide an effective solution to today’s unprecedented growth of textual data. This paper proposes an innovative graph-based text summarization framework for generic single and multi document summarization. The summarizer benefits from two well-established text semantic representation techniques; Semantic Role Labelling (SRL) and Explicit Semantic Analysis (ESA) as well as the constantly evolving collective human knowledge in Wikipedia. The SRL is used to achieve sentence semantic parsing whose word tokens are represented as a vector of weighted Wikipedia concepts using ESA method. The essence of the developed framework is to construct a unique concept graph representation underpinned by semantic role-based multi-node (under sentence level) vertices for summarization. We have empirically evaluated the summarization system using the standard publicly available dataset from Document Understanding Conference 2002 (DUC 2002). Experimental results indicate that the proposed summarizer outperforms all state-of-the-art related comparators in the single document summarization based on the ROUGE-1 and ROUGE-2 measures, while also ranking second in the ROUGE-1 and ROUGE-SU4 scores for the multi-document summarization. On the other hand, the testing also demonstrates the scalability of the system, i.e., varying the evaluation data size is shown to have little impact on the summarizer performance, particularly for the single document summarization task. In a nutshell, the findings demonstrate the power of the role-based and vectorial semantic representation when combined with the crowd-sourced knowledge base in Wikipedia.  相似文献   

4.
In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods.  相似文献   

5.
We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of the ROUGEeval package.  相似文献   

6.
A new approach to narrative abstractive summarization (NATSUM) is presented in this paper. NATSUM is centered on generating a narrative chronologically ordered summary about a target entity from several news documents related to the same topic. To achieve this, first, our system creates a cross-document timeline where a time point contains all the event mentions that refer to the same event. This timeline is enriched with all the arguments of the events that are extracted from different documents. Secondly, using natural language generation techniques, one sentence for each event is produced using the arguments involved in the event. Specifically, a hybrid surface realization approach is used, based on over-generation and ranking techniques. The evaluation demonstrates that NATSUM performed better than extractive summarization approaches and competitive abstractive baselines, improving the F1-measure at least by 50%, when a real scenario is simulated.  相似文献   

7.
The rise in the amount of textual resources available on the Internet has created the need for tools of automatic document summarization. The main challenges of query-oriented extractive summarization are (1) to identify the topics of the documents and (2) to recover query-relevant sentences of the documents that together cover these topics. Existing graph- or hypergraph-based summarizers use graph-based ranking algorithms to produce individual scores of relevance for the sentences. Hence, these systems fail to measure the topics jointly covered by the sentences forming the summary, which tends to produce redundant summaries. To address the issue of selecting non-redundant sentences jointly covering the main query-relevant topics of a corpus, we propose a new method using the powerful theory of hypergraph transversals. First, we introduce a new topic model based on the semantic clustering of terms in order to discover the topics present in a corpus. Second, these topics are modeled as the hyperedges of a hypergraph in which the nodes are the sentences. A summary is then produced by generating a transversal of nodes in the hypergraph. Algorithms based on the theory of submodular functions are proposed to generate the transversals and to build the summaries. The proposed summarizer outperforms existing graph- or hypergraph-based summarizers by at least 6% of ROUGE-SU4 F-measure on DUC 2007 dataset. It is moreover cheaper than existing hypergraph-based summarizers in terms of computational time complexity.  相似文献   

8.
In practice, it is almost impossible to directly add a controller on each node in a complex dynamical network due to the high control cost and the difficulty of practical implementation, especially for large-scale networks. In order to address this issue, a pinning control strategy is introduced as a feasible alternative. The objective of this paper is first to recall some recent advancements in global pinning synchronization of complex networks with general communication topologies. A systematic review is presented thoroughly from the following aspects, including modeling, network topologies, control methodologies, theoretical analysis methods, and pinned node selection and localization schemes (pinning strategies). Fully distributed adaptive laws are proposed subsequently for the coupling strength as well as pinning control gains, and sufficient conditions are obtained to synchronize and pin a general complex network to a preassigned trajectory. Moreover, some open problems and future works in the field are also discussed.  相似文献   

9.
Clinical trials that terminate prematurely without reaching conclusions raise financial, ethical, and scientific concerns. Scientific studies in all disciplines are initiated with extensive planning and deliberation, often by a team of highly trained scientists. To assure that the quality, integrity, and feasibility of funded research projects meet the required standards, research-funding agencies such as the National Institute of Health and the National Science Foundation, pass proposed research plans through a rigorous peer review process before making funding decisions. Yet, some study proposals successfully pass through all the rigorous scrutiny of the scientific peer review process, but the proposed investigations end up being terminated before yielding results. This study demonstrates an algorithm that quantifies the risk associated with a study being terminated based on the analysis of patterns in the language used to describe the study prior to its implementation. To quantify the risk of termination, we use data from the clinicialTrials.gov repository, from which we extracted structured data that flagged study characteristics, and unstructured text data that described the study goals, objectives and methods in a standard narrative form. We propose an algorithm to extract distinctive words from this unstructured text data that are most frequently used to describe trials that were completed successfully vs. those that were terminated. Binary variables indicating the presence of these distinctive words in trial proposals are used as input in a random forest, along with standard structured data fields. In this paper, we demonstrate that this combined modeling approach yields robust predictive probabilities in terms of both sensitivity (0.56) and specificity (0.71), relative to a model that utilizes the structured data alone (sensitivity = 0.03, specificity = 0.97). These predictive probabilities can be applied to make judgements about a trial's feasibility using information that is available before any funding is granted.  相似文献   

10.
11.
Creativity is considered a human characteristic; creative endeavors, including automatic story generation, have been a major challenge for artificial intelligences. To understand how humans create and evaluate stories, we (1) construct a story dataset and (2) analyze the relationship between emotions and story interestingness. Given that understanding how to move readers emotionally is a crucial creative technique, we focus on the role of emotions in evaluating reader satisfaction. Although conventional research has highlighted emotions read from a text, we hypothesize that readers’ emotions do not necessarily coincide with those of the characters. The story dataset created for this study describes situations surrounding two characters. Crowdsourced volunteers label stories with the emotions of the two characters and those of readers; we then empirically analyze the relationship between emotions and interestingness. The results show that a story’s score has a stronger relationship to the readers’ emotions than the characters’ emotions.  相似文献   

12.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

13.
Tables in documents are a widely-available and rich source of information, but not yet well-utilised computationally because of the difficulty in automatically extracting their structure and data content. There has been a plethora of systems proposed to solve the problem, but current methods present low usability and accuracy and lack precision in detecting data from diverse layouts. We propose a component-based design and implementation of table processing concepts which can offer flexibility and re-usability as well as high performance on a wide range of table types. In this paper, we describe a system named TEXUS which is a fully automated table processing system that takes a PDF document and detects tables in a layout independent manner. We introduce TEXUS’s own table processing specific document model and the two-phased processing pipeline design. Through an extensive evaluation on a dataset comprised of complex financial tables, we show the performance of the system on different table types.  相似文献   

14.
Handwriter identification aims to simplify the task of forensic experts by providing them with semi-automated tools in order to enable them to narrow down the search to determine the final identification of an unknown handwritten sample. An identification algorithm aims to produce a list of predicted writers of the unknown handwritten sample ranked in terms of confidence measure metrics for use by the forensic expert will make the final decision.Most existing handwriter identification systems use either statistical or model-based approaches. To further improve the performances this paper proposes to deploy a combination of both approaches using Oriented Basic Image features and the concept of graphemes codebook. To reduce the resulting high dimensionality of the feature vector a Kernel Principal Component Analysis has been used. To gauge the effectiveness of the proposed method a performance analysis, using IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting, has been carried out. The results obtained achieved an accuracy of 96% thus demonstrating its superiority when compared against similar techniques.  相似文献   

15.
Intellectual Property Rights (IPRs) are widely acknowledged to be of central importance to research universities worldwide. This article assesses the patent portfolio of Universidade Federal Fluminense (UFF), which is located in State of Rio de Janeiro in Brazil and currently consists of the largest Brazilian research university in terms of the total number of enrolled undergraduate students. More specifically, this study provides robust empirical evidence pointing out that a major university in a developing country is strengthening its patent portfolio by increasing the number of patent applications. In addition, it was also possible to observe that UFF's patent portfolio is dominated by emerging technologies addressing human necessities, such as medical, hygiene, or veterinary products.  相似文献   

16.
Since its introduction, television has been the main channel of investment for advertisements in order to influence customers purchase behavior. Many have attributed the mere exposure effect as the source of influence in purchase intention and purchase decision; however, most of the studies of television advertisement effects are not only outdated, but their sample size is questionable and their environments do not reflect reality. With the advent of the internet, social media and new information technologies, many recent studies focus on the effects of online advertisement, meanwhile the investment in television advertisement still has not declined. In response to this, we applied machine learning algorithms SVM and XGBoost, as well as Logistic Regression, to construct a number of prediction models based on at-home advertisement exposure time and demographic data, examining the predictability of Actual Purchase and Purchase Intention behaviors of 3000 customers across 36 different products during the span of 3 months. If we were able to predict purchase behaviors with models based on exposure time more reliably than with models based on demographic data, the obvious strategy for businesses would be to increase the number of adverts. On the other hand, if models based on exposure time had unreliable predictability in contrast to models based on demographic data, doubts would surface about the effectiveness of the hard investment in television advertising. Based on our results, we found that models based on advert exposure time were consistently low in their predictability in comparison with models based on demographic data only, and with models based on both demographic data and exposure time data. We also found that there was not a statistically significant difference between these last two kinds of models. This suggests that advert exposure time has little to no effect in the short-term in increasing positive actual purchase behavior.  相似文献   

17.
In addressing persistent gaps in existing theories, recent advances in data-driven research approaches offer novel perspectives and exciting insights across a spectrum of scientific fields concerned with technological change and the socio-economic impact thereof. The present investigation suggests a novel approach to identify and analyze the evolution of technology sectors, in this case, information and communications technology (ICT), considering international collaboration patterns and knowledge flows and spillovers via information inputs derived from patent documents.The objective is to utilize and explore information regarding inventors’ geo-location, technology sector classifications, and patent citation records to construct various types of networks. This, in turn, will open up avenues to discover the nature of evolutionary pathways in ICT trajectories and will also provide evidence of how the overall ICT knowledge space, as well as directional knowledge flows within the ICT space, have evolved differently. It is expected that this data-driven inquiry will deliver intuitive results for decision makers seeking evidence for future resource allocation and who are interested in identifying well-suited collaborators for the development of potential next-generation technologies. Further, it will equip researchers in technology management, economic geography, or similar fields with a systematic approach to analyze evolutionary pathways of technological advancements and further enable exploitation and development of new theories regarding technological change and its socio-economic consequences.  相似文献   

18.
This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used.Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.  相似文献   

19.
Automatic word spacing in Korean remains a significant task in natural language processing owing to the extremely complex word spacing rules involved. Most previous models remove all spaces in input sentences and insert new spaces in the modified input sentences. If input sentences include only a small number of spacing errors, the previous models often return sentences with even more spacing errors than the input sentences because they remove the correct spaces that were typed intentionally by the users. To reduce this problem, we propose an automatic word spacing model based on a neural network that effectively uses word spacing information from input sentences. The proposed model comprises a space insertion layer and a spacing-error correction layer. Using an approach similar to previous models, the space insertion layer inserts word spaces into input sentences from which all spaces have been removed. The spacing error correction layer post-corrects the spacing errors of the space insertion model using word spacing typed by users. Because the two layers are tightly connected in the proposed model, the backpropagation flows are not blocked. As a result, the space insertion and error correction are performed simultaneously. In experiments, the proposed model outperformed all compared models on all measures on the same test data. In addition, it exhibited reliable performance (word-unit F1-measures of 94.17%~97.87%) regardless of how many word spacing errors were present in the input sentences.  相似文献   

20.
Dealing with unstructured issues, such as the transition to a sustainable energy system, requires stakeholder participation. A stakeholder dialogue should enhance learning about a problem and its potential solutions. However, not in any form will a stakeholder dialogue be effective. Part and parcel to the development of methodologies for stakeholder dialogue is the evaluation of those methodologies. The aim of this paper is to show how a methodology for stakeholder dialogue can be evaluated in terms of learning. This paper suggests three criteria for the evaluation of learning in stakeholder dialogue: (1) an operationalizable definition of the desired effect of dialogue, (2) the inclusion of a reference situation or control condition, and (3) the use of congruent and replicable evaluation methods. Q methodology was used in a quasi-experimental design to analyse to what extent learning took place in a stakeholder dialogue on energy options from biomass in the Netherlands. It is concluded that the dialogue had a significant effect: the dialogue increased participants’ understanding of the diversity of perspectives. This effect is traced back to particular methodological and design elements in the dialogue.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号