首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Using lexical chains for keyword extraction   总被引:9,自引:0,他引:9  
Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained.  相似文献   

2.
With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results.  相似文献   

3.
In the information retrieval systems, one of the most important and difficult operations is to extract appropriate keywords from documents. This paper proposes an effective substring search method by extending a pattern matching machine for multi-keyword based on Aho and Corasick (AC) called AC machine. The proposed method enables us to extract keyword candidates as much as possible and to select the suitable keywords for users' purpose at a retrieval stage. This method contains four types of substring search methods (exact, prefix, suffix and proper substring search). This paper also proposes a construction algorithm of the retrieval structure for speeding up the substring search. From the simulation results, it is shown that the retrieval time of the presented method is as fast as the key retrieval method based on the trie.  相似文献   

4.
In this paper we demonstrate a new method for concentrating the set of key-words of a thesaurus. This method is based on a mathematical study that we have carried out into the distribution of characters in a defined natural language.We have built a function f of concentration which generates only a few synonyms. In applying this function to the set of key-words of a thesaurus, we reduce each key-word to four characters without synonymity. (For three characters we have a rate of synonymity of approx. 1/1000th.)A new structure of binary files allows the thesaurus to be contained in a table of less than 700 bytes.  相似文献   

5.
This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations.We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters.A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions.  相似文献   

6.
郝杰 《大众科技》2012,(2):18-20
为了得到带有异常处理结构的切片,文章从传统控制流图到系统依赖图进行了层层改进,给出了构造系统依赖图的算法,并以此为基础得到精确的程序切片。该方法可以处理因为异常结构而引起的数据流和控制流的变化,有助于实现基于异常传播的程序依赖性分析的自动处理。  相似文献   

7.
Abnormal event detection in videos plays an essential role for public security. However, most weakly supervised learning methods ignore the relationship between the complicated spatial correlations and the dynamical trends of temporal pattern in video data. In this paper, we provide a new perspective, i.e., spatial similarity and temporal consistency are adopted to construct Spatio-Temporal Graph-based CNNs (STGCNs). For the feature extraction, we use Inflated 3D (I3D) convolutional networks to extract features which can better capture appearance and motion dynamics in videos. For the spatio graph and temporal graph, each video segment is regarded as a vertex in the graph, and attention mechanism is introduced to allocate attention for each segment. For the spatial-temporal fusion graph, we propose a self-adapting weighting to fuse them. Finally, we build ranking loss and classification loss to improve the robustness of STGCNs. We evaluate the performance of STGCNs on UCF-Crime datasets (total 128 h) and ShanghaiTech datasets (total 317,398 frames) with the AUC score 84.2% and 92.3%, respectively. The experimental results also show the effectiveness and robustness with other evaluation metrics.  相似文献   

8.
This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.  相似文献   

9.
针对XML数据的关键字查询问题,考查了已有的查询技术的优势和不足,提出了基于语义的XML关键字检索算法。对用户输入的关键字进行分类,分为条件关键字和结果关键字。条件关键字只用于限定查询范围,不出现在结果集中。给出了语义相关节点对的概念和判定方法,并提出了基于关键字分类和语义相关节点对的XML数据查询算法。  相似文献   

10.
Learning latent representations for users and points of interests (POIs) is an important task in location-based social networks (LBSN), which could largely benefit multiple location-based services, such as POI recommendation and social link prediction. Many contextual factors, like geographical influence, user social relationship and temporal information, are available in LBSN and would be useful for this task. However, incorporating all these contextual factors for user and POI representation learning in LBSN remains challenging, due to their heterogeneous nature. Although the encouraging performance of POI recommendation and social link prediction are delivered, most of the existing representation learning methods for LBSN incorporate only one or two of these contextual factors. In this paper, we propose a novel joint representation learning framework for users and POIs in LBSN, named UP2VEC. In UP2VEC, we present a heterogeneous LBSN graph to incorporate all these aforementioned factors. Specifically, the transition probabilities between nodes inside the heterogeneous graph are derived by jointly considering these contextual factors. The latent representations of users and POIs are then learnt by matching the topological structure of the heterogeneous graph. For evaluating the effectiveness of UP2VEC, a series of experiments are conducted with two real-world datasets (Foursquare and Gowalla) in terms of POI recommendation and social link prediction. Experimental results demonstrate that the proposed UP2VEC significantly outperforms the existing state-of-the-art alternatives. Further experiment shows the superiority of UP2VEC in handling cold-start problem for POI recommendation.  相似文献   

11.
A prefix trie index (originally called trie hashing) is applied to the problem of providing fast search times, fast load times and fast update properties in a bibliographic or full text retrieval system. For all but the largest dictionaries a single key search in the dictionary under trie hashing takes exactly one disk read. Front compression of search keys is used to enhance performance. Partial combining of the postings into the dictionary is analyzed as a method to give both faster retrieval and improved update properties for the trie hashing inverted file. Statistics are given for a test database consisting of an online catalog at the Graduate School of Library and Information Science Library of the University of Western Ontario. The effect of changing various parameters of prefix tries are tested in this application.  相似文献   

12.
程婷  王冰 《现代情报》2014,34(7):135-140
本文运用科学计量学方法对7 055条政务公开CNKI期刊论文数据进行计量统计,分析政务公开研究高频关键词的共现矩阵,得到共现聚类树和共现网络。在关键词共现分析基础上对政务公开的研究热点和未来研究趋势进行了分析。  相似文献   

13.
一种高阶平滑表面并行提取方法   总被引:1,自引:0,他引:1  
针对高阶平滑表面算法计算复杂和数据量大的问题,提出一种加快高阶平滑表面算法速度的并行方法.首先对高阶平滑表面算法进行并行化,然后采用优化技术提高算法性能,同时采用矩阵压缩改善内存空间性能.实验表明,在双核处理器上平均加速比达到1.87.  相似文献   

14.
15.
The task of answering complex questions requires inferencing and synthesizing information from multiple documents that can be seen as a kind of topic-oriented, informative multi-document summarization. In generic summarization the stochastic, graph-based random walk method to compute the relative importance of textual units (i.e. sentences) is proved to be very successful. However, the major limitation of the TF*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. This paper presents the impact of syntactic and semantic information in the graph-based random walk method for answering complex questions. Initially, we apply tree kernel functions to perform the similarity measures between sentences in the random walk framework. Then, we extend our work further to incorporate the Extended String Subsequence Kernel (ESSK) to perform the task in a similar manner. Experimental results show the effectiveness of the use of kernels to include the syntactic and semantic information for this task.  相似文献   

16.
In sponsored search, many advertisers have not achieved their expected performances while the search engine also has a large room to improve their revenue. Specifically, due to the improper keyword bidding, many advertisers cannot survive the competitive ad auctions to get their desired ad impressions; meanwhile, a significant portion of search queries have no ads displayed in their search result pages, even if many of them have commercial values. We propose recommending a group of relevant yet less-competitive keywords to an advertiser. Hence, the advertiser can get the chance to win some (originally empty) ad slots and accumulate a number of impressions. At the same time, the revenue of the search engine can also be boosted since many empty ad shots are filled. Mathematically, we model the problem as a mixed integer programming problem, which maximizes the advertiser revenue and the relevance of the recommended keywords, while minimizing the keyword competitiveness, subject to the bid and budget constraints. By solving the problem, we can offer an optimal group of keywords and their optimal bid prices to an advertiser. Simulation results have shown the proposed method is highly effective in increasing ad impressions, expected clicks, advertiser revenue, and search engine revenue.  相似文献   

17.
18.
Desirable characteristics of index displays are outlined; and a single underlying structure for a production of a variety of index displays is proposed, namely a network in which both concepts and links between concepts are represented by nodes, so that information about both may be stored using the same mechanisms. Algorithmic extraction of subnetworks for local or modular use is discussed, as well as possible approaches to structuring of index displays. Some details are given on implementation to date using a DEC system-10 COBOL ISAM database, and on experimentation with display generation.  相似文献   

19.
20.
This study proposes a probabilistic model for automatically extracting English noun phrases without part-of-speech tagging or any syntactic analysis. The technique is based on a Markov model, whose initial parameters are estimated by a phrase lookup program with a phrase dictionary, then optimized by a set of maximum entropy (ME) parameters for a set of morphological features. Using the Viterbi algorithm with the trained Markov model, the program can dynamically extract noun phrases from input text. Experiments show that this technique is of comparable effectiveness with the best existing techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号