首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This work aims to extract possible causal relations that exist between noun phrases. Some causal relations are manifested by lexical patterns like causal verbs and their sub-categorization. We use lexical patterns as a filter to find causality candidates and we transfer the causality extraction problem to the binary classification. To solve the problem, we introduce probabilities for word pair and concept pair that could be part of causal noun phrase pairs. We also use the cue phrase probability that could be a causality pattern. These probabilities are learned from the raw corpus in an unsupervised manner. With this probabilistic model, we increase both precision and recall. Our causality extraction shows an F-score of 77.37%, which is an improvement of 21.14 percentage points over the baseline model. The long distance causal relation is extracted with the binary tree-styled cue phrase. We propose an incremental cue phrase learning method based on the cue phrase confidence score that was measured after each causal classifier learning step. A better recall of 15.37 percentage points is acquired after the cue phrase learning.  相似文献   

2.
This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information displayed in the cross-language document summaries. Our results challenge two implicit assumptions in most of cross-language Information Retrieval research: first, that once documents in the target language are found, Machine Translation is the optimal way of informing the user about their contents; and second, that in an interactive setting the optimal way of formulating and refining the query is helping the user to choose appropriate translations for the query terms.  相似文献   

3.
In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.  相似文献   

4.
Abstractive summarization aims to generate a concise summary covering salient content from single or multiple text documents. Many recent abstractive summarization methods are built on the transformer model to capture long-range dependencies in the input text and achieve parallelization. In the transformer encoder, calculating attention weights is a crucial step for encoding input documents. Input documents usually contain some key phrases conveying salient information, and it is important to encode these phrases completely. However, existing transformer-based summarization works did not consider key phrases in input when determining attention weights. Consequently, some of the tokens within key phrases only receive small attention weights, which is not conducive to encoding the semantic information of input documents. In this paper, we introduce some prior knowledge of key phrases into the transformer-based summarization model and guide the model to encode key phrases. For the contextual representation of each token in the key phrase, we assume the tokens within the same key phrase make larger contributions compared with other tokens in the input sequence. Based on this assumption, we propose the Key Phrase Aware Transformer (KPAT), a model with the highlighting mechanism in the encoder to assign greater attention weights for tokens within key phrases. Specifically, we first extract key phrases from the input document and score the phrases’ importance. Then we build the block diagonal highlighting matrix to indicate these phrases’ importance scores and positions. To combine self-attention weights with key phrases’ importance scores, we design two structures of highlighting attention for each head and the multi-head highlighting attention. Experimental results on two datasets (Multi-News and PubMed) from different summarization tasks and domains show that our KPAT model significantly outperforms advanced summarization baselines. We conduct more experiments to analyze the impact of each part of our model on the summarization performance and verify the effectiveness of our proposed highlighting mechanism.  相似文献   

5.
6.
Operational level automatic indexing requires an efficient means of normalizing natural language phrases. Subject switching requires an efficient means of translating one set of authorized terms to another. A phrase structure rewrite system called a Lexical Dictionary is explained that performs these functions. Background, operational use, other applications and ongoing research are explained.  相似文献   

7.
In this article we describe a retrieval schema which goes beyond the classical information retrieval keyword hypothesis and takes into account also linguistic variation. Guided by the failures and successes of other state-of-the-art approaches, as well as our own experience with the Irena system, our approach is based on phrases and incorporates linguistic resources and processors. In this respect, we introduce the phrase retrieval hypothesis to replace the keyword retrieval hypothesis. We suggest a representation of phrases suitable for indexing, and an architecture for such a retrieval system. Syntactical normalization is introduced to improve retrieval effectiveness. Morphological and lexico-semantical normalizations are adjusted to fit in this model.  相似文献   

8.
The fundamental idea of the work reported here is to extract index phrases from texts with the help of a single word concept dictionary and a thesaurus containing relations among concepts. The work is based on the fact, that, within every phrase, the single words the phrase is composed of are related in a certain well denned manner, the type of relations holding between concepts depending only on the concepts themselves. Therefore relations can be stored in a semantic network. The algorithm described extracts single word concepts from texts and combines them to phrases using the semantic relations between these concepts, which are stored in the network. The results obtained show that phrase extraction from texts by this semantic method is possible and offers many advantages over other (purely syntactic or statistic) methods concerning preciseness and completeness of the meaning representation of the text. But the results show, too, that some syntactic and morphologic “filtering” should be included for effectivity reasons.  相似文献   

9.
Networked information retrieval aims at the interoperability of heterogeneous information retrieval (IR) systems. In this paper, we show how differences concerning search operators and database schemas can be handled by applying data abstraction concepts in combination with uncertain inference. Different data types with vague predicates are required to allow for queries referring to arbitrary attributes of documents. Physical data independence separates search operators from access paths, thus solving text search problems related to noun phrases, compound words and proper nouns. Projection and inheritance on attributes support the creation of unified views on a set of IR databases. Uncertain inference allows for query processing even on incompatible database schemas.  相似文献   

10.
Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.  相似文献   

11.
We propose answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology. A passage expansion technique based on simple anaphora resolution is introduced to retrieve more informative sentences, and a phrase extraction method based on syntactic information of the sentences is proposed to generate a more concise answer. In order to rank the phrases, we use several evidences including external definitions and definition terminology. Although external definitions are useful, it is obvious that they cannot cover all the possible targets. The definition terminology score which reflects how the phrase is definition-like is devised to assist the incomplete external definitions. Experimental results show that the proposed answer extraction and ranking method are effective and also show that our proposed system is comparable to state-of-the-art systems.  相似文献   

12.
This paper presents a method of normalizations of English titles and their retrieval. The title expressed by a noun phrase or a noun clause is converted to a function-expression by parsing. For the retrieval with a reasonable recall rate as well as a high precision rate, the function-expression is transformed to a predicate-governor form, and then normalized to a standard form. Therefrom, various items are extracted and recorded in a hierarchical tree-like inverted file.In order to keep the recall rate in a reasonable value, several retrieval stages are implemented based on the key-term and case-label matching. The retrieval is controlled by the preciseness of the specification of case-labels for each key-term.  相似文献   

13.
李晶 《科教文汇》2012,(5):129-129,135
本文梳理了三十年来英汉光杆名词短语语义研究的主要成果,旨在给出较系统的分析.  相似文献   

14.
基于组织学习视角阐述了企业文化的传播和演变。首先建立了企业文化学习速度的数理模型,通过对模型的求解和讨论,提出了企业文化组织学习的发现、发明、执行、凝聚及分化五阶段概念模型,并分析了企业文化在不同阶段组织学习的主体、主要内容、主要方式和速度;最后通过一个例子来验证本文提出的模型。  相似文献   

15.
In this research, we evaluate the effect of gender targeted advertising on the performance of sponsored search advertising. We analyze nearly 7,000,000 records spanning 33 consecutive months of a keyword advertising campaign from a major US retailer. In order to determine the effect of demographic targeting, we classify the campaign’s key phrases by a probability of being targeted for a specific gender, and we then compare the key performance indicators among these groupings using the critical sponsored search metrics of impressions, clicks, cost-per-click, sales revenue, orders, and items, and return on advertising. Findings from our research show that the gender-orientation of the key phrase is a significant determinant in predicting behaviors and performance, with statistically different consumer behaviors for all attributes as the probability of a male or female keyword phrase changes. However, gender neutral phrases perform the best overall, generating 20 times the return of advertising than any gender targeted category. Insight from this research could result in sponsored advertising efforts being more effectively targeted to searchers and potential consumers.  相似文献   

16.
The paper presents two approaches to interactively refining user search formulations and their evaluation in the new High Accuracy Retrieval from Documents (HARD) track of TREC-12. The first method consists of asking the user to select a number of sentences that represent documents. The second method consists of showing to the user a list of noun phrases extracted from the initial document set. Both methods then expand the query based on the user feedback. The TREC results show that one of the methods is an effective means of interactive query expansion and yields significant performance improvements. The paper presents a comparison of the methods and detailed analysis of the evaluation results.  相似文献   

17.
The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.  相似文献   

18.
Term occurrence A is included in term occurrence B if A is a substring of B. By making a single pass through a slightly non-standard KWIC index, every recurring phrase can be detected, and its inclusion relationships with other phrases and/or single words can be computed. Results obtained by processing a corpus of 2675 medical titles indicate that several properties definable in terms of inclusion relationships among terms have significance for vocabulary control. Preliminary results from a corpus of more than 62,000 medical titles have confirmed this finding.  相似文献   

19.
Gyro simulation is an important process of inertial navigation theory research, with the major difficulty being the stochastic error modeling. One commonly used stochastic model for a fiber optic gyro (FOG) is a Gaussian white (GW) noise plus a first order Markov process. The model parameters are usually obtained by using time series analysis methods or the Allan variance method through FOG static experiment. However, in a real life situation, a FOG may not be used. In this paper, a simulation method is proposed for estimating the stochastic errors of FOG. When using this method, the model parameters are set based on performance indicators, which are chosen as the angle random walk (ARW) and bias stability. During the research, the ARW and bias stability indicators of the GW noise and the first order Markov process are analyzed separately. Their analytical expressions are derived to reveal the relation between the model parameters and performance indicators. In order to verify the theory, a large number of simulations were carried out. The results show that the statistical performance indicators of the simulated signals are consistent with the theory. Furthermore, a simulation of a VG951 FOG is designed in this research. The Allan variance curve of the simulated signal is in agreement with the real one.  相似文献   

20.
基于2007-2013年中文社会科学引文索引(CSSCI)收录的图书馆嵌入式服务研究的110篇文献,利用知识图谱可视化软件Citespace Ⅲ对相关数据进行文献共引、名词短语、突现词等的处理与分析,揭示国内图书馆嵌入式服务研究的代表作者、核心机构及演进路径、研究前沿与热点,以期为我国图书馆嵌入式服务的研究提供有益参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号