首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 250 毫秒
序列到序列模型是文本摘要研究领域应用最广泛的方法.但在此方法中,文本语言特征没有得到充分利用,摘要句存在词语丢失和词语重复问题,影响文本摘要的准确性和可读性.为此,论文提出基于特征融合和局部注意力相结合的摘要生成方法FCLA(Feature Combination and Local Attention)模型.利用评论...  相似文献   

张秋子  陆伟  程齐凯  黄永 《情报工程》2015,1(2):064-072
为实现海量英文学术文本中缩写词及对应缩写定义的识别,本文提出了一种自动缩写识别算法 MELearn-AI。该算法在人工标注数据集的基础上,从序列标注的角度,通过最大熵模型实现了计算机领域 英文学术文本中的自动缩写识别。MELearn-AI 在本文构建的评测数据集“Paren-sen”上得到了95.8% 的 查准率和86.3% 的查全率,相对于其他两组对照实验的效果有较为明显的提升。本文提出的自动缩写识别 方法能够在计算机领域的学术文本上取得令人满意的效果,有助于更好地理解并利用该领域术语。  相似文献   

本文通过研究开源自然语言处理平台GATE和条件随机场模型,提出一种高效的电子产品领域命名实体识别策略,为实习项目中的初步工作--通过计算机智能方法识别出电子产品领域的产品品牌、属性等命名实体提出解决方案,并为下一步可能开展的领域内自动问答系统等高层应用提供底层支撑.该方法是基于层叠模型的规则与统计相结合的新的方法,分别继承了基于规则和基于统计识别方法的优点.最终,通过分析电子产品领域自身的领域特点实现了如品牌、重量等二十余种命名实体的识别.对比实验结果表明,该系统达到了令人满意的识别效果.  相似文献   

借鉴文本自动分类思想,基于文档权重归并法,采用N元语言模型,设计一个专家领域识别实验系统;并以“武汉大学”为例对专家研究领域自动识别的效果进行初步评测,实验结果表明该系统对专家研究领域的自动识别具有很高的查准率。  相似文献   

理论(Theory)、应用(Application),输入(Input)、处理(Process)、输出(Output)是探讨某一学科或领域发展的五要素,以五要素构成TAIPO模型,利用该模型对文献计量分析研究的发展趋势进行探讨.文献计量分析研究将朝着以下几大方向发展:统计规律与理论总结得到更多关注;计量指标与评价体系不断完善;数据量越来越大,异构异源数据互连分析越来越多;计量粒度越来越小,从篇章层次向句段层次发展;自动化程度越来越高,实现信息可视化与报告的自动生成.  相似文献   

将自然语言处理技术——统计语言模型引入信息检索领域产生了一系列全新的检索模型,典型包括查询似然模型、生成相关性模型、词项依赖模型、统计翻译模型、泊松分布模型以及风险最小化框架等。本文从统计学模型以及N-gram技术的角度重点解析这些信息检索模型的演进过程。最后对基于统计语言模型的信息检索模型的发展过程以及未来发展趋势和挑战进行了总结。  相似文献   

用词、构句方面的语病,是属于语言规范的问题,句间的关系处理不得当,衔接不紧密读起来不流畅,让思想跳来跳去,是属于语言连贯的问题。 表达一个完整的意思,常常是由若干句子合起来才能奏效的,句与句之间有一定的关系,相互之间都有联系,好像自行车的链条一样,我们称它为“句链”。语言的连贯是高考作文基础等级的要求。因为只有语言连贯、通顺,才能讲文采,求奇警。 考生作文中语言不连贯的有三种情况。  相似文献   

随着互联网资源的多语言性和用户所使用语言的日益多样性,跨语言信息检索成为越来越重要的研究领域.而跨语言信息检索评测是检索系统发展过程中非常重要的一环.NTCIR 是针对亚洲语言的跨语言信息检索评测会议,本文介绍了NTCIR 的发展历史,评测任务安排以及评测语料等有关信息.NTCIR 已经成为相关研究领域的著名国际会议,随着参赛队伍数目增加以及各种评测语料集合的逐步完善,可以预见它的影响将进一步扩大并对相关学科产生更加积极的影响.  相似文献   

面向专利领域的机器翻译近年来已成为机器翻译的重要应用领域之一。本文提出了一个汉英专利文本机器翻译融合系统,该系统以规则系统为主导搭建,并把规则翻译方法和基于短语的统计翻译系统相结合。在融合系统中,规则系统主要负责源语言的分析和转换阶段的处理,生成相应的源语言句法分析树与转换树,并确定目标语言的基本句法框架。统计翻译系统则在目标语生成阶段根据生成的目标语句法结构寻找合适的对译词形,并产生最终的候选译文。通过利用自动评测指标对融合系统进行测试,融合系统的结果均优于单个规则系统和统计系统的结果,表明了融合方法的有效性和可行性,可以改善系统的翻译性能,提高翻译质量。  相似文献   

用词、构句方面的语病,是属于语言规范的问题,句间的关系处理不得当,衔接不紧密读起来不流畅,让思想跳来跳去,是属于语言连贯的问题.  相似文献   

In clinical research and clinical decision-making, it is important to know if a study changes or only supports the current standards of care for specific disease management. We define such a change as transformative and a support as incremental research. It usually requires a huge amount of domain expertise and time for humans to finish such tasks. Faculty Opinions provides us with a well-annotated corpus on whether a research challenges or only confirms established research. In this study, a machine learning approach is proposed to distinguishing transformative from incremental clinical evidence. The texts from both abstract and a 2-year window of citing sentences are collected for a training set of clinical studies recommended and labeled by Faculty Opinions experts. We achieve the best performance with an average AUC of 0.755 (0.705–0.875) using Random Forest as the classifier and citing sentences as the feature. The results showed that transformative research has more typical language patterns in citing sentences than abstract sentences. We provide an efficient tool for identifying those clinical evidence challenging or only confirming established claims for clinicians and researchers.  相似文献   

The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.  相似文献   

自然语言语义分析研究进展   总被引:5,自引:0,他引:5  
按照自然语言的构成层次——词语、句子和篇章,分析各层次语义分析的内涵、现有的研究策略、理论依据及存在的主要方法,并对现存的两类主要研究策略进行对比分析.认为词语语义分析是指确定词语意义,衡量两个词之间的语义相似度或相关度;句子语义分析研究包含句义分析和句义相似度分析两方面;文本语义分析就是识别文本的意义、主题、类别等语义信息的过程.当前的自然语言语义分析主要存在两种主要的研究策略:基于知识或语义学规则的语义分析和基于统计学的语义分析.基于统计与规则相融合的语义分析方法是未来自然语言语义分析的主流方法,本体语义学是自然语言语义分析的重要基础.  相似文献   

A machine learning approach to sentiment analysis in multilingual Web texts   总被引:1,自引:0,他引:1  
Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.  相似文献   

张家俊  宗成庆 《情报工程》2017,3(3):021-028
近两年来,神经机器翻译(Neural Machine Translation, NMT)模型主导了机器翻译的研究,但是统计机器翻译(Statistical Machine Translation, SMT)在很多应用场合(尤其是专业领域)仍有较强的竞争力。如何利用深度学习技术提升现有统计机器翻译的水平成为研究者们关注的主要问题。由于语言模型是统计机器翻译中最核心的模块之一,本文主要从语言模型的角度入手,探索神经网络语言模型在统计机器翻译中的应用。本文分别探讨了基于词和基于短语的神经网络语言模型,在汉语到英语和汉语到日语的翻译实验表明神经网络语言模型能够显著改善统计机器翻译的译文质量。  相似文献   

We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.  相似文献   

《Communication monographs》2012,79(2):130-139

This study examined the effects of variation in language intensity on the perceived aggressiveness of sentences representing five empirically established levels of verbal aggression. Subjects read and rated the aggressiveness of replicated sentences in which the level of verbal aggression and language intensity had been systematically varied. Tests of the hypothesized relationship between language intensity, verbal aggression and perceived aggressiveness gave evidence that frequency adverbs do affect the perceived aggressiveness of sentences at most levels of verbal aggression. Increasing language intensity increases perceived verbal aggression only at low levels of verbal aggression; decreasing language intensity is most effective at higher levels of verbal aggression.  相似文献   

针对专利文献句子偏长的特点,将统计机器翻译中的训练语料进行子句切割获取双语的子句序列,再采 用统计和规则相结合的策略来生成子句对齐,建立基于简单子句的双语语料来重新训练统计机器翻译系统,在一定程 度上改善了原有双语训练语料中的短语对齐和词对齐,可以更为深入地利用平行语料中蕴含的翻译信息,应用于专利 统计机器翻译中,在NTCIR-9的测试集上进行实验比较,获得较为满意的翻译效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号