首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 175 毫秒
生物医学文本语义消歧研究中,上下文语义表示存在精度不高、忽略语言特性等问题,对此提出一种基于Bi-LSTM的新型语言模型。该模型通过考虑上下文词序将整个句义信息以无监督学习方式嵌入低维连续空间,并以此生成高质量的上下文表示,然后利用该方法构建歧义向量,最终计算cosine相似度,完成对歧义词的分类。实验表明,相比传统线性语言模型,基于Bi-LSTM生成的语义向量能更好地表示歧义词的语义信息,并在不同生物医学文本数据集中达到高准确度(95.01/91.27)。  相似文献   

为提升抽取短文本关键词的准确率和召回率,并发掘出文中未出现但能很好表达短文主题的关键词,提出一种短文本关键词抽取及扩展方法。该方法在关键词抽取时,考虑了词的统计特征、主题特征及词搭配特征等多种特征,分步对词的评分进行修正,最终得到较为准确的关键词。关键词扩展时,通过计算抽取出的关键词与主题特征词之间的相似度,扩展出能够较好反应短文本主题的扩展关键词。考虑主题特征及关键词扩展时,需要有主题相关性较强的长文本语料库辅助。有相关性较强的长文本语料库时,该方法有较好的表现。  相似文献   

基于统计的关键词抽取方法忽略了词语的深层语义信息,而词汇链的关键词抽取方法能弥补这一缺陷,但词汇链的构造需要计算语义相似度,而语义相似度的计算需要知识库的支持,提出了一种综合考虑词汇链和互信息模型的关键词抽取算法。首先对文本进行预处理。借助词汇链和互信息模型来表达词语间语义关系,以及对未包含词及相关联度高而相似度值不理想的关键词识别。实验结果表明:在准确率和召回率方面.较基于统计的和基于词汇链的关键词抽取算法均有所提高。  相似文献   

针对短语文本的分类、聚类、信息查询问题,提出了一种新的中文短语文本相似度计算方法.用该方法计算出的文本相似度及一个比较文本与多个被比较文本所得相似度变化趋势是合理的,因此可以满足短语文本分类/聚类和信息查询的需要.  相似文献   

魏澜  王坤 《成人教育》2023,(11):63-71
实习是我国职业教育人才培养的重要环节,也是实现人才培养模式改革创新的重要途径。以关键节点性政策为依据,对我国职业教育学生实习政策的发展阶段进行划分。运用扎根理论对各个阶段的政策文本进行开放式编码,提取出文本关键词,运用TF-IDF加权技术和政策效力等级,分别对各个阶段的关键词和政策文本进行加权,然后根据余弦相似度构建出各阶段关键词共词相似矩阵,并绘制出关键词共词网络图。通过对各个阶段共词网络图中的小群体进行逐个分析,梳理出5条实习政策演变的规律,并依此对实习考核评价政策、实习工作的监督与管理、企业优惠政策、人才队伍建设等方面提出相应的政策建议。  相似文献   

以影响力为衡量标准的影响力扩散模型,广泛用于挖掘和分析社交网络舆论领袖及热门话题,但因其在计算影响力时没有考虑文本内容相似度,导致舆论领袖识别的准确率不高,为此,提出一种影响力扩散内容模型。根据帖子回复关系构建帖子之间的外部链接结构;使用向量空间模型计算帖子间的内容相似度,构建内部链接结构;根据所含高频关键词个数比赋予每个帖子相应的影响力值。该过程整合了帖子回复结构网络特性及帖子内容相似度等信息,提高了舆论领袖的识别准确率。实验结果表明,该方法比影响力扩散模型效果更好。  相似文献   

文本相似度计算是文本分类、文本聚类、自动文摘、信息抽取的基础.文本相似度计算性能直接影响到文本分类、文本聚类、自动文摘的质量.另外文本相似度还应用于诸多自然语言处理任务中,本文对文本相似度计算问题进行了深入的研究,并根据自然语言的特点提出了通过比较两个文本关键语义对来计算文本的语义相似度.  相似文献   

随着信息化的深入发展,各应用领域积累了大量采用半结构化方式记录的文本数据。为了快速有效地从大规模面向领域的半结构化文本中抽取有用信息,信息抽取技术应运而生。文本信息抽取的核心算法之一是计算词或短语的相似度,针对面向领域的半结构化文本中的中文短语相似度计算,先采用模式匹配算法从原始半结构化文本中抽取中文短语,然后结合领域语义依存关系,对基于公共子串的短语相似度计算方法进行改进,以此提高短语相似度计算的可靠性。实验结果表明,所提算法具有较好的计算效果。  相似文献   

为了使源语文本和译语文本间的转换和比较有据可依,采用综合分析法探讨文本意义结构及跨语重构。研究表明,文本意义是一个多维度的立体结构,由作者主体建构系统、语言系统和读者主体建构系统三个维度构成,源语文本和译语文本的意义结构相似度越高,翻译质量越高。文本意义结构模型分析为文本意义的跨语重构提供了新视角。  相似文献   

通过对目前各种本体映射方法的分析,提出一种改进的本体映射的方法.该方法考虑了概念的名称、实例、属性、关系对相似度计算的影响,使概念相似度的计算更加全面、准确.  相似文献   

彼得.纽马克借用德国功能语言学家卡尔.布勒的语言功能工具模式,将文本按语言功能和交际目的分为不同的类型,并针对不同类型的文本提出不同的方法——语义翻译或交际翻译的方法,从而将语言功能、文本交际目的和翻译方法结合起来,被公认对翻译研究、翻译实践和教学有直接指导意义。  相似文献   

长久以来,人们把文本节奏和语言节奏等同起来,并且把后者作为一个重要的参数对文本进行分析,这是对文本节奏的一种误读.文本节奏和传统意义上被等同于格律的语言节奏不同,它包括通过听觉和视觉直接把握的外部节奏以及通过思维间接把握的内部节奏.文本节奏统摄着文本的形式与神韵,是文本价值的体现.翻译时只有对文本的价值呈现方式有了充分的认识,才能在译文中实现价值重构.  相似文献   

文本分类是指在给定的分类体系下,根据文本的内容自动判别文本类别的过程,本文对、NET文本分类检索中所涉及的关键技术贝叶斯网分类方法,进行了研究和探讨,并且提出了基于向量空间的.NET文本分类检索的结构,并给出了会估计方法和实验结果。  相似文献   

The language of text messages speeds up the transmission of information,shows the richness of languages,and contains all kinds of implication. Many researches on text messages have been published but the analysis of the languages of text messages in the domain of Grice's cooperative principle is open to investigate. This paper explores the language of text messages based on Grice's Cooperative Principle(CP) and its maxims,which aims to understand how the theory influences the text message communication and create some humorous effect. It is of practical significance to research text messages as a kind of language phenomenon.  相似文献   

Reflective learning refers to a learner's purposeful and conscious manipulation of ideas toward meaningful learning. Blogs have been used to support reflective thinking, but the commonly seen blog software usually does not provide overt mechanisms for students' high-level reflections. A new tool was designed to support the reflective thinking process. Beyond writing blog posts, the tool allowed users to attach up to five keywords to each post and link the keywords on a concept map. This study aimed to seek evidence of reflective thinking in participants’ keyword-attaching activities. Data analysis included producing mental maps of the blog texts, calculating nodes of high centrality (most talked-about nodes and most connected nodes) with the help of software including AutoMap and Organizational Risk Analyzer, and comparing student-generated keywords against mental map nodes. Results of keyword analyses revealed that two-thirds of the student-attached keywords matched mental map nodes. Results also indicate that the map analysis method can produce reliable indexes of a given text, which in turn could serve as anchor points for further content analysis. Other findings also uncovered some differences between participant-selected keywords and mental map nodes, indicating different levels of reflective activities.  相似文献   

主要以组织篇章翻译过程中语用功能的作用为研究对象,着重强调译者在翻译篇章时应采取的思维方式,在利用语言资源对应注重读者的反映,读者在接受篇章时的心理特征,读者、译者和篇章三者之间的整体关系.  相似文献   

“期待视野”是接受美学的核心概念,影响着读者对文本的理解和鹇释,决定了译文的实际效果。读者的“期待视野”被文本融合的程度越高,评价就越高;否则,就不会获得读者的认可。在商务文本翻译的活动中只有充分考虑目的语读者的语言习惯、文化心理、审芙情趣等诸多因素,译者才能实现译文和目的语读者之间的视野融合。  相似文献   

Bilingual German fourth‐graders are expected to develop greater linguistic awareness than monolingual children and therefore should habitually apply different text‐processing strategies compared with German monolingual fourth‐graders when comprehending and recalling a text. Bilingual children are expected to process texts from the bottom up, from the text base to the gist, whereas monolingual children should engage in top‐down processing, which is indicated, for example, by more text intrusions and inferences. This research attempts to clarify whether bilinguals show this shift in direction of processing when they process cross‐linguistic versus mono‐linguistic texts. The results of Experiment 1 supported our main hypothesis. Monolingual German fourth‐graders had more intrusions than same‐aged German–English (L1–L2) bilingual children. In Experiment 2, nearly balanced German–English and German‐dominant children were tested separately in within‐language free recall in both languages and in across‐language text recall. For nearly balanced bilingual children, within‐ and cross‐language recall was equally efficient in both languages but not for German‐dominant bilingual children – in their recall, more intrusions appeared in their L2 recall. Top‐down processing seems to increase when it is in the weaker language. Engaging in bottom‐up processing apparently is associated with cognitive functioning in L1.  相似文献   

In this work, we have investigated text readability in Bangla language. Text readability is an indicator of the suitability of a given document with respect to a target reader group. Therefore, text readability has huge impact on educational content preparation. The advances in the field of natural language processing have enabled the automatic identification of reading difficulty of texts and contributed in the design and development of suitable educational materials. In spite of the fact that, Bangla is one of the major languages in India and the official language of Bangladesh, the research of text readability in Bangla is still in its nascent stage. In this paper, we have presented computational models to determine the readability of Bangla text documents based on syntactic properties. Since Bangla is a digital resource poor language, therefore, we were required to develop a novel dataset suitable for automatic identification of text properties. Our initial experiments have shown that existing English readability metrics are inapplicable for Bangla. Accordingly, we have proceeded towards new models for analyzing text readability in Bangla. We have considered language specific syntactic features of Bangla text in this work. We have identified major structural contributors responsible for text comprehensibility and subsequently developed readability models for Bangla texts. We have used different machine-learning methods such as regression, support vector machines (SVM) and support vector regression (SVR) to achieve our aim. The performance of the individual models has been compared against one another. We have conducted detailed user survey for data preparation, identification of important structural parameters of texts and validation of our proposed models. The work posses further implications in the field of educational research and in matching text to readers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号