首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
巫桂梅 《科技通报》2012,28(7):148-151
研究文本快速准确分类的问题。同一词语在不同的语言环境下或者由不同的人使用可能代表不同的含义,这些词语在文本分类中的描述特征却极为相似。传统的文本分类方法是将文本表示成向量空间模型,向量空间模型只是从词语的出现频率角度构造,当文中出现一些多义词和同义词时就会出现分类延时明显准确性不高等特点。为此提出一种基于语义索引的文本主题匹配方法。将文本进行关键词的抽取后构造文档-词语矩阵,SVD分解后通过优化平衡的方法进行矩阵降维与相似度的计算,克服传统方法的弊端。实践证明,这种方法能大幅度降低同义词与多义词对文本分类时的影响,使文本按主题匹配分类时准确高效,实验效果明显提高。  相似文献   

2.
非平衡数据分类问题是近些年机器学习和数据挖掘领域的一个研究热点。对于非平衡数据分类问题,标准的分类学习算法不能获得良好的性能,因为它们往往只关注多数类而忽略少数类。从分类学习的3个不同层面对非平衡数据分类算法进行了综述,并指出了该领域未来可能的研究方向。  相似文献   

3.
基于同义词词林的文本特征选择与加权研究   总被引:1,自引:0,他引:1  
特征选择与加权是文本分类的关键问题之一,而噪音与数据稀疏则是特征选择过程中遇到的主要障碍.介绍了一种基于同义词词林的统计与语义相结合的文本特征选择与加权方法.该方法首先对同义词进行合并,将原有的特征提取从词的层面上升到主题概念层面,然后采用词频与相对熵的剩余度的组合TF*Ensu对特征进行加权,强化对分类贡献大的主题特征.实验结果表明,这种方法较之传统方法在特征选择与加权的效果上有明显改善,并能提高文本分类的精度.  相似文献   

4.
随着网络技术的迅猛发展,文本分类成为处理和组织大量文档数据的关键技术.常采用向量空间模型来表示文本,将文本看作特征空间的一个向量,使用TF·IDF方法对特征加权.但是这种加权方法简单地认为文本频数少的单词就重要,文本频数多的单词就不重要,使它不可能很好地反映单词的有用程度,从而导致分类准确率下降.针对TF·IDF方法存在的问题,提出了一种基于特征基尼指数的特征加权方法TF·GINI.实验结果显示,这种加权方法具有很好的分类性能.  相似文献   

5.
针对图书、期刊论文等数字文献文本特征较少而导致特征向量语义表达不够准确、分类效果差的问题,本文提出一种基于特征语义扩展的数字文献分类方法。该方法首先利用TF-IDF方法获取对数字文献文本表示能力较强、具有较高TF-IDF值的核心特征词;其次分别借助知网(Hownet)语义词典以及开放知识库维基百科(Wikipedia)对核心特征词集进行语义概念的扩展,以构建维度较低、语义丰富的概念向量空间;最后采用MaxEnt、SVM等多种算法构造分类器实现对数字文献的自动分类。实验结果表明:相比传统基于特征选择的短文本分类方法,该方法能有效地实现对短文本特征的语义扩展,提高数字文献分类的分类性能。  相似文献   

6.
自然语言处理是人工智能领域中的一个热门方向,而文本分类作为自然语言处理中的关键技术受到专家学者的广泛关注。随着机器学习技术的发展,决策树算法已经在文本分类中取得了较好的分类效果。本文针对短文本分类问题,利用TFIDF提取文本特征后,结合梯度提升决策树算法进行文本分类,并与朴素贝叶斯、逻辑回归和支持向量机的分类效果进行对比分析,验证了梯度提升决策树用于短文本分类的可行性。  相似文献   

7.
传统特征选择算法没有考虑特征之间的关联性,并且基于类别平衡假设,在不平衡问题上偏向多数类而忽略少数类。针对以上不足,本文综合考虑特征相关性与不平衡性,提出一种基于类区分度的高维不平衡特征选择算法CDHI,该算法通过k-means进行特征聚类,并计算簇中每个特征的类区分度,利用类区分度对聚类簇中特征进行重要性排序,然后选择各簇中类区分度较高的特征组成特征子集,达到去除高维特征冗余与处理不平衡数据的双重目的。实验结果表明,与传统特征选择方法相比,CDHI算法有效降低了特征空间的维度,提高了少数类的识别率。  相似文献   

8.
周源  刘怀兰  杜朋朋  廖岭 《情报科学》2017,35(5):111-118
【目的/意义】特征提取会很大程度地影响分类效果,而传统TF-IDF特征提取方法缺乏对特征词上下文环 境和对特征词在类之间分布状况的考虑。【方法/过程】本文提出一种改进TF-IDF特征提取的方法:①基于文本网 络和改进PageRank算法计算节点重要程度值,解决传统TF-IDF忽略文本结构信息的问题;②增加特征值IDF值 的方差来衡量特征词w在不同类别文本集中程度的分布情况,解决传统TF-IDF忽略特征词在类之间分布状况的 不足。【结果/结论】基于该改进方法构建了文本分类模型,对3D打印数据进行分类实验。对比算法改进前后的分 类效果,验证了该方法能够有效提高文本特征词提取的准确度。  相似文献   

9.
在文本自动分类中,目前有词频和文档频率统计这两种概率估算方法,采用的估算方法恰当与否会直接影响特征抽取的质量与分类的准确度。本文采用K最近邻算法实现中文文本分类器,在中文平衡与非平衡两种训练语料下进行了训练与分类实验,实验数据表明使用非平衡语料语料时,可以采用基于词频的概率估算方法,使用平衡语料语料时,采用基于文档频率的概率估算方法,能够有效地提取高质量的文本特征,从而提高分类的准确度。  相似文献   

10.
针对自然语言处理中普遍存在的特征缺失问题,提出了基于特征缺失补偿最大熵模型的文本分类方法.为避免数据稀疏时出现训练过适应,采用高斯先验平滑进行特征补偿,并提出基于条件最大熵计算增益和基于特征频数的混合特征选择方法.通过实验将本方法与中心法、最近邻、贝叶斯、SVM和平滑前的最大熵文本分类器进行了比较,实验结果表明基于特征缺失补偿最大熵模型分类器的综合性能超过以上算法.  相似文献   

11.
This study examined how students who had no prior experience with videoconferencing would react to the use of videoconferencing as an instructional medium. Students enrolled in seven different courses completed a questionnaire at the beginning of the semester and again at the end of the semester. Students at the origination and remote sites did not differ in their reactions toward videoconferencing but there was a significant difference for gender. Women reacted less favorably to videoconferencing. Compared to the beginning of the semester, students reported significantly less positive attitudes toward taking a course through videoconferencing at the end of the semester. There were no significant differences in students' attitudes toward videoconferencing across courses at the beginning of the semester but there were significant differences across the courses at the end of the semester. The results suggest the need for better preparation for both students and instructors.  相似文献   

12.
Ajoint study by Prof. ZHANG Zhibin from the CAS Institute of Zoology and his co-workers from Norway, US and Swiss have indicated that historical outbreaks of migratory locusts in China were associated with cold spells, suggesting that China's projected climate warming could decrease the pest's numbers. The study was published in Proceedings of theNational Academy of Sciences on 17 September, 2007.  相似文献   

13.
A computer-mediated group is a complex entity whose members exchange many types of information via multiple means of communication in pursuit of goals specific to their environment. Over time, they coordinate technical features of media with locally enacted use to achieve a viable working arrangement. To explore this complex interaction, a case study is presented of the social networks of interactions and media use among members of a class of computer-supported distance learners. Results show how group structures associated with project teams dominated who communicated with whom, about what, and via which media over the term, and how media came to occupy their own communication niches: Webboard for diffuse class-wide communication; Internet Relay Chat more to named others but still for general communication across the class; and e-mail primarily for intrateam communication. Face-to-face interaction, occurring only during a short on-campus session, appears to have had a catalytic effect on social and emotional exchanges. Results suggest the need to structure exchanges to balance class-wide sharing of ideas with subgroup interactions that facilitate project completion, and to provide media that support these two modes of interaction.  相似文献   

14.
Electronic data interchange (EDI) provides means for interorganizational communication, creates network externalities, requires an advanced information technology (IT) infrastructure, and relies on standards. In the diffusion of such innovations, institutional involvement is imperative. Such institutions contain governmental agencies, national and global standardization organizations, local government, and nonprofit private organizations like industry associations. The last type of organizations we call intermediating institutions. They intermediate or coordinate ("inscribe") the activities of a group of would-be adopters. Unfortunately, little is known of how these organizations shape the EDI diffusion trajectory. In this article we examine one specific type of intermediating organizations?industry associations?and how they advanced the EDI diffusion process in the grocery sectors of Hong Kong, Denmark and Finland. We identify six institutional measures, placed into a matrix formed by the mode of involvement (influence vs. regulation) and the type of diffusion force (supply push vs. demand pull), that can be mobilized to further the EDI diffusion. Industry associations were found to be active users of all these measures to varying degrees. Their role was critical especially in knowledge building, knowledge deployment, and standard setting. Furthermore, institutional involvement varied due to policy and cultural contingencies and power dependencies.  相似文献   

15.
The increasing prospects of digital piracy has prompted the perceived need by electronic publishers to adopt technical systems of protection, and governments to reform their copyright laws. This article is a preliminary study of the management of intellectual property by electronic publishers, defined as those involved in the production of online databases, and CD-ROMs. It focuses on three main issues: (1) how electronic publishers view the increasing threat of piracy; (2) the methods of protection employed to protect intellectual property in digital format; and (3) the importance of technological protection of intellectual property in electronic publications. The analysis is based on a sample of 23 UK electronic publishers. The interviews revealed an interesting assortment of protection methods and did not show that technological protection was a preferred way. Instead, the means of protection, in addition to copyright law, comprised niche markets, pricing, trust, bad publicity, and nontechnical and technical means.  相似文献   

16.
Long-standing conflict between domain name registrants and trademark holders prompted the Internet Corporation for Assigned Names and Numbers (ICANN) to create a global, mandatory arbitration procedure known as the Uniform Dispute Resolution Policy (UDRP). The UDRP has been used in 2166 cases involving 3938 domain names as of 1 November 2000. The policy gives the initiator of a complaint, generally a trademark holder, the right to choose which ICANN-accredited dispute resolution service provider (RSP) will handle the case. During the preparation of the UDRP,some feared that complainant selection would lead to "forum shopping" that might bias the results. This article performs a statistical assessment of the forum-shopping thesis and finds support for it. There are statistically significant differences in the various RSPs' propensity to take away names from defendants; there are also major differences in the number of cases brought to each RSP. RSPs who take away names have the larger share of cases. The study examines other variables that might explain differences in market share, such as price, the plaintiff's nationality, or the time taken to decide a case. It finds that nationality and time also are correlated with market share. The study concludes that shopping for a favorable outcome is an important factor in the UDRP.  相似文献   

17.
Prof. Raymond C.K. Chen, a neuropsychologist with the CAS Institute of Psychology, has made novel progress in his studies of schizophrenia. His work has been reported by a recent issue of Behavioural Neurology.  相似文献   

18.
<正>In Xishuangbanna,one of China’s most biodiverse regions,landscape has changed dramatically during the past three decades due to the conversion of tropical rainforest to rubber plantations.In steep areas,terraces are often constructed before planting rubber trees,which causes two important changes in the soil:the destabilization of soil in the bench terraces and the increased vulnerability of unvegetated riser faces to erosion.Few studies have documented the nature and intensity of erosion on bench terraces.Prof.LIU Wenjie and his colleagues from the Xishuangbanna Tropical Botanical Garden(XTBG)conducted a study in Menglun County(21°5′39″N,101°15′55″E),Xishuangbanna to evaluate the influence  相似文献   

19.
<正>Chinese scientists plan to apply Earth Observation technologies to protect the critically endangered wild camels(Camelus ferus).With the help of remote sensing,satellite positioning,geographical information system and wireless sensors networks,they will be able to access the distribution and population of the wild camels and protect their habitats.The project will be carried out by the International Research Center for Wild Camel Conservation,which was jointly established earlier this year by the Institute of Remote Sensing and Digital Earth(RADI)of the Chinese  相似文献   

20.
<正>The Paul Gerson Unna Research Group on Dermatogenomics was founded in October 2012 at the Partner Institute for Computational Biology.The ultimate goal of the group is to understand the biology of skin and skin appendages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号