首页 | 本学科首页   官方微博 | 高级检索  
     

语料库间多特征相似性比较的统计方法研究
引用本文:韩金龙. 语料库间多特征相似性比较的统计方法研究[J]. 现代教育技术, 2010, 20(8): 83-87
作者姓名:韩金龙
作者单位:华南理工大学,外国语学院,广东广州,510640
摘    要:语料库间多特征相似性比较可采用的统计方法包括卡方检验、秩相关检验和卡方相似性检验。以350个常用词汇为例的语料库统计实验研究表明,在较大样本的多特征语言研究中,卡方检验很容易得出语料库之间具有显著性差异的结论,秩相关检验同样容易得出参与比较的文体具有显著相关的结论,而卡方相似性检验采用统计量相对值作为推断的根据,可得到较为细致的语料库之间相似程度的研究结果。

关 键 词:语料库  多特征相似性比较  卡方相似性检验

An Analysis of Statistical Techniques Applying to Multi-feature Similarity Comparison between Corpora
HAN Jin-long. An Analysis of Statistical Techniques Applying to Multi-feature Similarity Comparison between Corpora[J]. Modern Educational Technology, 2010, 20(8): 83-87
Authors:HAN Jin-long
Affiliation:HAN Jin-long(School of Foreign Languages,South China Univ.of Tech.,Guangzhou,Guangdong 510640)
Abstract:Statistical techniques applying to multi-feature similarity comparison include chisquare test,rank correlation test and chi by degrees of freedom (CBDF) test.Taking 350 high frequency words as an example,the statistical experiment between corpora shows that,in multi-feature linguistic studies based on relatively large samples,chi-square test arrives at the conclusion of significant difference between corpora easily and rank correlation analysis reaches the conclusion of significant correlation between different linguistic styles easily as well.CBDF test,based on the relative statistical values between different corpora,draws a relatively elaborate conclusion about the degree of similarity between them.However,this technique requires high computerized text manipulation capability of researchers.
Keywords:Corpus  Multi-feature Similarity Comparison  CBDF Test
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号