首页 | 本学科首页   官方微博 | 高级检索  
     检索      

大规模汉语语料库中任意n的n-gram统计算法及知识获取方法
引用本文:张民,李生,赵铁军.大规模汉语语料库中任意n的n-gram统计算法及知识获取方法[J].情报学报,1997(1).
作者姓名:张民  李生  赵铁军
作者单位:哈尔滨工业大学计算机科学与工程系
摘    要:本文提出并实现了一种大规模汉语语料库中字、词级任意n的n-gram统计算法,本算法可以一次性统计出所有不大于任意n(本文n取为256)的字、词级n-gram,可将传统n-gram统计时的指数空间开销变为线性的,且与所统计的元数无关。基于这种n-gram的统计,本文还进行了汉语信息熵的计算及字、词级知识获取的研究。本算法及本文的研究结果已应用于我们研制的机译系统中

关 键 词:n元语法  统计  信息熵  知识获取

Algorithm of n gram Statistics for Arbitrary n and Knowledge Acquisition Based on Statistics
Zhang Min,Li Sheng and Zhao Tiejun.Algorithm of n gram Statistics for Arbitrary n and Knowledge Acquisition Based on Statistics[J].Journal of the China Society for Scientific andTechnical Information,1997(1).
Authors:Zhang Min  Li Sheng and Zhao Tiejun
Abstract:A new algorithm of n gram statistics for arbitrary n at word or phrase level is proposed and realized in this paper,with which the n gram for all n at word or phrase level can be calculated at the same time. Based on the n gram,the Chinese information entropy and knowledge acquisition at word or phrase level have also been studied.The algorithm and its result have been integrated with a MT system.
Keywords:n  gram  statistics  information entropy  knowledge acquisition
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号