基于多特征知识的先秦典籍词性自动标注研究 Researches of Automatic Part-of-speech Tagging for Pre-Qin Literature Based on Multi-feature Knowledge期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于多特征知识的先秦典籍词性自动标注研究

引用本文：	王东波,黄水清,何琳.基于多特征知识的先秦典籍词性自动标注研究[J].图书情报工作,2017,61(12):64-70.

作者姓名：	王东波黄水清何琳

作者单位：	1. 南京农业大学信息科学技术学院南京 210095; 2. 南京农业大学领域知识关联研究中心南京 210095

基金项目：	本文系国家社会科学基金重大项目"基于《汉学引得丛刊》的典籍知识库构建及人文计算研究"（项目编号：15ZDB127）、南京农业大学人文社会科学基金项目（项目编号：SKPT2016001）和国家社会科学基金青年项目"哈佛燕京学社汉学引得丛刊研究"（项目编号：12CTQ019）研究成果之一。

摘要：	目的/意义] 先秦典籍在古代典籍中的地位极为重要。本文提出对先秦典籍进行词性自动标注的解决方法,以便更加准确地挖掘先秦典籍中的潜在知识。方法/过程] 通过条件随机场模型,结合统计方法确定组合特征模板,并最终得到针对先秦典籍的词性自动标注算法模型。结果/结论] 在先秦典籍自动分词的整个流程基础上,得到简单特征模板、组合特征模板下的词性自动标注模型,基于组合特征模板的词性标注模型调和平均值F达到94.79%,具有较强的推广和应用价值。在构建词性自动标注模型的过程中,通过融入字词结构、词语拼音和字词长度的特征知识,使得模型的精确率和召回率得到有效提升。
关键词：	词性标注先秦古籍条件随机场模型特征模板古文信息处理
收稿时间：	2017-02-13
Researches of Automatic Part-of-speech Tagging for Pre-Qin Literature Based on Multi-feature Knowledge

Wang Dongbo,Huang Shuiqing,He Lin.Researches of Automatic Part-of-speech Tagging for Pre-Qin Literature Based on Multi-feature Knowledge[J].Library and Information Service,2017,61(12):64-70.

Authors:	Wang Dongbo Huang Shuiqing He Lin

Institution:	1. College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095; 2. Research Center for Correlation of Domain Knowledge Nanjing Agricultural University, Nanjing 210095

Abstract:	Purpose/significance] The Pre-Qin literature plays an extremely important role in the whole ancient classics. In order to more accurately mine the deep knowledge from the Pre-Qin literature, the automatic part-of-speech tagging for Pre-Qin literature becomes the first assignment, and the paper presents the solving method. Method/process] Based on conditional random fields model and combined feature template which is determined by the method of statistics, the paper finally finishes constructing the model of the automatic part-of-speech tagging for the Pre-Qin literature. Result/conclusion] The part-of-speech tagging models based on simple feature template and combined feature template are obtained under the processing flow of part-of-speech for Pre-Qin literature. The F-measure of part-of-speech model reaches 94.79% which is able to promote and apply. In the course of constructing model, the precision rate and recall rate of segmentation model are effectively enhanced by merging the feature knowledge, such as word structure, phonetic spelling and word length.

Keywords:	part-of-speech Pre-Qin literature conditional random fields model feature template ancient Chinese character information processing

	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏