首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于搜索引擎日志的中文纠错方法研究
引用本文:杨苏稳,张晓如.基于搜索引擎日志的中文纠错方法研究[J].教育技术导刊,2020,19(6):182-187.
作者姓名:杨苏稳  张晓如
作者单位:江苏科技大学 计算机学院,江苏 镇江 212003
摘    要:针对用户使用搜索引擎输入关键词查询信息时,由于输入法的原因或者不小心输入错误关键词等,致使搜索结果不符合用户预期的问题,提出基于搜索引擎日志的中文纠错方法。首先对用户网络日志展开研究,对数据进行预处理,将用户常见错误分为两大类:一类为拼音引起的错误,针对该类错误,参考并改进了基于拼音索引的中文模糊匹配算法进行纠错;另一类为多字、少字、异位及别字引起的错误,针对该类错误,设计了模糊匹配方法结合最小编辑距离方法进行纠错。经过实验验证,证明了该纠错方法的有效性,该方法能够一定程度上提升用户体验,满足实际工程需要。

关 键 词:搜索引擎日志  中文纠错  模糊匹配  最小编辑距离  
收稿时间:2019-10-18

Research on Chinese Error Correction Method Based on Search Engine Log
YANG Su-wen,ZHANG Xiao-ru.Research on Chinese Error Correction Method Based on Search Engine Log[J].Introduction of Educational Technology,2020,19(6):182-187.
Authors:YANG Su-wen  ZHANG Xiao-ru
Institution:College of Computer,Jiangsu University of Science and Technology,Zhenjiang 212003,China
Abstract:When a user uses a search engine to input keyword query information,the search result fail to meet the user’s expected query due to input method or carelessness. This paper proposes a Chinese error correction method based on search engine logs. This paper firstly studies the users’weblog,preprocesses the data,and classifies the common errors of users into two categories. One type is the errors caused by Pinyin. For this kind of errors,this paper refers to and improves the Chinese fuzzy matching algorithm based on pinyin index for error correction. The other type is the error caused by multi-word,missing word,ectopic and other words. For this class,the fuzzy matching method is designed with the minimum editing distance for error correction. After experimental verification,the effectiveness of the error correction method proposed in this paper is demonstrated,which proves the method can improve the user experience to a certain extent and meet the needs of practical engineering.
Keywords:search engine log  Chinese error correction  fuzzy matching  minimum editing distance  
点击此处可从《教育技术导刊》浏览原始摘要信息
点击此处可从《教育技术导刊》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号