首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度学习的虚假健康信息识别
引用本文:於张闲,冒宇清,胡孔法.基于深度学习的虚假健康信息识别[J].教育技术导刊,2020,19(3):16-20.
作者姓名:於张闲  冒宇清  胡孔法
作者单位:南京中医药大学 人工智能与信息技术学院,江苏 南京 210023
基金项目:国家自然科学基金项目(81674099,81804219);国家重点研发计划项目(2017YFC1703500,2017YFC1703501,2017YFC1703503,2017YFC1703506);江苏省自然科学基金项目(BK20180822);江苏省“六大人才高峰”高层次人才项目(2016-XYDXXJS-047)
摘    要:随着互联网的迅猛发展,网上健康信息以几何速度增长,其中大量虚假健康信息给人们的生活带来了很大影响,但目前对虚假健康信息文本识别的研究非常缺乏,以往研究主要集中在识别微博上的谣言、伪造商品评论、垃圾邮件及虚假新闻等方面。鉴于此,采用基于词向量的深度神经网络模型和基于双向编码的语言表征模型,对互联网上流传广泛的健康信息文本进行自动分类,识别其中的虚假健康信息。实验中,深度网络模型比传统机器学习模型性能提高10%,融合Word2vec的深度神经网络模型比单独的CNN或Att-BiLSTM模型在分类性能上提高近7%。BERT模型表现最好,准确率高达88.1%。实验结果表明,深度学习可以有效识别虚假健康信息,并且通过大规模语料预训练获得的语言表征模型比基于词向量的深度神经网络模型性能更好。

关 键 词:健康信息  词向量  深度神经网络模型  语言表征模型  预训练模型  
收稿时间:2019-12-11

False Health Information Recognition Based on Deep Learning
YU Zhang-xian,MAO Yu-qing,HU Kong-fa.False Health Information Recognition Based on Deep Learning[J].Introduction of Educational Technology,2020,19(3):16-20.
Authors:YU Zhang-xian  MAO Yu-qing  HU Kong-fa
Institution:School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023,China
Abstract:With the rapid development of the Internet, online health information has been growing exponentially. A lot of fake health information has a great effect on people’s daily life. However, there is a lack of research on text recognition of fake health information recognition. Existing research mainly focus on rumors on microblogs, fabricated product reviews, spam and fake news, etc. This paper utilizes a deep neural network based on word vector and a language presentation model based on bidirectional encoder to classify health information automatically, so that the fake health information can be recognized. In this experiment, the performance of the deep network model is 10% higher than the traditional machine learning model. The deep neural network model integrated with Word2vec improves the classification performance by nearly 7% compared with the CNN or Att-BiLSTM model alone. The BERT model performs best, with an accuracy rate of 88.1%. The experimental results show that the deep learning techniques can recognize fake health information effectively, and the language representation model pretrained with large-scale corpus performs better than the deep neural network model based on word vector.
Keywords:health information    word vector    neural network model    language representation model    pre-trained model  
点击此处可从《教育技术导刊》浏览原始摘要信息
点击此处可从《教育技术导刊》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号