首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度学习的数据科学招聘实体自动抽取及分析研究
引用本文:王东波,胡昊天,周鑫,朱丹浩.基于深度学习的数据科学招聘实体自动抽取及分析研究[J].图书情报工作,2018,62(13):64-73.
作者姓名:王东波  胡昊天  周鑫  朱丹浩
作者单位:1. 南京农业大学信息科学技术学院 南京 210095; 2. 南京大学信息管理学院 南京 210093; 3. 南京大学计算机科学与技术系 南京 210093
基金项目:本文系国家社会科学基金重大项目"情报学学科建设与情报工作未来发展路径研究"(项目编号:17ZDA291)和江苏省普通高校学术学位研究生科研创新计划项目"引用内容分析——引文语义信息的自动挖掘(KYZZ16_0033)"研究成果之一。
摘    要:目的/意义]数据科学作为一个融合诸多领域的新兴交叉学科正在快速形成。从数据科学招聘的公告信息中,抽取出相应的实体知识不仅有助于从市场的角度了解数据科学的发展动态,而且有助于改进数据科学教学的内容。方法/过程]基于各大招聘网站职位招聘公告,结合情报学的数据获取、标注和组织方法,构建数据科学招聘语料库并从中抽取相应的实体进行分析与研究。结果/结论]在搜集到的11 000篇经过标注的职位招聘公告语料的基础上,基于Bi-LSTM-CRF、CRF和Bi-LSTM模型,对数据科学招聘实体的抽取任务进行性能的对比,确定最终的数据科学招聘实体自动抽取模型,设计数据科学招聘实体自动抽取平台,并构建数据科学招聘实体网络。

关 键 词:数据科学  条件随机场  深度学习  Bi-LSTM-CRF  
收稿时间:2017-12-02

Research of Automatic Extraction of Entities of Data Science Recruitment and Analysis Based on Deep Learning
Wang Dongbo,Hu Haotian,Zhou Xin,Zhu Danhao.Research of Automatic Extraction of Entities of Data Science Recruitment and Analysis Based on Deep Learning[J].Library and Information Service,2018,62(13):64-73.
Authors:Wang Dongbo  Hu Haotian  Zhou Xin  Zhu Danhao
Institution:1. Colledge of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095; 2. Department of Information Management, Nanjing University, Nanjing 210093; 3. Department of Computer Science and Technology, Nanjing University, Nanjing 210093
Abstract:Purpose/significance] Data science is emerging as a new interdisciplinary field which combines many fields. Extracting the corresponding entities knowledge from the announcement information of data science recruitment can not only help to understand the development of data science from a market perspective, but also help to improve the content of data science teaching.Method/process] Based on the recruitment announcement from the recruitment website, combining with information science data collection, annotation and organization methods, data science corpus was constructed and the corresponding entities from it were extracted.Result/conclusion] In the existing 11000 annotated data science corpus scale recruitment announcement, based on the Bi-LSTM-CRF, CRF and Bi-LSTM models, this paper compared the extraction performance of data science recruiting entities and finally determined the final data science recruitment entities automatic extraction model, designed the data science recruitment entities automatic extraction platform, and built a data science recruitment entities network.
Keywords:data science  conditional random field  deep learning  Bi-LSTM-CRF  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号