首页 | 本学科首页   官方微博 | 高级检索  
     检索      


An ensemble model for classifying idioms and literal texts using BERT and RoBERTa
Institution:1. School of Economics and Management, Harbin Engineering University, Harbin 150001, China;2. Management School, Harbin University of Commerce, Harbin 150028, China;3. Department of Computer Science and Information Engineering, Asia University, Taichung, 41354, Taiwan;4. Department of Computer Science and Engineering, Kyung Hee University, Republic of Korea;1. Business School, Hohai University, Nanjing 211100, China;2. Foreign Language School, Hohai University, Nanjing 211100, China;1. Institute of Finance Engineering in School of Management/School of Emergency Management, Jinan University, Guangzhou 510632, China;2. School of Emergency Industry, Guangzhou Pearl-River College of Vocational Technology, Huizhou 516131, China;3. Guangdong Emergency Technology Research Center of Risk Evaluation and Prewarning on Public Network Security, Guangzhou 510632, China
Abstract:An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain applications like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classification, which categorizes text into structured categories known as text labeling or categorization. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a predictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号