首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Advanced learning algorithms for cross-language patent retrieval and classification
Authors:Yaoyong Li  John Shawe-Taylor
Institution:1. Department of Computer Science, The University of Sheffield, Regent Court, 211, Portobello Street, Sheffield S1 4DP, UK;2. Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
Abstract:We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese–English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.
Keywords:Machine learning  Cross-language patent retrieval  Cross-language document classification
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号