首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Building parallel corpora by automatic title alignment using length-based and text-based approaches
Authors:Christopher C Yang  Kar Wing Li
Institution:Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Ho Sin Hang Engineering Building, Shatin, NT, Hong Kong
Abstract:Cross-lingual semantic interoperability has drawn significant attention in recent digital library and World Wide Web research as the information in languages other than English has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish, and French, has been widely explored; however, CLIR across European languages and Oriental languages is still in the initial stage. To cross language boundary, corpus-based approach is promising to overcome the limitation of the knowledge-based and controlled vocabulary approaches but collecting parallel corpora between European language and Oriental language is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches and compare their performances in aligning English and Chinese titles of parallel documents available on the Web.
Keywords:Cross-lingual information retrieval  Parallel corpus  Sentence alignment  Covert translation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号