Web、语料库与双语平行语料库的建设 Web, Corpus and the Building of Bilingual Parallel Corpora期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Web、语料库与双语平行语料库的建设

引用本文：	熊文新.Web、语料库与双语平行语料库的建设[J].图书情报工作,2013,57(10):128-135.

作者姓名：	熊文新

作者单位：	北京外国语大学中国外语教育研究中心

基金项目：	教育部人文社会科学研究项目"基于语料库及对应词表的英语特异组合研究",国家社会科学基金项目"服务信息检索的自然语言"

摘要：	对Web和语料库以及多语语料库的关系进行辨析,针对Web上丰富的各类电子文本,从语言工程角度出发,提出"分步骤、按领域"建设大规模双语平行语料库的思路,即选定领域专一、语言可靠、格式规范的文本,逐次建设特定领域的语料库,最后汇总成高质量、大规模、全领域的"高大全"式双语平行语料库。同时,围绕一个实例介绍如何利用Web资源建设特定领域双语平行语料库。
关键词：	Web 语料库子语言双语平行语料库语言资源
收稿时间：	2013-01-28
Web, Corpus and the Building of Bilingual Parallel Corpora

Xiong Wenxin.Web, Corpus and the Building of Bilingual Parallel Corpora[J].Library and Information Service,2013,57(10):128-135.

Authors:	Xiong Wenxin

Institution:	National Research Center for Foreign Language Education, Beijing Foreign Studies University, Beijing 100089

Abstract:	There are different understandings of Web as corpus. We try to explore the relations between Web, corpus and bilingual parallel corpora. Inspired by the rich electronic texts available on World Wide Web, and the strategy of sublanguage in language engineering, we propose a solution to building a large-scale bilingual parallel corpus, by accumulating homogeneous documents in different domains. The large amount of texts with high quality on a restricted domain collected at each step eventually constitutes a massive general-purpose balanced data warehouse. An example is elaborated to show how to construct a domain-specific bilingual parallel corpus from the Web.

Keywords:	Web corpus sublanguage bilingual parallel corpora language resource
本文献已被万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏