首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Two-stage statistical language models for text database selection
Authors:Hui Yang  Minjie Zhang
Institution:(1) School of Information Technology and Computer Science, University of Wollongong, Wollongong, 2500, Australia
Abstract:As the number and diversity of distributed Web databases on the Internet exponentially increase, it is difficult for user to know which databases are appropriate to search. Given database language models that describe the content of each database, database selection services can provide assistance in locating databases relevant to the information needs of users. In this paper, we propose a database selection approach based on statistical language modeling. The basic idea behind the approach is that, for databases that are categorized into a topic hierarchy, individual language models are estimated at different search stages, and then the databases are ranked by the similarity to the query according to the estimated language model. Two-stage smoothed language models are presented to circumvent inaccuracy due to word sparseness. Experimental results demonstrate that such a language modeling approach is competitive with current state-of-the-art database selection approaches.
Keywords:Database language model  Text database selection  Distributed information retrieval  Hierarchical topics  Statistical language modeling  Query expansion
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号