首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Syntactic complexity of Web search queries through the lenses of language models,networks and users
Institution:1. Databases and Information Systems Group, Max Planck Institute for Informatics, Saarbrücken, Germany;2. Experian PLC, Cyberjaya, Malaysia;3. Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India;4. Multilingual Systems Research Group, Microsoft Research India, Bangalore, India;1. Institute of Computing, Federal University of Amazonas, AM, Brazil;2. Department of Computer Science, Federal University of Minas Gerais, MG, Brazil;3. Institute of Computing, University of Campinas, SP, Brazil;1. Universitat Politècnica de València, 46022 Valencia, Spain;2. Sciling, 46022 Valencia, Spain;3. Brown University, Providence, RI 02912, United States;4. École Polytechnique de Montréal, QC 06079, Canada;1. Universidad Técnica Federico Santa María, Santiago, Chile;2. Universidad de Santiago de Chile, Santiago, Chile;3. CONICET, Universidad Nacional de San Luis, Argentina;4. Software Competence Center Hagenberg, Austria;1. Aix-Marseille Université, CNRS, Univ. Toulon, ENSAM (LSIS, UMR 7296), France;2. LIMSI, CNRS, Univ. Paris-Sud, Université Paris-Saclay, France;3. LIA, Université d’Avignon, France;4. IRIT UMR5505 CNRS, ESPE UT2J, Université de Toulouse, France;1. Department of Information Management, National Sun Yat-Sen University, No. 70, Lienhai Rd., Kaohsiung 80424, Taiwan;2. School of Information Sciences, University of Pittsburgh, 135 North Bellefield Avenue, Pittsburgh, PA 15260, USA
Abstract:Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号