首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Transaction logs of NAVER, a major Korean Web search engine, were analyzed to track the information-seeking behavior of Korean Web users. These transaction logs include more than 40 million queries collected over 1 week. This study examines current transaction log analysis methodologies and proposes a method for log cleaning, session definition, and query classification. A term definition method which is necessary for Korean transaction log analysis is also discussed. The results of this study show that users behave in a simple way: they type in short queries with a few query terms, seldom use advanced features, and view few results' pages. Users also behave in a passive way: they seldom change search environments set by the system. It is of interest that users tend to change their queries totally rather than adding or deleting terms to modify the previous queries. The results of this study might contribute to the development of more efficient and effective Web search engines and services.  相似文献   

2.
The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

3.
4.
It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here we describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries are discussed and our solutions are outlined.  相似文献   

5.
中国互联网用户网络使用行为分析   总被引:1,自引:0,他引:1  
我国目前有互联网用户6800万人,其网络使用行为大体可分为通信、信息浏览、下载信息和购物4大类。影响用户使用行为的因素主要有:用户特征、网络易用性及其娱乐性、实用性等。这些影响因素与用户网络行为间有着必然的联系。图1。表2。参考文献13。  相似文献   

6.
The Internet continues to grow as an information and entertainment medium. Internet growth has implications for the news industry. Twenty-four hour news networks such as CNN and MSNBC regularly encourage viewers of their television programs to visit their Web sites. While visiting news Web sites, visitors are invited to participate in opinion polls. Unfortunately, these online opinion polls are not scientific and have little real news value. In spite of these limitations, news Web sites' Internet polls are often treated as serious topics in broadcast news discussions. This article examines media organizations' Internet online polls and critiques them as instances of symbolic representation and pseudo-events that have arisen largely out of the integration of print, broadcast, and Internet media.  相似文献   

7.
User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.  相似文献   

8.
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.  相似文献   

9.
This study examined ten, selected word pairs, each containing a word's full spelling and its abbreviation, to determine which form search engine users preferred in searching. Using seven search logs gathered from several Internet search engines with approximately 608 MB of data, the study measured the occurrences of the twenty terms. The selected words are important in library cataloging, for some are prescribed abbreviations in metadata content standards. The study found that in eight of the ten word pairs users preferred to search words’ full spellings over the abbreviations, often by a high margin.  相似文献   

10.
The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.  相似文献   

11.
12.
13.
Word form normalization through lemmatization or stemming is a standard procedure in information retrieval because morphological variation needs to be accounted for and several languages are morphologically non-trivial. Lemmatization is effective but often requires expensive resources. Stemming is also effective in most contexts, generally almost as good as lemmatization and typically much less expensive; besides it also has a query expansion effect. However, in both approaches the idea is to turn many inflectional word forms to a single lemma or stem both in the database index and in queries. This means extra effort in creating database indexes. In this paper we take an opposite approach: we leave the database index un-normalized and enrich the queries to cover for surface form variation of keywords. A potential penalty of the approach would be long queries and slow processing. However, we show that it only matters to cover a negligible number of possible surface forms even in morphologically complex languages to arrive at a performance that is almost as good as that delivered by stemming or lemmatization. Moreover, we show that, at least for typical test collections, it only matters to cover nouns and adjectives in queries. Furthermore, we show that our findings are particularly good for short queries that resemble normal searches of web users. Our approach is called FCG (for Frequent Case (form) Generation). It can be relatively easily implemented for Latin/Greek/Cyrillic alphabet languages by examining their (typically very skewed) nominal form statistics in a small text sample and by creating surface form generators for the 3–9 most frequent forms. We demonstrate the potential of our FCG approach for several languages of varying morphological complexity: Swedish, German, Russian, and Finnish in well-known test collections. Applications include in particular Web IR in languages poor in morphological resources.  相似文献   

14.
从Sogou查询日志中选取样本查询且进行人工标注,通过对标注后新闻查询的分析,提出能用于识别新闻意图的新特征,即查询表达式特征、查询随时间分布特征以及点击结果特征。根据这3个特征,利用决策树分类器实现查询中新闻意图的自动识别,结果发现:①新闻类查询的查询目标主要集中在特定主题信息以及娱乐类信息方面,其查询主题大多为娱乐、政治、体育与经济类信息;②相对非新闻查询,新闻查询具有更可能包含实体、随时间分布波动较大、点击结果之间相似度更高的特点;③本方法对查询中新闻意图的识别效果较好,其宏平均准确率、召回率、F值分别为 0.76、0.73、0、74。  相似文献   

15.
王若佳  李培 《图书情报工作》2015,59(11):111-118
[目的/意义] 针对当前我国网络用户的健康信息检索行为, 探索利用中文搜索引擎的健康信息检索规律, 为完善健康搜索引擎和网站建设提供参考。[方法/过程] 基于搜狗搜索引擎的大规模查询日志, 采用日志挖掘的方法, 从查询行为和点击行为两个角度对网络用户的健康信息检索行为进行研究。查询行为的研究指标包括会话层(会话长度、用户重复查询), 查询串层(查询串长度、重复查询)和词项层(高频词汇, 主题分类);点击行为的研究指标为点击位置和点击内容。[结果/结论] 健康相关查询的重复率较高, 提示相关网站可缓存高重复率查询串的返回结果;大众关注的热点领域为疾病、保健、母婴、医疗机构与美容整形, 提示网站的导航设计注意导航方向;用户更偏爱使用问答型平台, 提示网站设计者应更加关注与用户间问答型的互动模式。  相似文献   

16.
Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher’s current state.  相似文献   

17.
多会话网络购物商品信息搜寻行为研究   总被引:1,自引:0,他引:1  
[目的/意义] 研究用户在多会话网购过程中的信息浏览、检索行为及其行为序列特征,以期更好地理解用户的复杂网购行为,指导购物网站提高服务质量,改善用户体验。[方法/过程] 基于某电商网站1 993名用户的11 514个购物任务的网购访问日志,在识别多会话网购任务的基础上,对用户在经多个会话进行网购过程中的信息搜寻行为进行统计分析,并利用顺序分析和聚类分析方法挖掘其典型的行为模式。[结果/结论] 当会话数量为8个及以下时,用户的浏览和搜索行为呈现出明显的规律性变化,且前4个会话发生时是用户做出购物决策的关键阶段;用户在多会话网购过程中存在6种典型的信息搜寻行为模式,分别有不同的信息搜寻行为特征。理解用户的复杂网购行为,可为电商网站设计导航和推荐策略、制定营销方案提供依据。  相似文献   

18.
Users often issue all kinds of queries to look for the same target due to the intrinsic ambiguity and flexibility of natural languages. Some previous work clusters queries based on co-clicks; however, the intents of queries in one cluster are not that similar but roughly related. It is desirable to conduct automatic mining of queries with equivalent intents from a large scale search logs. In this paper, we take account of similarities between query strings. There are two issues associated with such similarities: it is too costly to compare any pair of queries in large scale search logs, and two queries with a similar formulation, such as “SVN” (Apache Subversion) and support vector machine (SVM), are not necessarily similar in their intents. To address these issues, we propose using the similarities of query strings above the co-click based clustering results. Our method improves precision over the co-click based clustering method (lifting precision from 0.37 to 0.62), and outperforms a commercial search engine’s query alteration (lifting \(F_1\) measure from 0.42 to 0.56). As an application, we consider web document retrieval. We aggregate similar queries’ click-throughs with the query’s click-throughs and evaluate them on a large scale dataset. Experimental results indicate that our proposed method significantly outperforms the baseline method of using a query’s own click-throughs in all metrics.  相似文献   

19.
文章介绍了四川师范大学图书馆外文期刊订购原则,分析了2003-2006年外文期刊订购情况,提出了2007年外文期刊订购的新策略。  相似文献   

20.
ABSTRACT

Although there is a proliferation of information available on the Web, and law professors, students, and other users have a variety of channels to locate information and complete their research activities, the law library catalog still remains an important source for offering users access to information that has been evaluated and cataloged by experts. The usability of the catalog needs to be effectively measured before any necessary improvements can be made. This study was undertaken to investigate the information retrieval patterns of users of the Rutgers Law Library Online Public Access Catalog and to develop the catalog into a more effective search tool for these users. This study used an experimental approach to measure the usability of our catalog by analyzing the transaction logs from the OPAC system and the results from Google Analytics. The findings provided not only important information on user demographics and their computer systems, but also more insight on the search behaviors of users. The specific findings included the following:
  1. As a Web-analytic tool Google Analytics provided extensive information on the OPAC and the navigational behaviors of users.

  2. Fifty-eight percent of our users visited the Web site regularly.

  3. The most popular search method, which was employed by 37% of our users, was by title.

  4. Most patrons used computer systems with a high resolution and color depth monitor and visited the catalog Web site with a high-speed Internet connection.

  5. Suggestions were made by the authors to improve the users’ search experience of the catalog Web site.

This study is significant to libraries with Web catalogs because it demonstrates the potential value of using Google Analytics as a Web analytics tool in combination with the OPAC transaction logs to measure catalog usability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号