期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Click data as implicit relevance feedback in web search

Seikyung Jung Jonathan L. Herlocker Janet Webster 《Information processing & management》2007

Search sessions consist of a person presenting a query to a search engine, followed by that person examining the search results, selecting some of those search results for further review, possibly following some series of hyperlinks, and perhaps backtracking to previously viewed pages in the session. The series of pages selected for viewing in a search session, sometimes called the click data, is intuitively a source of relevance feedback information to the search engine. We are interested in how that relevance feedback can be used to improve the search results quality for all users, not just the current user. For example, the search engine could learn which documents are frequently visited when certain search queries are given. 相似文献

2.

Automatic new topic identification using multiple linear regression

Seda Ozmutlu 《Information processing & management》2006

The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine queries on new topic identification. By applying multiple linear regression and multi-factor ANOVA on a sample data log from the Excite search engine, we demonstrated that the statistical characteristics of Web search queries, such as time interval, search pattern and position of a query in a user session, are effective on shifting to a new topic. Multiple linear regression is also a successful tool for estimating topic shifts and continuations. The findings of this study provide statistical proof for the relationship between the non-semantic characteristics of Web search queries and the occurrence of topic shifts and continuations. 相似文献

3.

Children's query types and reformulations in Google search

Dania Bilal Jacek Gwizdka 《Information processing & management》2018,54(6):1022-1041

We investigated the searching behaviors of twenty-four children in grades 6, 7, and 8 (ages 11–13) in finding information on three types of search tasks in Google. Children conducted 72 search sessions and issued 150 queries. Children's phrase- and question-like queries combined were much more prevalent than keyword queries (70% vs. 30%, respectively). Fifty two percent of the queries were reformulations (33 sessions). We classified children's query reformulation types into five classes based on the taxonomy by Liu et al. (2010). We found that most query reformulations were by Substitution and Specialization, and that children hardly repeated queries. We categorized children's queries by task facets and examined the way they expressed these facets in their query formulations and reformulations. Oldest children tended to target the general topic of search tasks in their queries most frequently, whereas younger children expressed one of the two facets more often. We assessed children's achieved task outcomes using the search task outcomes measure we developed. Children were mostly more successful on the fact-finding and fully self-generated task and partially successful on the research-oriented task. Query type, reformulation type, achieved task outcomes, and expressing task facets varied by task type and grade level. There was no significant effect of query length in words or of the number of queries issued on search task outcomes. The study findings have implications for human intervention, digital literacy, search task literacy, as well as for system intervention to support children's query formulation and reformulation during interaction with Google. 相似文献

4.

on the inclusiveness of systems for retrieval documents indexed by unweighted descriptors

Tadeusz Radecki 《Information processing & management》1981,17(5):227-237

相似文献

5.

GTE-Rank: A time-aware search engine to answer time-sensitive queries

《Information processing & management》2016,52(2):273-298

In the web environment, most of the queries issued by users are implicit by nature. Inferring the different temporal intents of this type of query enhances the overall temporal part of the web search results. Previous works tackling this problem usually focused on news queries, where the retrieval of the most recent results related to the query are usually sufficient to meet the user's information needs. However, few works have studied the importance of time in queries such as “Philip Seymour Hoffman” where the results may require no recency at all. In this work, we focus on this type of queries named “time-sensitive queries” where the results are preferably from a diversified time span, not necessarily the most recent one. Unlike related work, we follow a content-based approach to identify the most important time periods of the query and integrate time into a re-ranking model to boost the retrieval of documents whose contents match the query time period. For that purpose, we define a linear combination of topical and temporal scores, which reflects the relevance of any web document both in the topical and temporal dimensions, thus contributing to improve the effectiveness of the ranked results across different types of queries. Our approach relies on a novel temporal similarity measure that is capable of determining the most important dates for a query, while filtering out the non-relevant ones. Through extensive experimental evaluation over web corpora, we show that our model offers promising results compared to baseline approaches. As a result of our investigation, we publicly provide a set of web services and a web search interface so that the system can be graphically explored by the research community. 相似文献

6.

Efficient query-by-example spoken document retrieval combining phone multigram representation and dynamic time warping

Paula Lopez-Otero Javier Parapar Alvaro Barreiro 《Information processing & management》2019,56(1):43-60

相似文献

7.

How doctors search: A study of query behaviour and the impact on search results

Marianne Lykke Susan Price Lois Delcambre 《Information processing & management》2012

Professional, workplace searching is different from general searching, because it is typically limited to specific facets and targeted to a single answer. We have developed the semantic component (SC) model, which is a search feature that allows searchers to structure and specify the search to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome. In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by the well-known vocabulary problem. The problem was exacerbated by using certain filters as Boolean filters. The best working queries were structured into 2–3 main facets out of 3–5 possible search facets, and expressed with terms reflecting the focal view of the search task. The findings at the same time support and extend previous results about query structure and exhaustivity showing the importance of selecting central search facets and express them from the perspective of search task. The SC model was applied in the highest performing queries except one. The findings suggest that the model might be a helpful feature to structure queries into central, appropriate facets, and in returning highly relevant documents. 相似文献

8.

How are we searching the World Wide Web? A comparison of nine search engine transaction logs

Bernard J. Jansen Amanda Spink 《Information processing & management》2006

The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content. 相似文献

9.

A Wikipedia powered state-based approach to automatic search query enhancement

Kyle Goslin Markus Hofmann 《Information processing & management》2018,54(4):726-739

This paper describes the development and testing of a novel Automatic Search Query Enhancement (ASQE) algorithm, the Wikipedia N Sub-state Algorithm (WNSSA), which utilises Wikipedia as the sole data source for prior knowledge. This algorithm is built upon the concept of iterative states and sub-states, harnessing the power of Wikipedia’s data set and link information to identify and utilise reoccurring terms to aid term selection and weighting during enhancement. This algorithm is designed to prevent query drift by making callbacks to the user’s original search intent by persisting the original query between internal states with additional selected enhancement terms. The developed algorithm has shown to improve both short and long queries by providing a better understanding of the query and available data. The proposed algorithm was compared against five existing ASQE algorithms that utilise Wikipedia as the sole data source, showing an average Mean Average Precision (MAP) improvement of 0.273 over the tested existing ASQE algorithms. 相似文献

10.

Evaluating WordBars in exploratory Web search scenarios

Orland Hoeber Xue Dong Yang 《Information processing & management》2008

Web searchers commonly have difficulties crafting queries to fulfill their information needs; even after they are able to craft a query, they often find it challenging to evaluate the results of their Web searches. Sources of these problems include the lack of support for constructing and refining queries, and the static nature of the list-based representations of Web search results. WordBars has been developed to assist users in their Web search and exploration tasks. This system provides a visual representation of the frequencies of the terms found in the first 100 document surrogates returned from an initial query, in the form of a histogram. Exploration of the search results is supported through term selection in the histogram, resulting in a re-sorting of the search results based on the use of the selected terms in the document surrogates. Terms from the histogram can be easily added or removed from the query, generating a new set of search results. Examples illustrate how WordBars can provide valuable support for query refinement and search results exploration, both when vague and specific initial queries are provided. User evaluations with both expert and intermediate Web searchers illustrate the benefits of the interactive exploration features of WordBars in terms of effectiveness as well as subjective measures. Although differences were found in the demographics of these two user groups, both were able to benefit from the features of WordBars. 相似文献

11.

Learning from homologous queries and semantically related terms for query auto completion

《Information processing & management》2016,52(4):628-643

Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today’s QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today’s QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries.We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models. 相似文献

12.

Analysis of large data logs: an application of Poisson sampling on excite web queries

《Information processing & management》2002,38(4):473-490

Search engines are the gateway for users to retrieve information from the Web. There is a crucial need for tools that allow effective analysis of search engine queries to provide a greater understanding of Web users' information seeking behavior. The objective of the study is to develop an effective strategy for the selection of samples from large-scale data sets. Millions of queries are submitted to Web search engines daily and new sampling techniques are required to bring these databases to a manageable size, while preserving the statistically representative characteristics of the entire data set. This paper reports results from a study using data logs from the Excite Web search engine. We use Poisson sampling to develop a sampling strategy, and show how sample sets selected by Poisson sampling statistically effectively represent the characteristics of the entire dataset. In addition, this paper discusses the use of Poisson sampling in continuous monitoring of stochastic processes, such as Web site dynamics. 相似文献

13.

Characteristics of question format web queries: an exploratory study

《Information processing & management》2002,38(4):453-471

Web queries in question format are becoming a common element of a user's interaction with Web search engines. Web search services such as Ask Jeeves – a publicly accessible question and answer (Q&A) search engine – request users to enter question format queries. This paper provides results from a study examining queries in question format submitted to two different Web search engines – Ask Jeeves that explicitly encourages queries in question format and the Excite search service that does not explicitly encourage queries in question format. We identify the characteristics of queries in question format in two different data sets: (1) 30,000 Ask Jeeves queries and 15,575 Excite queries, including the nature, length, and structure of queries in question format. Findings include: (1) 50% of Ask Jeeves queries and less than 1% of Excite were in question format, (2) most users entered only one query in question format with little query reformulation, (3) limited range of formats for queries in question format – mainly “where”, “what”, or “how” questions, (4) most common question query format was “Where can I find………” for general information on a topic, and (5) non-question queries may be in request format. Overall, four types of user Web queries were identified: keyword, Boolean, question, and request. These findings provide an initial mapping of the structure and content of queries in question and request format. Implications for Web search services are discussed. 相似文献

14.

基于“谷歌趋势”数据的入境外国游客量预测

沈苏彦赵锦徐坚《资源科学》2015,37(11):2111-2119

入境游客量的预测是制定旅游发展规划和相关政策法规的重要依据。基于“谷歌趋势”提供的涉及旅游活动食、住、行、游、购、娱等环节的相关关键词搜索数据,通过计算相关系数,找出与国家旅游局公布的2004年1月至2015年3月中国入境外国游客量统计数据密切相关的搜索关键词。同时,利用2004年1月至2012年12月的入境外国游客量数据构建一般季节性乘积ARIMA模型,以及带搜索关键词作为自变量的季节性乘积ARIMA模型,分别对2013年1月至2015年3月入境外国游客量进行模拟预测,比较两模型的拟合程度和预测能力。研究发现：加入谷歌关键词作为自变量的季节性乘积ARIMA模型比一般季节性乘积ARIMA模型拟合效果和预测精度高,而中国签证政策与航班信息均对入境外国游客量有显著的影响。相似文献

15.

Answering recreational web searches with relevant things to do results

《Information processing & management》2020,57(2):102184

Recreational queries from users searching for places to go and things to do or see are very common in web and mobile search. Users specify constraints for what they are looking for, like suitability for kids, romantic ambiance or budget. Queries like “restaurants in New York City” are currently served by static local results or the thumbnail carousel. More complex queries like “things to do in San Francisco with kids” or “romantic places to eat in Seattle” require the user to click on every element of the search engine result page to read articles from Yelp, TripAdvisor, or WikiTravel to satisfy their needs. Location data, which is an essential part of web search, is even more prevalent with location-based social networks and offers new opportunities for many ways of satisfying information seeking scenarios.In this paper, we address the problem of recreational queries in information retrieval and propose a solution that combines search query logs with LBSNs data to match user needs and possible options. At the core of our solution is a framework that combines social, geographical, and temporal information for a relevance model centered around the use of semantic annotations on Points of Interest with the goal of addressing these recreational queries. A central part of the framework is a taxonomy derived from behavioral data that drives the modeling and user experience. We also describe in detail the complexity of assessing and evaluating Point of Interest data, a topic that is usually not covered in related work, and propose task design alternatives that work well.We demonstrate the feasibility and scalability of our methods using a data set of 1B check-ins and a large sample of queries from the real-world. Finally, we describe the integration of our techniques in a commercial search engine. 相似文献

16.

Time-series classification with SAFE: Simple and fast segmented word embedding-based neural time series classifier

《Information processing & management》2022,59(5):103044

Dictionary-based classifiers are an essential group of approaches in the field of time series classification. Their distinctive characteristic is that they transform time series into segments made of symbols (words) and then classify time series using these words. Dictionary-based approaches are suitable for datasets containing time series of unequal length. The prevalence of dictionary-based methods inspired the research in this paper. We propose a new dictionary-based classifier called SAFE. The new approach transforms the raw numeric data into a symbolic representation using the Simple Symbolic Aggregate approXimation (SAX) method. We then partition the symbolic time series into a sequence of words. Then we employ the word embedding neural model known in Natural Language Processing to train the classifying mechanism. The proposed scheme was applied to classify 30 benchmark datasets and compared with a range of state-of-the-art time series classifiers. The name SAFE comes from our observation that this method is safe to use. Empirical experiments have shown that SAFE gives excellent results: it is always in the top 5%–10% when we rank the classification accuracy of state-of-the-art algorithms for various datasets. Our method ranks third in the list of state-of-the-art dictionary-based approaches (after the WEASEL and BOSS methods). 相似文献

17.

Multitasking during Web search sessions

Amanda Spink Minsoo Park Bernard J. Jansen Jan Pedersen 《Information processing & management》2006

A user’s single session with a Web search engine or information retrieval (IR) system may consist of seeking information on single or multiple topics, and switch between tasks or multitasking information behavior. Most Web search sessions consist of two queries of approximately two words. However, some Web search sessions consist of three or more queries. We present findings from two studies. First, a study of two-query search sessions on the AltaVista Web search engine, and second, a study of three or more query search sessions on the AltaVista Web search engine. We examine the degree of multitasking search and information task switching during these two sets of AltaVista Web search sessions. A sample of two-query and three or more query sessions were filtered from AltaVista transaction logs from 2002 and qualitatively analyzed. Sessions ranged in duration from less than a minute to a few hours. Findings include: (1) 81% of two-query sessions included multiple topics, (2) 91.3% of three or more query sessions included multiple topics, (3) there are a broad variety of topics in multitasking search sessions, and (4) three or more query sessions sometimes contained frequent topic changes. Multitasking is found to be a growing element in Web searching. This paper proposes an approach to interactive information retrieval (IR) contextually within a multitasking framework. The implications of our findings for Web design and further research are discussed. 相似文献

18.

Automatic suggestion of phrasal-concept queries for literature search

Youngho Kim Jangwon Seo W. Bruce CroftDavid A. Smith 《Information processing & management》2014

Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments. 相似文献

19.

Document replication strategies for geographically distributed web search engines

Enver Kayaaslan B. Barla Cambazoglu Cevdet Aykanat 《Information processing & management》2013

Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. 相似文献

20.

Querying XML documents using Prolog engines: When is this a good idea?

《Information processing & management》2019,56(5):1753-1770

XML has become a universal standard for information exchange over the Web due to features such as simple syntax and extensibility. Processing queries over these documents has been the focus of several research groups. In fact, there is broad literature in efficient XML query processing which explore indexes, fragmentation techniques, etc. However, for answering complex queries, existing approaches mainly analyze information that is explicitly defined in the XML document. A few work investigate the use of Prolog to increase the query possibilities, allowing inference over the data content. This can cause a significant increase in the query possibilities and expressive power, allowing access to non-obvious information. However, this requires translating the XML documents into Prolog facts. But for regular queries (which do not require inference), is this a good alternative? What kind of queries could benefit from the Prolog translation? Can we always use Prolog engines to execute XML queries in an efficient way? There are many questions involved in adopting an alternative approach to run XML queries. In this work, we investigate this matter by translating XML queries into Prolog queries and comparing the query processing times using Prolog and native XML engines. Our work contributes by providing a set of heuristics that helps users to decide when to use Prolog engines to process a given XML query. In summary, our results show that queries that search elements by a key value or by its position (simple search) are more efficient when run in Prolog than in native XML engines. Also, queries over large datasets, or that searches for substrings perform better when run by native XML engines. 相似文献