共查询到20条相似文献,搜索用时 15 毫秒
1.
Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result sets in order to to help users find the person of interest more readily. In the task of name disambiguation, effective measurement of similarities in the documents is a crucial step towards the final disambiguation. We propose a new method that uses web directories as a knowledge base to find common contexts in documents and uses the common contexts measure to determine document similarities. Experiments, conducted on documents mentioning real people on the web, together with several famous web directory structures, suggest that there are significant advantages in using web directories to disambiguate people compared with other conventional methods. 相似文献
2.
In this paper, we present a novel clustering algorithm to generate a number of candidate clusters from other web search results. The candidate clusters generate a connective relation among the clusters and the relation is semantic. Moreover, the algorithm also contains the following attractive properties: (1) it can be applied to multilingual web documents, (2) it improves the clustering performance of any search engine, (3) its unsupervised learning can automatically identify potentially relevant knowledge without using any corpus, and (4) clustering results are generated on the fly and fitted into search engines. 相似文献
3.
We present the Permutation Prefix Index (this work is a revised and extended version of Esuli (2009b), presented at the 2009 LSDS-IR Workshop, held in Boston) (PP-Index), an index data structure that supports efficient approximate similarity search. 相似文献
4.
Search engine researchers typically depict search as the solitary activity of an individual searcher. In contrast, results from our critical-incident survey of 150 users on Amazon’s Mechanical Turk service suggest that social interactions play an important role throughout the search process. A second survey of also 150 users, focused instead on difficulties encountered during searches, suggests similar conclusions. These social interactions range from highly coordinated collaborations with shared goals to loosely coordinated collaborations in which only advice is sought. Our main contribution is that we have integrated models from previous work in sensemaking and information-seeking behavior to present a canonical social model of user activities before, during, and after a search episode, suggesting where in the search process both explicitly and implicitly shared information may be valuable to individual searchers. 相似文献
5.
An integrated information retrieval system generally contains multiple databases that are inconsistent in terms of their content and indexing. This paper proposes a rough set-based transfer (RST) model for integration of the concepts of document databases using various indexing languages, so that users can search through the multiple databases using any of the current indexing languages. The RST model aims to effectively create meaningful transfer relations between the terms of two indexing languages, provided a number of documents are indexed with them in parallel. In our experiment, the indexing concepts of two databases respectively using the Thesaurus of Social Science (IZ) and the Schlagwortnormdatei (SWD) are integrated by means of the RST model. Finally, this paper compares the results achieved with a cross-concordance method, a conditional probability based method and the RST model. 相似文献
6.
Lori Lorigo Bing Pan Helene Hembrooke Thorsten Joachims Laura Granka Geri Gay 《Information processing & management》2006
To improve search engine effectiveness, we have observed an increased interest in gathering additional feedback about users’ information needs that goes beyond the queries they type in. Adaptive search engines use explicit and implicit feedback indicators to model users or search tasks. In order to create appropriate models, it is essential to understand how users interact with search engines, including the determining factors of their actions. Using eye tracking, we extend this understanding by analyzing the sequences and patterns with which users evaluate query result returned to them when using Google. We find that the query result abstracts are viewed in the order of their ranking in only about one fifth of the cases, and only an average of about three abstracts per result page are viewed at all. We also compare search behavior variability with respect to different classes of users and different classes of search tasks to reveal whether user models or task models may be greater predictors of behavior. We discover that gender and task significantly influence different kinds of search behaviors discussed here. The results are suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow. 相似文献
7.
Rong Qu Yongyi Fang Wen Bai Yuncheng Jiang 《Information processing & management》2018,54(6):1002-1021
Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI. 相似文献
8.
9.
《Information processing & management》2023,60(2):103202
Wikipedia links its articles by manually defined semantic relations called the Wikipedia hyperlink (link) structure. The existing Wikipedia link-based semantic similarity (SS) and semantic relatedness (SR) computation models, such as Wikipedia one-way link (WOLM) model and Wikipedia two-way link (WTLM) model, do not assess the strengths of the relationships between a candidate concept and its links (out-links or in-links). These models treat all the links as equally important even though some links are semantically more influential than others and should be given more importance. This phenomenon reduces the accuracy of these models. This paper presents the Wikipedia bi-linear link (WBLM) model that extends the previously proposed WOLM and WTLM models. The WBLM model explores the Wikipedia link structure as a semantic graph and discovers the strongly (bi-linear links) and weakly (out-links or in-links) connected links of a candidate concept. It improves the link-based vector representations of concepts by assigning weights to their connected links according to the strengths of their semantic associations. The experimental results demonstrate that the proposed WBLM model significantly improves the SS and SR computation accuracy of the WOLM model (6.9%, 8%, 24%, 17.3%, 31.2%, 30.6%, 26.5%, and 35.4%) and WTLM model (1.2%, 3.9%, 7.1%, 9.9%, 11%, 6.3%, 12.7%, and 13%), in terms of linear correlations with human judgments on gold standard benchmarks, including MC30, RG65, WS203, SimLex, 353All, MTurk287, MTurk771, and MEN3000, respectively. Moreover, this research offers a deep insight into the Wikipedia link structure and provides an adequate base for understanding it as a semantic graph. 相似文献
10.
Rocío L. Cecchini Carlos M. Lorenzetti Ana G. Maguitman Nlida Beatríz Brignole 《Information processing & management》2008,44(6):1863
Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant on the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve “good query terms” in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms. 相似文献
11.
B. Barla Cambazoglu Evren Karaca Tayfun Kucukyilmaz Ata Turk Cevdet Aykanat 《Information processing & management》2007
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architectural design issues and implementation details of this search engine. We conduct various experiments to illustrate performance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE. 相似文献
12.
13.
El-Sayed Atlam Ghada ElmarhomyAuthor VitaeKazuhiro MoritaAuthor Vitae Masao FuketaAuthor VitaeJun-ichi AoeAuthor Vitae 《Information processing & management》2006
With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically. 相似文献
14.
In this research, we evaluate the effect of gender targeted advertising on the performance of sponsored search advertising. We analyze nearly 7,000,000 records spanning 33 consecutive months of a keyword advertising campaign from a major US retailer. In order to determine the effect of demographic targeting, we classify the campaign’s key phrases by a probability of being targeted for a specific gender, and we then compare the key performance indicators among these groupings using the critical sponsored search metrics of impressions, clicks, cost-per-click, sales revenue, orders, and items, and return on advertising. Findings from our research show that the gender-orientation of the key phrase is a significant determinant in predicting behaviors and performance, with statistically different consumer behaviors for all attributes as the probability of a male or female keyword phrase changes. However, gender neutral phrases perform the best overall, generating 20 times the return of advertising than any gender targeted category. Insight from this research could result in sponsored advertising efforts being more effectively targeted to searchers and potential consumers. 相似文献
15.
In this study we attempt to answer two questions: Is there a natural way to classify projects and what are the specific factors that influence the success of various kinds of projects? Perhaps one of the major barriers to understanding the reasons behind the success of a project has been the lack of specificity of constructs applied in project management studies. Many studies of project success factors have used a universalistic approach, assuming a basic similarity among projects. Instead of presenting an initial construct, we have employed a linear discriminant analysis methodology in order to classify projects. Our results suggest that project success factors are not universal for all projects. Different projects exhibit different sets of success factors, suggesting the need for a more contingent approach in project management theory and practice. In the analysis we use multivariate methods which have been proven to be powerful in many ways, for example, enabling the ranking of different managerial factors according to their influence on project success. 相似文献
16.
This paper addresses the problem of how to rank retrieval systems without the need for human relevance judgments, which are very resource intensive to obtain. Using TREC 3, 6, 7 and 8 data, it is shown how the overlap structure between the search results of multiple systems can be used to infer relative performance differences. In particular, the overlap structures for random groupings of five systems are computed, so that each system is selected an equal number of times. It is shown that the average percentage of a system’s documents that are only found by it and no other systems is strongly and negatively correlated with its retrieval performance effectiveness, such as its mean average precision or precision at 1000. The presented method uses the degree of consensus or agreement a retrieval system can generate to infer its quality. This paper also addresses the question of how many documents in a ranked list need to be examined to be able to rank the systems. It is shown that the overlap structure of the top 50 documents can be used to rank the systems, often producing the best results. The presented method significantly improves upon previous attempts to rank retrieval systems without the need for human relevance judgments. This “structure of overlap” method can be of value to communities that need to identify the best experts or rank them, but do not have the resources to evaluate the experts’ recommendations, since it does not require knowledge about the domain being searched or the information being requested. 相似文献
17.
利用网络搜索关键词的搜索量变化来分析和预测相关事物发展趋势是一种逐渐被广泛关注的研究领域。提出网络搜索关键词时序变化特征包括领先、同步和滞后三种特征。通过采集搜索网站关键词的搜索量数据,针对分析预测对象进行时差相关分析,可以识别出相关关键词时序变化特征。通过H7 N9禽流感关键词时序变化特征识别实验,说明该方法的可行性。 相似文献
18.
Joy Iong-Zong Chen 《Journal of The Franklin Institute》2007,344(6):889-911
The average level crossing rate (LCR) and average fading duration (AFD) criterions are applied to analyze the selection combining (SC) diversity for wireless communication systems over correlated-Rayleigh and correlated-Rice fading in this paper. The scenarios of the fading channel models are characterized as 4 generalized and experimental distributions, e.g., Rayleigh, Rice, Nakagami-m, and Weibull distributed statistics. Moreover, both of independent and correlated proprieties between branches are also involved for consideration. For purpose of unifying and clarifying the average LCR and AFD performance formulas results from the evaluation for SC diversity, it is not only the results from our research presented, but some of the published results are also cited and illustrated by numerical evaluation again in this study. 相似文献
19.
Evaluation research on information retrieval (IR) systems has thus far been narrowly focused and disjointed. This research attempts to narrow the gap by providing a comprehensive and integrated multiple criteria decision-theoretic approach for the evaluation of IR systems. The approach, which is based on the Analytic Hierarchy Process (AHP), is illustrated in the context of a domain-specific IR system. The novelty of this approach lies in the focus on the user aspect and the application of decision-making theories in the IR field. 相似文献
20.
Shou-Tao Peng 《Journal of The Franklin Institute》2004,341(4):343-360
This paper presents an approach to the modification of a class of Lyapunov-based robust controllers when the input constraint needs to be taken into account. The approach shows advantages in enhancing the input utilization and in retaining the stability and the robustness of the original control. The modification comprises two stages. The first stage is to reshape the original control for satisfying the constraint and preserving the original control direction. The second stage is to apply a structure for enhancing the input utilization and retaining the stability and robustness developed in the first stage. In addition, an estimate of the stabilization region is employed to select the design parameters for the local, semiglobal, and global stabilization. 相似文献