首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
A growing body of studies is developing approaches to evaluating human interaction with Web search engines, including the usability and effectiveness of Web search tools. This study explores a user-centered approach to the evaluation of the Web search engine Inquirus – a Web meta-search tool developed by researchers from the NEC Research Institute. The goal of the study reported in this paper was to develop a user-centered approach to the evaluation including: (1) effectiveness: based on the impact of users' interactions on their information problem and information seeking stage, and (2) usability: including screen layout and system capabilities for users. Twenty-two volunteers searched Inquirus on their own personal information topics. Data analyzed included: (1) user pre- and post-search questionnaires and (2) Inquirus search transaction logs. Key findings include: (1) Inquirus was rated highly by users on various usability measures, (2) all users experienced some level of shift/change in their information problem, information seeking, and personal knowledge due to their Inquirus interaction, (3) different users experienced different levels of change/shift, and (4) the search measure precision did not correlate with other user-based measures. Some users experienced major changes/shifts in various user-based variables, such as information problem or information seeking stage with a search of low precision and vice versa. Implications for the development of user-centered approaches to the evaluation of Web and information retrieval (IR) systems and further research are discussed.  相似文献   

2.
Indexing quality determines whether the information content of an indexed document is accurately represented. Indexing effectiveness measures whether an indexed document is correctly retrieved every time it is relevant to a query. Measurement of these criteria is cumbersome and costly; data base producers therefore prefer inter-indexer consistency as a measure of indexing quality or effectiveness. The present article assesses the validity of this substitution in various environments.  相似文献   

3.
A general method is presented to construct ordered similarity measures (OS-measures), i.e., similarity measures for ordered sets of documents (as, e.g., being the result of an IR-process), based on classical, well-known similarity measures for ordinary sets (measures such as Jaccard, Dice, Cosine or overlap measures). To this extent, we first present a review of these measures and their relationships.The method given here to construct OS-measures extends the one given by Michel in a previous paper so that it becomes applicable on any pair of ordered sets. Concrete expressions of this method, applied to the classical similarity measures, are given.Some of these measures are then tested in the IR-system Profil-Doc. The engine SPIRIT© extracts ranked document sets in three different contexts, each for 550 requests. The practical usability of the OS-measures is then discussed based on these experiments.  相似文献   

4.
The presentation of search results on the web has been dominated by the textual form of document representation. On the other hand, the document’s visual aspects such as the layout, colour scheme, or presence of images have been studied in a limited context with regard to their effectiveness of search result presentation. This article presents a comparative evaluation of textual and visual forms of document representation as additional components of document surrogates. A total of 24 people were recruited for our task-based user study. The experimental results suggest that an increased level of document representation available in the search results can facilitate users’ interaction with a search interface. The results also suggest that the two forms of additional representations are likely beneficial to users’ information searching process in different contexts.  相似文献   

5.
Previous studies have repeatedly demonstrated that the relevance of a citing document is related to the number of times with which the source document is cited. Despite the ease with which electronic documents would permit the incorporation of this information into citation-based document search and retrieval systems, the possibilities of repeated citations remain untapped. Part of this under-utilization may be due to the fact that very little is known regarding the pattern of repeated citations in scholarly literature or how this pattern may vary as a function of journal, academic discipline or self-citation. The current research addresses these unanswered questions in order to facilitate the future incorporation of repeated citation information into document search and retrieval systems. Using data mining of electronic texts, the citation characteristics of nine different journals, covering the three different academic fields (economics, computing, and medicine & biology), were characterized. It was found that the frequency (f) with which a reference is cited N or more times within a document is consistent across the sampled journals and academic fields. Self-citation causes an increase in frequency, and this effect becomes more pronounced for large N. The objectivity, automatability, and insensitivity of repeated citations to journal and discipline, present powerful opportunities for improving citation-based document search.  相似文献   

6.
Methods for document clustering and topic modelling in online social networks (OSNs) offer a means of categorising, annotating and making sense of large volumes of user generated content. Many techniques have been developed over the years, ranging from text mining and clustering methods to latent topic models and neural embedding approaches. However, many of these methods deliver poor results when applied to OSN data as such text is notoriously short and noisy, and often results are not comparable across studies. In this study we evaluate several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit. We benchmark four different feature representations derived from term-frequency inverse-document-frequency (tf-idf) matrices and word embedding models combined with four clustering methods, and we include a Latent Dirichlet Allocation topic model for comparison. Several different evaluation measures are used in the literature, so we provide a discussion and recommendation for the most appropriate extrinsic measures for this task. We also demonstrate the performance of the methods over data sets with different document lengths. Our results show that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures. We also demonstrate a method for interpreting the clusters with a top-words based approach using tf-idf weights combined with embedding distance measures.  相似文献   

7.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

8.
Multimedia objects can be retrieved using their context that can be for instance the text surrounding them in documents. This text may be either near or far from the searched objects. Our goal in this paper is to study the impact, in term of effectiveness, of text position relatively to searched objects. The multimedia objects we consider are described in structured documents such as XML ones. The document structure is therefore exploited to provide this text position in documents. Although structural information has been shown to be an effective source of evidence in textual information retrieval, only a few works investigated its interest in multimedia retrieval. More precisely, the task we are interested in this paper is to retrieve multimedia fragments (i.e. XML elements having at least one multimedia object). Our general approach is built on two steps: we first retrieve XML elements containing multimedia objects, and we then explore the surrounding information to retrieve relevant multimedia fragments. In both cases, we study the impact of the surrounding information using the documents structure.  相似文献   

9.
Information systems for un-regimented domains such as museums, art and book collections, face representational and usability challenges that surpass the demands of traditional information systems for regimented domains. While the former require complex conceptual models supporting a set of dynamic and evolving qualitative properties of a small number of objects, the latter focus on the quantitative aspects of a possibly very large number of objects but with a relatively small and stable set of properties. In this paper we study the use of a non-monotonic knowledge-base system for the development of information systems for un-regimented domains. We discuss the ontological assumptions of the formalism, its structure and its inferential mechanisms through a simple example. Then we present an information system for a highly un-regimented domain in the digital humanities with promising results. The present study shows that the so-called extensible, flexible, dynamic or evolving information systems need the expressive power of non-monotonic knowledge-base systems, and that such phenomena should be addressed explicitly.  相似文献   

10.
This paper examines whether there are significant differences in private R&D investment performance between the EU and the US and, if so, why. The study is based on data from the 2008 EU Industrial R&D Investment Scoreboard. The investigation assesses the effects of three very distinct factors that can determine the relative size of the overall R&D intensities of the two economies: these are the influence of sector composition (structural effect) vis-à-vis the intensity of R&D in each sector (intrinsic effect) and company demographics. The paper finds that the lower overall corporate R&D intensity for the EU is the result of sector specialisation (structural effect) - the US has a stronger sectoral specialisation in the high R&D intensity (especially ICT-related) sectors than the EU does, and also has a much larger population of R&D investing firms within these sectors. Since aggregate R&D indicators are so closely dependent on industrial structures, many of the debates and claims about differences in comparative R&D performance are in effect about industrial structure rather than sectoral R&D performance. These have complex policy implications that are discussed in the closing section.  相似文献   

11.
This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information displayed in the cross-language document summaries. Our results challenge two implicit assumptions in most of cross-language Information Retrieval research: first, that once documents in the target language are found, Machine Translation is the optimal way of informing the user about their contents; and second, that in an interactive setting the optimal way of formulating and refining the query is helping the user to choose appropriate translations for the query terms.  相似文献   

12.
《Research Policy》2022,51(8):104170
Knowledge creation is widely considered as the central driver for innovation, and accordingly, for creating competitive advantage. However, most measurement approaches have so far mainly focused on the quantitative dimension of knowledge creation, neglecting that not all knowledge has the same value (Balland and Rigby, 2017). The notion of knowledge complexity has come into use in this context just recently as an attempt to measure the quality of knowledge in terms of its uniqueness and its replicability. The central underlying assumption is that more complex knowledge is more difficult to be replicated, and therefore provides a higher competitive advantage for firms, or at an aggregated level, regions and countries. The objective of this study is to advance and apply measures for regional knowledge complexity to a set of European regions, and to highlight its potential in a regional policy context. This is done by, first, characterising the spatial distribution of complex knowledge in Europe and its dynamics in recent years, second, establishing that knowledge complexity is associated with future regional economic growth, and third, illustrating the usefulness of the measures by means of some policy relevant example applications. We proxy the production of complex knowledge with a regional knowledge complexity index (KCI) that is based on regional patent data of European metropolitan regions from current EU and EFTA member countries. The results are promising as the regional KCI unveils knowledge creation patterns not observed by conventional measures. Moreover, regional complexity measures can be easily combined with relatedness metrics to support policy makers in a smart specialisation context.  相似文献   

13.
Learning low dimensional dense representations of the vocabularies of a corpus, known as neural embeddings, has gained much attention in the information retrieval community. While there have been several successful attempts at integrating embeddings within the ad hoc document retrieval task, yet, no systematic study has been reported that explores the various aspects of neural embeddings and how they impact retrieval performance. In this paper, we perform a methodical study on how neural embeddings influence the ad hoc document retrieval task. More specifically, we systematically explore the following research questions: (i) do methods solely based on neural embeddings perform competitively with state of the art retrieval methods with and without interpolation? (ii) are there any statistically significant difference between the performance of retrieval models when based on word embeddings compared to when knowledge graph entity embeddings are used? and (iii) is there significant difference between using locally trained neural embeddings compared to when globally trained neural embeddings are used? We examine these three research questions across both hard and all queries. Our study finds that word embeddings do not show competitive performance to any of the baselines. In contrast, entity embeddings show competitive performance to the baselines and when interpolated, outperform the best baselines for both hard and soft queries.  相似文献   

14.
Privacy-preserving collaborative filtering is an emerging web-adaptation tool to cope with information overload problem without jeopardizing individuals’ privacy. However, collaborative filtering with privacy schemes commonly suffer from scalability and sparseness as the content in the domain proliferates. Moreover, applying privacy measures causes a distortion in collected data, which in turn defects accuracy of such systems. In this work, we propose a novel privacy-preserving collaborative filtering scheme based on bisecting k-means clustering in which we apply two preprocessing methods. The first preprocessing scheme deals with scalability problem by constructing a binary decision tree through a bisecting k-means clustering approach while the second produces clones of users by inserting pseudo-self-predictions into original user profiles to boost accuracy of scalability-enhanced structure. Sparse nature of collections are handled by transforming ratings into item features-based profiles. After analyzing our scheme with respect to privacy and supplementary costs, we perform experiments on benchmark data sets to evaluate it in terms of accuracy and online performance. Our empirical outcomes verify that combined effects of the proposed preprocessing schemes relieve scalability and augment accuracy significantly.  相似文献   

15.
16.
The large amount of information available and the difficulty on processing it has made knowledge management a promising area of research. Several topics are related to it, for example distributed and intelligent information retrieval, information filtering and information evaluation, which became crucial. In this paper, we focus our attention on the knowledge evaluation problem. With the aim of evaluating information coded in the standard non-proprietary format SGML (as also in XML), we propose some evaluation methods based on L-grammars which are fuzzy grammars. In particular we apply these methods to the evaluation of documents in SGML-format and to the evaluation of HTML-pages in the World Wide Web. L-grammars generate recursively enumerable L-languages, as it has been proved in Gerla ((1991), Information Sciences 53), and so they can be used to generate fuzzy languages based on extensions of the document type definitions (DTD) involved by SGML. Given a DTD, we extend its associated language by adding a judgement label. By selecting a particular label and by taking the start symbol of the grammar associated to the DTD, we can generate any DTD-compliant document with a fuzzy degree of membership derived from the judgement label. In this way we fit the computational model underlying the recursively enumerable L-languages to the process of collecting different evaluations of the same document. Finally, we outline how the generalization of these methods of evaluation can be applied in different contexts and for different roles, as for example for information filtering.  相似文献   

17.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

18.
Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities.  相似文献   

19.
Conventional information retrieval technology (i.e. VSM) faces many difficulties when being implemented in complex P2P systems for the lack of global statistic information (e.g. IDF) and central services. In this paper, we suggest a novel query optimization scheme (Semantic Dual Query Expansion, SDQE) that makes full use of the context information supplied by the local document collection. Latent Semantic Indexing (LSI) is used to explore the local context information. By comparing the different local context information hidden in different document collections, it is possible to solve the synonymy–polysemy problem in VSM. The experiments prove that our scheme is effective to improve the retrieval performance in P2P systems without knowing the global statistic information.  相似文献   

20.
Integrating useful input information is essential to provide efficient recommendations to users. In this work, we focus on improving items ratings prediction by merging both multiple contexts and multiple criteria based research directions which were addressed separately in most existent literature. Throughout this article, Criteria refer to the items attributes, while Context denotes the circumstances in which the user uses an item. Our goal is to capture more fine grained preferences to improve items recommendation quality using users’ multiple criteria ratings under specific contextual situations. Therefore, we examine the recommenders’ data from the graph theory based perspective by representing three types of entities (users, contextual situations and criteria) as well as their relationships as a tripartite graph. Upon the assumption that contextually similar users tend to have similar interests for similar item criteria, we perform a high-order co-clustering on the tripartite graph for simultaneously partitioning the graph entities representing users in similar contextual situations and their evaluated item criteria. To predict cluster-based multi-criteria ratings, we introduce an improved rating prediction method that considers the dependency between users and their contextual situations, and also takes into account the correlation between criteria in the prediction process. The predicted multi-criteria ratings are finally aggregated into a single representative output corresponding to an overall item rating. To guide our investigation, we create a research hypothesis to provide insights about the tripartite graph partitioning and design clear and justified preliminary experiments including quantitative and qualitative analyzes to validate it. Further thorough experiments on the two available context-aware multi-criteria datasets, TripAdvisor and Educational, demonstrate that our proposal exhibits substantial improvements over alternative recommendations approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号