首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

2.
The importance of query performance prediction has been widely acknowledged in the literature, especially for query expansion, refinement, and interpolating different retrieval approaches. This paper proposes a novel semantics-based query performance prediction approach based on estimating semantic similarities between queries and documents. We introduce three post-retrieval predictors, namely (1) semantic distinction, (2) semantic query drift, and (3) semantic cohesion based on (1) the semantic similarity of a query to the top-ranked documents compared to the whole collection, (2) the estimation of non-query related aspects of the retrieved documents using semantic measures, and (3) the semantic cohesion of the retrieved documents. We assume that queries and documents are modeled as sets of entities from a knowledge graph, e.g., DBPedia concepts, instead of bags of words. With this assumption, semantic similarities between two texts are measured based on the relatedness between entities, which are learned from the contextual information represented in the knowledge graph. We empirically illustrate these predictors’ effectiveness, especially when term-based measures fail to quantify query performance prediction hypotheses correctly. We report our findings on the proposed predictors’ performance and their interpolation on three standard collections, namely ClueWeb09-B, ClueWeb12-B, and Robust04. We show that the proposed predictors are effective across different datasets in terms of Pearson and Kendall correlation coefficients between the predicted performance and the average precision measured by relevance judgments.  相似文献   

3.
The study of query performance prediction (QPP) in information retrieval (IR) aims to predict retrieval effectiveness. The specificity of the underlying information need of a query often determines how effectively can a search engine retrieve relevant documents at top ranks. The presence of ambiguous terms makes a query less specific to the sought information need, which in turn may degrade IR effectiveness. In this paper, we propose a novel word embedding based pre-retrieval feature which measures the ambiguity of each query term by estimating how many ‘senses’ each word is associated with. Assuming each sense roughly corresponds to a Gaussian mixture component, our proposed generative model first estimates a Gaussian mixture model (GMM) from the word vectors that are most similar to the given query terms. We then use the posterior probabilities of generating the query terms themselves from this estimated GMM in order to quantify the ambiguity of the query. Previous studies have shown that post-retrieval QPP approaches often outperform pre-retrieval ones because they use additional information from the top ranked documents. To achieve the best of both worlds, we formalize a linear combination of our proposed GMM based pre-retrieval predictor with NQC, a state-of-the-art post-retrieval QPP. Our experiments on the TREC benchmark news and web collections demonstrate that our proposed hybrid QPP approach (in linear combination with NQC) significantly outperforms a range of other existing pre-retrieval approaches in combination with NQC used as baselines.  相似文献   

4.
The quality of feedback documents is crucial to the effectiveness of query expansion (QE) in ad hoc retrieval. Recently, machine learning methods have been adopted to tackle this issue by training classifiers from feedback documents. However, the lack of proper training data has prevented these methods from selecting good feedback documents. In this paper, we propose a new method, called AdapCOT, which applies co-training in an adaptive manner to select feedback documents for boosting QE’s effectiveness. Co-training is an effective technique for classification over limited training data, which is particularly suitable for selecting feedback documents. The proposed AdapCOT method makes use of a small set of training documents, and labels the feedback documents according to their quality through an iterative process. Two exclusive sets of term-based features are selected to train the classifiers. Finally, QE is performed on the labeled positive documents. Our extensive experiments show that the proposed method improves QE’s effectiveness, and outperforms strong baselines on various standard TREC collections.  相似文献   

5.
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.  相似文献   

6.
This paper examines the factors affecting the performance of global query expansion based on term co-occurrence data and suggests a way to maximize the retrieval effectiveness. Major parameters to be optimized through experiments are term similarity measure and the weighting scheme of additional terms. The evaluation of four similarity measures tested in query expansion reveal that mutual information and Yule's Y, which emphasize low frequency terms, achieve better performance than cosine and Jaccard coefficients that have the reverse tendency. In the evaluation of three weighting schemes, similarity weight performs well only with short queries, whereas fixed weights of approximately 0.5 and similarity rank weights were effective with queries of any length. Furthermore, the optimal similarity rank weight achieving the best overall performance seems to be the least affected by test collections and the number of additional terms. For the efficiency of retrieval, the number of additional terms needs not exceed 70 in our test collections, but the optimal number may vary according to the characteristics of the similarity measure employed.  相似文献   

7.
Social media platforms allow users to express their opinions towards various topics online. Oftentimes, users’ opinions are not static, but might be changed over time due to the influences from their neighbors in social networks or updated based on arguments encountered that undermine their beliefs. In this paper, we propose to use a Recurrent Neural Network (RNN) to model each user’s posting behaviors on Twitter and incorporate their neighbors’ topic-associated context as attention signals using an attention mechanism for user-level stance prediction. Moreover, our proposed model operates in an online setting in that its parameters are continuously updated with the Twitter stream data and can be used to predict user’s topic-dependent stance. Detailed evaluation on two Twitter datasets, related to Brexit and US General Election, justifies the superior performance of our neural opinion dynamics model over both static and dynamic alternatives for user-level stance prediction.  相似文献   

8.
Since previous studies in cognitive psychology show that individuals’ affective states can help analyze and predict their future behaviors, researchers have explored emotion mining for predicting online activities, firm profitability, and so on. Existing emotion mining methods are divided into two categories: feature-based approaches that rely on handcrafted annotations and deep learning-based methods that thrive on computational resources and big data. However, neither category can effectively detect emotional expressions captured in text (e.g., social media postings). In addition, the utilization of these methods in downstream explanatory and predictive applications is also rare. To fill the aforementioned research gaps, we develop a novel deep learning-based emotion detector named DeepEmotionNet that can simultaneously leverage contextual, syntactic, semantic, and document-level features and lexicon-based linguistic knowledge to bootstrap the overall emotion detection performance. Based on three emotion detection benchmark corpora, our experimental results confirm that DeepEmotionNet outperforms state-of-the-art baseline methods by 4.9% to 29.8% in macro-averaged F-score. For the downstream application of DeepEmotionNet to a real-world financial application, our econometric analysis highlights that top executives’ emotions of fear and anger embedded in their social media postings are significantly associated with corporate financial performance. Furthermore, these two emotions can significantly improve the predictive power of corporate financial performance when compared to sentiments. To the best of our knowledge, this is the first study to develop a deep learning-based emotion detection method and successfully apply it to enhance corporate performance prediction.  相似文献   

9.
如何有效实现XML数据的存储、查询及更新等操作是XML原生数据库管理技术中的重要领域.本文基于DTD与XPath树模型,对XPath查询表达式本身的优化进行了研究;通过对变量的绑定,重新构造XPath查询树来减轻查询的复杂度;最后对于XPath表达式中的无效查询部分,通过DTD模式树来消除并构造新的有效的XPath树模型.  相似文献   

10.
This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.  相似文献   

11.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

12.
13.
14.
Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312–318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283–289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21; Yu, C., & Salton, G. (1976). Precision weighting – an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76–88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.  相似文献   

15.
 探索专用性程度不同的人力资本对突破性创新能力和渐进性创新能力的影响、两种不同的技术创新能力对新产品开发绩效的影响以及专用性程度不同的人力资本在创新能力与绩效之间的调节作用。以我国157家企业为样本,以企业规模、R&D投入、环境的不确定性、需求不确定性和竞争强度作为控制变量,构建理论模型并进行检验。研究结果表明:人力资本的专用性越强,渐进性产品创新能力而非突破性产品创新能力越强。然而人力资本专用性较弱并不负面影响突破性产品创新能力对新产品绩效的效果,相反,对于渐进性产品创新能力来说,弱专用性人力资本有利于新产品绩效而强专用性人力资本不利于新产品绩效。  相似文献   

16.
Taxonomies enable organising information in a human–machine understandable form, but constructing them for reuse and maintainability remains difficult. The paper presents a formal underpinning to provide quality metrics for a taxonomy under development. It proposes a methodology for semi-automatic building of maintainable taxonomies and outlines key features of the knowledge engineering context where the metrics and methodology are most suitable. The strength of the approach presented is that it is applied during the actual construction of the taxonomy. Users provide terms to describe different domain elements, as well as their attributes, and methodology uses metrics to assess the quality of this input. Changes according to given quality constraints are then proposed during the actual development of the taxonomy.  相似文献   

17.
18.
给出了多连接查询优化问题的计算模型,分析了免疫遗传算法的基本原理,提出将免疫遗传算法应用于多连接查询优化问题。针对多连接查询优化问题的具体特点,给出了免疫遗传算法的设计,包括亲和度、适应度函数的设计,基于抗体浓度的选择算子、交叉算子、变异算子的设计,免疫算子的设计。  相似文献   

19.
Modeling the temporal context efficiently and effectively is essential to provide useful recommendations to users. In this work, we focus on improving neighborhood-based approaches where we integrate three different mechanisms to exploit temporal information. We first present an improved version of a similarity metric between users using a temporal decay function, then, we propose an adaptation of the Longest Common Subsequence algorithm to be used as a time-aware similarity metric, and we also redefine the neighborhood-based recommenders to be interpreted as ranking fusion techniques where the neighbor interaction sequence can be exploited by considering the last common interaction between the neighbor and the user.We demonstrate the effectiveness of these approaches by comparing them with other state-of-the-art recommender systems such as Matrix Factorization, Neural Networks, and Markov Chains under two realistic time-aware evaluation methodologies (per user and community-based). We use several evaluation metrics to measure both the quality of the recommendations – in terms of ranking relevance – and their temporal novelty or freshness. According to the obtained results, our proposals are highly competitive and obtain better results than the rest of the analyzed algorithms, producing improvements under the two evaluation dimensions tested consistently through three real-world datasets.  相似文献   

20.
This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号