首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper examines the meaning of context in relation to ontology based query expansion and contains a review of query expansion approaches. The various query expansion approaches include relevance feedback, corpus dependent knowledge models and corpus independent knowledge models. Case studies detailing query expansion using domain-specific and domain-independent ontologies are also included. The penultimate section attempts to synthesise the information obtained from the review and provide success factors in using an ontology for query expansion. Finally the area of further research in applying context from an ontology to query expansion within a newswire domain is described.  相似文献   

2.
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.  相似文献   

3.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

4.
This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.  相似文献   

5.
Recent studies suggest that significant improvement in information retrieval performance can be achieved by combining multiple representations of an information need. The paper presents a genetic approach that combines the results from multiple query evaluations. The genetic algorithm aims to optimise the overall relevance estimate by exploring different directions of the document space. We investigate ways to improve the effectiveness of the genetic exploration by combining appropriate techniques and heuristics known in genetic theory or in the IR field. Indeed, the approach uses a niching technique to solve the relevance multimodality problem, a relevance feedback technique to perform genetic transformations on query formulations and evolution heuristics in order to improve the convergence conditions of the genetic process. The effectiveness of the global approach is demonstrated by comparing the retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation performed on a subset of TREC-4 using the Mercure IRS. Moreover, experimental results show the positive effect of the various techniques integrated to our genetic algorithm model.  相似文献   

6.
提出一种基于遗传规划的合成孔径雷达图像识别方法.首先提取SAR图像的5种特征作为原始特征,然后利用遗传规划算法在5种原始特征上合成新的特征,最后采用支持向量机进行分类.实验结果表明了算法的有效性.  相似文献   

7.
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. In our experiments of query expansion using these five similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval effectiveness, whereas, NMI and ACP are better in terms of execution efficiency.  相似文献   

8.
The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but choosing the right terms for expansion from the standpoint of the un-modeled query semantics remains an open issue. In this paper we propose a novel query expansion method using word embeddings that models the global query semantics from the standpoint of prospect vocabulary terms. The proposed method allows to explore query-vocabulary semantic closeness in such a way that new terms, semantically related to more relevant topics, are elicited and added in function of the query as a whole. The method includes candidates pooling strategies that address disambiguation issues without using exogenous resources. We tested our method with three topic sets over CLEF corpora and compared it across different Information Retrieval models and against another expansion technique using word embeddings as well. Our experiments indicate that our method achieves significant results that outperform the baselines, improving both recall and precision metrics without relevance feedback.  相似文献   

9.
给出了多连接查询优化问题的计算模型,分析了免疫遗传算法的基本原理,提出将免疫遗传算法应用于多连接查询优化问题。针对多连接查询优化问题的具体特点,给出了免疫遗传算法的设计,包括亲和度、适应度函数的设计,基于抗体浓度的选择算子、交叉算子、变异算子的设计,免疫算子的设计。  相似文献   

10.
This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.  相似文献   

11.
12.
针对XML数据的关键字查询问题,考查了已有的查询技术的优势和不足,提出了基于语义的XML关键字检索算法。对用户输入的关键字进行分类,分为条件关键字和结果关键字。条件关键字只用于限定查询范围,不出现在结果集中。给出了语义相关节点对的概念和判定方法,并提出了基于关键字分类和语义相关节点对的XML数据查询算法。  相似文献   

13.
Think tanks have been proved helpful for decision-making in various communities. However, collecting information manually for think tank construction implies too much time and labor cost as well as inevitable subjectivity. A probable solution is to retrieve webpages of renowned experts and institutes similar to a given example, denoted as query by webpage (QBW). Considering users’ searching behaviors, a novel QBW model based on webpages’ visual and textual features is proposed. Specifically, a visual feature extraction module based on pre-trained neural networks and a heuristic pooling scheme is proposed, which bridges the gap that existing extractors fail to extract snapshots’ high-level features and are sensitive to the noise effect brought by images. Moreover, a textual feature extraction module is proposed to represent textual content in both term and topic grains, while most existing extractors merely focus on the term grain. In addition, a series of similarity metrics are proposed, including a textual similarity metric based on feature bootstrapping to improve model’s robustness and an adaptive weighting scheme to balance the effect of different types of features. The proposed QBW model is evaluated on expert and institute introduction retrieval tasks in academic and medical scenarios, in which the average value of MAP has been improved by 10% compared to existing baselines. Practically, useful insights can be derived from this study for various applications involved with webpage retrieval besides think tank construction.  相似文献   

14.
基于移动技术的电子商务智能终端的探讨   总被引:1,自引:0,他引:1  
梁立  宋玲  张翠 《大众科技》2014,(8):43-44
文章主要对基于移动技术的电子商务智能终端进行探讨,首先提出电子商务智能终端的概念,然后阐述中小企业应用电子商务的优势和目前仍存在的问题,最后提出基于GPRS/SMS移动技术的电子商务智能终端平台的模型架构。  相似文献   

15.
16.
陈傲  孙兆刚 《科技管理研究》2006,26(9):208-211,221
本文分析了电子商务环境下供应链的变革,指出电子商务环境下供应链合作伙伴选择的新特点;在此基础上,对传统合作伙伴评价体系设计的一些缺陷进行了系统分析,同时依托资源理论,把电子商务这一环境变量纳入评价指标的研究范围,构建了一套基于电子商务供应链合作伙伴评价体系;最后,对所有投入产出指标进行了量化研究。  相似文献   

17.
田毕飞  戴露露 《科研管理》2019,40(9):149-158
本文构建了基于跨境电商的国际创业路径模型,并采用跨案例研究法对鼎龙股份、中瑞思创、青岛金王三家企业的国际创业路径进行了剖析,提出了技术型、市场型与技术市场整合型三种国际创业路径。本文认为,中国新创企业应根据技术、市场、资金三方面的能力强弱,借助跨境电商优化自身相对较弱的国际创业要素,选择适合自己的国际创业路径。  相似文献   

18.
如何有效实现XML数据的存储、查询及更新等操作是XML原生数据库管理技术中的重要领域.本文基于DTD与XPath树模型,对XPath查询表达式本身的优化进行了研究;通过对变量的绑定,重新构造XPath查询树来减轻查询的复杂度;最后对于XPath表达式中的无效查询部分,通过DTD模式树来消除并构造新的有效的XPath树模型.  相似文献   

19.
Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312–318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283–289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21; Yu, C., & Salton, G. (1976). Precision weighting – an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76–88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.  相似文献   

20.
Student success is crucial to the process of building a Cross-border e-commerce (CBEC) talent development platform. Analysis of the important aspects impacting performance and performance prediction are carried out with the goal of enhancing students' academic outcomes. To better forecast student outcomes, a logistic regression model is used for factor analysis, and a penalty function is implemented. Parameters are reconciled using K-fold cross-validation, and then estimated using the coordinate descent technique. Model performance validation findings indicated that the Area Under the curve (AUC) for the minimax concave penalty (MCP) and smoothlyclippedabsolutedeviation(SCAD) penalized logistic regression models were 0.772 and 0.771, respectively. Both the MCP and SCAD penalized logistic regression models have overall accuracies of 0.738 and 0.739, respectively. Researchers found that for MCP, the correlation coefficient was 0.99949, and for SCAD, it was 0.99958, between the projected value and the anticipated value of students' performance. Due to their superior prediction accuracy, the MCP and SCAD penalized logistic regression models may be used as analytical tools in the development of the CBEC talent training platform.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号