首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The problem of results merging in distributed information retrieval environments has gained significant attention the last years. Two generic approaches have been introduced in research. The first approach aims at estimating the relevance of the documents returned from the remote collections through ad hoc methodologies (such as weighted score merging, regression etc.) while the other is based on downloading all the documents locally, completely or partially, in order to calculate their relevance. Both approaches have advantages and disadvantages. Download methodologies are more effective but they pose a significant overhead on the process in terms of time and bandwidth. Approaches that rely solely on estimation on the other hand, usually depend on document relevance scores being reported by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging algorithms, need a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced is based on adaptively downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. Thus it reconciles the above two approaches, combining their strengths, while minimizing their drawbacks, achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download. The proposed algorithm is tested in a variety of settings and its performance is found to be significantly better than the former, while approximating that of the latter.  相似文献   

2.
In this paper, we propose a re-ranking algorithm using post-retrieval clustering for content-based image retrieval (CBIR). In conventional CBIR systems, it is often observed that images visually dissimilar to a query image are ranked high in retrieval results. To remedy this problem, we utilize the similarity relationship of the retrieved results via post-retrieval clustering. In the first step of our method, images are retrieved using visual features such as color histogram. Next, the retrieved images are analyzed using hierarchical agglomerative clustering methods (HACM) and the rank of the results is adjusted according to the distance of a cluster from a query. In addition, we analyze the effects of clustering methods, query-cluster similarity functions, and weighting factors in the proposed method. We conducted a number of experiments using several clustering methods and cluster parameters. Experimental results show that the proposed method achieves an improvement of retrieval effectiveness of over 10% on average in the average normalized modified retrieval rank (ANMRR) measure.  相似文献   

3.
This paper investigates an evolving split-complex valued neuro-fuzzy (SCVNF) algorithm for Takagi–Sugeno–Kang (TSK) system. In a bid to avoid the contradiction between boundedness and analyticity, splitting technique is traditionally employed to independently process the real part and the imaginary part of weight parameters in the system, which doubles weight dimension and causes oversized structure. For improving efficiency of structural optimization, previous studies have revealed that L1/2-norm regularizer can be effective in such sparse tasks thus is regarded as a representative of Lq (0?<?q?<?1) regularizer. To eliminate oscillation phenomenon and stabilize training procedure, a smoothed L1/2 regularizer learning is facilitated by smoothing the original one at the origin flexibly. It is rigorously proved that the real-valued cost function is monotonic decreasing during learning course, and the sum of gradient norm trends closer to zero. Plus some very general condition, the weight sequence itself is also convergent to a fixed point. Experimental results for the SCVNF are demonstrated, which match the theoretical analysis.  相似文献   

4.
This paper solves a data-driven control problem for a flow-based distribution network with two objectives: a resource allocation and a fair distribution of costs. These objectives represent both cooperation and competition directions. It is proposed a solution that combines either a centralized or distributed cooperative game approach using the Shapley value to determine a proper partitioning of the system and a fair communication cost distribution. On the other hand, a decentralized non-cooperative game approach computing the Nash equilibrium is used to achieve the control objective of the resource allocation under a non-complete information topology. Furthermore, an invariant-set property is presented and the closed-loop system stability is analyzed for the non-cooperative game approach. Another contribution regarding the cooperative game approach is an alternative way to compute the Shapley value for the proposed specific characteristic function. Unlike the classical cooperative-games approach, which has a limited application due to the combinatorial explosion issues, the alternative method allows calculating the Shapley value in polynomial time and hence can be applied to large-scale problems.  相似文献   

5.
Using the Krylov–Bogoliubov method for obtaining analytical solutions to systems with small non-linearities, a procedure is employed to determine the initial amplitude and phase in terms of the initial displacement and velocity. Equations representing the time rate of change of amplitude and phase are used directly. Whether the corresponding linear equations of the non-linear system has purely imaginary, complex conjugate or real roots, the same procedure can be applied.An example is given which demonstrates the initial amplitude and phase change for various higher order approximations.  相似文献   

6.
孙明轩 《科技通报》1996,12(3):152-156
通过对l1模指标的平滑处理,构造了平滑近似指标下的递推辨识算法,这种算法实时计算负担小,且具有良好算法性质,数值仿真结果表明,它仍蕴含原指标意义下的鲁棒估计性质。  相似文献   

7.
Smartphones have gained significant popularity. With the rising concerns of compulsive smartphone use, understanding how smartphone users develop compulsive behaviors is crucial. In this study, we aim to investigate the role of flow in the formation of compulsive smartphone use. Drawing upon the flow theory, we incorporate the psychological state of flow as a key factor in our research model. We identify its determinants based on the desirability–feasibility perspective and reinforcement sensitivity theory. We empirically test our model by conducting an online survey with 384 valid responses. We expect that our findings can provide noteworthy insights on the formation of compulsive smartphone use.  相似文献   

8.
Median based smoothing algorithms have received considerable attention in the last few years. Their properties make them sometimes superior to linear smoothers. In this paper we develop an expression for the bivariate distribution of a median- smoothed Markov chain and we illustrate one application of it by comparing the power spectra of the input and the output of a median smoother when the input is binary valued.  相似文献   

9.
As an information medium, video offers many possible retrieval and browsing modalities, far more than text, image or audio. Some of these, like searching the text of the spoken dialogue, are well developed, others like keyframe browsing tools are in their infancy, and others not yet technically achievable. For those modalities for browsing and retrieval which we cannot yet achieve we can only speculate as to how useful they will actually be, but we do not know for sure. In our work we have created a system to support multiple modalities for video browsing and retrieval including text search through the spoken dialogue, image matching against shot keyframes and object matching against segmented video objects. For the last of these, automatic segmentation and tracking of video objects is a computationally demanding problem which is not yet solved for generic natural video material, and when it is then it is expected to open up possibilities for user interaction with objects in video, including searching and browsing. In this paper we achieve object segmentation by working in a closed domain of animated cartoons. We describe an interactive user experiment on a medium-sized corpus of video where we were able to measure users’ use of video objects versus other modes of retrieval during multiple-iteration searching. Results of this experiment show that although object searching is used far less than text searching in the first iteration of a user’s search it is a popular and useful search type once an initial set of relevant shots have been found.  相似文献   

10.
Semi-supervised document retrieval   总被引:2,自引:0,他引:2  
This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.  相似文献   

11.
Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user’s information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model.  相似文献   

12.
本文就为实现机捡系统能多途径、多角度检索文献的问题,阐述了如何调整机读数据分类标引规则。经过调整克服了仅标引一个分类号,提供的检索途径十分有限的弊端,增加了对文献多重学科或专业属性的揭示。可以经多途径、从不同角度检索得到,提高文献的利用率。  相似文献   

13.
In a hierarchical XML structure, surrounding elements form the context of an XML element. In document-oriented XML, the context is a part of the semantics of the element and augments its textual information. The process of taking the context of the element into account in element scoring is called contextualization. This study extends the concept of contextualization and presents a classification of contextualization models. In an XML collection, elements are of different granularity, i.e. lower level elements are shorter and carry less textual information. Thus, it seems credible that contextualization interacts differently with diverse elements. Even if it is known that contextualization leads to improved effectiveness in element retrieval, the improvement on different granularity levels has not been investigated. This study explores the effect of contextualization on these levels. Further, a parameterized framework for testing contextualization is presented.  相似文献   

14.
Topic distillation is one of the main information needs when users search the Web. Previous approaches for topic distillation treat single page as the basic searching unit, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named sub-site retrieval, in which the basic searching unit is sub-site instead of single page. A sub-site is the subset of a website, consisting of a structural collection of pages. The key of sub-site retrieval includes (1) extracting effective features for the representation of a sub-site using both the content and structure information, (2) delivering the sub-site-based retrieval results with a friendly and informative user interface. For the first point, we propose Punished Integration algorithm, which is based on the modeling of the growth of websites. For the second point, we design a user interface to better illustrate the search results of sub-site retrieval. Testing on the topic distillation task of TREC 2003 and 2004, sub-site retrieval leads to significant improvement of retrieval performance over the previous methods based on single pages. Furthermore, time complexity analysis shows that sub-site retrieval can be integrated into the index component of search engines.  相似文献   

15.
16.
虽然传统的向量空间模型被誉为检索模型中最富有想象力和创造力的模型,但是它同时也存在着一些考虑不足的地方,如未考虑文档结构、文档类型等问题。本文就这些问题给予分析并给出了相应的改进方法,最后构建了一个改进后的向量空间模型。  相似文献   

17.
Facet-based opinion retrieval from blogs   总被引:1,自引:0,他引:1  
The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback–Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.  相似文献   

18.
Due to their ready availability, database management systems are being applied to bibliographic databases with increasing frequency. This is being done in spite of the fact that although DBMS query languages tend to be very powerful, they are far too complex for the casual user. It is proposed that PSI, an existing virtual-system intermediary for document retrieval systems, be extended to include access to DBMS containing bibliographic data in order to circumvent the complexity problem or the casual user. PSI currently provides a common command language for access to multiple document retrieval systems. It is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network.  相似文献   

19.
A method using the amount of semantic information of query terms as weight in a fuzzy relation of resemblance is presented. The relation can be used to partially order documents in decreasing order of resemblance with the query. Large operational bibliographic data bases are used to test the validity of the approach.  相似文献   

20.
Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions. The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights. Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号