期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase

Georgios Paltoglou Michail Salampasis Maria Satratzemi 《Information processing & management》2008

The problem of results merging in distributed information retrieval environments has gained significant attention the last years. Two generic approaches have been introduced in research. The first approach aims at estimating the relevance of the documents returned from the remote collections through ad hoc methodologies (such as weighted score merging, regression etc.) while the other is based on downloading all the documents locally, completely or partially, in order to calculate their relevance. Both approaches have advantages and disadvantages. Download methodologies are more effective but they pose a significant overhead on the process in terms of time and bandwidth. Approaches that rely solely on estimation on the other hand, usually depend on document relevance scores being reported by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging algorithms, need a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced is based on adaptively downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. Thus it reconciles the above two approaches, combining their strengths, while minimizing their drawbacks, achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download. The proposed algorithm is tested in a variety of settings and its performance is found to be significantly better than the former, while approximating that of the latter. 相似文献

2.

Re-ranking algorithm using post-retrieval clustering for content-based image retrieval

《Information processing & management》2005,41(2):177-194

In this paper, we propose a re-ranking algorithm using post-retrieval clustering for content-based image retrieval (CBIR). In conventional CBIR systems, it is often observed that images visually dissimilar to a query image are ranked high in retrieval results. To remedy this problem, we utilize the similarity relationship of the retrieved results via post-retrieval clustering. In the first step of our method, images are retrieved using visual features such as color histogram. Next, the retrieved images are analyzed using hierarchical agglomerative clustering methods (HACM) and the rank of the results is adjusted according to the distance of a cluster from a query. In addition, we analyze the effects of clustering methods, query-cluster similarity functions, and weighting factors in the proposed method. We conducted a number of experiments using several clustering methods and cluster parameters. Experimental results show that the proposed method achieves an improvement of retrieval effectiveness of over 10% on average in the average normalized modified retrieval rank (ANMRR) measure. 相似文献

3.

Smoothed L1/2 regularizer learning for split-complex valued neuro-fuzzy algorithm for TSK system and its convergence results

Yan Liu Dakun Yang Feng Li 《Journal of The Franklin Institute》2018,355(13):6132-6151

This paper investigates an evolving split-complex valued neuro-fuzzy (SCVNF) algorithm for Takagi–Sugeno–Kang (TSK) system. In a bid to avoid the contradiction between boundedness and analyticity, splitting technique is traditionally employed to independently process the real part and the imaginary part of weight parameters in the system, which doubles weight dimension and causes oversized structure. For improving efficiency of structural optimization, previous studies have revealed that L_1/2-norm regularizer can be effective in such sparse tasks thus is regarded as a representative of L_q (0?<?q?<?1) regularizer. To eliminate oscillation phenomenon and stabilize training procedure, a smoothed L_1/2 regularizer learning is facilitated by smoothing the original one at the origin flexibly. It is rigorously proved that the real-valued cost function is monotonic decreasing during learning course, and the sum of gradient norm trends closer to zero. Plus some very general condition, the weight sequence itself is also convergent to a fixed point. Experimental results for the SCVNF are demonstrated, which match the theoretical analysis. 相似文献

4.

Non-centralized control for flow-based distribution networks: A game-theoretical insight

Julian Barreiro-Gomez Carlos Ocampo-Martinez Nicanor Quijano José M. Maestre 《Journal of The Franklin Institute》2017,354(14):5771-5796

This paper solves a data-driven control problem for a flow-based distribution network with two objectives: a resource allocation and a fair distribution of costs. These objectives represent both cooperation and competition directions. It is proposed a solution that combines either a centralized or distributed cooperative game approach using the Shapley value to determine a proper partitioning of the system and a fair communication cost distribution. On the other hand, a decentralized non-cooperative game approach computing the Nash equilibrium is used to achieve the control objective of the resource allocation under a non-complete information topology. Furthermore, an invariant-set property is presented and the closed-loop system stability is analyzed for the non-cooperative game approach. Another contribution regarding the cooperative game approach is an alternative way to compute the Shapley value for the proposed specific characteristic function. Unlike the classical cooperative-games approach, which has a limited application due to the combinatorial explosion issues, the alternative method allows calculating the Shapley value in polynomial time and hence can be applied to large-scale problems. 相似文献

5.

A procedure to obtain the initial amplitude and phase for the Krylov–Bogoliubov method

R.A. Rink 《Journal of The Franklin Institute》1977,303(1):59-65

Using the Krylov–Bogoliubov method for obtaining analytical solutions to systems with small non-linearities, a procedure is employed to determine the initial amplitude and phase in terms of the initial displacement and velocity. Equations representing the time rate of change of amplitude and phase are used directly. Whether the corresponding linear equations of the non-linear system has purely imaginary, complex conjugate or real roots, the same procedure can be applied.An example is given which demonstrates the initial amplitude and phase change for various higher order approximations. 相似文献

6.

平滑l1模递推辨识算法

孙明轩《科技通报》1996,12(3):152-156

通过对ｌ１模指标的平滑处理，构造了平滑近似指标下的递推辨识算法，这种算法实时计算负担小，且具有良好算法性质，数值仿真结果表明，它仍蕴含原指标意义下的鲁棒估计性质。相似文献

7.

Understanding compulsive smartphone use: An empirical test of a flow-based model

《International Journal of Information Management》2017,37(5):438-454

Smartphones have gained significant popularity. With the rising concerns of compulsive smartphone use, understanding how smartphone users develop compulsive behaviors is crucial. In this study, we aim to investigate the role of flow in the formation of compulsive smartphone use. Drawing upon the flow theory, we incorporate the psychological state of flow as a key factor in our research model. We identify its determinants based on the desirability–feasibility perspective and reinforcement sensitivity theory. We empirically test our model by conducting an online survey with 384 valid responses. We expect that our findings can provide noteworthy insights on the formation of compulsive smartphone use. 相似文献

8.

The Bivariate Distribution of a Median Smoothed Markov Chain

Federico Kuhlmann Gary L. Wise 《Journal of The Franklin Institute》1982,313(2):107-118

Median based smoothing algorithms have received considerable attention in the last few years. Their properties make them sometimes superior to linear smoothers. In this paper we develop an expression for the bivariate distribution of a median- smoothed Markov chain and we illustrate one application of it by comparing the power spectra of the input and the output of a median smoother when the input is binary valued. 相似文献

9.

A usage study of retrieval modalities for video shot retrieval

Alan F. Smeaton Paul Browne 《Information processing & management》2006

As an information medium, video offers many possible retrieval and browsing modalities, far more than text, image or audio. Some of these, like searching the text of the spoken dialogue, are well developed, others like keyframe browsing tools are in their infancy, and others not yet technically achievable. For those modalities for browsing and retrieval which we cannot yet achieve we can only speculate as to how useful they will actually be, but we do not know for sure. In our work we have created a system to support multiple modalities for video browsing and retrieval including text search through the spoken dialogue, image matching against shot keyframes and object matching against segmented video objects. For the last of these, automatic segmentation and tracking of video objects is a computationally demanding problem which is not yet solved for generic natural video material, and when it is then it is expected to open up possibilities for user interaction with objects in video, including searching and browsing. In this paper we achieve object segmentation by working in a closed domain of animated cartoons. We describe an interactive user experiment on a medium-sized corpus of video where we were able to measure users’ use of video objects versus other modes of retrieval during multiple-iteration searching. Results of this experiment show that although object searching is used far less than text searching in the first iteration of a user’s search it is a popular and useful search type once an initial set of relevant shots have been found. 相似文献

10.

Semi-supervised document retrieval 总被引：2，自引：0，他引：2

Ming Li Hang Li Zhi-Hua Zhou 《Information processing & management》2009

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning. 相似文献

11.

Cluster-based patent retrieval

In-Su Kang Seung-Hoon Na Jungi Kim Jong-Hyeok Lee 《Information processing & management》2007

Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user’s information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model. 相似文献

12.

适应机读环境下的文献分类、标引

任磊《中国科技信息》2010,(3):202-203

本文就为实现机捡系统能多途径、多角度检索文献的问题,阐述了如何调整机读数据分类标引规则。经过调整克服了仅标引一个分类号,提供的检索途径十分有限的弊端,增加了对文献多重学科或专业属性的揭示。可以经多途径、从不同角度检索得到,提高文献的利用率。相似文献

13.

Contextualization models for XML retrieval

Paavo Arvola Jaana Kekäläinen Marko Junkkari 《Information processing & management》2011

In a hierarchical XML structure, surrounding elements form the context of an XML element. In document-oriented XML, the context is a part of the semantics of the element and augments its textual information. The process of taking the context of the element into account in element scoring is called contextualization. This study extends the concept of contextualization and presents a classification of contextualization models. In an XML collection, elements are of different granularity, i.e. lower level elements are shorter and carry less textual information. Thus, it seems credible that contextualization interacts differently with diverse elements. Even if it is known that contextualization leads to improved effectiveness in element retrieval, the improvement on different granularity levels has not been investigated. This study explores the effect of contextualization on these levels. Further, a parameterized framework for testing contextualization is presented. 相似文献

14.

Topic distillation via sub-site retrieval

Tao Qin Tie-Yan Liu Xu-Dong Zhang Guang Feng De-Sheng Wang Wei-Ying Ma 《Information processing & management》2007

Topic distillation is one of the main information needs when users search the Web. Previous approaches for topic distillation treat single page as the basic searching unit, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named sub-site retrieval, in which the basic searching unit is sub-site instead of single page. A sub-site is the subset of a website, consisting of a structural collection of pages. The key of sub-site retrieval includes (1) extracting effective features for the representation of a sub-site using both the content and structure information, (2) delivering the sub-site-based retrieval results with a friendly and informative user interface. For the first point, we propose Punished Integration algorithm, which is based on the modeling of the growth of websites. For the second point, we design a user interface to better illustrate the search results of sub-site retrieval. Testing on the topic distillation task of TREC 2003 and 2004, sub-site retrieval leads to significant improvement of retrieval performance over the previous methods based on single pages. Furthermore, time complexity analysis shows that sub-site retrieval can be integrated into the index component of search engines. 相似文献

15.

Library and information retrieval software

D. Ellis 《International Journal of Information Management》1986,6(4)

相似文献

16.

基于VSM的文档信息检索改进

焦玉英宋晓晴《情报理论与实践》2007,30(1):97-99,104

虽然传统的向量空间模型被誉为检索模型中最富有想象力和创造力的模型,但是它同时也存在着一些考虑不足的地方,如未考虑文档结构、文档类型等问题。本文就这些问题给予分析并给出了相应的改进方法,最后构建了一个改进后的向量空间模型。相似文献

17.

Facet-based opinion retrieval from blogs 总被引：1，自引：0，他引：1

Olga Vechtomova 《Information processing & management》2010,46(1):71-88

The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback–Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective. 相似文献

18.

A common interface for accessing document retrieval systems and DBMS for retrieval of bibliographic data

Michael A. Shepherd Carolyn Watters 《Information processing & management》1985,21(2):127-138

Due to their ready availability, database management systems are being applied to bibliographic databases with increasing frequency. This is being done in spite of the fact that although DBMS query languages tend to be very powerful, they are far too complex for the casual user. It is proposed that PSI, an existing virtual-system intermediary for document retrieval systems, be extended to include access to DBMS containing bibliographic data in order to circumvent the complexity problem or the casual user. PSI currently provides a common command language for access to multiple document retrieval systems. It is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network. 相似文献

19.

Relation of resemblance in information retrieval

Pirkko Pietiläinen 《Information processing & management》1982,18(2):55-59

A method using the amount of semantic information of query terms as weight in a fuzzy relation of resemblance is presented. The relation can be used to partially order documents in decreasing order of resemblance with the query. Large operational bibliographic data bases are used to test the validity of the approach. 相似文献

20.

Threshold values and Boolean retrieval systems

Duncan A. Buell Donald H. Kraft 《Information processing & management》1981,17(3):127-136

Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions. The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights. Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations. 相似文献