期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

International Investments and Acquisitions in India: Tax and Regulatory Aspects

Sandeep Chaufla 《Publishing Research Quarterly》2008,24(3):187-201

A review and analysis of the rules and regulations including the tax aspects of making an investment in India is presented. The full range from Foreign Direct Investment to different forms of doing business with specific examples from the publishing industry is explored to help understand current policies and regulations.

Sandeep ChauflaEmail: Email:

相似文献

2.

Regularizing query-based retrieval scores

Fernando Diaz 《Information Retrieval》2007,10(6):531-562

We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this objective by adjusting retrieval scores so that topically related documents receive similar scores. We refer to this process as score regularization. Because score regularization operates on retrieval scores, regardless of their origin, we can apply the technique to arbitrary initial retrieval rankings. Document rankings derived from regularized scores, when compared to rankings derived from un-regularized scores, consistently and significantly result in improved performance given a variety of baseline retrieval algorithms. We also present several proofs demonstrating that regularization generalizes methods such as pseudo-relevance feedback, document expansion, and cluster-based retrieval. Because of these strong empirical and theoretical results, we argue for the adoption of score regularization as general design principle or post-processing step for information retrieval systems.

Fernando DiazEmail:

相似文献

3.

Current research issues and trends in non-English Web searching

Fotis Lazarinis Jesús Vilares John Tait Efthimis N. Efthimiadis 《Information Retrieval》2009,12(3):230-250

With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the research questions and solutions proposed in these papers. Further research is proposed at the end of each section.

Efthimis N. EfthimiadisEmail:

相似文献

4.

Probabilistic relevance ranking for collaborative filtering

Jun Wang Stephen Robertson Arjen P. de Vries Marcel J. T. Reinders 《Information Retrieval》2008,11(6):477-497

Collaborative filtering is concerned with making recommendations about items to users. Most formulations of the problem are specifically designed for predicting user ratings, assuming past data of explicit user ratings is available. However, in practice we may only have implicit evidence of user preference; and furthermore, a better view of the task is of generating a top-N list of items that the user is most likely to like. In this regard, we argue that collaborative filtering can be directly cast as a relevance ranking problem. We begin with the classic Probability Ranking Principle of information retrieval, proposing a probabilistic item ranking framework. In the framework, we derive two different ranking models, showing that despite their common origin, different factorizations reflect two distinctive ways to approach item ranking. For the model estimations, we limit our discussions to implicit user preference data, and adopt an approximation method introduced in the classic text retrieval model (i.e. the Okapi BM25 formula) to effectively decouple frequency counts and presence/absence counts in the preference data. Furthermore, we extend the basic formula by proposing the Bayesian inference to estimate the probability of relevance (and non-relevance), which largely alleviates the data sparsity problem. Apart from a theoretical contribution, our experiments on real data sets demonstrate that the proposed methods perform significantly better than other strong baselines.

Marcel J. T. ReindersEmail:

相似文献

5.

Evaluating the effectiveness of content-oriented XML retrieval methods 总被引：1，自引：0，他引：1

Norbert Gövert Norbert Fuhr Mounia Lalmas Gabriella Kazai 《Information Retrieval》2006,9(6):699-722

Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should be retrieved. In this article, we show that the evaluation methods developed for standard retrieval must be modified in order to deal with the structure of XML documents. More precisely, the size and overlap of document components must be taken into account. For this purpose, we propose a new effectiveness metric based on the definition of a concept space defined upon the notions of exhaustiveness and specificity of a search result. We compare the results of this new metric by the results obtained with the official metric used in INEX, the evaluation initiative for content-oriented XML retrieval.

Gabriella KazaiEmail:

相似文献

6.

On rank-based effectiveness measures and optimization 总被引：1，自引：0，他引：1

Stephen Robertson Hugo Zaragoza 《Information Retrieval》2007,10(3):321-339

Many current retrieval models and scoring functions contain free parameters which need to be set—ideally, optimized. The process of optimization normally involves some training corpus of the usual document-query-relevance judgement type, and some choice of measure that is to be optimized. The paper proposes a way to think about the process of exploring the space of parameter values, and how moving around in this space might be expected to affect different measures. One result, concerning local optima, is demonstrated for a range of rank-based evaluation measures.

Hugo ZaragozaEmail:

相似文献

7.

Consumer Magazines in Argentina: A Market to Recover

Ethel Alejandra Pis Diez 《Publishing Research Quarterly》2007,23(3):194-209

相似文献

8.

Multilingual phrase-based concordance generation in real-time

Kumiko Tanaka-Ishii Yuichiro Ishii 《Information Retrieval》2007,10(3):275-295

We present software that generates phrase-based concordances in real-time based on Internet searching. When a user enters a string of words for which he wants to find concordances, the system sends this string as a query to a search engine and obtains search results for the string. The concordances are extracted by performing statistical analysis on search results and then fed back to the user. Unlike existing tools, this concordance consultation tool is language-independent, so concordances can be obtained even in a language for which there are no well-established analytical methods. Our evaluation has revealed that concordances can be obtained more effectively than by only using a search engine directly.

Yuichiro IshiiEmail:

相似文献

9.

Modeling context through domain ontologies 总被引：1，自引：0，他引：1

Nathalie Hernandez Josiane Mothe Claude Chrisment Daniel Egret 《Information Retrieval》2007,10(2):143-172

Traditional information retrieval systems aim at satisfying most users for most of their searches, leaving aside the context in which the search takes place. We propose to model two main aspects of context: The themes of the user's information need and the specific data the user is looking for to achieve the task that has motivated his search. Both aspects are modeled by means of ontologies. Documents are semantically indexed according to the context representation and the user accesses information by browsing the ontologies. The model has been applied to a case study that has shown the added value of such a semantic representation of context.

Daniel EgretEmail:

相似文献

10.

Result merging methods in distributed information retrieval with overlapping databases 总被引：5，自引：0，他引：5

Shengli Wu Sally McClean 《Information Retrieval》2007,10(3):297-319

In distributed information retrieval systems, document overlaps occur frequently among different component databases. This paper presents an experimental investigation and evaluation of a group of result merging methods including the shadow document method and the multi-evidence method in the environment of overlapping databases. We assume, with the exception of resultant document lists (either with rankings or scores), no extra information about retrieval servers and text databases is available, which is the usual case for many applications on the Internet and the Web. The experimental results show that the shadow document method and the multi-evidence method are the two best methods when overlap is high, while Round-robin is the best for low overlap. The experiments also show that [0,1] linear normalization is a better option than linear regression normalization for result merging in a heterogeneous environment.

Sally McCleanEmail:

相似文献

11.

Effect of OCR error correction on Arabic retrieval

Walid Magdy Kareem Darwish 《Information Retrieval》2008,11(5):405-425

Arabic documents that are available only in print continue to be ubiquitous and they can be scanned and subsequently OCR’ed to ease their retrieval. This paper explores the effect of context-based OCR correction on the effectiveness of retrieving Arabic OCR documents using different index terms. Different OCR correction techniques based on language modeling with different correction abilities were tested on real OCR and synthetic OCR degradation. Results show that the reduction of word error rates needs to pass a certain limit to get a noticeable effect on retrieval. If only moderate error reduction is available, then using short character n-gram for retrieval without error correction is not a bad strategy. Word-based correction in conjunction with language modeling had a statistically significant impact on retrieval even for character 3-grams, which are known to be among the best index terms for OCR degraded Arabic text. Further, using a sufficiently large language model for correction can minimize the need for morphologically sensitive error correction.

Kareem DarwishEmail:

相似文献

12.

Features for image retrieval: an experimental comparison 总被引：6，自引：0，他引：6

Thomas Deselaers Daniel Keysers Hermann Ney 《Information Retrieval》2008,11(2):77-107

相似文献

13.

Re-ranking search results using language models of query-specific clusters

Oren Kurland 《Information Retrieval》2009,12(4):437-460

To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.

Oren KurlandEmail:

相似文献

14.

Exploring criteria for successful query expansion in the genomic domain 总被引：1，自引：0，他引：1

Nicola Stokes Yi Li Lawrence Cavedon Justin Zobel 《Information Retrieval》2009,12(1):17-50

Query Expansion is commonly used in Information Retrieval to overcome vocabulary mismatch issues, such as synonymy between the original query terms and a relevant document. In general, query expansion experiments exhibit mixed results. Overall TREC Genomics Track results are also mixed; however, results from the top performing systems provide strong evidence supporting the need for expansion. In this paper, we examine the conditions necessary for optimal query expansion performance with respect to two system design issues: IR framework and knowledge source used for expansion. We present a query expansion framework that improves Okapi baseline passage MAP performance by 185%. Using this framework, we compare and contrast the effectiveness of a variety of biomedical knowledge sources used by TREC 2006 Genomics Track participants for expansion. Based on the outcome of these experiments, we discuss the success factors required for effective query expansion with respect to various sources of term expansion, such as corpus-based cooccurrence statistics, pseudo-relevance feedback methods, and domain-specific and domain-independent ontologies and databases. Our results show that choice of document ranking algorithm is the most important factor affecting retrieval performance on this dataset. In addition, when an appropriate ranking algorithm is used, we find that query expansion with domain-specific knowledge sources provides an equally substantive gain in performance over a baseline system.

Nicola StokesEmail: Email:

相似文献

15.

Smoothing document language models with probabilistic term count propagation

Azadeh Shakery ChengXiang Zhai 《Information Retrieval》2008,11(2):139-164

Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through “filling in” missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.

ChengXiang ZhaiEmail:

相似文献

16.

Teaching mathematics for search using a tutorial style of delivery

Andrew MacFarlane 《Information Retrieval》2009,12(2):162-178

Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic, adjacency) or implicitly with Best match natural language search. In this paper we outline some pedagogical challenges in teaching mathematics for information retrieval (IR) to postgraduate information science students. The aim is to take these challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve the delivery of the material and positively affect the learning of the target audience by using a tutorial style of teaching. Results show that there is evidence to support the notion that a more pro-active style of teaching using tutorials yield benefits both in terms of assessment results and student satisfaction.

Andrew MacFarlaneEmail:

相似文献

17.

Chinese Publishing Industry Going Global: Background and Performance

Lifang Xu Qing Fang 《Publishing Research Quarterly》2008,24(1):64-72

To put an end to the large copyright trade deficit, both Chinese government agencies and publishing houses have been striving for entering the international publication market. The article analyzes the background of the going-global strategy, and sums up the performance of both Chinese administrations and publishers.

Qing Fang (Corresponding author)Email:

相似文献

18.

An empirical study of gene synonym query expansion in biomedical information retrieval

Yue Lu Hui Fang Chengxiang Zhai 《Information Retrieval》2009,12(1):51-68

Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly. So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical for improving performance.

Chengxiang ZhaiEmail:

相似文献

19.

How to manage an information state: Jean-Baptiste Colbert’s archives and the education of his son

Jacob Soll 《Archival Science》2007,7(4):331-342

This article examines the archival methods developed by Colbert to train his son in state administration. Based on Colbert’s correspondence with his son, it reveals the practices Colbert thought necessary to collect and manage information in his state encyclopedic archive during the last half of the 17th century.

Jacob SollEmail:

相似文献

20.

The Identification of Digital Book Content

Andy Weissberg 《Publishing Research Quarterly》2008,24(4):255-260

This article analyzes current industry practices toward the identification of digital book content. It highlights key technology trends, workflow considerations and supply chain behaviors, and examines the implications of these trends and behaviors on the production, discoverability, purchasing and consumption of digital book products.

Andy WeissbergEmail:

相似文献