首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
This article analyzes current industry practices toward the identification of digital book content. It highlights key technology trends, workflow considerations and supply chain behaviors, and examines the implications of these trends and behaviors on the production, discoverability, purchasing and consumption of digital book products.
Andy WeissbergEmail:
  相似文献   

2.
We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this objective by adjusting retrieval scores so that topically related documents receive similar scores. We refer to this process as score regularization. Because score regularization operates on retrieval scores, regardless of their origin, we can apply the technique to arbitrary initial retrieval rankings. Document rankings derived from regularized scores, when compared to rankings derived from un-regularized scores, consistently and significantly result in improved performance given a variety of baseline retrieval algorithms. We also present several proofs demonstrating that regularization generalizes methods such as pseudo-relevance feedback, document expansion, and cluster-based retrieval. Because of these strong empirical and theoretical results, we argue for the adoption of score regularization as general design principle or post-processing step for information retrieval systems.
Fernando DiazEmail:
  相似文献   

3.
Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic, adjacency) or implicitly with Best match natural language search. In this paper we outline some pedagogical challenges in teaching mathematics for information retrieval (IR) to postgraduate information science students. The aim is to take these challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve the delivery of the material and positively affect the learning of the target audience by using a tutorial style of teaching. Results show that there is evidence to support the notion that a more pro-active style of teaching using tutorials yield benefits both in terms of assessment results and student satisfaction.
Andrew MacFarlaneEmail:
  相似文献   

4.
On rank-based effectiveness measures and optimization   总被引:1,自引:0,他引:1  
Many current retrieval models and scoring functions contain free parameters which need to be set—ideally, optimized. The process of optimization normally involves some training corpus of the usual document-query-relevance judgement type, and some choice of measure that is to be optimized. The paper proposes a way to think about the process of exploring the space of parameter values, and how moving around in this space might be expected to affect different measures. One result, concerning local optima, is demonstrated for a range of rank-based evaluation measures.
Hugo ZaragozaEmail:
  相似文献   

5.
Evaluating the effectiveness of content-oriented XML retrieval methods   总被引:1,自引:0,他引:1  
Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should be retrieved. In this article, we show that the evaluation methods developed for standard retrieval must be modified in order to deal with the structure of XML documents. More precisely, the size and overlap of document components must be taken into account. For this purpose, we propose a new effectiveness metric based on the definition of a concept space defined upon the notions of exhaustiveness and specificity of a search result. We compare the results of this new metric by the results obtained with the official metric used in INEX, the evaluation initiative for content-oriented XML retrieval.
Gabriella KazaiEmail:
  相似文献   

6.
With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the research questions and solutions proposed in these papers. Further research is proposed at the end of each section.
Efthimis N. EfthimiadisEmail:
  相似文献   

7.
A questionnaire was circulated to librarians and learning support staff in all 109 UK universities asking how they were dealing with material only available in print, accessing electronic copies of books for print-disabled students and whether they felt a change in the law was required to make publishers take greater responsibility for accessibility issues. At the same time publishers’ policies were retrieved and an interview was conducted with a senior manager at JISC TechDis, the disabilities section of the Joint Information Systems Committee. Findings: While some publishers are going to considerable lengths to be helpful, others are not, and many learning support staff are struggling, either through lack of time or finance or both, to deliver the level of service they aspire to provide. An overwhelming majority of respondents to the questionnaire believe on grounds of cost and morals that there should be a change in the law, either by way of amendment to existing legislation or through the creation of a separate Act.
Guy WhitehouseEmail:
  相似文献   

8.
Arabic documents that are available only in print continue to be ubiquitous and they can be scanned and subsequently OCR’ed to ease their retrieval. This paper explores the effect of context-based OCR correction on the effectiveness of retrieving Arabic OCR documents using different index terms. Different OCR correction techniques based on language modeling with different correction abilities were tested on real OCR and synthetic OCR degradation. Results show that the reduction of word error rates needs to pass a certain limit to get a noticeable effect on retrieval. If only moderate error reduction is available, then using short character n-gram for retrieval without error correction is not a bad strategy. Word-based correction in conjunction with language modeling had a statistically significant impact on retrieval even for character 3-grams, which are known to be among the best index terms for OCR degraded Arabic text. Further, using a sufficiently large language model for correction can minimize the need for morphologically sensitive error correction.
Kareem DarwishEmail:
  相似文献   

9.
In distributed information retrieval systems, document overlaps occur frequently among different component databases. This paper presents an experimental investigation and evaluation of a group of result merging methods including the shadow document method and the multi-evidence method in the environment of overlapping databases. We assume, with the exception of resultant document lists (either with rankings or scores), no extra information about retrieval servers and text databases is available, which is the usual case for many applications on the Internet and the Web. The experimental results show that the shadow document method and the multi-evidence method are the two best methods when overlap is high, while Round-robin is the best for low overlap. The experiments also show that [0,1] linear normalization is a better option than linear regression normalization for result merging in a heterogeneous environment.
Sally McCleanEmail:
  相似文献   

10.
Modeling context through domain ontologies   总被引:1,自引:0,他引:1  
Traditional information retrieval systems aim at satisfying most users for most of their searches, leaving aside the context in which the search takes place. We propose to model two main aspects of context: The themes of the user's information need and the specific data the user is looking for to achieve the task that has motivated his search. Both aspects are modeled by means of ontologies. Documents are semantically indexed according to the context representation and the user accesses information by browsing the ontologies. The model has been applied to a case study that has shown the added value of such a semantic representation of context.
Daniel EgretEmail:
  相似文献   

11.
12.
To put an end to the large copyright trade deficit, both Chinese government agencies and publishing houses have been striving for entering the international publication market. The article analyzes the background of the going-global strategy, and sums up the performance of both Chinese administrations and publishers.
Qing Fang (Corresponding author)Email:
  相似文献   

13.
This paper gives an overview of the archival issues that relate to digitally signed documents. First, by way of introduction, the advanced digital signature is presented briefly. In the second part, a number of problems are discussed that present themselves when a digital signature is used as a proof of authenticity and integrity for digital documents in general. In particular, it is also being investigated whether it makes any sense for the archivist to digitally sign all electronic records under his or her management. Problems relating to the (medium) long-term archiving of digitally signed documents are dealt with in the third part. After an overview of the sticking points for long-term validation (“Archival issues”) a number of possible solutions are discussed (“Solutions for long-term archiving”).
Filip BoudrezEmail:
  相似文献   

14.
This article analyses the extent to which archival exemptions for historical, scientific and statistical research in privacy legislation support preservation in selected European Union countries, and comparable aspects of Australian, American and Canadian law within a legal, ethical and digital archival perspective. The authors recommend that the further processing of personal data under data protection law be given a wider scope of interpretation for archival preservation purposes in both the public and private sector, coupled with the use of researcher and archival codes in relation to access to personal data. They also recommend early appraisal and integration of privacy with freedom of information and archival regimes.
Malcolm ToddEmail:
  相似文献   

15.
This article is a general introduction into the special issue of Archival Science on “archiving research data”. It summarizes the different contributions and gives an overview of the main issues in this special field of archiving. One of the leading questions is how and why research data archives differ from public record offices. In the past, the developments in these two worlds have been rather separate. There are however signs that they are converging in the digital world. In particular, this can be seen in the areas of metadata and Internet dissemination as these are strongly influenced by the rapid changes in information technology. These changes have also led to important new developments in the infrastructure of research data to which special attention is paid. New concepts such as collaboratories, data curation, Open Access and the Open Archives Initiative are discussed.
Heiko TjalsmaEmail:
  相似文献   

16.
Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly. So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical for improving performance.
Chengxiang ZhaiEmail:
  相似文献   

17.
Negation recognition in medical narrative reports   总被引:1,自引:0,他引:1  
Substantial medical data, such as discharge summaries and operative reports are stored in electronic textual form. Databases containing free-text clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes. The context of negation, a negative finding, is of special importance, since many of the most frequently described findings are such. When searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the documents retrieved will be irrelevant. Hence, negation is a major source of poor precision in medical information retrieval systems. Previous research has shown that negated findings may be difficult to identify if the words implying negations (negation signals) are more than a few words away from them. We present a new pattern learning method for automatic identification of negative context in clinical narratives reports. We compare the new algorithm to previous methods proposed for the same task, and show its advantages: accuracy improvement compared to other machine learning methods, and much faster than manual knowledge engineering techniques with matching accuracy. The new algorithm can be applied also to further context identification and information extraction tasks.
Lior RokachEmail:
  相似文献   

18.
This paper, based on PhD research, reflects upon the market for electronic books in the general trade sectors of UK and US publishers during the early years of the 21st century. The paper reports on interviews carried out with publishers between 2003 and 2005, and reflects upon four areas which presented and still present challenges to the uptake of e-books—negative perceptions from consumers; formats; pricing and issues regarding digital rights. The paper concludes that the development and uptake of electronic books has some way to go in the general trade/mass-market sectors.
Cliff McKnightEmail:
  相似文献   

19.
A review and analysis of the rules and regulations including the tax aspects of making an investment in India is presented. The full range from Foreign Direct Investment to different forms of doing business with specific examples from the publishing industry is explored to help understand current policies and regulations.
Sandeep ChauflaEmail: Email:
  相似文献   

20.
A probability ranking principle for interactive information retrieval   总被引:1,自引:1,他引:0  
The classical Probability Ranking Principle (PRP) forms the theoretical basis for probabilistic Information Retrieval (IR) models, which are dominating IR theory since about 20 years. However, the assumptions underlying the PRP often do not hold, and its view is too narrow for interactive information retrieval (IIR). In this article, a new theoretical framework for interactive retrieval is proposed: The basic idea is that during IIR, a user moves between situations. In each situation, the system presents to the user a list of choices, about which s/he has to decide, and the first positive decision moves the user to a new situation. Each choice is associated with a number of cost and probability parameters. Based on these parameters, an optimum ordering of the choices can the derived—the PRP for IIR. The relationship of this rule to the classical PRP is described, and issues of further research are pointed out.
Norbert FuhrEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号