News-oriented bloggers have contributed much to the public sphere in recent years. Whether or not bloggers are considered journalists and thereby are protected by shield laws will be an important question for policy makers and the courts. This paper provides an overview of the law concerning the constitutional and statutory privileges accorded journalists. It then critiques proposals to create a federal journalist's privilege as applied to bloggers. Finally, the paper argues that the test articulated in Von Bulow v. Von Bulow should be adopted in federal legislation. Under Von Bulow, bloggers would be shielded from disclosing confidential sources and information when they function as journalists.  相似文献   

We consider text retrieval applications that assign query-specific relevance scores to documents drawn from particular collections. Such applications represent a primary focus of the annual Text Retrieval Conference (TREC), where the participants compare the empirical performance of different approaches. P(K), the proportion of the top K documents that are relevant, is a popular measure of retrieval effectiveness. Participants in the TREC Very Large Corpus track have observed that when the target is a random sample from a collection, P(K) is substantially smaller than when the target is the entire collection. Hawking and Robertson (2003) confirmed this finding in a number of experimental settings. Hawking et al. (1999) posed as an open research question the cause of this phenomenon and proposed five possible explanatory hypotheses. In this paper, we present a mathematical analysis that sheds some light on these hypotheses and complements the experimental work of Hawking and Robertson (2003). We will also introduce C(L), contamination at L, the number of irrelevant documents amongst the top L relevant documents, and describe its properties. Our analysis shows that while P(K) typically will increase with collection size, the phenomenon is not universal. That is, the asymptotic behavior of P(K) and C(L) depends on the score distributions and relative proportions of relevant and irrelevant documents in the collection. While this article went to press, Yehuda Vardi passed away. We dedicate the paper to his memory.  相似文献   

新型文献主要指电子文献与网络文献。传统文献主要指纸质文献。由于二者在生产、存储、传递、价格等诸多方面的差别 ,图书馆新型载体文献的数量将越来越多 ,传统文献占主导地位的状况将被改变 ,但纸质文献不会消失 ,仍会在一定范围发挥作用。参考文献 6。  相似文献   

Conclusion Freelancers provide unique resources for publishers. Because of their special expertise, unique viewpoints, and even geographical location, they provide publishers with customized materials in disciplines and locations where the expense of full-time writers would not be justifiable. They will continue to be an important factor in the publishing industry. Publishers will continue to write contracts that allow them to republish articles in all known and future formats. Problems still exist with articles published before 1995. Publishers will have to decide how they will respond to Tasini, whether that means some version of the PRC or deleting articles. It is hoped that a final agreement between authors and publishers, according to the decision of the Supreme Court, will result in minimum inconvenience to the research public. New technology presents many challenges, as well as many opportunities, for freelance writers. As technology evolves, copyright law must ensure that writers’ incentives to create are nurtured and that they are fairly compensated for their work. Authors whose work creates value should share in the revenue and opportunities created by those technologies.  相似文献   

Discussion about the value of electronic documents is often hampered by starting from what is usual in the paper world and attempting to impose that on an electronic environment. In order to grasp the impact of the current electronic revolution, and formulate a policy for the future, we examine the aims and content of scientific communication. We then critically discuss the recommendations of an International Working Group [see Learned Publishing 2000:13(4) Oct. 251–8], and show the tension between these very reasonable recommendations and the reality of electronic publishing. We conclude that the scientific article will change considerably but that, in its new more composite form as an ensemble of various textual and non-textual components, it will retain many of the current cultural and scientific requirements with regard to editorial, quality and integrity.  相似文献   


This paper discusses the transformation of the book binding industry and possibilities that binding technology affords today's authors, publishers and book designers. Book manufacturers will endure, although they will be challenged to offer better products with ever faster service.  相似文献   

In this paper, we evaluate a number of machine learning techniques for the task of ranking answers to why-questions. We use TF-IDF together with a set of 36 linguistically motivated features that characterize questions and answers. We experiment with a number of machine learning techniques (among which several classifiers and regression techniques, Ranking SVM and SVM map ) in various settings. The purpose of the experiments is to assess how the different machine learning approaches can cope with our highly imbalanced binary relevance data, with and without hyperparameter tuning. We find that with all machine learning techniques, we can obtain an MRR score that is significantly above the TF-IDF baseline of 0.25 and not significantly lower than the best score of 0.35. We provide an in-depth analysis of the effect of data imbalance and hyperparameter tuning, and we relate our findings to previous research on learning to rank for Information Retrieval.  相似文献   


This concept was initially developed by William Wilson, library consultant, PROVIDENCE Associates Inc. It was modified for a presentation at die 1996 Texas State Library Association Convention. It outlines a different way of thinking about funding libraries, an innovative framework that will interest legislators in the plight of their under-funded library services. Wilson and Waters refer to this new structure as a “Library Authority.” The concept would allow the question of funding to go directly to the voters, who are those most concerned with the quality of libraries. This quality is also addressed by Waters, who considers quantity and volume as a faulty line of reasoning for gathering support. “We make little mention of the long-term impact associated with introducing a four-year old to the world of books… or to the economic impact derived from helping a saleswoman locate a new market for her products.” Mr. Waters hopes others will assist in fleshing out this concept for the good of all libraries.  相似文献   

Just as gold, silver and other precious metals are regarded as valuable commodities, so now is information. We call it the information age. But with this recognition there are many questions. One of the more pressing ones is, “What are the needs of Librarians, Archivists, Records Managers and other information specialists so that they can insure that information, like the precious metals, retains itsreliability and accountabilitythrough time?” This paper presents several technology changes that are shaking the foundations of our previous “tried and tested” methods. It identifies the current problems with what has worked in the past. An attempt is made to predict what will be happening in the future and what effect it has on a Global Information Society and the way we need to go about our business in the coming decades.  相似文献   

This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments (General and Specific) and two categories of topics (Broad and Narrow). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call Coherent Retrieval Elements. The results of our experiments show that—when each of the three systems is evaluated against different retrieval scenarios (such as different cases of relevance assessments, different topic categories and different choices of evaluation metrics)—the XML retrieval systems exhibit varying behaviour and the best performance can be reached for different values of the retrieval parameters. In the case of INEX 2003 relevance assessments for the content-only topics, our newly developed hybrid XML retrieval system is substantially more effective than either Zettair or eXist, and yields a robust and a very effective XML retrieval.  相似文献   


Beginning in the 1990s, various academic units within our health sciences institution moved aggressively toward providing courses and programs via distance education. Without a centralized campus distance education office, distance library services from our campus evolved sporadically in response to individual needs. In 2001, the library hired its first distance services librarian, whose primary responsibility was to develop a written distance library services plan. In accordance with the ACRL Guidelines for Distance Learning Library Services, the library determined that the formulation of an effective plan required a formal needs assessment of the faculty providing distance education. In this paper, we will discuss the process for developing this needs assessment, based on focus groups and a written survey instrument. We will also address some of the challenges we faced with this approach. Preliminary data identified copyright clearance and lack of awareness regarding library services as the major barriers to distance faculty seeking course support from the library.  相似文献   

Automatic document classification can be used to organize documents in a digital library, construct on-line directories, improve the precision of web searching, or help the interactions between user and search engines. In this paper we explore how linkage information inherent to different document collections can be used to enhance the effectiveness of classification algorithms. We have experimented with three link-based bibliometric measures, co-citation, bibliographic coupling and Amsler, on three different document collections: a digital library of computer science papers, a web directory and an on-line encyclopedia. Results show that both hyperlink and citation information can be used to learn reliable and effective classifiers based on a kNN classifier. In one of the test collections used, we obtained improvements of up to 69.8% of macro-averaged F 1 over the traditional text-based kNN classifier, considered as the baseline measure in our experiments. We also present alternative ways of combining bibliometric based classifiers with text based classifiers. Finally, we conducted studies to analyze the situation in which the bibliometric-based classifiers failed and show that in such cases it is hard to reach consensus regarding the correct classes, even for human judges.  相似文献   

Those who have viewed the Times Mirror acquisition of New American Library through the lens of Victor Weybright’s memoirs have seen a picture of corporate interference in editorial decision making. By studying the correspondence and memoranda exchanged at the time, Thomas Bonn has uncovered different causes for dissatisfaction on Weybright’s part, and new insight into the managerial problems that occur when a large corporation acquires an independent publisher. Thomas L. Bonn is a librarian at the State University of New York at Cortland. HisHeavy Traffic and High Culture: The New American Library as Literary Gatekeeper was published in July by Southern Illinois University Press.  相似文献   

The following seniority-independent Hirsch-type index has been defined. A scientist has index hpd if hpd of his/her papers have at least hpd citations per decade each, and his/her other papers have less than hpd + 1 citations per decade each. In contrast with the original h-index, which steadily increases in time, hpd of a mature scientist is nearly constant over many years, and hpd of an inactive scientist slowly declines. Therefore hpd is suitable to compare the scientific output of scientists in different ages.  相似文献   

This paper is concerned with Markov processes for computing page importance. Page importance is a key factor in Web search. Many algorithms such as PageRank and its variations have been proposed for computing the quantity in different scenarios, using different data sources, and with different assumptions. Then a question arises, as to whether these algorithms can be explained in a unified way, and whether there is a general guideline to design new algorithms for new scenarios. In order to answer these questions, we introduce a General Markov Framework in this paper. Under the framework, a Web Markov Skeleton Process is used to model the random walk conducted by the web surfer on a given graph. Page importance is then defined as the product of two factors: page reachability, the average possibility that the surfer arrives at the page, and page utility, the average value that the page gives to the surfer in a single visit. These two factors can be computed as the stationary probability distribution of the corresponding embedded Markov chain and the mean staying time on each page of the Web Markov Skeleton Process respectively. We show that this general framework can cover many existing algorithms including PageRank, TrustRank, and BrowseRank as its special cases. We also show that the framework can help us design new algorithms to handle more complex problems, by constructing graphs from new data sources, employing new family members of the Web Markov Skeleton Process, and using new methods to estimate these two factors. In particular, we demonstrate the use of the framework with the exploitation of a new process, named Mirror Semi-Markov Process. In the new process, the staying time on a page, as a random variable, is assumed to be dependent on both the current page and its inlink pages. Our experimental results on both the user browsing graph and the mobile web graph validate that the Mirror Semi-Markov Process is more effective than previous models in several tasks, even when there are web spams and when the assumption on preferential attachment does not hold.  相似文献   

In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world.  相似文献   

The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Text Summarisation. In this paper, we treat event detection as a sentence level text classification problem. Overall, we compare the performance of discriminative versus generative approaches to this task: namely, a Support Vector Machine (SVM) classifier versus a Language Modeling (LM) approach. We also investigate a rule-based method that uses handcrafted lists of ‘trigger’ terms derived from WordNet. Two datasets are used in our experiments to test each approach on six different event types, i.e., Die, Attack, Injure, Meet, Transport and Charge-Indict. Our experimental results show that the trained SVM classifier significantly outperforms the simple rule-based system and language modeling approach on both datasets: ACE (F1 66% vs. 45% and 38%, respectively) and IBC (F1 92% vs. 88% and 74%, respectively). A detailed error analysis framework for the task is also provided which separates errors into different types: semantic, inference, continuous and trigger-less.  相似文献   

