首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The non-citation rate refers to the proportion of papers that do not attract any citation over a period of time following their publication. After reviewing all the related papers in Web of Science, Google Scholar and Scopus database, we find the current literature on citation distribution gives more focus on the distribution of the percentages and citations of papers receiving at least one citation, while there are fewer studies on the time-dependent patterns of the percentage of never-cited papers, on what distribution model can fit their time-dependent patterns, as well as on the factors influencing the non-citation rate. Here, we perform an empirical pilot analysis to the time-dependent distribution of the percentages of never-cited papers in a series of different, consecutive citation time windows following their publication in our selected six sample journals, and study the influence of paper length on the chance of papers’ getting cited. Through the above analysis, the following general conclusions are drawn: (1) a three-parameter negative exponential model can well fit time-dependent distribution curve of the percentages of never-cited papers; (2) in the initial citation time window, the percentage of never-cited papers in each journal is very high. However, as the citation time window becomes wider and wider, the percentage of never-cited papers begins to drop rapidly at first, and then drop more slowly, and the total degree of decline for most of journals is very large; (3) when applying the wider citation time windows, the percentage of never-cited papers for each journal begins to approach a stable value, and after that value, there will be very few changes in these stable percentages, unless we meet a large amount of “Sleeping Beauties” type papers; (4) the length of an paper has a great influence on whether it will be cited or not.  相似文献   

2.
Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking—i.e. its minimum effective size—remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function—i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated—are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change—for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.  相似文献   

3.
Experimental data [Mansilla, R., Köppen, E., Cocho, G., & Miramontes, P. (2007). On the behavior of journal impact factor rank-order distribution. Journal of Informetrics, 1(2), 155–160] reveal that, if one ranks a set of journals (e.g. in a field) in decreasing order of their impact factors, the rank distribution of the logarithm of these impact factors has a typical S-shape: first a convex decrease, followed by a concave decrease. In this paper we give a mathematical formula for this distribution and explain the S-shape. Also the experimentally found smaller convex part and larger concave part is explained. If one studies the rank distribution of the impact factors themselves, we now prove that we have the same S-shape but with inflection point in μ, the average of the impact factors. These distributions are valid for any type of impact factor (any publication period and any citation period). They are even valid for any sample average rank distribution.  相似文献   

4.
Citation averages, and Impact Factors (IFs) in particular, are sensitive to sample size. Here, we apply the Central Limit Theorem to IFs to understand their scale-dependent behavior. For a journal of n randomly selected papers from a population of all papers, we expect from the Theorem that its IF fluctuates around the population average μ, and spans a range of values proportional to σ/n, where σ2 is the variance of the population's citation distribution. The 1/n dependence has profound implications for IF rankings: The larger a journal, the narrower the range around μ where its IF lies. IF rankings therefore allocate an unfair advantage to smaller journals in the high IF ranks, and to larger journals in the low IF ranks. As a result, we expect a scale-dependent stratification of journals in IF rankings, whereby small journals occupy the top, middle, and bottom ranks; mid-sized journals occupy the middle ranks; and very large journals have IFs that asymptotically approach μ. We obtain qualitative and quantitative confirmation of these predictions by analyzing (i) the complete set of 166,498 IF & journal-size data pairs in the 1997–2016 Journal Citation Reports of Clarivate Analytics, (ii) the top-cited portion of 276,000 physics papers published in 2014–2015, and (iii) the citation distributions of an arbitrarily sampled list of physics journals. We conclude that the Central Limit Theorem is a good predictor of the IF range of actual journals, while sustained deviations from its predictions are a mark of true, non-random, citation impact. IF rankings are thus misleading unless one compares like-sized journals or adjusts for these effects. We propose the Φ index, a rescaled IF that accounts for size effects, and which can be readily generalized to account also for different citation practices across research fields. Our methodology applies to other citation averages that are used to compare research fields, university departments or countries in various types of rankings.  相似文献   

5.
Journal metrics are employed for the assessment of scientific scholar journals from a general bibliometric perspective. In this context, the Thomson Reuters journal impact factors (JIFs) are the citation-based indicators most used. The 2-year journal impact factor (2-JIF) counts citations to one and two year old articles, while the 5-year journal impact factor (5-JIF) counts citations from one to five year old articles. Nevertheless, these indicators are not comparable among fields of science for two reasons: (i) each field has a different impact maturity time, and (ii) because of systematic differences in publication and citation behavior across disciplines. In fact, the 5-JIF firstly appeared in the Journal Citation Reports (JCR) in 2007 with the purpose of making more comparable impacts in fields in which impact matures slowly. However, there is not an optimal fixed impact maturity time valid for all the fields. In some of them two years provides a good performance whereas in others three or more years are necessary. Therefore, there is a problem when comparing a journal from a field in which impact matures slowly with a journal from a field in which impact matures rapidly. In this work, we propose the 2-year maximum journal impact factor (2M-JIF), a new impact indicator that considers the 2-year rolling citation time window of maximum impact instead of the previous 2-year time window. Finally, an empirical application comparing 2-JIF, 5-JIF, and 2M-JIF shows that the maximum rolling target window reduces the between-group variance with respect to the within-group variance in a random sample of about six hundred journals from eight different fields.  相似文献   

6.
As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank, retrieval systems can learn directly from implicit feedback inferred from user interactions. In such an online setting, algorithms must obtain feedback for effective learning while simultaneously utilizing what has already been learned to produce high quality results. We formulate this challenge as an exploration–exploitation dilemma and propose two methods for addressing it. By adding mechanisms for balancing exploration and exploitation during learning, each method extends a state-of-the-art learning to rank method, one based on listwise learning and the other on pairwise learning. Using a recently developed simulation framework that allows assessment of online performance, we empirically evaluate both methods. Our results show that balancing exploration and exploitation can substantially and significantly improve the online retrieval performance of both listwise and pairwise approaches. In addition, the results demonstrate that such a balance affects the two approaches in different ways, especially when user feedback is noisy, yielding new insights relevant to making online learning to rank effective in practice.  相似文献   

7.
Combination of multiple evidences (multiple query formulations, multiple retrieval schemes or systems) has been shown (mostly experimentally) to be effective in data fusion in information retrieval. However, the question of why and how combination should be done still remains largely unanswered. In this paper, we provide a model for simulation and a framework for analysis in the study of data fusion in the information retrieval domain. A rank/score function is defined and the concept of a Cayley graph is used in the design and analysis of our framework. The model and framework have led us to better understanding of the data fusion phenomena in information retrieval. In particular, by exploiting the graphical properties of the rank/score function, we have shown analytically and by simulation that combination using rank performs better than combination using score under certain conditions. Moreover, we demonstrated that the rank/score function might be used as a predictive variable for the effectiveness of combination of multiple evidences.Authors wish to dedicate this paper to the memory of our friend and colleague Professor Jacob Shapiro, who passed away September 2003.Supported in part by the DIMACS NSF grant STC-91-19999 and by NJ Commission.Supported in part by a grant from The City University of New York PSC-CUNY Research Award.  相似文献   

8.
9.
This paper proposes to use random walk (RW) to discover the properties of the deep web data sources that are hidden behind searchable interfaces. The properties, such as the average degree and population size of both documents and terms, are of interests to general public, and find their applications in business intelligence, data integration and deep web crawling. We show that simple RW can outperform the uniform random (UR) samples disregarding the high cost of UR sampling. We prove that in the idealized case when the degrees follow Zipf’s law, the sample size of UR sampling needs to grow in the order of O(N/ln 2 N) with the corpus size N, while the sample size of RW sampling grows logarithmically. Reuters corpus is used to demonstrate that the term degrees resemble power law distribution, thus RW is better than UR sampling. On the other hand, document degrees have lognormal distribution and exhibit a smaller variance, therefore UR sampling is slightly better.  相似文献   

10.
Questions of definition and measurement continue to constrain a consensus on the measurement of interdisciplinarity. Using Rao-Stirling (RS) Diversity sometimes produces anomalous results. We argue that these unexpected outcomes can be related to the use of “dual-concept diversity” which combines “variety” and “balance” in the definitions (ex ante). We propose to modify RS Diversity into a new indicator (DIV) which operationalizes “variety,” “balance,” and “disparity” independently and then combines them ex post. “Balance” can be measured using the Gini coefficient. We apply DIV to the aggregated citation patterns of 11,487 journals covered by the Journal Citation Reports 2016 of the Science Citation Index and the Social Sciences Citation Index as an empirical domain and, in more detail, to the citation patterns of 85 journals assigned to the Web-of-Science category “information science & library science” in both the cited and citing directions. We compare the results of the indicators and show that DIV provides improved results in terms of distinguishing between interdisciplinary knowledge integration (citing references) versus knowledge diffusion (cited impact). The new diversity indicator and RS diversity measure different features. A routine for the measurement of the various operationalization of diversity (in any data matrix) is made available online.  相似文献   

11.
The reform of Italian public administration, which started in the 1990s, shifted the consolidated paradigm towards a results-oriented management of the res publica. The new regulatory framework emphasised the role of the evaluation process carried out by the designated audit authorities (OIV or NDV); legislators provided a new information system principally making accessible the audit-related data and other information via the institutional websites of Italian cities. In this context, the Minister of Public Administration promoted the platform called ‘Bussola della Trasparenza’, the goal of which is to ensure easy access to institutional data of the municipalities and to evaluate the available information. However, we found that the results provided by this platform were unreliable. Our study of 525 municipalities showed severe discrepancies with Bussola's evaluation, suggesting a lack of transparency. We therefore propose a logit model as an alternative framework to evaluate the probability that a municipal website is compliant with the new regulations using a set of predictors to consider a broader and more complete definition of transparency. This model is thought to be a practical tool to correctly evaluate the compliance of municipal websites.  相似文献   

12.
An expert ranking of forestry journals was compared with Journal Impact Factors and h-indices computed from the ISI Web of Science and internet-based data. Citations reported by Google Scholar offer an efficient way to rank all journals objectively, in a manner consistent with other indicators. This h-index exhibited a high correlation with the Journal Impact Factor (r = 0.92), but is not confined to journals selected by any particular commercial provider. A ranking of 180 forestry journals is presented, on the basis of this index.  相似文献   

13.
The distribution of cumulative citations L and contributed citations Lf to individual multiauthored papers published by selected authors working in different scientific disciplines is analyzed and discussed using Langmuir-type function: yn = y0[1  αKn/(1 + Kn)], where yn denotes the total number of normalized cumulative citations ln* and normalized contributed citations lnf* received by individual papers of rank n, y0 is the maximum value of yn when n = 0, α  1 is an effectiveness parameter, and K is the Langmuir constant related to the dimensionless differential energy Q = ln(KNc), with Nc as the number of papers receiving citations. Relationships between the values of the Langmuir constant K of the distribution function, the number Nc of papers of an individual author receiving citations and the effectiveness parameter α of this function, obtained from analysis of the data of rank-size distributions of the authors, are investigated. It was found that: (1) the quantity KNc obtained from the real citation distribution of papers of various authors working in different disciplines is inversely proportional to (α  1) with a proportional constant (KNc)0 < 1, (2) the relation KNc = (KNc)0/(α  1) also holds for the citation distribution of journals published in countries of two different groups, investigated earlier (Sangwal, K. (2013). Journal of Informetrics, 7, 487–504), and (3) deviations of the real citation distribution from curves predicted by the Langmuir-type function are associated with changing activity of sources of generation of items (citations).  相似文献   

14.
When speaking of information retrieval, we often mean text retrieval. But there exist many other forms of information retrieval applications. A typical example is collaborative filtering that suggests interesting items to a user by taking into account other users’ preferences or tastes. Due to the uniqueness of the problem, it has been modeled and studied differently in the past, mainly drawing from the preference prediction and machine learning view point. A few attempts have yet been made to bring back collaborative filtering to information (text) retrieval modeling and subsequently new interesting collaborative filtering techniques have been thus derived. In this paper, we show that from the algorithmic view point, there is an even closer relationship between collaborative filtering and text retrieval. Specifically, major collaborative filtering algorithms, such as the memory-based, essentially calculate the dot product between the user vector (as the query vector in text retrieval) and the item rating vector (as the document vector in text retrieval). Thus, if we properly structure user preference data and employ the target user’s ratings as query input, major text retrieval algorithms and systems can be directly used without any modification. In this regard, we propose a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data. Besides confirming the rationale of the framework, our preliminary experimental results have also demonstrated the effectiveness of the approach in using text retrieval models and systems to perform item ranking tasks in collaborative filtering.  相似文献   

15.
The citation distribution of papers of selected individual authors was analyzed using five mathematical functions: power-law, stretched exponential, logarithmic, binomial and Langmuir-type. The former two functions have previously been proposed in the literature whereas the remaining three are novel and are derived following the concepts of growth kinetics of crystals in the presence of additives which act as inhibitors of growth. Analysis of the data of citation distribution of papers of the authors revealed that the value of the goodness-of-the-fit parameter R2 was the highest for the empirical binomial relation, it was high and comparable for stretched exponential and Langmuir-type functions, relatively low for power law but it was the lowest for the logarithmic function. In the Langmuir-type function a parameter K, defined as Langmuir constant, characterizing the citation behavior of the authors has been identified. Based on the Langmuir-type function an expression for cumulative citations L relating the extrapolated value of citations l0 corresponding to rank n = 0 for an author and his/her constant K and the number N of paper receiving citation l  1 is also proposed.  相似文献   

16.
eCommerce, Brexit, new safety and security concerns are only a few examples of the challenges that government organisations, in particular customs administrations, face today when controlling goods crossing borders. To deal with the enormous volumes of trade customs administrations rely more and more on information technology (IT) and risk assessment, and are starting to explore the possibilities that data analytics (DA) can offer to support their supervision tasks. Driven by customs as our empirical domain, we explore the use of DA to support the supervision role of government. Although data analytics is considered to be a technological breakthrough, there is so far only a limited understanding of how governments can translate this potential into actual value and what are barriers and trade-offs that need to be overcome to lead to value realisation. The main question that we explore in this paper is: How to identify the value of DA in a government supervision context, and what are barriers and trade-offs to be considered and overcome in order to realise this value? Building on leading models from the information system (IS) literature, and by using case studies from the customs domain, we developed the Value of Data Analytics in Government Supervision (VDAGS) framework. The framework can help managers and policy-makers to gain a better understanding of the benefits and trade-offs of using DA when developing DA strategies or when embarking on new DA projects. Future research can examine the applicability of the VDAGS framework in other domains of government supervision.  相似文献   

17.
The journal impact factor (JIF) has been questioned considerably during its development in the past half-century because of its inconsistency with scholarly reputation evaluations of scientific journals. This paper proposes a publication delay adjusted impact factor (PDAIF) which takes publication delay into consideration to reduce the negative effect on the quality of the impact factor determination. Based on citation data collected from Journal Citation Reports and publication delay data extracted from the journals’ official websites, the PDAIFs for journals from business-related disciplines are calculated. The results show that PDAIF values are, on average, more than 50% higher than JIF results. Furthermore, journal ranking based on PDAIF shows very high consistency with reputation-based journal rankings. Moreover, based on a case study of journals published by ELSEVIER and INFORMS, we find that PDAIF will bring a greater impact factor increase for journals with longer publication delay because of reducing that negative influence. Finally, insightful and practical suggestions to shorten the publication delay are provided.  相似文献   

18.
In the present work we introduce a modification of the h-index for multi-authored papers with contribution based author name ranking. The modified h-index is denoted by hmc-index. It employs the framework of the hm-index, which in turn is a straightforward modification of the Hirsch index, proposed by Schreiber. To retain the merit of requiring no additional rearrangement of papers in the hm-index and in order to overcome its shortage of benefiting secondary authors at the expense of primary authors, hmc-index uses combined credit allocation (CCA) to replace fractionalized counting in the hm-index. The hm-index is a special form of hmc-index and fits for papers with equally important authors or alphabetically ordered authorship. There is a possibility of an author of lower contribution to the whole scientific community obtaining a higher hmc-index. Rational hmc-index, denoted by hmcr-index, can avoid it. A fictitious example as a model case and two empirical cases are analyzed. The correlations of the hmcr-index with the h-index and its several variants considering multiple co-authorship are inspected with 30 researchers’ citation data. The results show that the hmcr-index is more reasonable for authors with different contributions. A researcher playing more important roles in significant work will obtain higher hmcr-index.  相似文献   

19.
In many probabilistic modeling approaches to Information Retrieval we are interested in estimating how well a document model “fits” the user’s information need (query model). On the other hand in statistics, goodness of fit tests are well established techniques for assessing the assumptions about the underlying distribution of a data set. Supposing that the query terms are randomly distributed in the various documents of the collection, we actually want to know whether the occurrences of the query terms are more frequently distributed by chance in a particular document. This can be quantified by the so-called goodness of fit tests. In this paper, we present a new document ranking technique based on Chi-square goodness of fit tests. Given the null hypothesis that there is no association between the query terms q and the document d irrespective of any chance occurrences, we perform a Chi-square goodness of fit test for assessing this hypothesis and calculate the corresponding Chi-square values. Our retrieval formula is based on ranking the documents in the collection according to these calculated Chi-square values. The method was evaluated over the entire test collection of TREC data, on disks 4 and 5, using the topics of TREC-7 and TREC-8 (50 topics each) conferences. It performs well, outperforming steadily the classical OKAPI term frequency weighting formula but below that of KL-Divergence from language modeling approach. Despite this, we believe that the technique is an important non-parametric way of thinking of retrieval, offering the possibility to try simple alternative retrieval formulas within goodness-of-fit statistical tests’ framework, modeling the data in various ways estimating or assigning any arbitrary theoretical distribution in terms.  相似文献   

20.
《Knowledge Acquisition》1992,4(1):89-108
There are two main problems in the development of a knowledge-based system (KBS). The first is the modelling of domain expertise. The second is the modelling of the application of this knowledge to tasks that future users want to perform. This paper discusses how the second problem can be addressed in a systematic way. Our argument is that the second problem is at least equally important and if it is given serious attention, the first main problem will become simpler, because efforts can be directed at the subset of expertise that is actually required.This Analysis of Cooperation helps to arrive at a consistent set of functional requirements for a future KBS and a population of intended users. It comprises (i) a theoretical framework for system development, (ii) a technique for constructing a model of cooperation and (iii) a recommendation to use the “Wizard of Oz” technique for validating a model of cooperation in experiments with future users. In such an experiment, users attempt to perform tasks with the help of a mock-up of the future system operating according to the model of cooperation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号