排序方式: 共有78条查询结果,搜索用时 0 毫秒
31.
Jia Zhu Gabriel Pui Cheong Fung Zeyang Lei Min Yang Ying Shen 《Information processing & management》2019,56(3):381-393
In the last decades, many similarity measures are proposed, such as Jaccard coefficient, cosine similarity, BM25, language model, etc. Despite the effectiveness of the existing similarity measures, we observe that none of them can consistently outperform the others in most typical situations. Choosing which similarity predicate to use is usually treated as an empirical question by evaluating a particular task with a number of different similarity predicates, which is not computationally efficient and the obtained results are not portable. In this paper, we propose a novel approach to combine different similarity predicates together to form a committee so that we do not need to worry about choosing which of them to use. Empirically, we can obtain a better result than any individual similarity predicate, which is quite meaningful in practice. Specifically, our method models the problem of committee generation as a 0–1 integer programming problem based on the confidence of similarity predicates and the reliability of attributes. We demonstrate the effectiveness of our model by applying it on three datasets with controlled errors. Experimental results demonstrate that our similarity predicate committee is more robust and superior over existing individual similarity predicates. 相似文献
32.
We evaluate article-level metrics along two dimensions. Firstly, we analyse metrics’ ranking bias in terms of fields and time. Secondly, we evaluate their performance based on test data that consists of (1) papers that have won high-impact awards and (2) papers that have won prizes for outstanding quality. We consider different citation impact indicators and indirect ranking algorithms in combination with various normalisation approaches (mean-based, percentile-based, co-citation-based, and post hoc rescaling). We execute all experiments on two publication databases which use different field categorisation schemes (author-chosen concept categories and categories based on papers’ semantic information).In terms of bias, we find that citation counts are always less time biased but always more field biased compared to PageRank. Furthermore, rescaling paper scores by a constant number of similarly aged papers reduces time bias more effectively compared to normalising by calendar years. We also find that percentile citation scores are less field and time biased than mean-normalised citation counts.In terms of performance, we find that time-normalised metrics identify high-impact papers better shortly after their publication compared to their non-normalised variants. However, after 7 to 10 years, the non-normalised metrics perform better. A similar trend exists for the set of high-quality papers where these performance cross-over points occur after 5 to 10 years.Lastly, we also find that personalising PageRank with papers’ citation counts reduces time bias but increases field bias. Similarly, using papers’ associated journal impact factors to personalise PageRank increases its field bias. In terms of performance, PageRank should always be personalised with papers’ citation counts and time-rescaled for citation windows smaller than 7 to 10 years. 相似文献
33.
The most popular world university rankings are routinely taken at face value by media and social actors. While these rankings are politically influential, they are sensitive to both the conceptual framework (the set of indicators) and the modelling choices made in their construction (e.g., weighting or type of aggregation). A robustness analysis, based on a multi-modelling approach, aims to test the validity of the inference about the rankings produced in the Academic Ranking of World Universities (ARWU) of Shanghai Jiao Tong University and those produced by the UK's Times Higher Education Supplement (THES). Conclusions are drawn on the reliability of individual university ranks and on relative country or macro regional performance (e.g., Europe versus USA versus China) in terms of the number of top performing institutions. We find that while university and country level statistical inferences are unsound, the inference on macro regions is more robust. The paper also aims to propose an alternative ranking which is more dependant on the framework than on the methodological choices. 相似文献
34.
《美国新闻与世界报道》每年9月所推出的"全美最佳大学"排行榜尽管是对美国国内的高等学校进行排名,但却在全球范围内享有盛誉。贯穿于"全美最佳大学"排行榜中的排行理念是其精髓所在,其分类排行、重视促进人才培养的评估指标、评估方法的客观性、为消费者服务的排行理念值得中国学习。 相似文献
35.
针对解决中国武术散打王争霸赛与锦标赛赛制中存在的矛盾,既保证散打王参赛运动员的主体,又兼顾了锦标赛优秀运动员的参赛;既保证明星运动员的出场率,保证赛事水平、确保收视率,又能通过新赛制不断地推出新人.以散打王、锦标赛,以及相关项目的赛制为研究对象,通过积分排名的相应计算办法,确定参赛的运动员范围.进一步根据散王打竞赛市场特性的整体构思,设计新的赛制结构和进度,为探索符合市场要求的散打赛制体系提供理论依据. 相似文献
36.
Balázs R. Sziklai 《Journal of Informetrics》2021,15(2):101133
We present a novel algorithm to rank smaller academic entities such as university departments or research groups within a research discipline. The Weighted Top Candidate (WTC) algorithm is a generalisation of an expert identification method. The axiomatic characterisation of WTC shows why it is especially suitable for scientometric purposes. The key axiom is stability – the selected institutions support each other's membership. The WTC algorithm, upon receiving an institution citation matrix, produces a list of institutions that can be deemed experts of the field. With a parameter we can adjust how exclusive our list should be. By completely relaxing the parameter, we obtain the largest stable set – academic entities that can qualify as experts under the mildest conditions. With a strict setup, we obtain a short list of the absolute elite. We demonstrate the algorithm on a citation database compiled from game theoretic literature published between 2008–2017. By plotting the size of the stable sets with respect to exclusiveness, we can obtain an overview of the competitiveness of the field. The diagram hints at how difficult it is for an institution to improve its position. 相似文献
37.
With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified. 相似文献
38.
39.
This paper uses regression analysis to test if the universities performing less well according to Shanghai Jiao Tong University’s world universities league tables are able to catch up with the top performers, and to identify national and institutional factors that could affect this catching up process. We have constructed a dataset of 461 universities across 41 countries. We found consistent evidence of a moderate degree of catching up, especially amongst non-US universities. Larger universities as well as universities located in English speaking countries not only perform better on average, but also catch up more over 2003–2009. Universities located in lower income countries are also catching up more. The performance of private universities, as compared to that of public universities, varies substantially between the US and the other countries. 相似文献
40.
《Information processing & management》2023,60(1):103135
The pre-trained language models (PLMs), such as BERT, have been successfully employed in two-phases ranking pipeline for information retrieval (IR). Meanwhile, recent studies have reported that BERT model is vulnerable to imperceptible textual perturbations on quite a few natural language processing (NLP) tasks. As for IR tasks, current established BERT re-ranker is mainly trained on large-scale and relatively clean dataset, such as MS MARCO, but actually noisy text is more common in real-world scenarios, such as web search. In addition, the impact of within-document textual noises (perturbations) on retrieval effectiveness remains to be investigated, especially on the ranking quality of BERT re-ranker, considering its contextualized nature. To mitigate this gap, we carry out exploratory experiments on the MS MARCO dataset in this work to examine whether BERT re-ranker can still perform well when ranking text with noise. Unfortunately, we observe non-negligible effectiveness degradation of BERT re-ranker over a total of ten different types of synthetic within-document textual noise. Furthermore, to address the effectiveness losses over textual noise, we propose a novel noise-tolerant model, De-Ranker, which is learned by minimizing the distance between the noisy text and its original clean version. Our evaluation on the MS MARCO and TREC 2019–2020 DL datasets demonstrates that De-Ranker can deal with synthetic textual noise more effectively, with 3%–4% performance improvement over vanilla BERT re-ranker. Meanwhile, extensive zero-shot transfer experiments on a total of 18 widely-used IR datasets show that De-Ranker can not only tackle natural noise in real-world text, but also achieve 1.32% improvement on average in terms of cross-domain generalization ability on the BEIR benchmark. 相似文献