首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Direct optimization of evaluation measures has become an important branch of learning to rank for information retrieval (IR). Since IR evaluation measures are difficult to optimize due to their non-continuity and non-differentiability, most direct optimization methods optimize some surrogate functions instead, which we call surrogate measures. A critical issue regarding these methods is whether the optimization of the surrogate measures can really lead to the optimization of the original IR evaluation measures. In this work, we perform formal analysis on this issue. We propose a concept named “tendency correlation” to describe the relationship between a surrogate measure and its corresponding IR evaluation measure. We show that when a surrogate measure has arbitrarily strong tendency correlation with an IR evaluation measure, the optimization of it will lead to the effective optimization of the original IR evaluation measure. Then, we analyze the tendency correlations of the surrogate measures optimized in a number of direct optimization methods. We prove that the surrogate measures in SoftRank and ApproxRank can have arbitrarily strong tendency correlation with the original IR evaluation measures, regardless of the data distribution, when some parameters are appropriately set. However, the surrogate measures in SVM MAP , DORM NDCG , PermuRank MAP , and SVM NDCG cannot have arbitrarily strong tendency correlation with the original IR evaluation measures on certain distributions of data. Therefore SoftRank and ApproxRank are theoretically sounder than SVM MAP , DORM NDCG , PermuRank MAP , and SVM NDCG , and are expected to result in better ranking performances. Our theoretical findings can explain the experimental results observed on public benchmark datasets.  相似文献   

[目的/意义]针对新生代用户群体对信息检索系统的需求,提出一种游戏化信息检索系统的理论模型,实现激发用户使用检索系统的兴趣,支持用户的信息检索与交互以及鼓励用户持续使用的目标。[方法/过程]基于游戏化基础理论、相关框架及信息检索系统的机制,对不同游戏元素进行组合,在考虑不同游戏元素与规则之间关系的前提下,设计具有特定功能的模块,实现游戏元素在非游戏情境中的应用。[结果/结论]为构建游戏化信息检索系统的理论模型,确定20种游戏元素,并按其功能进行组合,设计出12类游戏模块,包括5类简单模块和7类复合模块,使信息检索系统具备游戏功能。提出的构建思路和理论模型弥补当前游戏化信息检索领域研究的不足,为开发游戏化信息检索系统及后续的相关研究提供了理论框架。  相似文献   

Reviewer Assignment Problem (RAP) is a crucial problem for the conference due to time constraints and inadequate availability of expert reviewers. A fair evaluation of paper is key to an author's success, paper quality, conference reputation, and productive usage of funds. Recent studies reflect on the issue of reviewer bias in bids favoring authors belonging to the top institution and higher authority. Existing Conference Management Systems (CMS) are solely dependent upon self-declared Conflict of Interest (CoI) made by the authors, and reviewers. In literature, existing studies considers topic similarity, potential CoI, and reviewer's workload as trivial factors for ensuring review quality. Other factors include the diversity and authority of a reviewer. Past studies propose several theoretical optimization models. In this paper, we first individually model the factors using the best possible strategy in a constrained-based optimization framework. We tried to propose a completely novel framework that can be practically implemented to improve upon the performance of existing CMS. We map the RAP to an equilibrium multi-job assignment problem. Moreover, we propose a meta-heuristic greedy solution to solve it using weighted matrix factorization. We re-define an assignment quality metric required to validate such assignments. A real conference assignment data set collected from EasyChair is used for a comparative study. The TPMS is used as a baseline because it also uses similar factors, and due to its integration with widely used Microsoft CMS. The results show that the mean assignment quality of the proposed method is superior to other benchmark RAP systems.  相似文献   

In the field of scientometrics, impact indicators and ranking algorithms are frequently evaluated using unlabelled test data comprising relevant entities (e.g., papers, authors, or institutions) that are considered important. The rationale is that the higher some algorithm ranks these entities, the better its performance. To compute a performance score for an algorithm, an evaluation measure is required to translate the rank distribution of the relevant entities into a single-value performance score. Until recently, it was simply assumed that taking the average rank (of the relevant entities) is an appropriate evaluation measure when comparing ranking algorithms or fine-tuning algorithm parameters.With this paper we propose a framework for evaluating the evaluation measures themselves. Using this framework the following questions can now be answered: (1) which evaluation measure should be chosen for an experiment, and (2) given an evaluation measure and corresponding performance scores for the algorithms under investigation, how significant are the observed performance differences?Using two publication databases and four test data sets we demonstrate the functionality of the framework and analyse the stability and discriminative power of the most common information retrieval evaluation measures. We find that there is no clear winner and that the performance of the evaluation measures is highly dependent on the underlying data. Our results show that the average rank is indeed an adequate and stable measure. However, we also show that relatively large performance differences are required to confidently determine if one ranking algorithm is significantly superior to another. Lastly, we list alternative measures that also yield stable results and highlight measures that should not be used in this context.  相似文献   

[目的/意义] 为综合拓展我国公共图书馆文旅融合的深度和广度,从"馆内深耕"与"馆外联通"两大维度建构公共图书馆文旅融合框架。[方法/过程] 通过文献调研、网络调研和案例分析,对我国公共图书馆文旅融合理论研究与实践成果作出系统梳理,深入分析文旅融合语境下"馆内深耕"与"馆外联通"维度及其下属融合类型的内涵,探究各融合类型的形式/要素和应用。[结果/结论] "馆内深耕"维度关注公共图书馆自身的建设,依融合深度可分为信息型融合、沉浸型融合;"馆外联通"维度关注公共图书馆与其他组织机构的合作,据融合内容与形式可分为嵌入型融合、文创型融合、研学型融合。  相似文献   

Scientific journals are ordered by their impact factor while countries, institutions or researchers can be ranked by their scientific production, impact or by other simple or composite indicators as in the case of university rankings. In this paper, the theoretical framework proposed in Criado, R., Garcia, E., Pedroche, F. & Romance, M. (2013). A new method for comparing rankings through complex networks: Model and analysis of competitiveness of major European soccer leagues. Chaos, 23, 043114 for football competitions is used as a starting point to define a general index describing the dynamics or its opposite, stability, of rankings. Some characteristics to study rankings, ranking dynamics measures and axioms for such indices are presented. Furthermore, the notion of volatility of elements in rankings is introduced. Our study includes rankings with ties, entrants and leavers. Finally, some worked out examples are shown.  相似文献   

Knowledge flow between scientific disciplines has commonly been measured based on citation data. Previous studies using citing relationships have mostly considered direct citations but have paid little attention to indirect citations (IDC) to indicate how knowledge diffusion from one discipline to another via one or more intermediaries. In this study, we measured knowledge flow between disciplines from two perspectives: direct citations (DC) and discipline potential energy (DPE), which is proposed to combine both direct and indirect citations. Data were collected from the Web of Science (WoS) database. Findings include: (1) DPE overshadows previous measures by considering not only direct citations but also indirect citations between disciplines which was usually ignored in previous measures, and revealed that the knowledge contribution of some disciplines had been underestimated by previous measures, such as Physics and Engineering. (2) The proportion of IDC contribution is close to that of direct knowledge contribution when the discipline scale is removed, which suggests that it is essential to consider IDC to distinguish the knowledge relationship (net-outflow/inflow) between disciplines. (3) Both measurements show that Biology & Biochemistry has always been the top discipline with the highest net outflow of knowledge, which is inconsistent with the history of science that Mathematics, Physics and Chemistry would be the highest net outflow disciplines. The results show that even considering IDC does not fully reveal the knowledge contribution and academic influence of disciplines. This paper also analyzes the potential reasons for citation bias in revealing the contribution of disciplinary knowledge from a citation perspective. Therefore, caution should be taken in the use of citations as a primary measure of knowledge flow.  相似文献   

In this paper, we present a framework that can process a user query for retrieval of information from documents of different properties across multiple domains, with specific application to patent laws and regulations. The framework has three basic components. The first component is ontology mapping and generation. What happens is that the keywords entered by users are mapped into a subset of relevant keywords. This step is performed by looking up those words in an ontology database. The second component is the joint and cross search in various document domains; in our case, they are patents and scientific publications. The last component is to modify the search results by applying user feedback statistics. The results of feedback will be saved as metadata for future uses.A case example is given to demonstrate how results from multiple domain searches can be combined using ontology and cross referencing. We use an example of well-known biotechnology patents on erythropoietin (EPO) and give detailed analysis on each document domain with this keyword. Relationships between each domain are demonstrated.A user feedback mechanism is also discussed in this paper. The ability to take user feedback into the framework is important. There is no doubt that domain knowledge from expert or experienced users could be a very good compliment to the proposed system. Both direct and indirect user feedbacks are discussed.  相似文献   

A structured document retrieval (SDR) system aims to minimize the effort users spend to locate relevant information by retrieving parts of documents. To evaluate the range of SDR tasks, from element to passage to tree retrieval, numerous task-specific measures have been proposed. This has resulted in SDR evaluation measures that cannot easily be compared with respect to each other and across tasks. In previous work, we defined the SDR task of tree retrieval where passage and element are special cases. In this paper, we look in greater detail into tree retrieval to identify the main components of SDR evaluation: relevance, navigation, and redundancy. Our goal is to evaluate SDR within a single probabilistic framework based on these components. This framework, called Extended Structural Relevance (ESR), calculates user expected gain in relevant information depending on whether it is seen via hits (relevant results retrieved), unseen via misses (relevant results not retrieved), or possibly seen via near-misses (relevant results accessed via navigation). We use these expectations as parameters to formulate evaluation measures for tree retrieval. We then demonstrate how existing task-specific measures, if viewed as tree retrieval, can be formulated, computed and compared using our framework. Finally, we experimentally validate ESR across a range of SDR tasks.  相似文献   

The number of topics that a test collection contains has a direct impact on how well the evaluation results reflect the true performance of systems. However, large collections can be prohibitively expensive, so researchers are bound to balance reliability and cost. This issue arises when researchers have an existing collection and they would like to know how much they can trust their results, and also when they are building a new collection and they would like to know how many topics it should contain before they can trust the results. Several measures have been proposed in the literature to quantify the accuracy of a collection to estimate the true scores, as well as different ways to estimate the expected accuracy of hypothetical collections with a certain number of topics. We can find ad-hoc measures such as Kendall tau correlation and swap rates, and statistical measures such as statistical power and indexes from generalizability theory. Each measure focuses on different aspects of evaluation, has a different theoretical basis, and makes a number of assumptions that are not met in practice, such as normality of distributions, homoscedasticity, uncorrelated effects and random sampling. However, how good these estimates are in practice remains a largely open question. In this paper we first compare measures and estimators of test collection accuracy and propose unbiased statistical estimators of the Kendall tau and tau AP correlation coefficients. Second, we detail a method for stochastic simulation of evaluation results under different statistical assumptions, which can be used for a variety of evaluation research where we need to know the true scores of systems. Third, through large-scale simulation from TREC data, we analyze the bias of a range of estimators of test collection accuracy. Fourth, we analyze the robustness to statistical assumptions of these estimators, in order to understand what aspects of an evaluation are affected by what assumptions and guide in the development of new collections and new measures. All the results in this paper are fully reproducible with data and code available online.  相似文献   

[目的/意义] 移动图书馆服务平台的感知质量优化,关系到未来移动图书馆的发展。本文力图为服务质量优化决策提供实例支持,促进移动图书馆服务平台优化改进的有形化。[方法/过程] 参考ITIL管理框架,结合移动图书馆服务平台的生命周期,类比构建了感知质量的优化框架,并以ML-1移动图书馆APP客户端为实证对象,提出一些数理指标来分析移动图书馆平台的用户、技术实现人员、管理决策人员对优化认知的判断,验证优化框架的实施过程。[结果/结论] 论文构建出一个层层递进的移动图书馆服务平台用户感知质量优化框架,依托实证分析结果,证实服务质量优化实施方案的有效性。  相似文献   

A General Evaluation Framework for Topical Crawlers   总被引:10,自引:0,他引:10  
Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. As the Web mining field matures, the disparate crawling strategies proposed in the literature will have to be evaluated and compared on common tasks through well-defined performance measures. This paper presents a general framework to evaluate topical crawlers. We identify a class of tasks that model crawling applications of different nature and difficulty. We then introduce a set of performance measures for fair comparative evaluations of crawlers along several dimensions including generalized notions of precision, recall, and efficiency that are appropriate and practical for the Web. The framework relies on independent relevance judgements compiled by human editors and available from public directories. Two sources of evidence are proposed to assess crawled pages, capturing different relevance criteria. Finally we introduce a set of topic characterizations to analyze the variability in crawling effectiveness across topics. The proposed evaluation framework synthesizes a number of methodologies in the topical crawlers literature and many lessons learned from several studies conducted by our group. The general framework is described in detail and then illustrated in practice by a case study that evaluates four public crawling algorithms. We found that the proposed framework is effective at evaluating, comparing, differentiating and interpreting the performance of the four crawlers. For example, we found the IS crawler to be most sensitive to the popularity of topics.Partially supported by National Science Foundation CAREER grant No. IIS-0133124/0348940.  相似文献   

This paper discusses the factors affecting the adoption of electronic tax-filing systems. Using the technology acceptance model (TAM) as a theoretical framework, this study introduces “perceived credibility” as a new factor that reflects the user's intrinsic belief in the electronic tax-filing systems, and examines the effect of computer self-efficacy on the intention to use an electronic tax-filing system. Based on a sample of 260 users from a telephone interview, the results strongly support the extended TAM in predicting the intention of users to adopt electronic tax-filing systems. The results also demonstrate the significant effect that computer self-efficacy has on behavioral intention through perceived ease of use, perceived usefulness, and perceived credibility. Based on the findings of this study, implications for electronic tax filing in particular and for e-government services in general are discussed. Finally, this paper concludes by discussing limitations that could be addressed in future studies.  相似文献   

A recent line of e-government research has emphasized the importance of interorganizational information sharing in the public domain. This research extends these information-sharing dimensions to explore information sharing relative to service performance. It utilizes a time-critical information services (TCIS) conceptual framework as an analytical lens. TCIS highlights multiple dimensions of information sharing, including operational, organizational, and governance factors as well as timeliness and quality as key performance metrics. A case study approach was employed to examine the exchange of performance-related information in a key time information critical service: a county-wide emergency medical services (EMS) system. The paper first explains the theoretical foundations for the study, stemming from interorganizational systems (IOS) literature, e-government IOS, and even more specifically, IOS in emergency medical services (EMS). The paper discusses performance measures in EMS, describes the TCIS analytical lens, the study methodology, and the case study under investigation. Case study findings are reported along operational, organizational, and governance dimensions. In general, the case study illustrates promising factors that can enhance information sharing across organizations, while noting that considerable gaps remain in achieving an end-to-end IT-enabled performance approach. Future research should aim to better understand how to overcome these gaps, including addressing the usability constraints that can confront professionals working in time information critical circumstances, such as trauma conditions.  相似文献   

Consumer health information studies in library and information science (LIS) are typically not grounded within a theoretical framework. This article explains the importance of theory to LIS research in general, and the specific value of using theories from other disciplines to study consumers' health information-seeking behavior. The argument is supported with two examples: Miller's psychological theory of blunting and monitoring behavior and Granovetter's sociological theory of the strength of weak ties. These theories can be applied by practitioner-researchers to investigate a variety of research problems.  相似文献   

This research investigates how disciplinary contexts, institutional settings, and individual motivations all affect researchers' depositing their articles into an institutional repository (IR). This study employed the Theory of Planned Behavior as its main theoretical framework and proposed six hypotheses to explain how disciplinary, institutional, and individual factors influence researchers' article depositing behaviors through an IR. This research utilized an online survey as its data collection method, and a total of 221 survey responses from researchers in U.S. academic institutions were collected. The hypothesized relationships were then tested by using multiple regression analysis. This research found that perceived community benefit, perceived institutional support, and perceived career benefit significantly increases researchers' article depositing behaviors through an IR, and the perceived career risk significantly decreases researchers' article depositing behaviors through an IR. This research suggests that community benefit, institutional support, and career issues need to be considered to increase researchers' overall article sharing behaviors through an IR.  相似文献   

林芳 《图书情报工作》2015,59(20):60-65
[目的/意义]分析当前机构知识库中引入Altmetrics的主要模式和需要考虑的问题,为机构知识库引入Altmetrics的实践提供参考。[方法/过程]采用比较分析和案例分析方法,通过对香港大学学术库、匹兹堡大学机构知识库引入Altmetrics的具体实践进行分析,归纳机构知识库引入Altmetrics的模式以及各模式的特征和适用情形。[结果/结论]机构知识库引入Altmetrics有3种模式:嵌入式,直接嵌入已有altmetrics应用或代码;集成式,在机构知识库平台中集成altmetrics应用与数据;共享式,商业altmetrics平台与机构知识库共享对象元数据。机构知识库引入Altmetrics是机构知识库发展的趋势,模式二和模式三有融合的趋向。当前机构知识库引入Altmetrics时最重要的问题是在元数据结构层面要设计覆盖机构知识生产全过程的对象元数据结构。  相似文献   

研究发现,非相关文献知识发现有三个理论基础:检索理论、文献计量学理论和逻辑学理论.其中目标性检索策略、共现理论与三段论的逻辑推理是非相关文献知识发现的实际应用理论.目标性检索策略总的原则是缩小范围,提高主题关联度与准确性,这决定了非相关文献知识发现过程中过滤和排序方法的改进方向;而基于三段论逻辑推理的非完全形式化,非相关文献知识发现应改变目前的以完全自动化为目标的研究方向,实现高阶共现框架下的非相关文献知识发现的过滤和排序方法的优化,从而形成人工辅助下的更具实际应用价值的知识发现系统.  相似文献   

Ensuring the quality of information is a critical ethical issue for any information system. Research Information Management Systems (RIMSs) need to engage researchers in sharing research information and knowledge, and ensuring its quality. This paper introduces a theoretical framework for researcher participation in RIMSs. The framework is grounded in empirical research and can guide the design of RIMSs by defining typologies of researcher activities in RIMSs, related motivations, levels of participation, and metadata profiles. In addition, the framework defines discipline- and seniority-specific priorities for the researcher's activities and motivations. RIMS managers and scholarly communications librarians can use the framework to assemble RIMS service and metadata profiles that are tailored to the researcher's context. Likewise, the framework can guide the construction of communication messages personalized to the researcher's priorities and her or his motivations for engaging in a specific activity, which will enhance the researcher's engagement with the RIMS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号