共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
5.
Kazimierz Zielinski Radoslaw Nielek Adam Wierzbicki Adam Jatowt 《Information processing & management》2018,54(1):14-36
Controversy is a complex concept that has been attracting attention of scholars from diverse fields. In the era of Internet and social media, detecting controversy and controversial concepts by the means of automatic methods is especially important. Web searchers could be alerted when the contents they consume are controversial or when they attempt to acquire information on disputed topics. Presenting users with the indications and explanations of the controversy should offer them chance to see the “wider picture” rather than letting them obtain one-sided views. In this work we first introduce a formal model of controversy as the basis of computational approaches to detecting controversial concepts. Then we propose a classification based method for automatic detection of controversial articles and categories in Wikipedia. Next, we demonstrate how to use the obtained results for the estimation of the controversy level of search queries. The proposed method can be incorporated into search engines as a component responsible for detection of queries related to controversial topics. The method is independent of the search engine’s retrieval and search results recommendation algorithms, and is therefore unaffected by a possible filter bubble.Our approach can be also applied in Wikipedia or other knowledge bases for supporting the detection of controversy and content maintenance. Finally, we believe that our results could be useful for social science researchers for understanding the complex nature of controversy and in fostering their studies. 相似文献
6.
Yogesh Sankarasubramaniam Krishnan Ramanathan Subhankar Ghosh 《Information processing & management》2014
Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization – they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence–concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization – users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles. 相似文献
7.
9.
10.
Sesia J. Zhao Kem Z.K. Zhang Christian Wagner Huaping Chen 《International Journal of Information Management》2013
The recent prevalence of wiki applications has demonstrated that wikis have high potential in facilitating knowledge creation, sharing, integration, and utilization. As wikis are increasingly adopted in contexts like business, education, research, government, and the public, how to improve user contribution becomes an important concern for researchers and practitioners. In this research, we focus on the quality aspect of user contribution: contribution value. Building upon the critical mass theory and research on editing activities in wikis, this study investigates whether user interests and resources can increase contribution value for different types of users. We develop our research model and empirically test it using survey method and content analysis method in Wikipedia. The results demonstrate that (1) for users who emphasize substantive edits, depth of interests and depth of resources play more influential roles in affecting contribution value; and (2) for users who focus on non-substantive edits, breadth of interests and breadth of resources are more important in increasing contribution value. The findings suggest that contribution value develops in different patterns for two types of users. Implications for both theory and practice are discussed. 相似文献
11.
利用WOS( Web of Science)和Wikipedia两种数据源,对大数据相关的内容进行词频统计、文本归类分析,得出两种数据源下大数据主题的共识和差异,并进一步梳理提炼出大数据领域的主题类别。共同的类别包括整体角度、技术层面、应用层面、实体和活动等,进一步细分的主题包括数据及数据源、大数据处理和分析技术、大数据系统和应用、国家地区以及企业的推动、社会和人的讨论、行业和学科变化等。最后论文还结合相关数据探讨了大数据领域的研究前沿。 相似文献
12.
13.
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting of a collection of fields, or arbitrary tree-structured documents that can be adequately modeled with such a flat structure. The approaches range from trivial modifications of text modeling to more elaborate schemes, specifically tailored to structured documents. We combine these methods with three different text classification algorithms and evaluate their performance on four standard datasets containing different types of semi-structured documents. The best results were obtained with stacking, an approach in which predictions based on different structural components are combined by a meta classifier. A further improvement of this method is achieved by including the flat text model in the final prediction. 相似文献
14.
This paper describes the development and testing of a novel Automatic Search Query Enhancement (ASQE) algorithm, the Wikipedia N Sub-state Algorithm (WNSSA), which utilises Wikipedia as the sole data source for prior knowledge. This algorithm is built upon the concept of iterative states and sub-states, harnessing the power of Wikipedia’s data set and link information to identify and utilise reoccurring terms to aid term selection and weighting during enhancement. This algorithm is designed to prevent query drift by making callbacks to the user’s original search intent by persisting the original query between internal states with additional selected enhancement terms. The developed algorithm has shown to improve both short and long queries by providing a better understanding of the query and available data. The proposed algorithm was compared against five existing ASQE algorithms that utilise Wikipedia as the sole data source, showing an average Mean Average Precision (MAP) improvement of 0.273 over the tested existing ASQE algorithms. 相似文献
15.
16.
尤洋 《科学技术与辩证法》2013,(6):17-20
以维基百科为代表的网络合作创造了一种全新的知识生产方式,并从认识论上提出了新的问题和挑战,文章关注了这些问题并对集体合作的认识论研究展开深入的分析和思考。文章指出维基百科与科学的四种认识文化差异,认为维基百科知识可以被视为一种认识论研究中的集体陈词,这种集体陈词具有较强的可辩护性,维基百科并不能取代专家作用,但是却能够生成一种认识平均主义模型从而打破知识特权实现知识权利的转移和流动。 相似文献
17.
In this article we examine contributions to Wikipedia through the prism of two divergent critical theorists: Jürgen Habermas and Mikhail Bakhtin. We show that, in slightly dissimilar ways, these theorists came to consider an “aesthetic for democracy” (Hirschkop 1999) or template for deliberative relationships that privileges relatively free and unconstrained dialogue to which every speaker has equal access and without authoritative closure. We employ Habermas's theory of “universal pragmatics” and Bakhtin's “dialogism” for analyses of contributions on Wikipedia for its entry on stem cells and transhumanism and show that the decision to embrace either unified or pluralistic forms of deliberation is an empirical matter to be judged in sociohistorical context, as opposed to what normative theories insist on. We conclude by stressing the need to be attuned to the complexity and ambiguity of deliberative relations online. 相似文献
18.
Rong Qu Yongyi Fang Wen Bai Yuncheng Jiang 《Information processing & management》2018,54(6):1002-1021
Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI. 相似文献
19.
Wikipedia is known as a free online encyclopedia. Wikipedia uses largely transparent writing and editing processes, which
aim at providing the user with quality information through a democratic collaborative system. However, one aspect of these
processes is not transparent—the identity of contributors, editors, and administrators. We argue that this particular lack
of transparency jeopardizes the validity of the information being produced by Wikipedia. We analyze the social and ethical
consequences of this lack of transparency in Wikipedia for all users, but especially students; we assess the corporate social
performance issues involved, and we propose courses of action to compensate for the potential problems. We show that Wikipedia
has the appearance, but not the reality, of responsible, transparent information production.
This paper’s authors are the same as those who authored Wood, D. J. and Queiroz, A. 2008. Information versus. knowledge: Transparency
and social responsibility issues for Wikipedia. In Antonino Vaccaro, Hugo Horta, and Peter Madsen (Eds.), Transparency, Information,
and Communication Technology (pp. 261–283). Charlottesville, VA: Philosophy Documentation Center.
Adele has changed her surname from Queiroz to Santana 相似文献
20.
Edgardo Ferretti Leticia Cagnina Viviana Paiz Sebastián Delle Donne Rodrigo Zacagnini Marcelo Errecalde 《Information processing & management》2018,54(6):1169-1181
In this work, we present the first quality flaw prediction study for articles containing the two most frequent verifiability flaws in Spanish Wikipedia: articles which do not cite any references or sources at all (denominated Unreferenced) and articles that need additional citations for verification (so-called Refimprove). Based on the underlying characteristics of each flaw, different state-of-the-art approaches were evaluated. For articles not citing any references, a well-established rule-based approach was evaluated and interesting findings show that some of them suffer from Refimprove flaw instead. Likewise, for articles that need additional citations for verification, the well-known PU learning and one-class classification approaches were evaluated. Besides, new methods were compared and a new feature was also proposed to model this latter flaw. The results showed that new methods such as under-bagged decision trees with sum or majority voting rules, biased-SVM, and centroid-based balanced SVM, perform best in comparison with the ones previously published. 相似文献