We present an efficient document clustering algorithm that uses a term frequency vector for each document instead of using a huge proximity matrix. The algorithm has the following features: (1) it requires a relatively small amount of memory and runs fast, (2) it produces a hierarchy in the form of a document classification tree and (3) the hierarchy obtained by the algorithm explicitly reveals a collection structure. We confirm these features and thus show the algorithm's feasibility through clustering experiments in which we use two collections of Japanese documents, the sizes of which are 83,099 and 14,701 documents. We also introduce an application of this algorithm to a document browser. This browser is used in our Japanese-to-English translation aid system. The browsing module of the system consists of a huge database of Japanese news articles and their English translations. The Japanese article collection is clustered into a hierarchy by our method. Since each node in the hierarchy corresponds to a topic in the collection, we can use the hierarchy to directly access articles by topic. A user can learn general translation knowledge of each topic by browsing the Japanese articles and their English translations. We also discuss techniques of presenting a large tree-formed hierarchy on a computer screen.  相似文献   

The only way to be aware of the risks and threats of Facebook, the most commonly used social networking site in the world and Turkey, is to be a careful user changing the default settings or simply not to have a Facebook account. In Turkey, there is still no study in which personal information shared though social networking sites has been evaluated in terms of privacy. For this reason, the findings obtained of this study have a great importance in the general picture of the current situation and drawing attention to the risks of the issue in Turkey where there are no legal arrangements effectively protecting the users from such sites. This study aims to investigate the Facebook privacy of information professionals who are members of KUTUP-L, and to determine the sensitivity and level of awareness of information professionals in Turkey. Facebook user profiles of 400 information professionals, all KUTUP-L members, have been analyzed in a study examining 32 different privacy settings. A privacy score has been calculated for each user, and the relations between privacy results have been analyzed. The findings at the end of the study show that information professionals in Turkey do pay attention to privacy, and most of the users change the default settings in order to protect their personal information.  相似文献   

There is no doubt that scientific discoveries have always brought changes to society. New technologies help solve social problems such as transportation and education, while research brings benefits such as curing diseases and improving food production. Despite the impacts caused by science and society on each other, this relationship is rarely studied and they are often seen as different universes. Previous literature focuses only on a single domain, detecting social demands or research fronts for example, without ever crossing the results for new insights. In this work, we create a system that is able to assess the relationship between social and scholar data using the topics discussed in social networks and research topics. We use the articles as science sensors and humans as social sensors via social networks. Topic modeling algorithms are used to extract and label social subjects and research themes and then topic correlation metrics are used to create links between them if they have a significant relationship. The proposed system is based on topic modeling, labeling and correlation from heterogeneous sources, so it can be used in a variety of scenarios. We make an evaluation of the approach using a large-scale Twitter corpus combined with a PubMed article corpus. In both of them, we work with data of the Zika epidemic in the world, as this scenario provides topics and discussions on both domains. Our work was capable of discovering links between various topics of different domains, which suggests that some of the relationships can be automatically inferred by the sensors. Results can open new opportunities for forecasting social behavior, assess community interest in a scientific subject or directing research to the population welfare.  相似文献   

This paper investigates the research question if senders of large amounts of irrelevant or unsolicited information – commonly called “spammers” – distort the network structure of social networks. Two large social networks are analyzed, the first extracted from the Twitter discourse about a big telecommunication company, and the second obtained from three years of email communication of 200 managers working for a large multinational company. This work compares network robustness and the stability of centrality and interaction metrics, as well as the use of language, after removing spammers and the most and least connected nodes. The results show that spammers do not significantly alter the structure of the information-carrying network, for most of the social indicators. The authors additionally investigate the correlation between e-mail subject line and content by tracking language sentiment, emotionality, and complexity, addressing the cases where collecting email bodies is not permitted for privacy reasons. The findings extend the research about robustness and stability of social networks metrics, after the application of graph simplification strategies. The results have practical implication for network analysts and for those company managers who rely on network analytics (applied to company emails and social media data) to support their decision-making processes.  相似文献   

Raising the level of biological realism by utilizing the timing of individual spikes, spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks. In this work, a novel variable-structure-systems based approach for online learning of SNN is developed and tested on the identification and speed control of a real-time servo system. In this approach, neurocontroller parameters are used to define a time-varying sliding surface to lead the control error signal to zero. To prove the convergence property of the developed algorithm, the Lyapunov stability method is utilized. The results of the real-time experiments on the laboratory servo system for a number of different load conditions including nonlinear and time-varying ones indicate that the control structure exhibits a highly robust behavior against disturbances and sudden changes in the command signal.  相似文献   

We propose a setting for two-phase opinion dynamics in social networks, where a node’s final opinion in the first phase acts as its initial biased opinion in the second phase. In this setting, we study the problem of two camps aiming to maximize adoption of their respective opinions, by strategically investing on nodes in the two phases. A node’s initial opinion in the second phase naturally plays a key role in determining the final opinion of that node, and hence also of other nodes in the network due to its influence on them. However, more importantly, this bias also determines the effectiveness of a camp’s investment on that node in the second phase. In order to formalize this two-phase investment setting, we propose an extension of Friedkin–Johnsen model, and hence formulate the utility functions of the camps. We arrive at a decision parameter which can be interpreted as two-phase Katz centrality. There is a natural tradeoff while splitting the available budget between the two phases. A lower investment in the first phase results in worse initial biases in the network for the second phase. On the other hand, a higher investment in the first phase spares a lower available budget for the second phase, resulting in an inability to fully harness the influenced biases. We first analyze the non-competitive case where only one camp invests, for which we present a polynomial time algorithm for determining an optimal way to split the camp’s budget between the two phases. We then analyze the case of competing camps, where we show the existence of Nash equilibrium and that it can be computed in polynomial time under reasonable assumptions. We conclude our study with simulations on real-world network datasets, in order to quantify the effects of the initial biases and the weightage attributed by nodes to their initial biases, as well as that of a camp deviating from its equilibrium strategy. Our main conclusion is that, if nodes attribute high weightage to their initial biases, it is advantageous to have a high investment in the first phase, so as to effectively influence the biases to be harnessed in the second phase.  相似文献   

土地整治重大项目相关风险的识别与管控是确保项目效益实现的重要基础。本研究基于全生命周期,在识别重大项目实施风险的基础上引入社会网络模型,分析各风险之间的关系与影响,并结合案例对关键风险进行评价。研究结果显示:① 基于项目阶段与要素整合两个维度,可依据风险主体与风险类型识别出39项关键风险因素;② 社会网络分析结果表明各风险之间联系紧密,度差最大的政府缺乏沟通风险对其他风险的影响最大,中介中心度最高的施工单位缺乏沟通风险对其他风险的控制能力最强,代理特性最高的政府缺乏沟通风险在协调与外部风险的矛盾时占据了优势地位;③ 综合凝聚特性与代理特性,施工单位缺乏沟通是影响权重最大的风险,案例表明,通过健全多方主体需求表达机制、构建多维资金保障体系、加强工程质量管理、落实评估机制等方式,控制影响权重排名前十位的关键风险因素可有效降低至少7.12%的土地整治重大项目风险综合指数值。因此,有必要加强土地整治重大项目实施过程中的风险管理,以保障项目效益的实现。  相似文献   

Research typically focuses on one medium. But in today's digital media environment, people use and are influenced by their experience with multiple systems. Building on media ecology research, we introduce the notion of integrated media effects. We draw on resource dependence and homophily theories to analyze the mechanisms that connect media systems. To test the integrated media effects, we examine the relationships between news media visibility and social media visibility and hyperlinking patterns among 410 nongovernmental organization (NGO) websites in China. NGOs with greater news media visibility and more social media followers receive significantly more hyperlinks. Further, NGOs with a similar number of social media followers prefer to hyperlink to each other. The results suggest that both news media and social media systems are related to the configuration of hyperlink networks, providing support for the integrated media effects described. Implications for the study of hyperlink networks, online behaviors of organizations, and public relations are drawn from the results.  相似文献   

The research field of crisis informatics examines, amongst others, the potentials and barriers of social media use during disasters and emergencies. Social media allow emergency services to receive valuable information (e.g., eyewitness reports, pictures, or videos) from social media. However, the vast amount of data generated during large-scale incidents can lead to issue of information overload. Research indicates that supervised machine learning techniques are suitable for identifying relevant messages and filter out irrelevant messages, thus mitigating information overload. Still, they require a considerable amount of labeled data, clear criteria for relevance classification, a usable interface to facilitate the labeling process and a mechanism to rapidly deploy retrained classifiers. To overcome these issues, we present (1) a system for social media monitoring, analysis and relevance classification, (2) abstract and precise criteria for relevance classification in social media during disasters and emergencies, (3) the evaluation of a well-performing Random Forest algorithm for relevance classification incorporating metadata from social media into a batch learning approach (e.g., 91.28%/89.19% accuracy, 98.3%/89.6% precision and 80.4%/87.5% recall with a fast training time with feature subset selection on the European floods/BASF SE incident datasets), as well as (4) an approach and preliminary evaluation for relevance classification including active, incremental and online learning to reduce the amount of required labeled data and to correct misclassifications of the algorithm by feedback classification. Using the latter approach, we achieved a well-performing classifier based on the European floods dataset by only requiring a quarter of labeled data compared to the traditional batch learning approach. Despite a lesser effect on the BASF SE incident dataset, still a substantial improvement could be determined.  相似文献   

Online games have created significant opportunities for electronic commerce managers. The degree to which online gamers regard their avatars—their gaming representations— as themselves is known to be influential to gamers’ behavior but little is known about how such identification impacts online gamer loyalty (i.e., gamers’ continued intention to play). This study filled this gap by developing its research framework from the perspective of the social identity theory and the social capital theory. Responses from 1384 online gamers were collected, and structural equation modeling was used to test the hypotheses. The analytical results indicate that avatar identification (the degree to which users regard avatars as themselves) is positively related to participation in gaming communities and social presence (the degree of awareness of other persons and interpersonal relationships). These were further positively related to online gamer loyalty. This study is the first using the two theories, i.e., the social identity and social capital theoretical perspectives, to clarify the mechanism underlying the impact of avatar identification on online gamer loyalty, assisting electronic commerce managers to create a loyal user base.  相似文献   

This paper presents a survey of previous studies done on the problem of tracking community evolution over time in dynamic social networks. This problem is of crucial importance in the field of social network analysis. The goal of our paper is to classify existing methods dealing with the issue. We propose a classification of various methods for tracking community evolution in dynamic social networks into four main approaches using as a criterion the functioning principle: the first one is based on independent successive static detection and matching; the second is based on dependent successive static detection; the third is based on simultaneous study of all stages of community evolution; finally, the fourth and last one concerns methods working directly on temporal networks. Our paper starts by giving basic concepts about social networks, community structure and strategies for evaluating community detection methods. Then, it describes the different approaches, and exposes the strengths as well as the weaknesses of each.  相似文献   

Probabilistic topic models are unsupervised generative models which model document content as a two-step generation process, that is, documents are observed as mixtures of latent concepts or topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingual to multilingual settings. Novel topic models have been designed to work with parallel and comparable texts. We define multilingual probabilistic topic modeling (MuPTM) and present the first full overview of the current research, methodology, advantages and limitations in MuPTM. As a representative example, we choose a natural extension of the omnipresent LDA model to multilingual settings called bilingual LDA (BiLDA). We provide a thorough overview of this representative multilingual model from its high-level modeling assumptions down to its mathematical foundations. We demonstrate how to use the data representation by means of output sets of (i) per-topic word distributions and (ii) per-document topic distributions coming from a multilingual probabilistic topic model in various real-life cross-lingual tasks involving different languages, without any external language pair dependent translation resource: (1) cross-lingual event-centered news clustering, (2) cross-lingual document classification, (3) cross-lingual semantic similarity, and (4) cross-lingual information retrieval. We also briefly review several other applications present in the relevant literature, and introduce and illustrate two related modeling concepts: topic smoothing and topic pruning. In summary, this article encompasses the current research in multilingual probabilistic topic modeling. By presenting a series of potential applications, we reveal the importance of the language-independent and language pair independent data representations by means of MuPTM. We provide clear directions for future research in the field by providing a systematic overview of how to link and transfer aspect knowledge across corpora written in different languages via the shared space of latent cross-lingual topics, that is, how to effectively employ learned per-topic word distributions and per-document topic distributions of any multilingual probabilistic topic model in various cross-lingual applications.  相似文献   

Political polarization remains perhaps the “greatest barrier” to effective COVID-19 pandemic mitigation measures in the United States. Social media has been implicated in fueling this polarization. In this paper, we uncover the network of COVID-19 related news sources shared to 30 politically biased and 2 neutral subcommunities on Reddit. We find, using exponential random graph modeling, that news sources associated with highly toxic – “rude, disrespectful” – content are more likely to be shared across political subreddits. We also find homophily according to toxicity levels in the network of online news sources. Our findings suggest that news sources associated with high toxicity are rewarded with prominent positions in the resultant network. The toxicity in COVID-19 discussions may fuel political polarization by denigrating ideological opponents and politicizing responses to the COVID-19 pandemic, all to the detriment of mitigation measures. Public health practitioners should monitor toxicity in public online discussions to familiarize themselves with emerging political arguments that threaten adherence to public health crises management. We also recommend, based on our findings, that social media platforms algorithmically promote neutral and scientific news sources to reduce toxic discussion in subcommunities and encourage compliance with public health recommendations in the fight against COVID-19.  相似文献   

The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate.Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.  相似文献   

Research into hyperlink interaction patterns has been particularly interested in whether they integrate the online space or segregate it into “echo chambers.” Concentrating on contentious politics in national settings, the existing studies have mainly examined the relationships between domestic actors, mostly bloggers. This study seeks to expand the focus by including several actor types, allowing their connective actions to reach beyond national borders, and employing a comparative approach that contrasts high- with low-contentious contexts. Analyzing climate change hyperlink networks originating in the US and Switzerland, the results show that their transnational dimension plays a crucial role in polarizing the discourse, regardless of the specific political context. We find similar patterns that segregate climate advocates from skeptics and lead to distinct transnational relationships within the camps. The results demonstrate that countermovement actors in particular are able to forge strong transnational alliances.  相似文献   

The impact of research work is related to a scholar's reputation and future promotions. Greater research impact not only inspires scholars to continue their research, but also increases the possibility of a larger research budget from sponsors. Given the importance of research impact, this study proposes that utilizing social capital embedded in a social structure is an effective way to achieve more research impact. The contribution of this study is to define six indicators of social capital (degree centrality, closeness centrality, betweenness centrality, prolific co-author count, team exploration, and publishing tenure) and investigate how these indicators interact and affect citations for publications. A total of 137 Information Systems scholars from the Social Science Citation Index database were selected to test the hypothesized relationships. The results show that betweenness centrality plays the most important role in taking advantage of non-redundant resources in a co-authorship network, thereby significantly affecting citations for publications. In addition, we found that prolific co-author count, team exploration, and publishing tenure all have indirect effects on citation count. Specifically, co-authoring with prolific scholars helps researchers develop centralities and, in turn, generate higher numbers of citations. Researchers with longer publishing tenure tend to have higher degree centrality. When they collaborate more with different scholars, they achieve more closeness and betweenness centralities, but risk being distrusted by prolific scholars and losing chances to co-author with them. Finally, implications of findings and recommendations for future research are discussed.  相似文献   

This paper analyses how the knowledge shared between employees and suppliers within a private enterprise social network affects process improvement. Data was collected from internal documents, and the internal and external enterprise social networks used by an international insurance company; the average cycle time for handling 8494 claims and 3240 messages posted on the internal and external social networks was analysed. Social network analysis techniques were combined with principal component analysis and structural equation modeling, and the results demonstrate that the knowledge shared within the internal and external social network can explain 35.10% of process improvement variability, while the knowledge shared within the internal social network explains 89.90% of external social network variability. The analysis also demonstrates that: (i) the knowledge shared among employees positively affects process improvement; (ii) the knowledge shared among suppliers negatively affects process improvement; and (iii) the knowledge shared among employees positively affects the knowledge shared among supply chain members. These findings have theoretical and practical implications. They extend the literature in the knowledge management and information management field by offering empirical evidence of how the knowledge shared through an enterprise social network affects business process improvement, using the objective data provided by Yammer. They also provide a strategic tool for managers that will allow them to better understand how they can use the enterprise social network for business processes improvement.  相似文献   

Gamification planning has been a topic of discussion in the last years since it can be used to increase performance, engagement, and motivation of end users. When properly applied in educational settings, gamification can lead to better learning. Furthermore, it can be boosted when tied to social networks. However, according to the literature, there are three main concerns regarding this topic: (a) instructors and teachers does not have the resources to plan and develop gamification strategies into their classes; (b) gamification needs a systematic approach to achieve the desired positive results; and (c) inexistence of systematic approaches that connect and help in the design of gamification and social network tasks within these contexts. Thus, this work proposes a solution to help instructors and teachers to plan and deploy gamification concepts with social network features in learning environments. In this paper, we detailed our approach depicting the set of items to analyze and compare it with other solutions that are focused on education. Then, it was conducted a case study over a programming course (N = 40) to analyze the planning and deployment phases. Our results demonstrated that our approach is the first to consider the stakeholders (i.e. instructors and teachers) as part of the process. Moreover, even though there are still some obstacles to overcome, the gamified strategies that were created achieved positive acceptance among the students and professor.  相似文献   

This paper describes an applied document filtering system embedded in an operational watch center that monitors disease outbreaks worldwide. At the initial time of this writing, the system effectively supported monitoring of 23 geographic regions by filtering documents in several thousand daily news sources in 11 different languages. This paper describes the filtering algorithm, statistical procedures for estimating Precision and Recall in an operational environment, summarizes operational performance data and suggests lessons learned for other applications of document filtering technology. Overall, these results are interpreted as supporting the general utility of document filtering and information retrieval technology and offers recommendations for future applications of this technology.  相似文献   

