首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
“We the Media” networks are real time and open, and such networks lack a gatekeeper system. As netizens’ comments on emergency events are disseminated, negative public opinion topics and confrontations concerning those events also spread widely on “We the Media” networks. Gradually, this phenomenon has attracted scholarly attention, and all social circles attach importance to the phenomenon as well. In existing topic detection studies, a topic is mainly defined as an "event" from the perspective of news-media information flow, but in the “We the Media” era, there are often many different views or topics surrounding a specific public opinion event. In this paper, a study on the detection of public opinion topics in “We the Media” networks is presented, starting with the characteristics of the elements found in public opinions on “We the Media” networks; such public opinions are multidimensional, multilayered and possess multiple attributes. By categorizing the elements’ attributes using social psychology and system science categories as references, we build a multidimensional network model oriented toward the topology of public opinions on “We the Media” networks. Based on the real process by which multiple topics concerning the same event are generated and disseminated, we designed a topic detection algorithm that works on these multidimensional public opinion networks. As a case study, the “Explosion in Tianjin Port on August 12, 2015″ accident was selected to conduct empirical analyses on the algorithm's effectiveness. The theoretical and empirical research findings of this paper are summarized along the following three aspects. 1. The multidimensional network model can be used to effectively characterize the communication characteristics of multiple topics on “We the Media” networks, and it provided the modeling ideas for the present paper and for other related studies on “We the Media” public opinion networks. 2. Using the multidimensional topic detection algorithm, 70% of the public opinion topics concerning the case study event were effectively detected, which shows that the algorithm is effective at detecting topics from the information flow on “We the Media” networks. 3. By defining the psychological scores of single and paired Chinese keywords in public opinion information, the topic detection algorithm can also be used to judge the sentiment tendencies of each topic, which can facilitate a timely understanding of public opinion and reveal negative topics under discussion on “We the Media” networks.  相似文献   

2.
Emerging topic detection is a vital research area for researchers and scholars interested in searching for and tracking new research trends and topics. The current methods of text mining and data mining used for this purpose focus only on the frequency of which subjects are mentioned, and ignore the novelty of the subject which is also critical, but beyond the scope of a frequency study. This work tackles this inadequacy to propose a new set of indices for emerging topic detection. They are the novelty index (NI) and the published volume index (PVI). This new set of indices is created based on time, volume, frequency and represents a resolution to provide a more precise set of prediction indices. They are then utilized to determine the detection point (DP) of new emerging topics. Following the detection point, the intersection decides the worth of a new topic. The algorithms presented in this paper can be used to decide the novelty and life span of an emerging topic in a specific field. The entire comprehensive collection of the ACM Digital Library is examined in the experiments. The application of the NI and PVI gives a promising indication of emerging topics in conferences and journals.  相似文献   

3.
In this paper, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, “novelty” is redefined based on the proposed information patterns, and several different types of information patterns are given corresponding to different types of users’ information needs. Second, a thorough analysis of sentence level information patterns is elaborated using data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, a unified information-pattern-based approach to novelty detection (ip-BAND) is presented for both specific NE topics and more general topics. Experiments on novelty detection on data from the TREC 2002, 2003 and 2004 novelty tracks show that the proposed approach significantly improves the performance of novelty detection in terms of precision at top ranks. Future research directions are suggested.  相似文献   

4.
In this paper we introduce HEMOS (Humor-EMOji-Slang-based) system for fine-grained sentiment classification for the Chinese language using deep learning approach. We investigate the importance of recognizing the influence of humor, pictograms and slang on the task of affective processing of the social media. In the first step, we collected 576 frequent Internet slang expressions as a slang lexicon; then, we converted 109 Weibo emojis into textual features creating a Chinese emoji lexicon. In the next step, by performing two polarity annotations with new “optimistic humorous type” and “pessimistic humorous type” added to standard “positive” and “negative” sentiment categories, we applied both lexicons to attention-based bi-directional long short-term memory recurrent neural network (AttBiLSTM) and tested its performance on undersized labeled data. Our experimental results show that the proposed method can significantly improve the state-of-the-art methods in predicting sentiment polarity on Weibo, the largest Chinese social network.  相似文献   

5.
《Research Policy》2022,51(1):104393
In this paper we draw a parallel between the insights developed within the framework of the current COVID-19 health crisis and the views and insights developed with respect to the long term environmental crisis, the implications for science, technology and innovation (STI) policy, Christopher Freeman analyzed already in the early 90′s. With at the time of writing, the COVID-19 pandemic entering in many countries a third wave with a very differentiated implementation path of vaccination across rich and poor countries, drawing such a parallel remains of course a relatively speculative exercise. Nevertheless, based on the available evidence of the first wave of the pandemic, we feel confident that some lessons from the current health crisis and its parallels with the long-term environmental crisis can be drawn. The COVID-19 pandemic has also been described as a “syndemic”: a term popular in medical anthropology which marries the concept of ‘synergy’ with ‘epidemic’ and provides conceptually an interesting background for these posthumous Freeman reflections on crises. The COVID-19 crisis affects citizens in very different and disproportionate ways. It results not only in rising structural inequalities among social groups and classes, but also among generations. In the paper, we focus on the growing inequality within two particular groups: youngsters and the impact of COVID-19 on learning and the organization of education; and as mirror picture, the elderly many of whom witnessed despite strict confinement in long-term care facilities, high mortality following the COVID-19 outbreak. From a Freeman perspective, these inequality consequences of the current COVID-19 health crisis call for new social STI policies: for a new “corona version” of inclusion versus exclusion.  相似文献   

6.
Knowledge representation learning(KRL) transforms knowledge graph(KG) from symbol space to vector space. However, KRL under open world assumption(OWA) is deeply trapped in the dilemma of lack of labels due to difficulty or high cost in labeling. To address this problem, we propose KRL_MLCCL:Multi-Label Classification based on Contrastive Learning(CL) Knowledge Representation Learning method. Specifically, (1)we formalize a problem of solving true knowledge graph objects(KGOs) matchings(KGOMs) under the OWA in the original KGOM sample space(KGOMSS)(multi-label classification with one known true matching(positive-example)). (2)we solve the problem in the new KGOMSS, generated through augmenting the true matching according to CL’s idea(multi-label classification with multiple known true matching). (3)we score the true matchings based on hermitian inner product and softmax and minimize a negative logarithm likelihood loss to establish KRL_MLCCL model preliminarily. (4)we migrate the learned model back to the original KGOMSS to solve the true matching problem. We creatively design and apply a positive-example augmentation way of CL enabling KRL_MLCCL with back migration ability: “pulling KGOs in true matching close and pushing KGOs in false matching away”, which helps KRL out of the labels shortage dilemma faced in modeling. We also propose a negative-example noise filtering algorithm to enhance this ability. The open world entity prediction(OWEP) experiment on dataset FB15K-237-OWE shows that the performance of KRL_MLCCL is increased by 3% in Hits@10 and 1.32% in MRR compared with the state-of-the-art in the baselines. The experiments of OWEP in KG also show that KRL_MLCCL has a better back migration ability.  相似文献   

7.
Trends change rapidly in today’s world, prompting this key question: What is the mechanism behind the emergence of new trends? By representing real-world dynamic systems as complex networks, the emergence of new trends can be symbolized by vertices that “shine.” That is, at a specific time interval in a network’s life, certain vertices become increasingly connected to other vertices. This process creates new high-degree vertices, i.e., network stars. Thus, to study trends, we must look at how networks evolve over time and determine how the stars behave. In our research, we constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations resulted: (a) links are most prevalent among vertices that join a network at a similar time; (b) the rate that new vertices join a network is a central factor in molding a network’s topology; and (c) the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a flexible network-generation model based on large-scale, real-world data. This model gives a better understanding of how stars rise and fall within networks, and is applicable to dynamic systems both in nature and society.Multimedia Links▶ Video ▶ Interactive Data Visualization ▶ Data ▶ Code Tutorials  相似文献   

8.
Along with the proliferation of big data technology, organizations are involved in an overwhelming data ocean, the huge volume of data makes them at a loss in the face of frequent data breaches due to their failure of efficient data security management. Data classification has become a hot topic as a cornerstone of data protection especially in China in recent years, by categorizing information types and distinguishing protective measures at different classification levels. Both the text and tables of the promulgated data classification-related regulations (for simplicity, laws, regulations, policies, and standards are collectively referred to as “regulations”) contain a wealth of valuable information which can guide the work of data classification. To best assist data practitioners, in this paper, we automatically “grasp” expert experience on how to classify data from the analysis of such regulations. We design a framework, GENONTO, that automatically extracts data classification practices (DCPs), such as information types and their corresponding sensitive levels to construct an information type lexicon as well as to encode a generic ontology on top of 38 real-world regulations promulgated in China. GENONTO employs machine learning techniques and natural language processing (NLP) to parse unstructured text and tables. To our knowledge, GENONTO is the first work that explores critical information like the category and the sensitivity of information types from regulations, and organizes them in a structured form of ontology, characterizing the subsumptive relations between different information types. Our research helps provide a well-defined integrated view across regulations and bridges the gap between what experts say and how data practitioners do.  相似文献   

9.
周玉芳 《现代情报》2012,32(6):25-28,32
采用文献计量方法和关键词共现分析法,对被中国学术期刊全文数据库收录的核心期刊上发表的查新研究论文按发表时间、作者、高频关键词和研究内容进行统计分析。研究近21年来科技查新研究领域的现状、发展、热点和趋势。  相似文献   

10.
The era of big data has promoted the vigorous development of many industries, boosting the full potential of holistic data-driven analysis, yet it has also been accompanied by uninterrupted data breaches. In recent years, especially in China, data security laws and regulations have been promulgated continuously, and many of them have made clear requirements for data classification. As the support of data security initiatives, data classification has received the bulk of attention and has been hailed by all walks of life. There is a lot of valuable information contained in the issued regulations, which has already been well exploited in the research of privacy policy compliance verification, whereas few scholars have drawn on such information to guide data classification for security and compliance. As a step towards this direction, in this paper, we define two information types: one is “regulated data” mentioned in external laws and regulations, another is “non-regulated data”, indicating internal business data produced in a certain organization, and develop a novel generalization-enhanced decision tree classification algorithm called Gen-DT to classify data. In this way, data covered by the relevant data security regulatory mandates can be quickly identified and handled in full compliance as well. Furthermore, we evaluate the proposed compliance-driven data classification scheme using datasets collected from two famous universities in China and validate that our approach can achieve better performance than existing popular machine learning techniques.  相似文献   

11.
In a previous issue of Knowledge Management Research & Practice (KMRP), we analysed the content and keywords of all articles published in the first decade of KMRP. With this article, we extend our preliminary analysis to the citation and co-citations made by these articles. The study covers all the 256 articles published. The most cited article was A dynamic theory of organisational knowledge creation by Nonaka. The most cited KMRP article was by Nonaka and Toyama: The knowledge-creating theory revisited: knowledge creation as a synthesizing process. The co-citation analysis of the 100 most cited articles in KMRP publications showed that four groups of topics emerged, one around communities and situated learning, the second group around networks, knowledge transfer and research methods, a third group around the foundations of knowledge management and a fourth group around intellectual capital.  相似文献   

12.
POSIE (POSTECH Information Extraction System) is an information extraction system which uses multiple learning strategies, i.e., SmL, user-oriented learning, and separate-context learning, in a question answering framework. POSIE replaces laborious annotation with automatic instance extraction by the SmL from structured Web documents, and places the user at the end of the user-oriented learning cycle. Information extraction as question answering simplifies the extraction procedures for a set of slots. We introduce the techniques verified on the question answering framework, such as domain knowledge and instance rules, into an information extraction problem. To incrementally improve extraction performance, a sequence of the user-oriented learning and the separate-context learning produces context rules and generalizes them in both the learning and extraction phases. Experiments on the “continuing education” domain initially show that the F1-measure becomes 0.477 and recall 0.748 with no user training. However, as the size of the training documents grows, the F1-measure reaches beyond 0.75 with recall 0.772. We also obtain F-measure of about 0.9 for five out of seven slots on “job offering” domain.  相似文献   

13.
Stress and depression detection on social media aim at the analysis of stress and identification of depression tendency from social media posts, which provide assistance for the early detection of mental health conditions. Existing methods mainly model the mental states of the post speaker implicitly. They also lack the ability to mentalise for complex mental state reasoning. Besides, they are not designed to explicitly capture class-specific features. To resolve the above issues, we propose a mental state Knowledge–aware and Contrastive Network (KC-Net). In detail, we first extract mental state knowledge from a commonsense knowledge base COMET, and infuse the knowledge using Gated Recurrent Units (GRUs) to explicitly model the mental states of the speaker. Then we propose a knowledge–aware mentalisation module based on dot-product attention to accordingly attend to the most relevant knowledge aspects. A supervised contrastive learning module is also utilised to fully leverage label information for capturing class-specific features. We test the proposed methods on a depression detection dataset Depression_Mixed with 3165 Reddit and blog posts, a stress detection dataset Dreaddit with 3553 Reddit posts, and a stress factors recognition dataset SAD with 6850 SMS-like messages. The experimental results show that our method achieves new state-of-the-art results on all datasets: 95.4% of F1 scores on Depression_Mixed, 83.5% on Dreaddit and 77.8% on SAD, with 2.07% average improvement. Factor-specific analysis and ablation study prove the effectiveness of all proposed modules, while UMAP analysis and case study visualise their mechanisms. We believe our work facilitates detection and analysis of depression and stress on social media data, and shows potential for applications on other mental health conditions.  相似文献   

14.
【目的/意义】旨在将社会化问答社区中碎片化的答案关联起来,并为用户提供不同主题的高质量答案和更 好的知识服务。【方法/过程】首先,本研究利用Doc2vec算法计算答案之间的语义相似度,并构建答案语义网络。其 次,利用Louvain算法对答案语义网络进行社区划分,并用TextRank算法抽取各个主题下文档的关键词,使用词云 对每个主题进行可视化展示。最后,利用PageRank算法对聚类后的答案语义网络进行排序,从而实现答案文档的 主题聚合和排序。【结果/结论】本研究使用“知乎”上的问答数据进行了实证研究。结果表明,所提出的答案聚合和 排序方法不仅能够向用户直观地展示答案之间的关联强度和各个主题答案的主要内容,还能够为用户提供分主题 的答案排序结果,自动为用户筛选高质量的答案。【创新/局限】创新性地提出了答案语义网络,并基于答案语义网 络,提出了一种集聚合、主题可视化和排序于一体的答案知识组织方法。  相似文献   

15.
In the last 20 years, microfinance has moved from a promise to reality, although with ups and downs. This paper reviews 1874 papers published from 1997 to 2017 to perform a scientometric analysis of the microfinance field. The literature review is based on bibliometric data: keyword co-occurrence networks and citation networks were exploited for knowledge mapping. Data analysis shows the two research traditions: papers focusing on clients (welfarists) and papers focusing on microfinance entities themselves (institutionalists). Institutionalism, which had little presence in the early research in microfinance, now exhibits great strength. A chronological analysis reveals the evolution of the topics most interesting to researchers: the first stage described the innovations of the microcredit practices and their impact; the second and very expansive stage in which microfinance institutions’ peculiarities were analyzed; and nowadays the sector is mature but with negative aspects arising, such as mission drift. The keywords analysis discovers emerging research topics, shows the use of sophisticated techniques, and recognizes an emerging trend of the sector: achieving financial inclusion.  相似文献   

16.
Blogging has been an emerging media for people to express themselves. However, the presence of spam blogs (also known as splogs) may reduce the value of blogs and blog search engines. Hence, splog detection has recently attracted much attention from research. Most existing works on splog detection identify splogs using their content/link features and target on spam filters protecting blog search engines’ index from spam. In this paper, we propose a splog detection framework by monitoring the on-line search results. The novelty of our splog detection is that our detection capitalizes on the results returned by search engines. The proposed method therefore is particularly useful in detecting those splogs that have successfully slipped through the spam filters that are also actively generating spam-posts. More specifically, our method monitors the top-ranked results of a sequence of temporally-ordered queries and detects splogs based on blogs’ temporal behavior. The temporal behavior of a blog is maintained in a blog profile. Given blog profiles, splog detecting functions have been proposed and evaluated using real data collected from a popular blog search engine. Our experiments have demonstrated that splogs could be detected with high accuracy. The proposed method can be implemented on top of any existing blog search engine without intrusion to the latter.  相似文献   

17.
基于主题模型(LDA)的查新辅助分析系统设计研究   总被引:1,自引:0,他引:1  
马林山  郭磊 《现代情报》2018,38(2):111-115
文章概述了主题概率模型(LDA)的计算原理和方法,以及开源R语言中lda程序包采用快速压缩吉普抽样算法分析语料库的处理流程。设计了基于LDA模型的查新辅助分析系统设计功能框架,对其功能、编程实现思路和工作流程做了描述。最后结合课题查新实例,详述了采用LDA模型通过相关文献关键词进行潜在主题挖掘,对比分析课题研究内容,对课题给出客观评价的过程。结果表明,基于主题模型的查新辅助分析系统可以快速有效挖掘相关文献主题,降低查新员对相关文献的分析难度,提高课题评价的客观性,整体辅助分析效果良好。  相似文献   

18.
This paper focuses on temporal retrieval of activities in videos via sentence queries. Given a sentence query describing an activity, temporal moment retrieval aims at localizing the temporal segment within the video that best describes the textual query. This is a general yet challenging task as it requires the comprehending of both video and language. Existing research predominantly employ coarse frame-level features as the visual representation, obfuscating the specific details (e.g., the desired objects “girl”, “cup” and action “pour”) within the video which may provide critical cues for localizing the desired moment. In this paper, we propose a novel Spatial and Language-Temporal Tensor Fusion (SLTF) approach to resolve those issues. Specifically, the SLTF method first takes advantage of object-level local features and attends to the most relevant local features (e.g., the local features “girl”, “cup”) by spatial attention. Then we encode the sequence of the local features on consecutive frames by employing LSTM network, which can capture the motion information and interactions among these objects (e.g., the interaction “pour” involving these two objects). Meanwhile, language-temporal attention is utilized to emphasize the keywords based on moment context information. Thereafter, a tensor fusion network learns both the intra-modality and inter-modality dynamics, which can enhance the learning of moment-query representation. Therefore, our proposed two attention sub-networks can adaptively recognize the most relevant objects and interactions in the video, and simultaneously highlight the keywords in the query for retrieving the desired moment. Experimental results on three public benchmark datasets (obtained from TACOS, Charades-STA, and DiDeMo) show that the SLTF model significantly outperforms current state-of-the-art approaches, and demonstrate the benefits produced by new technologies incorporated into SLTF.  相似文献   

19.
朱光  潘高枝  李凤景 《情报科学》2022,40(4):127-137
【目的/意义】识别信息隐私研究领域的热点主题,梳理主题演化路径。【方法/过程】针对主题识别语义杂乱 等问题,提出时序关联与结构表征视角下的主题演化分析方法。首先利用LDA(Latent Dirichlet Allocation)模型识 别多时间窗口下的文献主题,进一步运用共词分析绘制语义更为独立的主题凝聚子群。在此基础上,从时序关联 维度计算相邻窗口下主题间的相似度,梳理演化路径;从结构表征维度,设计主题新颖度、中心性、影响力等计量指 标,探寻信息隐私前沿和热点主题的演化变迁。【结果/结论】实证分析结果表明,本文方法可以深度挖掘信息隐私 领域研究主题,从宏微观两个维度全面梳理主题的演化路径。研究有利于探测信息隐私研究的前沿。【创新/局限】 综合运用LDA主题模型与共词分析方法绘制主题凝聚子群,从时序演化和结构表征两个维度探寻主题演化路径。 未来研究中有待于引入多种数据源以对比主题差异,有待于引入多元组术语改善主题识别效果。  相似文献   

20.
纳米科技、基因工程、人工智能等争议性新兴技术的涌现,引发研究与实践关于新兴技术治理的讨论。责任式创新作为科技创新新视角,为新兴技术负外部性与风险性所引发治理挑战提供理论范式基础。研究聚焦新兴技术治理,基于责任式创新视角展开系统性评述,为新兴技术治理的系统化理论研究与政策实践输出启示。研究显示:已有新兴技术的治理研究主要形成了过程视角、行动视角、以及治理体系的结构视角三大研究基础;而以责任式创新范式为基础,新兴技术治理则涌现目标设定视角、行动主体参与视角、价值准则协调视角、过程响应视角、以及制度建构视角的五大研究分支的讨论。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号