首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
提出一种基于LDA主题模型的科技新闻主题分析方法,选取2009—2018年中、澳、英、美4国极地科考新闻数据,从主题类型和主题强度角度进行主题演化分析。在中文新闻中,极地测绘等主题的热度上升,极地冰川科考主题的热度下降;在英文新闻中,热门主题为极地冰川科考与极地海洋科考;其余主题热度相对稳定。研究结果表明,该方法可以有效识别科技新闻主题并揭示其演化趋势,可以有效改善网络环境下科技情报分析的自动化程度。  相似文献   

2.
“We the Media” networks are real time and open, and such networks lack a gatekeeper system. As netizens’ comments on emergency events are disseminated, negative public opinion topics and confrontations concerning those events also spread widely on “We the Media” networks. Gradually, this phenomenon has attracted scholarly attention, and all social circles attach importance to the phenomenon as well. In existing topic detection studies, a topic is mainly defined as an "event" from the perspective of news-media information flow, but in the “We the Media” era, there are often many different views or topics surrounding a specific public opinion event. In this paper, a study on the detection of public opinion topics in “We the Media” networks is presented, starting with the characteristics of the elements found in public opinions on “We the Media” networks; such public opinions are multidimensional, multilayered and possess multiple attributes. By categorizing the elements’ attributes using social psychology and system science categories as references, we build a multidimensional network model oriented toward the topology of public opinions on “We the Media” networks. Based on the real process by which multiple topics concerning the same event are generated and disseminated, we designed a topic detection algorithm that works on these multidimensional public opinion networks. As a case study, the “Explosion in Tianjin Port on August 12, 2015″ accident was selected to conduct empirical analyses on the algorithm's effectiveness. The theoretical and empirical research findings of this paper are summarized along the following three aspects. 1. The multidimensional network model can be used to effectively characterize the communication characteristics of multiple topics on “We the Media” networks, and it provided the modeling ideas for the present paper and for other related studies on “We the Media” public opinion networks. 2. Using the multidimensional topic detection algorithm, 70% of the public opinion topics concerning the case study event were effectively detected, which shows that the algorithm is effective at detecting topics from the information flow on “We the Media” networks. 3. By defining the psychological scores of single and paired Chinese keywords in public opinion information, the topic detection algorithm can also be used to judge the sentiment tendencies of each topic, which can facilitate a timely understanding of public opinion and reveal negative topics under discussion on “We the Media” networks.  相似文献   

3.
Social media data have recently attracted considerable attention as an emerging voice of the customer as it has rapidly become a channel for exchanging and storing customer-generated, large-scale, and unregulated voices about products. Although product planning studies using social media data have used systematic methods for product planning, their methods have limitations, such as the difficulty of identifying latent product features due to the use of only term-level analysis and insufficient consideration of opportunity potential analysis of the identified features. Therefore, an opportunity mining approach is proposed in this study to identify product opportunities based on topic modeling and sentiment analysis of social media data. For a multifunctional product, this approach can identify latent product topics discussed by product customers in social media using topic modeling, thereby quantifying the importance of each product topic. Next, the satisfaction level of each product topic is evaluated using sentiment analysis. Finally, the opportunity value and improvement direction of each product topic from a customer-centered view are identified by an opportunity algorithm based on product topics’ importance and satisfaction. We expect that our approach for product planning will contribute to the systematic identification of product opportunities from large-scale customer-generated social media data and will be used as a real-time monitoring tool for changing customer needs analysis in rapidly evolving product environments.  相似文献   

4.
With the emergence and development of deep generative models, such as the variational auto-encoders (VAEs), the research on topic modeling successfully extends to a new area: neural topic modeling, which aims to learn disentangled topics to understand the data better. However, the original VAE framework had been shown to be limited in disentanglement performance, bringing their inherent defects to a neural topic model (NTM). In this paper, we put forward that the optimization objectives of contrastive learning are consistent with two important goals (alignment and uniformity) of well-disentangled topic learning. Also, the optimization objectives of contrastive learning are consistent with two key evaluation measures for topic models, topic coherence and topic diversity. So, we come to the important conclusion that alignment and uniformity of disentangled topic learning can be quantified with topic coherence and topic diversity. Accordingly, we are inspired to propose the Contrastive Disentangled Neural Topic Model (CNTM). By representing both words and topics as low-dimensional vectors in the same embedding space, we apply contrastive learning to neural topic modeling to produce factorized and disentangled topics in an interpretable manner. We compare our proposed CNTM with strong baseline models on widely-used metrics. Our model achieves the best topic coherence scores under the most general evaluation setting (100% proportion topic selected) with 25.0%, 10.9%, 24.6%, and 51.3% improvements above the second-best models’ scores reported on four datasets of 20 Newsgroups, Web Snippets, Tag My News, and Reuters, respectively. Our method also gets the second-best topic diversity scores on the dataset of 20Newsgroups and Web Snippets. Our experimental results show that CNTM can effectively leverage the disentanglement ability from contrastive learning to solve the inherent defect of neural topic modeling and obtain better topic quality.  相似文献   

5.
Efficient topic modeling is needed to support applications that aim at identifying main themes from a collection of documents. In the present paper, a reduced vector embedding representation and particle swarm optimization (PSO) are combined to develop a topic modeling strategy that is able to identify representative themes from a large collection of documents. Documents are encoded using a reduced, contextual vector embedding from a general-purpose pre-trained language model (sBERT). A modified PSO algorithm (pPSO) that tracks particle fitness on a dimension-by-dimension basis is then applied to these embeddings to create clusters of related documents. The proposed methodology is demonstrated on two datasets. The first dataset consists of posts from the online health forum r/Cancer and the second dataset is a standard benchmark for topic modeling which consists of a collection of messages posted to 20 different news groups. When compared to the state-of-the-art generative document models (i.e., ETM and NVDM), pPSO is able to produce interpretable clusters. The results indicate that pPSO is able to capture both common topics as well as emergent topics. Moreover, the topic coherence of pPSO is comparable to that of ETM and its topic diversity is comparable to NVDM. The assignment parity of pPSO on a document completion task exceeded 90% for the 20NewsGroups dataset. This rate drops to approximately 30% when pPSO is applied to the same Skip-Gram embedding derived from a limited, corpus-specific vocabulary which is used by ETM and NVDM.  相似文献   

6.
Inferring users’ interests from their activities on social networks has been an emerging research topic in the recent years. Most existing approaches heavily rely on the explicit contributions (posts) of a user and overlook users’ implicit interests, i.e., those potential user interests that the user did not explicitly mention but might have interest in. Given a set of active topics present in a social network in a specified time interval, our goal is to build an interest profile for a user over these topics by considering both explicit and implicit interests of the user. The reason for this is that the interests of free-riders and cold start users who constitute a large majority of social network users, cannot be directly identified from their explicit contributions to the social network. Specifically, to infer users’ implicit interests, we propose a graph-based link prediction schema that operates over a representation model consisting of three types of information: user explicit contributions to topics, relationships between users, and the relatedness between topics. Through extensive experiments on different variants of our representation model and considering both homogeneous and heterogeneous link prediction, we investigate how topic relatedness and users’ homophily relation impact the quality of inferring users’ implicit interests. Comparison with state-of-the-art baselines on a real-world Twitter dataset demonstrates the effectiveness of our model in inferring users’ interests in terms of perplexity and in the context of retweet prediction application. Moreover, we further show that the impact of our work is especially meaningful when considered in case of free-riders and cold start users.  相似文献   

7.
Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address the limitations, we propose a neural topic modeling approach based on the Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM) in this paper. To our best knowledge, this work is the first attempt to use adversarial training for topic modeling. The proposed ATM models topics with dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. Besides, to illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. To validate the effectiveness of the proposed ATM, two topic modeling benchmark corpora and an event dataset are employed in the experiments. Our experimental results on benchmark corpora show that ATM generates more coherence topics (considering five topic coherence measures), outperforming a number of competitive baselines. Moreover, the experiments on event dataset also validate that the proposed approach is able to extract meaningful events from news articles.  相似文献   

8.
Information management is the management of organizational processes, technologies, and people which collectively create, acquire, integrate, organize, process, store, disseminate, access, and dispose of the information. Information management is a vast, multi-disciplinary domain that syndicates various subdomains and perfectly intermingles with other domains. This study aims to provide a comprehensive overview of the information management domain from 1970 to 2019. Drawing upon the methodology from statistical text analysis research, this study summarizes the evolution of knowledge in this domain by examining the publication trends as per authors, institutions, countries, etc. Further, this study proposes a probabilistic generative model based on structural topic modeling to understand and extract the latent themes from the research articles related to information management. Furthermore, this study graphically visualizes the variations in the topic prevalences over the period of 1970 to 2019. The results highlight that the most common themes are data management, knowledge management, environmental management, project management, service management, and mobile and web management. The findings also identify themes such as knowledge management, environmental management, project management, and social communication as academic hotspots for future research.  相似文献   

9.
宋凯  冉从敬 《情报科学》2022,40(7):136-144
【目的/意义】主题发展等级划分是信息组织研究的基础性问题,也是科研人员和科研管理部门进行研究选题和学科服务的重要工作,对学科研究主题进行高效的发展等级划分与趋势预测,能够帮助相关科研人员和机构把握学科领域研究态势,准确做出科研决策。【方法/过程】本文结合主题模型、Sen’s斜率估计法、Mann-Kendall法、指数平滑法,提出一种学科研究主题发展等级划分与趋势预测方法。首先,在主题识别的基础上,形成主题发文度和主题引文度两个指标,并参考波士顿矩阵对学科研究主题发展等级进行划分;然后,融合研究主题发文量、被引量和下载量,形成主题热力度指标,采用指数平滑法对研究主题未来发展态势进行预测。【结果/结论】以我国“智慧图书馆”研究的实验表明,本文所提方法能够对学科领域研究主题进行全方位、细粒度地发展等级划分和趋势预测。【创新/局限】本文所提方法对其他学科研究主题的分析具有普适性,为实现动态情报分析提供了新的视角,局限在于需要提高主题建模的可解读性,并进一步优化趋势预测方法。  相似文献   

10.
E-petitions have become a popular vehicle for political activism, but studying them has been difficult because efficient methods for analyzing their content are currently lacking. Researchers have used topic modeling for content analysis, but current practices carry some serious limitations. While modeling may be more efficient than manually reading each petition, it generally relies on unsupervised machine learning and so requires a dependable training and validation process. And so this paper describes a framework to train and validate Latent Dirichlet Allocation (LDA), the simplest and most popular topic modeling algorithm, using e-petition data. With rigorous training and evaluation, 87% of LDA-generated topics made sense to human judges. Topics also aligned well with results from an independent content analysis by the Pew Research Center, and were strongly associated with corresponding social events. Computer-assisted content analysts can benefit from our guidelines to supervise every process of training and evaluation of LDA. Software developers can benefit from learning the demands of social scientists when using LDA for content analysis. These findings have significant implications for developing LDA tools and assuring validity and interpretability of LDA content analysis. In addition, LDA topics can have some advantages over subjects extracted by manual content analysis by reflecting multiple themes expressed in texts, by extracting new themes that are not highlighted by human coders, and by being less prone to human bias.  相似文献   

11.
[研究目的]基于研究前沿多维特征指标测度识别研究前沿,分析科学前沿主题与技术前沿主题间的联系及其演化。[研究方法]首先,对论文和专利数据进行主题挖掘,从前沿主题特征出发通过新颖度、增长性、影响力和交叉性4个测度指标识别出研究前沿,分析科学前沿主题与技术前沿主题间的联系;其次,通过主题相似度计算、主题过滤等方法识别具有演化关系的主题对,并对前沿主题内容演化过程进行可视化分析。[研究结论]以固体氧化物燃料电池领域为例,识别出了包括固体氧化物燃料电池堆研究等在内的4个科学前沿主题与复合电极材料在内的4个技术前沿主题,科学研究与技术研究互相推进,呈现双螺旋式发展。  相似文献   

12.
学术型发明人是科学向技术转移的重要桥梁和媒介.本文从识别高影响力的学术型发明人入手,通过对高影响力学术发明人的论文和专利的持续研究主题的研究,来识别对比高科技领域科学与技术前沿主题的差别与联系,并分析识别学术型发明人的研究成果中的更具市场潜力的研究主题.研究发现,不同国家的学术型发明人的研究侧重点不同,有的偏重科学研究...  相似文献   

13.
阮光册  夏磊 《情报科学》2020,38(12):152-157
【Purpose/significance】In science research, the Interdisciplinary is becoming more and more widespread. The recognition of interdisciplinary topics can reveal the intrinsic relationship between disciplines and is the basis for promot⁃ ing interdisciplinary cooperation.【Method/process】This paper combines cluster analysis, LDA model, co-occurrence topic knowledge network and other methods to discover and reveal the topics of interdisciplinary and the social networks of these topics in their respective disciplines. In the experiment, this paper selected 56 CSSCI journal papers in 2017 for Library and Information Science and Education for cross-topic identification and knowledge network construction.【Result/conclu⁃ sion】By comparing with the keyword co-occurrence method, this paper has a better research result.  相似文献   

14.
Emerging topic detection is a vital research area for researchers and scholars interested in searching for and tracking new research trends and topics. The current methods of text mining and data mining used for this purpose focus only on the frequency of which subjects are mentioned, and ignore the novelty of the subject which is also critical, but beyond the scope of a frequency study. This work tackles this inadequacy to propose a new set of indices for emerging topic detection. They are the novelty index (NI) and the published volume index (PVI). This new set of indices is created based on time, volume, frequency and represents a resolution to provide a more precise set of prediction indices. They are then utilized to determine the detection point (DP) of new emerging topics. Following the detection point, the intersection decides the worth of a new topic. The algorithms presented in this paper can be used to decide the novelty and life span of an emerging topic in a specific field. The entire comprehensive collection of the ACM Digital Library is examined in the experiments. The application of the NI and PVI gives a promising indication of emerging topics in conferences and journals.  相似文献   

15.
对学科领域研究主题优先级进行战略分析,能够帮助科研人员及科研管理决策部门快速了解学科领域的研究态势、发现科学前沿,对提高科研产出起到积极的支持和促进作用。本文以图书情报学研究主题为例,采用主题提取与趋势分析相结合的方法,在提取学科主题基础上,从发文趋势和引文趋势两个维度,绘制含“研究贫乏区、热点区、冷点区、过热区”的我国图书情报学领域研究主题战略坐标。研究表明:本文提出的趋势战略坐标能够有效展示学科领域不同研究主题的发展阶段,全面、细致地呈现不同研究主题的发展等级。  相似文献   

16.
Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.  相似文献   

17.
科研选题是原始性创新的起点,一般普通高校原始性创新能力相对较弱,科研选题起点低,科学研究缺乏有效的整合。提升原始性创新能力应立足于学科的交叉、融合开展选题,依据市场需求,加强科学技术集成。突出区域优势与特色,从科研选题的源头上突出原始性创新能力的培养。  相似文献   

18.
【目的/意义】学科领域的研究前沿是科学研究的重点。鉴于识别研究前沿中缺乏将用户需求信息和发文 趋势结合的现况,本文提出基于引文量与发文量,利用Z分数与Sen’ s斜率的研究前沿识别方法。【方法/过程】利用 LDA模型提取学科领域的研究主题,以Z分数代表研究主题的活跃度,Sen’ s斜率代表研究主题的发文趋势,以图书 馆学领域为例,分析其研究主题在2012-2017年的发文量与引文量,实现对该领域研究前沿的识别。【结果/结论】图 书馆学领域的前沿主题有图书馆网络化与自动化、阅读推广、公共文化事业、信息资源建设与知识管理等。通过与 Citespace突发检测法相比,证明本文提出的方法在识别学科领域研究前沿时更全面。  相似文献   

19.
This paper examines how alternative food networks (AFNs) cultivate engagement on a social media platform. Using the method proposed in Kar and Dwivedi (2020) and Berente et al. (2019), we contribute to theory through combining exploratory text analysis with model testing. Using the theoretical lens of relationship cultivation and social media engagement, we collected 55,358 original Weibo posts by 90 farms and other AFN participants in China and used Latent Dirichlet Allocation (LDA) modeling for topic analysis. We then used the literature to map the topics with constructs and developed a theoretical model. To validate the theoretical model, a panel dataset was constructed on Weibo account and year level, with Chinese city-level yearly economic data included as control variables. A fixed effects panel data regression analysis was performed. The empirical results revealed that posts centered on openness/disclosure, sharing of tasks, and knowledge sharing result in positive levels of social media engagement. Posting about irrelevant information and advertising that uses repetitive wording in multiple posts had negative effects on engagement. Our findings suggest that cultivating engagement requires different relationship strategies, and social media platforms should be leveraged according to the context and the purpose of the social cause. Our research is also among the early studies that use both big data analysis of large quantities of textual data and model validation for theoretical insights.  相似文献   

20.
深入了解科学体系的内部结构,分析科学系统的运行规律及发展趋势是科技评价研究的重要课题。科技文献体系中的知识网络类型多种多样,他们之间既相互联系,又相互区别,不同要素组合构成多种类型的网络结构,如引文网络、共词网络和合作网络。探讨利用社会网络分析方法,例如权力指数、点出度与特征向量中心性、共词网络分析、合著网络分析、E-I指数等,对科技文献、科技期刊、学科研究热点、作者与机构合作情况以及学科交叉程度等进行科学合理、客观有效地评价、分析与测量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号