首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Textual data have been a major form to convey internet users’ content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.  相似文献   

2.
There is no doubt that scientific discoveries have always brought changes to society. New technologies help solve social problems such as transportation and education, while research brings benefits such as curing diseases and improving food production. Despite the impacts caused by science and society on each other, this relationship is rarely studied and they are often seen as different universes. Previous literature focuses only on a single domain, detecting social demands or research fronts for example, without ever crossing the results for new insights. In this work, we create a system that is able to assess the relationship between social and scholar data using the topics discussed in social networks and research topics. We use the articles as science sensors and humans as social sensors via social networks. Topic modeling algorithms are used to extract and label social subjects and research themes and then topic correlation metrics are used to create links between them if they have a significant relationship. The proposed system is based on topic modeling, labeling and correlation from heterogeneous sources, so it can be used in a variety of scenarios. We make an evaluation of the approach using a large-scale Twitter corpus combined with a PubMed article corpus. In both of them, we work with data of the Zika epidemic in the world, as this scenario provides topics and discussions on both domains. Our work was capable of discovering links between various topics of different domains, which suggests that some of the relationships can be automatically inferred by the sensors. Results can open new opportunities for forecasting social behavior, assess community interest in a scientific subject or directing research to the population welfare.  相似文献   

3.
邢湘萍  宁广德  徐斌 《现代情报》2010,30(8):145-148
主题图是一种新兴的知识组织技术,它是传统索引技术在网络环境下的发展。本文将主题图技术引入到邯郸地方文献资源组织当中,介绍了主题图(TopicMaps)相关理论,探讨邯郸地方文献的主题及类型、分析邯郸地方文献资源主题关联以及与其他专业文献相比所具有的特点,目的在于为用户提供符合认知特点的知识导航。  相似文献   

4.
Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address the limitations, we propose a neural topic modeling approach based on the Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM) in this paper. To our best knowledge, this work is the first attempt to use adversarial training for topic modeling. The proposed ATM models topics with dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. Besides, to illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. To validate the effectiveness of the proposed ATM, two topic modeling benchmark corpora and an event dataset are employed in the experiments. Our experimental results on benchmark corpora show that ATM generates more coherence topics (considering five topic coherence measures), outperforming a number of competitive baselines. Moreover, the experiments on event dataset also validate that the proposed approach is able to extract meaningful events from news articles.  相似文献   

5.
杜晖 《人天科学研究》2014,(12):132-134
基于Google Map API实现某一领域专家地域分布的可视化,构建领域专家地图。该地图是基于知识网络的专家库系统的核心模块,可视化、直观地呈现某学科领域专家的地域分布及有关信息,提供一种新的知识检索途径。  相似文献   

6.
With the emergence and development of deep generative models, such as the variational auto-encoders (VAEs), the research on topic modeling successfully extends to a new area: neural topic modeling, which aims to learn disentangled topics to understand the data better. However, the original VAE framework had been shown to be limited in disentanglement performance, bringing their inherent defects to a neural topic model (NTM). In this paper, we put forward that the optimization objectives of contrastive learning are consistent with two important goals (alignment and uniformity) of well-disentangled topic learning. Also, the optimization objectives of contrastive learning are consistent with two key evaluation measures for topic models, topic coherence and topic diversity. So, we come to the important conclusion that alignment and uniformity of disentangled topic learning can be quantified with topic coherence and topic diversity. Accordingly, we are inspired to propose the Contrastive Disentangled Neural Topic Model (CNTM). By representing both words and topics as low-dimensional vectors in the same embedding space, we apply contrastive learning to neural topic modeling to produce factorized and disentangled topics in an interpretable manner. We compare our proposed CNTM with strong baseline models on widely-used metrics. Our model achieves the best topic coherence scores under the most general evaluation setting (100% proportion topic selected) with 25.0%, 10.9%, 24.6%, and 51.3% improvements above the second-best models’ scores reported on four datasets of 20 Newsgroups, Web Snippets, Tag My News, and Reuters, respectively. Our method also gets the second-best topic diversity scores on the dataset of 20Newsgroups and Web Snippets. Our experimental results show that CNTM can effectively leverage the disentanglement ability from contrastive learning to solve the inherent defect of neural topic modeling and obtain better topic quality.  相似文献   

7.
Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.  相似文献   

8.
In this paper, we propose a new language model, namely, a dependency structure language model, for topic detection and tracking (TDT) to compensate for weakness of unigram and bigram language models. The dependency structure language model is based on the Chow expansion theory and the dependency parse tree generated by a linguistic parser. So, long-distance dependencies can be naturally captured by the dependency structure language model. We carried out extensive experiments to verify the proposed model on topic tracking and link detection in TDT. In both cases, the dependency structure language models perform better than strong baseline approaches.  相似文献   

9.
In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.  相似文献   

10.
有关人类心智阋读的起源一直是发展心理学与心灵哲学的“难问题”。匹配论作为一种旨在解释该问题的元理论,提出了自我与他人的经验匹配观,但囿于核心机制的匮乏而陷入困境。伴随具身认知科学的兴起和镜像神经元的发现,使得经典匹配论的假设得以在具身认知视角下进行检验。具身匹配论认为人类心智阅读源自镜像系统的“共享身体表征”作用。这种具身化的方式将自我与他人的经验直接匹配起来,并最终实现自我与他人的同一性。  相似文献   

11.
本体集成相关的基本概念研究   总被引:1,自引:0,他引:1  
本文通过综合研究国内外典型观点.分析了本体匹配、本体映射、本体联结、本体融合、本体集成以及本体协同等概念的含义,详细描述了相似概念的区别与关系,并提出了易混淆概念的较为合适的中文译名,给出了重要概念的较为准确的定义.  相似文献   

12.
严贝妮 《情报科学》2005,23(4):594-596
如SGML/XML描述数据结构一样,主题图(Topic Map)描述了语义链接网络结构。运用SGML/XML置标对原始数据进行加工创建信息,运用主题图对信息集合进行加工创建知识结构。本文对主题图基本内容、主题图模板与主题图的自动生成做了简要的介绍。  相似文献   

13.
论文根据客户协同产品创新特点,提出基于任务分组的产品创新任务与协同客户匹配策略。在此基础上,提出度量客户与任务之间匹配程度的模糊匹配度概念,并以最大化模糊匹配度为目标,建立匹配模型,采用排序方法进行求解,从而解决在一定的时间、成本等约束下,如何制定任务与客户匹配方案,以最大化客户与任务之间匹配度这一问题。最后进行实例研究,结果表明文中提出的模型和方法合理可行,易于操作,分析的结论能够为企业决策者为产品创新任务指派合适的客户提供参考和依据。  相似文献   

14.
This paper conducts a comparative literature survey of Open Government Data (OGD) and Freedom of Information (FOI), with a view to tracking the central themes in the two civil society campaigns. With seeming similarities and a growing popularity in research, the major themes framing research on the two movements have not clearly emerged. Topic modelling, text mining and document analysis methods are used to extract the themes as well as key named entities. The topics are subsequently labeled and with expert guidance, their semantic meaning are provided. The results indicate that the major theme in FOI research borders on issues relating to disclosure, publishing, access and cost of requests. On the other hand, themes in OGD research have largely centered on technology and related concepts. The approach also helped in determining key similarities and differences in the two campaigns as reported in research.  相似文献   

15.
This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.  相似文献   

16.
The amount of heterogeneous data that is available to organizations nowadays has made information management a seriously complicated task, yet crucial since this data can be a valuable asset for business intelligence. Ontologies can act as a semantically rich knowledge base in systems that specialize in information management. The present work investigates the potential of ontologies in supporting the information lifecycle within a corporate environment for business intelligence. The paper demonstrates the use of Heraclitus II, a framework that employs ontology management and evolution in the context of information management systems. The capabilities of the framework in facilitating information management and business intelligence are evaluated through a real-life case study from the life sciences industry.  相似文献   

17.
语义Web的创建需要一套共同的标准概念体系,即本体(Ontology)。本体的构造手段仍然是以手工构造为主,效率和准确率都非常低,很容易导致知识获取的瓶颈。本文给出一个半自动化的需人工干预的本体学习体系结构,采用平衡的协作建模方式来构造语义Web中的本体;介绍了基于以上体系结构的本体学习的处理过程,并讨论了领域概念抽取,概念之间关系的抽取等关键技术。  相似文献   

18.
提出以知识服务为主要考量因素对跨境电商供应链中的知识服务供方(跨境电商平台)与知识服务需方(跨境电商企业)进行双边匹配的决策方法。首先,介绍了跨境电商供应链中所涉及的知识服务并分析了知识服务供需匹配的必要性问题;进一步构建了跨境电商供应链中的知识服务供需双方相互评价的指标体系,然后,依据跨境市场的不确定性和模糊性,评价信息采用了区间数和语言变量的表示形式,并给出了相应的满意度计算公式;再根据双边满意度构建了多目标优化模型并给出了求解方法,最后通过算例分析表明决策方法的有效性和合理性。  相似文献   

19.
针对电子商务中买卖双方交易匹配问题,文章将主体给出的多粒度语言评价信息转化成三角模糊数形式,运用逼近理想解法计算主体综合评价与理想评价的相对贴近度,并以其表示匹配主体的满意度。构建综合考虑匹配主体满意度一致性和最大化的优化模型,求解该模型获得匹配结果。通过实例说明所提方法的可行性和有效性。  相似文献   

20.
Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号