首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Predicting information cascade popularity is a fundamental problem in social networks. Capturing temporal attributes and cascade role information (e.g., cascade graphs and cascade sequences) is necessary for understanding the information cascade. Current methods rarely focus on unifying this information for popularity predictions, which prevents them from effectively modeling the full properties of cascades to achieve satisfactory prediction performances. In this paper, we propose an explicit Time embedding based Cascade Attention Network (TCAN) as a novel popularity prediction architecture for large-scale information networks. TCAN integrates temporal attributes (i.e., periodicity, linearity, and non-linear scaling) into node features via a general time embedding approach (TE), and then employs a cascade graph attention encoder (CGAT) and a cascade sequence attention encoder (CSAT) to fully learn the representation of cascade graphs and cascade sequences. We use two real-world datasets (i.e., Weibo and APS) with tens of thousands of cascade samples to validate our methods. Experimental results show that TCAN obtains mean logarithm squared errors of 2.007 and 1.201 and running times of 1.76 h and 0.15 h on both datasets, respectively. Furthermore, TCAN outperforms other representative baselines by 10.4%, 3.8%, and 10.4% in terms of MSLE, MAE, and R-squared on average while maintaining good interpretability.  相似文献   

2.
Research trends are the keys for researchers to decide their research agenda. However, only a few works have tried to quantify how scholars follow the research trends. We address this question by proposing a novel measurement for quantifying how a scientific entity (paper or researcher) follows the hot topics in a research field. Based on extended dynamic topic modeling, the degree of hotness tracing of papers and scholars is explored from three perspectives: mainstream, short-term direction, and long-term direction. By analyzing a large-scale dataset containing more than 270,000 papers and 45,000 authors in Computer Vision (CV), we found that the authors’ orientation is more in the established mainstream rather than based on incremental directions and makes little difference in the choice of long-term or short-term direction. Moreover, we identified six groups of researchers in the CV community by clustering research behavior, who differed significantly in their patterns of orientation, topic selection, and impact. This study provides a new quantitative method for analyzing topic trends and scholars’ research interests, capturing the diversity of research behavior patterns to address the phenomenon of canonical and ubiquitous progress in research fields.  相似文献   

3.
Nowadays, signed network has become an important research topic because it can reflect more complex relationships in reality than traditional network, especially in social networks. However, most signed network methods that achieve excellent performance through structure information learning always neglect neutral links, which have unique information in social networks. At the same time, previous approach for neutral links cannot utilize the graph structure information, which has been proved to be useful in node embedding field. Thus, in this paper, we proposed the Signed Graph Convolutional Network with Neutral Links (NL-SGCN) to address the structure information learning problem of neutral links in signed network, which shed new insight on signed network embedding. In NL-SGCN, we learn two representations for each node in each layer from both inner character and outward attitude aspects and propagate their information by balance theory. Among these three types of links, information of neutral links will be limited propagated by the learned coefficient matrix. To verify the performance of the proposed model, we choose several classical datasets in this field to perform empirical experiment. The experimental result shows that NL-SGCN significantly outperforms the existing state-of-the-art baseline methods for link prediction in signed network with neutral links, which supports the efficacy of structure information learning in neutral links.  相似文献   

4.
5.
Efficient topic modeling is needed to support applications that aim at identifying main themes from a collection of documents. In the present paper, a reduced vector embedding representation and particle swarm optimization (PSO) are combined to develop a topic modeling strategy that is able to identify representative themes from a large collection of documents. Documents are encoded using a reduced, contextual vector embedding from a general-purpose pre-trained language model (sBERT). A modified PSO algorithm (pPSO) that tracks particle fitness on a dimension-by-dimension basis is then applied to these embeddings to create clusters of related documents. The proposed methodology is demonstrated on two datasets. The first dataset consists of posts from the online health forum r/Cancer and the second dataset is a standard benchmark for topic modeling which consists of a collection of messages posted to 20 different news groups. When compared to the state-of-the-art generative document models (i.e., ETM and NVDM), pPSO is able to produce interpretable clusters. The results indicate that pPSO is able to capture both common topics as well as emergent topics. Moreover, the topic coherence of pPSO is comparable to that of ETM and its topic diversity is comparable to NVDM. The assignment parity of pPSO on a document completion task exceeded 90% for the 20NewsGroups dataset. This rate drops to approximately 30% when pPSO is applied to the same Skip-Gram embedding derived from a limited, corpus-specific vocabulary which is used by ETM and NVDM.  相似文献   

6.
Dynamic link prediction is a critical task in network research that seeks to predict future network links based on the relative behavior of prior network changes. However, most existing methods overlook mutual interactions between neighbors and long-distance interactions and lack the interpretability of the model’s predictions. To tackle the above issues, in this paper, we propose a temporal group-aware graph diffusion network(TGGDN). First, we construct a group affinity matrix to describe mutual interactions between neighbors, i.e., group interactions. Then, we merge the group affinity matrix into the graph diffusion to form a group-aware graph diffusion, which simultaneously captures group interactions and long-distance interactions in dynamic networks. Additionally, we present a transformer block that models the temporal information of dynamic networks using self-attention, allowing the TGGDN to pay greater attention to task-related snapshots while also providing interpretability to better understand the network evolutionary patterns. We compare the proposed TGGDN with state-of-the-art methods on five different sizes of real-world datasets ranging from 1k to 20k nodes. Experimental results show that TGGDN achieves an average improvement of 8.3% and 3.8% in terms of ACC and AUC on all datasets, respectively, demonstrating the superiority of TGGDN in the dynamic link prediction task.  相似文献   

7.
田亚丹 《情报科学》2021,39(6):123-133
【目的/意义】针对现有主题演化方法难以满足预测目的的需求,本文从知识动态发展的角度出发,构建知 识主题演化预测模型,为探究科学领域发展脉络与研究趋势提供方法。【方法/过程】通过Lda模型抽取知识主题,利 用马尔可夫和隐马尔可夫构建主题稳态与主题热度的演化预测模型。【结果/结论】以云计算领域的科学文献作为 实证分析对象,结果表明本模型可以根据历史数据来预测知识主题稳态分布情况与未来热度趋势,且在热度预测 精度上较灰色模型更高。【创新/局限】本文只考虑了横向主题内部的热度高低变化,没有进行纵向维度上各知识主 题间的对比。  相似文献   

8.
We propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words’ local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users’ reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training.  相似文献   

9.
宋凯  冉从敬 《情报科学》2022,40(7):136-144
【目的/意义】主题发展等级划分是信息组织研究的基础性问题,也是科研人员和科研管理部门进行研究选题和学科服务的重要工作,对学科研究主题进行高效的发展等级划分与趋势预测,能够帮助相关科研人员和机构把握学科领域研究态势,准确做出科研决策。【方法/过程】本文结合主题模型、Sen’s斜率估计法、Mann-Kendall法、指数平滑法,提出一种学科研究主题发展等级划分与趋势预测方法。首先,在主题识别的基础上,形成主题发文度和主题引文度两个指标,并参考波士顿矩阵对学科研究主题发展等级进行划分;然后,融合研究主题发文量、被引量和下载量,形成主题热力度指标,采用指数平滑法对研究主题未来发展态势进行预测。【结果/结论】以我国“智慧图书馆”研究的实验表明,本文所提方法能够对学科领域研究主题进行全方位、细粒度地发展等级划分和趋势预测。【创新/局限】本文所提方法对其他学科研究主题的分析具有普适性,为实现动态情报分析提供了新的视角,局限在于需要提高主题建模的可解读性,并进一步优化趋势预测方法。  相似文献   

10.
Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.  相似文献   

11.
Inferring users’ interests from their activities on social networks has been an emerging research topic in the recent years. Most existing approaches heavily rely on the explicit contributions (posts) of a user and overlook users’ implicit interests, i.e., those potential user interests that the user did not explicitly mention but might have interest in. Given a set of active topics present in a social network in a specified time interval, our goal is to build an interest profile for a user over these topics by considering both explicit and implicit interests of the user. The reason for this is that the interests of free-riders and cold start users who constitute a large majority of social network users, cannot be directly identified from their explicit contributions to the social network. Specifically, to infer users’ implicit interests, we propose a graph-based link prediction schema that operates over a representation model consisting of three types of information: user explicit contributions to topics, relationships between users, and the relatedness between topics. Through extensive experiments on different variants of our representation model and considering both homogeneous and heterogeneous link prediction, we investigate how topic relatedness and users’ homophily relation impact the quality of inferring users’ implicit interests. Comparison with state-of-the-art baselines on a real-world Twitter dataset demonstrates the effectiveness of our model in inferring users’ interests in terms of perplexity and in the context of retweet prediction application. Moreover, we further show that the impact of our work is especially meaningful when considered in case of free-riders and cold start users.  相似文献   

12.
In recent years, fake news detection has been a significant task attracting much attention. However, most current approaches utilize the features from a single modality, such as text or image, while the comprehensive fusion between features of different modalities has been ignored. To deal with the above problem, we propose a novel model named Bidirectional Cross-Modal Fusion (BCMF), which comprehensively integrates the textual and visual representations in a bidirectional manner. Specifically, the proposed model is decomposed into four submodules, i.e., the input embedding, the image2text fusion, the text2image fusion, and the prediction module. We conduct intensive experiments on four real-world datasets, i.e., Weibo, Twitter, Politi, and Gossip. The results show 2.2, 2.5, 4.9, and 3.1 percentage points of improvements in classification accuracy compared to the state-of-the-art methods on Weibo, Twitter, Politi, and Gossip, respectively. The experimental results suggest that the proposed model could better capture integrated information of different modalities and has high generalizability among different datasets. Further experiments suggest that the bidirectional fusions, the number of multi-attention heads, and the aggregating function could impact the performance of the cross-modal fake news detection. The research sheds light on the role of bidirectional cross-modal fusion in leveraging multi-modal information to improve the effect of fake news detection.  相似文献   

13.
Methods for document clustering and topic modelling in online social networks (OSNs) offer a means of categorising, annotating and making sense of large volumes of user generated content. Many techniques have been developed over the years, ranging from text mining and clustering methods to latent topic models and neural embedding approaches. However, many of these methods deliver poor results when applied to OSN data as such text is notoriously short and noisy, and often results are not comparable across studies. In this study we evaluate several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit. We benchmark four different feature representations derived from term-frequency inverse-document-frequency (tf-idf) matrices and word embedding models combined with four clustering methods, and we include a Latent Dirichlet Allocation topic model for comparison. Several different evaluation measures are used in the literature, so we provide a discussion and recommendation for the most appropriate extrinsic measures for this task. We also demonstrate the performance of the methods over data sets with different document lengths. Our results show that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures. We also demonstrate a method for interpreting the clusters with a top-words based approach using tf-idf weights combined with embedding distance measures.  相似文献   

14.
Information filtering has been a major task of study in the field of information retrieval (IR) for a long time, focusing on filtering well-formed documents such as news articles. Recently, more interest was directed towards applying filtering tasks to user-generated content such as microblogs. Several earlier studies investigated microblog filtering for focused topics. Another vital filtering scenario in microblogs targets the detection of posts that are relevant to long-standing broad and dynamic topics, i.e., topics spanning several subtopics that change over time. This type of filtering in microblogs is essential for many applications such as social studies on large events and news tracking of temporal topics. In this paper, we introduce an adaptive microblog filtering task that focuses on tracking topics of broad and dynamic nature. We propose an entirely-unsupervised approach that adapts to new aspects of the topic to retrieve relevant microblogs. We evaluated our filtering approach using 6 broad topics, each tested on 4 different time periods over 4 months. Experimental results showed that, on average, our approach achieved 84% increase in recall relative to the baseline approach, while maintaining an acceptable precision that showed a drop of about 8%. Our filtering method is currently implemented on TweetMogaz, a news portal generated from tweets. The website compiles the stream of Arabic tweets and detects the relevant tweets to different regions in the Middle East to be presented in the form of comprehensive reports that include top stories and news in each region.  相似文献   

15.
We deal with the task of authorship attribution, i.e. identifying the author of an unknown document, proposing the use of Part Of Speech (POS) tags as features for language modeling. The experimentation is carried out on corpora untypical for the task, i.e., with documents edited by non-professional writers, such as movie reviews or tweets. The former corpus is homogeneous with respect to the topic making the task more challenging, The latter corpus, puts language models into a framework of a continuously and fast evolving language, unique and noisy writing style, and limited length of social media messages. While we find that language models based on POS tags are competitive in only one of the corpora (movie reviews), they generally provide efficiency benefits and robustness against data sparsity. Furthermore, we experiment with model fusion, where language models based on different modalities are combined. By linearly combining three language models, based on characters, words, and POS trigrams, respectively, we achieve the best generalization accuracy of 96% on movie reviews, while the combination of language models based on characters and POS trigrams provides 54% accuracy on the Twitter corpus. In fusion, POS language models are proven essential effective components.  相似文献   

16.
李一帆  王玙 《情报科学》2022,40(6):115-123
【目的/意义】随着学科交叉与学科融合的不断深入,科研工作越来越需要多个学者合作完成。识别潜在的 合作关系,为学者推荐适合的合作对象,能有效提高科研效率。【方法/过程】基于动态网络表示学习模型对学者合 作关系预测展开研究。首先,提出一种动态网络表示学习模型 DynNE_Atten。其次,根据图书情报领域的文献数 据构建动态科研合作网络和动态关键词共现网络,使用 DynNE_Atten 模型得到作者向量表示和关键词向量表示, 同时提取作者单位特征。最后,融合作者合作、主题与单位特征,预测未来可能产生的合作。【结果/结论】实验结果 表明,本文提出的动态网络表示学习模型在时序链路预测任务中只需要较少的输入数据,就能达到较高的准确性; 相比于未融合特征的学者表示,融合模型在合作关系预测中展现出明显的优势。【创新/局限】提出了一种新的动态 网络表示学习模型,并融合主题特征和作者单位特征进行科研合作预测,取得了较好的结果。目前模型在特征融 合的方式上只考虑了数据层面的异构,并未考虑网络层面的异构。  相似文献   

17.
In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.  相似文献   

18.
Scientific knowledge dynamics and relatedness in biotech cities   总被引:4,自引:0,他引:4  
This paper investigates the impact of scientific relatedness on knowledge dynamics in biotech at the city level during the period 1989–2008. We assess the extent to which the emergence of new research topics and the disappearance of existing topics in cities are dependent on their degree of scientific relatedness with existing topics in those cities. We make use of the rise and fall of title words in scientific publications in biotech to identify major cognitive developments within the field. We determined the degree of relatedness between 1028 scientific topics in biotech by means of co-occurrence of pairs of topics in journal articles. We combined this relatedness indicator between topics in biotech with the scientific portfolio of cities (i.e. the topics on which they published previously) to determine how cognitively close a potentially new topic (or an existing topic) is to the scientific portfolio of a city. We analyzed knowledge dynamics at the city level by looking at the entry and exit of topics in the scientific portfolio of 276 cities in the world. We found strong and robust evidence that new scientific topics in biotech tend to emerge systematically in cities where scientifically related topics already exist, while existing scientific topics had a higher probability to disappear from a city when these were weakly related to the scientific portfolio of the city.  相似文献   

19.
高校科技创新能力评价体系构建及其分析   总被引:5,自引:1,他引:4  
高校是国家科技创新体系重要的组成部分,如何评价高校科技创新能力已成为学术界研究的重要课题.在对高校科技创新能力内涵和构成进行分析基础上,采用层次分析方法构建出了高校科技创新能力评价体系,对各评价指标进行了深入分析,为进一步开展高校科技创新能力评价打下了基础.  相似文献   

20.
While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today’s massive document collections (e.g., ClueWeb12’s 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号