Many existing systems for analyzing and summarizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the prominent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain services such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost everyday. In this paper, we propose a novel method empowered by knowledge sources such as Probase and WordNet, for extracting the most prominent aspects of a given product type from textual reviews. The proposed method, ExtRA (Extraction of Prominent Review Aspects), (i) extracts the aspect candidates from text reviews based on a data-driven approach, (ii) builds an aspect graph utilizing the Probase to narrow the aspect space, (iii) separates the space into reasonable aspect clusters by employing a set ofproposed algorithms and finally (iv) generates K most prominent aspect terms or phrases which do not overlap semantically automatically without supervision from those aspect clusters. ExtRA extracts high-quality prominent aspects as well as aspect clusters with little semantic overlap by exploring knowledge sources. ExtRA can extract not only words but also phrases as prominent aspects. Furthermore, it is general-purpose and can be applied to almost any type of product and service. Extensive experiments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types.  相似文献   

Information management is the management of organizational processes, technologies, and people which collectively create, acquire, integrate, organize, process, store, disseminate, access, and dispose of the information. Information management is a vast, multi-disciplinary domain that syndicates various subdomains and perfectly intermingles with other domains. This study aims to provide a comprehensive overview of the information management domain from 1970 to 2019. Drawing upon the methodology from statistical text analysis research, this study summarizes the evolution of knowledge in this domain by examining the publication trends as per authors, institutions, countries, etc. Further, this study proposes a probabilistic generative model based on structural topic modeling to understand and extract the latent themes from the research articles related to information management. Furthermore, this study graphically visualizes the variations in the topic prevalences over the period of 1970 to 2019. The results highlight that the most common themes are data management, knowledge management, environmental management, project management, service management, and mobile and web management. The findings also identify themes such as knowledge management, environmental management, project management, and social communication as academic hotspots for future research.  相似文献   

本文以主题模型描述数据为基础,提出了数字图书馆基于主题的Web服务理念,为数字图书馆网站设计了新的Web服务模式,也给数字图书馆网站的发展提供一种新思路。  相似文献   

针对现有的标签推荐方法存在的推荐准确率不高与效果不理想等问题,本文提出了基于LDA主题模型的社会化标签推荐方法。该方法利用LDA主题建模技术将传统的基于对象间关系的推荐方法扩展到融合对象间关系与资源内容特征的统一推荐。实验结果表明,该方法取得了理想的预期效果,能够显著提高标签推荐的质量与效果。  相似文献   

本文探讨了CBI主题模式下的英语专业精读课堂教学模式和流程,旨在改善精读课堂教学模式,提高教学水平。  相似文献   

Reviewer assignment is an important task in many research-related activities, such as conference organization and grant-proposal adjudication. The goal is to assign each submitted artifact to a set of reviewers who can thoroughly evaluate all aspects of the artifact’s content, while, at the same time, balancing the workload of the reviewers. In this paper, we focus on textual artifacts such as conference papers, where both (aspects of) the submitted papers and (expertise areas of) the reviewers can be described with terms and/or topics extracted from the text. We propose a method for automatically assigning a team of reviewers to each submitted paper, based on the clusters of the reviewers’ publications as latent research areas. Our method extends the definition of the relevance score between reviewers and papers using the latent research areas information to find a team of reviewers for each paper, such that each individual reviewer and the team as a whole cover as many paper aspects as possible. To solve the constrained problem where each reviewer has a limited reviewing capacity, we utilize a greedy algorithm that starts with a group of reviewers for each paper and iteratively evolves it to improve the coverage of the papers’ topics by the reviewers’ expertise. We experimentally demonstrate that our method outperforms state-of-the-art approaches w.r.t several standard quality measures.  相似文献   

[目的/意义]产业变革快速演进,技术创新成为驱动社会经济发展、提高国家和企业科技竞争力的关键所在,如何对前沿技术进行识别和预测,成为国家科技政策研究和企业技术创新活动关注的热点。[方法/过程]以人工智能作为重点研究领域,首先以LDA模型进行技术主题抽取,并结合K-means算法进行专利文本聚类;在此基础上,以Z分数表示技术主题创新度,以Sen's斜率估计技术主题授权趋势,两个指标结合形成技术主题前沿度并将二者映射到二维空间,识别前沿技术主题以及划分技术主题类型;再,计算前沿技术主题的新颖度和关注度,二者融合形成技术主题趋势度指标;最后,采用三次指数平滑法对前沿技术主题的发展趋势进行预测。[结果/结论]人工智能领域的前沿技术主题有“智能家居”“电动汽车”和“自动化控制系统”,其中“智能家居”在未来3年的发展呈下降态势,而“电动汽车”和“自动化控制系统”的发展呈明显上升趋势。  相似文献   

Tables in documents are a widely-available and rich source of information, but not yet well-utilised computationally because of the difficulty in automatically extracting their structure and data content. There has been a plethora of systems proposed to solve the problem, but current methods present low usability and accuracy and lack precision in detecting data from diverse layouts. We propose a component-based design and implementation of table processing concepts which can offer flexibility and re-usability as well as high performance on a wide range of table types. In this paper, we describe a system named TEXUS which is a fully automated table processing system that takes a PDF document and detects tables in a layout independent manner. We introduce TEXUS’s own table processing specific document model and the two-phased processing pipeline design. Through an extensive evaluation on a dataset comprised of complex financial tables, we show the performance of the system on different table types.  相似文献   

[目的/意义]专利是企业技术创新活动的重要成果,对专利数据进行分析,有利于客观评价企业技术创新能力。[方法/过程]从计量的角度对企业专利数据进行分析的同时,结合机器学习的方法,通过LDA模型对专利摘要文本进行内容挖掘,构建基于专利文本内容的评价指标,建立由专利数量、专利趋势和专利内容三方面指标组成的技术创新评价体系。[结果/结论]采用熵值法确定各项指标对企业技术创新的影响权重,并通过实验对国内自主品牌制造企业进行技术创新评价,说明了评价方法的现实意义。  相似文献   

Graph-based recommendation approaches use a graph model to represent the relationships between users and items, and exploit the graph structure to make recommendations. Recent graph-based recommendation approaches focused on capturing users’ pairwise preferences and utilized a graph model to exploit the relationships between different entities in the graph. In this paper, we focus on the impact of pairwise preferences on the diversity of recommendations. We propose a novel graph-based ranking oriented recommendation algorithm that exploits both explicit and implicit feedback of users. The algorithm utilizes a user-preference-item tripartite graph model and modified resource allocation process to match the target user with users who share similar preferences, and make personalized recommendations. The principle of the additional preference layer is to capture users’ pairwise preferences, provide detailed information of users for further recommendations. Empirical analysis of four benchmark datasets demonstrated that our proposed algorithm performs better in most situations than other graph-based and ranking-oriented benchmark algorithms.  相似文献   

Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor’s actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin’s verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22–14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin’s verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64–11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.  相似文献   

[目的/意义] 前沿技术孵育的新兴产业发展演进快,但因统计数据迟滞,产业监测难而备受研究者关注。[方法/过程] 以2014-2019年36氪网站互联网区块链新闻为数据样本,提出纳入协变量的结构化主题模型(STM)与深度学习情感分析技术结合的新兴产业新闻文本监测方法,通过监测媒体报道的产业新闻热点强度变化,文本情感倾向对新闻热点强度的时序影响,发现并跟踪新兴产业热点及趋势。[结果/结论] 2014-2019年,69%的区块链新闻主题聚焦于区块链的产业应用和比特币等数字代币的发行与交易。文本的语义和情感分析显示,2017年以来,中国的区块链产业发展存在一定的媒体炒作特征,但媒体对各类数字代币发行与交易由褒转贬的情感倾向变化可以对区块链隐含风险起到预警作用。[创新/价值] 提出的产业新闻文本监测方法具有准实时性,能与传统的事后统计指标监测方法互为补充。  相似文献   

Trust and justice play an important role in the process of organizational change to build dynamic capabilities for sustainable competitive advantage. This study investigated the interaction effects of management's benevolence trustworthiness and integrity trustworthiness on employees’ perception of procedural justice during innovation or organizational change. Both the scenario and the field study showed that the patterns of the interaction effects of these two components of trustworthiness are further influenced in a complementary manner by different innovation approaches. The study indicated that the relationships between benevolence trustworthiness and integrity trustworthiness are far more complex than expected and thus need more research efforts.  相似文献   

[目的/意义]现有基于合作或引证建立的学术共同体展现了更显性的学术关联,但不能直观地揭示出学术共同体所共有的特征,同时不可避免增加了人情因素带来的偏私倾向。[方法/过程]以知网文献摘要数据为研究对象,文章利用LDA和Word2vec混合模型挖掘得到每篇文献的主题,主题包含主题词及其扩展词。并以此作为主题与文献作者关系的依据,构建学者—主题二模网络,通过对二模网络以及映射的一模学者网络进行可视化,直观地反映了领域内学者就研究方向的聚集情况。[结果/结论]LDA和Word2vec混合模型能够深入挖掘文献主题,而利用二模能够展现二元的主体,通过上述方法,能够找到在现实中或许没有发生合作、但具有潜在重合研究主题倾向的学者群体。以国内图情领域为例,识别其核心学术共同体。"学者—主题"的二模网络中纳入了学者隶属群体的信息,不仅从全局视域归纳出领域内由词语元素构成的具体主题,而且利用向量距离计算得到的各个主题的扩展词语集,能进一步解释学者共同体所隶属群体的深化特征,能够有效降低人情因素,为同行评议提供支持。  相似文献   

The aim of this study was to investigate videos as potential triggers of behavior. Therefore, we applied the theories of triggers and media richness to learn about the triggering efficiency of mobile marketing videos on participants’ behavioral intentions. The experiment involved three distinct test groups, each comprising 41 student participants. From the perspective of media richness theory, we observed that the different kinds of videos had quite similar effects in terms of triggering behavioral changes. However, the mechanisms explaining why triggers were present differed for each video. Further, the results reveal that the consumer's position in the information search process was the most significant reason for the triggering of any kind of effect. In addition, the instructionally designed videos were able to exert an affective triggering effect: the more participants liked the video, the more it affected their participation intention and recall scores. This study extends the media richness research by demonstrating that the effects of media richness can vary within technically similar videos, as they form different logical connections among non-verbal visual cues related to a video's storyline.  相似文献   

This study tackles the problem of extracting health claims from health research news headlines, in order to carry out veracity check. A health claim can be formally defined as a triplet consisting of an independent variable (IV – namely, what is being manipulated), a dependent variable (DV – namely, what is being measured), and the relation between the two. In this study, we develop HClaimE, an information extraction tool for identifying health claims in news headlines. Unlike the existing open information extraction (OpenIE) systems that rely on verbs as relation indicators, HClaimE focuses on finding relations between nouns, and draws on the linguistic characteristics of news headlines. HClaimE uses a Naïve Bayes classifier that combines syntactic and lexical features for identifying IV and DV nouns, and recognizes relations between IV and DV through a rule-based method. We conducted an evaluation on a set of health news headlines from ScienceDaily.com, and the results show that HClaimE outperforms current OpenIE systems: the F-measures for identifying headlines without health claims is 0.60 and that for extracting IV-relation-DV is 0.69. Our study shows that nouns can provide more clues than verbs for identifying health claims in news headlines. Furthermore, it also shows that dependency relations and bag-of-words can distinguish IV-DV noun pairs from other noun pairs. In practice, HClaimE can be used as a helpful tool to identifying health claims in news headlines, which can then be further compared against authoritative health claims for veracity. Given the linguistic similarity between health claims and other causal claims, e.g., impacts of pollution on the environment, HClaimE may also be applicable for extracting claims in other domains.  相似文献   

Human collaborative relationship inference is a meaningful task for online social networks and is called link prediction in network science. Real-world networks contain multiple types of interacting components and can be modeled naturally as heterogeneous information networks (HINs). The current link prediction algorithms in HINs fail to effectively extract training samples from snapshots of HINs; moreover, they underutilise the differences between nodes and between meta-paths. Therefore, we propose a meta-circuit machine (MCM) that can learn and fuse node and meta-path features efficiently, and we use these features to inference the collaborative relationships in question-and-answer and bibliographic networks. We first utilise meta-circuit random walks to obtain training samples in which the basic idea is to perform biased meta-path random walks on the input and target network successively and then connect them. Then, a meta-circuit recurrent neural network (mcRNN) is designed for link prediction, which represents each node and meta-path by a dense vector and leverages an RNN to fuse the features of node sequences. Experiments on two real-world networks demonstrate the effectiveness of our framework. This study promotes the investigation of potential evolutionary mechanisms for collaborative relationships and offers practical guidance for designing more effective recommendation systems for online social networks.  相似文献   

In addressing persistent gaps in existing theories, recent advances in data-driven research approaches offer novel perspectives and exciting insights across a spectrum of scientific fields concerned with technological change and the socio-economic impact thereof. The present investigation suggests a novel approach to identify and analyze the evolution of technology sectors, in this case, information and communications technology (ICT), considering international collaboration patterns and knowledge flows and spillovers via information inputs derived from patent documents.The objective is to utilize and explore information regarding inventors’ geo-location, technology sector classifications, and patent citation records to construct various types of networks. This, in turn, will open up avenues to discover the nature of evolutionary pathways in ICT trajectories and will also provide evidence of how the overall ICT knowledge space, as well as directional knowledge flows within the ICT space, have evolved differently. It is expected that this data-driven inquiry will deliver intuitive results for decision makers seeking evidence for future resource allocation and who are interested in identifying well-suited collaborators for the development of potential next-generation technologies. Further, it will equip researchers in technology management, economic geography, or similar fields with a systematic approach to analyze evolutionary pathways of technological advancements and further enable exploitation and development of new theories regarding technological change and its socio-economic consequences.  相似文献   

基于市场营销的计算机辅助新产品开发决策系统是目前研究不多的领域。本探讨了计算机辅助的新产品创意生成平台系统,提出了用超矩阵模型来构造平台的核心部分,并对其结构、工作原理、类推和联想进行了讨论。  相似文献   

Topic evolution has been described by many approaches from a macro level to a detail level, by extracting topic dynamics from text in literature and other media types. However, why the evolution happens is less studied. In this paper, we focus on whether and how the keyword semantics can invoke or affect the topic evolution. We assume that the semantic relatedness among the keywords can affect topic popularity during literature surveying and citing process, thus invoking evolution. However, the assumption is needed to be confirmed in an approach that fully considers the semantic interactions among topics. Traditional topic evolution analyses in scientometric domains cannot provide such support because of using limited semantic meanings. To address this problem, we apply the Google Word2Vec, a deep learning language model, to enhance the keywords with more complete semantic information. We further develop the semantic space as an urban geographic space. We analyze the topic evolution geographically using the measures of spatial autocorrelation, as if keywords are the changing lands in an evolving city. The keyword citations (keyword citation counts one when the paper containing this keyword obtains a citation) are used as an indicator of keyword popularity. Using the bibliographical datasets of the geographical natural hazard field, experimental results demonstrate that in some local areas, the popularity of keywords is affecting that of the surrounding keywords. However, there are no significant impacts on the evolution of all keywords. The spatial autocorrelation analysis identifies the interaction patterns (including High-High leading, High-Low suppressing) among the keywords in local areas. This approach can be regarded as an analyzing framework borrowed from geospatial modeling. Moreover, the prediction results in local areas are demonstrated to be more accurate if considering the spatial autocorrelations.  相似文献   

