首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
胡玉宁  韩玺  朱学芳 《情报科学》2021,39(11):21-29
【 目的/意义】从文献实体的多特征数据融合的视角构建基于主题指纹-引文耦合的数据融合理论模型,并 对数据融合过程进行实证分析。【方法/过程】对融合主题指纹-引文的方法逻辑进行理论阐释和数学机理分析,以 乳腺小叶癌为案例呈现文献特征项的数据融合过程,通过主题指纹的类别归属与引文期刊所属JCR学科的对比发 现二者在揭示知识表征方面的功能特征。【结果/结论】融合主题指纹-引文的2-模知识网络能够发挥主题指纹和引 文共同揭示学科主题和知识结构的功能;引文信息表征了研究的学科基础、学科背景等稳定性知识结构信息,主题 指纹代表了学科研究前沿、突变主题、新兴趋势等动态性知识主题信息。【创新/局限】融合主题指纹-引文的理论模 型和分析方法是从数据融合层面将内容分析方法与引文分析方法进行结合的有效尝试,未来的研究将聚焦多模知 识网络构建、网络结构分析和量化测度研究,进一步提高该理论模型在知识服务领域应用的科学性、普适性。  相似文献   

3.
专利引用是技术或知识溢出的重要机制,网络拓扑分析的引入有助于理解专利引用网络的结构特征,揭示专利引用过程中的技术或知识流动规律。在USPTO系统中,检索中外企业专利引用清华大学专利的相关数据,构建清华大学专利被企业引用网络,并运用社会网络分析方法研究网络特征路径长度、聚合系数、中心度等结构特征。研究发现,清华大学专利被企业引用网络的特征路径长度小,聚合系数较高,存在明显的小世界现象;网络中心度分布呈现明显的"少数结点拥有大量联结,大量结点拥有少数联结"的现象,符合幂律分布特征。  相似文献   

4.
5.
Patent documents are an ample source of technical and commercial knowledge and, thus, patent analysis has long been considered a useful vehicle for R&D management and technoeconomic analysis. In terms of techniques for patent analysis, citation analysis has been the most frequently adopted tool. In this research, we note that citation analysis is subject to some crucial drawbacks and propose a network-based analysis, an alternative method for citation analysis. By using an illustrative data set, the overall process of developing patent network is described. Furthermore, such new indexes as technology centrality index, technology cycle index, and technology keyword clusters are suggested for in-depth quantitative analysis. Although network analysis shares some commonality with conventional citation analysis, its relative advantage is substantial. It shows the overall relationship among patents as a visual network. In addition, the proposed method provides richer information and thus enables deeper analysis since it takes more diverse keywords into account and produces more meaningful indexes. These visuals and indexes can be used in analyzing up-to-date trends of high technologies and identifying promising avenues for new product development.  相似文献   

6.
以Web of Science数据库中收录的1981-2012年间发表的主题关于学科知识门户的文献为研究对象,通过信息可视化工具CiteSpace对所采集的数据进行文献共引分析和聚类分析,绘制出学科知识门户研究演进路径的知识图谱,并结合图谱结点中频次与中心度的高低,膨胀词等指标得出学科知识门户的主要研究国家与机构分布,研究热点,知识基础,不同时期的研究前沿,以及发展趋势。  相似文献   

7.
The paper describes a technique developed as automatic support to subject heading indexing at BIOSIS. The technique is based on the use of a formalized language for semantic representation of biological texts and subject headings—the language of Concept Primitives. The structure of the language is discussed as well as the structure of the Semantic Vocabulary, in which natural language words from biological texts are described by Concept Primitives. The Semantic Vocabulary is being constructed. Approximately 8,000 entries corresponding to high frequency significant words have been compiled, comprising at least three-quarters of the final number. Results of experiments checking the approach are given, and journal/subject heading and author/subject heading correlation data are analyzed to be used as a supporting technique.  相似文献   

8.
博士生学位论文引文的分析与研究   总被引:5,自引:0,他引:5  
郭万慧 《情报科学》1999,17(3):299-302
本文对大连理工大学化工学院1996,1997毕业的博士生学位论文的引文从引文量,文献类型,引文语种,引文时间跨度及年代分布,文献自引现象等进行了全面的分析和研究,从而对信息机构的工作提出意见和建议。  相似文献   

9.
[目的/意义]期刊学术影响力是学术界和期刊界关注的热点,已有许多学者对其测度指标结构关系进行了分析,但缺少从整体上对测度指标间相互作用关系的研究。[方法/过程]以632种"综合性人文、社会科学"类期刊为研究样本,构建期刊学术影响力的测度指标体系,运用BP神经网络DEMATEL模型计算各测度指标的中心度与原因度,并结合原因—结果图分析各测度指标的重要性及相互作用关系。[结果/结论]研究结果表明,该模型能较准确地反映出测度指标间的结构关系,他引影响因子和复合总被引为强驱动型指标;平均引文数和影响因子是排名前两位的驱动型指标;5年影响因子为最显著的特征型指标。  相似文献   

10.
The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a “Tree-like” Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.  相似文献   

11.
The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization.  相似文献   

12.
[目的/意义]有效融合引文网络中的引用关系和文本属性等多元数据,增强文献节点间的语义关联,从而为数据挖掘和知识发现等任务提供有力的支撑。[方法/过程]提出了一种引文网络的知识表示方法,先利用神经网络模型学习引文网络中的k阶邻近结构;然后使用doc2vec模型学习标题、摘要等文本属性;最后给出了基于向量共享的交叉学习机制用于多元数据融合。[结果/结论]通过面向干细胞领域的CNKI引文数据集的测试,在链路预测上取得了较好的性能,证明了方法的有效性和科学性。  相似文献   

13.
Deep multi-view clustering (MVC) is to mine and employ the complex relationships among views to learn the compact data clusters with deep neural networks in an unsupervised manner. The more recent deep contrastive learning (CL) methods have shown promising performance in MVC by learning cluster-oriented deep feature representations, which is realized by contrasting the positive and negative sample pairs. However, most existing deep contrastive MVC methods only focus on the one-side contrastive learning, such as feature-level or cluster-level contrast, failing to integrating the two sides together or bringing in more important aspects of contrast. Additionally, most of them work in a separate two-stage manner, i.e., first feature learning and then data clustering, failing to mutually benefit each other. To fix the above challenges, in this paper we propose a novel joint contrastive triple-learning framework to learn multi-view discriminative feature representation for deep clustering, which is threefold, i.e., feature-level alignment-oriented and commonality-oriented CL, and cluster-level consistency-oriented CL. The former two submodules aim to contrast the encoded feature representations of data samples in different feature levels, while the last contrasts the data samples in the cluster-level representations. Benefiting from the triple contrast, the more discriminative representations of views can be obtained. Meanwhile, a view weight learning module is designed to learn and exploit the quantitative complementary information across the learned discriminative features of each view. Thus, the contrastive triple-learning module, the view weight learning module and the data clustering module with these fused features are jointly performed, so that these modules are mutually beneficial. The extensive experiments on several challenging multi-view datasets show the superiority of the proposed method over many state-of-the-art methods, especially the large improvement of 15.5% and 8.1% on Caltech-4V and CCV in terms of accuracy. Due to the promising performance on visual datasets, the proposed method can be applied into many practical visual applications such as visual recognition and analysis. The source code of the proposed method is provided at https://github.com/ShizheHu/Joint-Contrastive-Triple-learning.  相似文献   

14.
本文对南京陆军指挥学院2004-2011年内容为公开的43篇博士学位论文的引文,从引文数量、引文年代、引文语种、引文类型等方面进行分析,总结出本院博士研究生的文献需求特点。据此,对图书馆文献资源建设提出参考建议。同时,该结果也可为同类图书馆相应工作提供新的思路。  相似文献   

15.
学报论文的引文统计分析   总被引:1,自引:0,他引:1  
朱晓红 《情报科学》1998,16(5):441-443
本文对《长春邮电学院学报》1990年至1997年刊发论文引文的引文量、文献类型、引文语种、引文年代分布、自引状况及论文作者合作度进行了统计与分析。  相似文献   

16.
社会网络分析法可以分为不同的分析层次。本文首先论述三方关系组方法的原理,并将其应用到引用网络结构的研究中,分析各种三方关系类型在引用网络中的具体意义。以SSCI收录的18种图书情报学期刊近10年的文章为数据源,运用三方关系组揭示了图书情报领域高产作者引用网络的内部结构,通过计算不同类型聚类系数研究了网络整体结构倾向。
Abstract:
Methods of social network analysis can be divided into different levels.This article first describes the principle of the triad census method,applies it in the research on the structure of citation network,and analyzes the specific significance of various triad census types in citation network.Taking the papers published by the 18 types of journals of library and information science in recent 10 years and included by SSCI as data sources,the article uses the triad census to reveal the internal structure of the citation network of the high-yield authors in the library and information science field,and by calculating the different types of clustering coefficients,studies the trend of the whole network structure.  相似文献   

17.
1989—1999年中文引文研究论文统计分析   总被引:5,自引:1,他引:5  
徐佳宁  高淑琴 《情报科学》2001,19(7):702-705
通过对1989-1999年间引文研究论文的来源、作者、年代、研究主题四个方面进行统计分析,探讨10年来引文研究发展的概况。  相似文献   

18.
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent presence of noise in such representation obviously degrades the performance of most of these approaches. In this paper we investigate an unsupervised dimensionality reduction technique for document clustering. This technique is based upon the assumption that terms co-occurring in the same context with the same frequencies are semantically related. On the basis of this assumption we first find term clusters using a classification version of the EM algorithm. Documents are then represented in the space of these term clusters and a multinomial mixture model (MM) is used to build document clusters. We empirically show on four document collections, Reuters-21578, Reuters RCV2-French, 20Newsgroups and WebKB, that this new text representation noticeably increases the performance of the MM model. By relating the proposed approach to the Probabilistic Latent Semantic Analysis (PLSA) model we further propose an extension of the latter in which an extra latent variable allows the model to co-cluster documents and terms simultaneously. We show on these four datasets that the proposed extended version of the PLSA model produces statistically significant improvements with respect to two clustering measures over all variants of the original PLSA and the MM models.  相似文献   

19.
There are several recent studies that propose search output clustering as an alternative representation method to ranked output. Users are provided with cluster representations instead of lists of titles and invited to make decisions on groups of documents. This paper discusses the difficulties involved in representing clusters for users’ evaluation in a concise but easily interpretable form. The discussion is based on findings and user feedback from a user study investigating the effectiveness of search output clustering. The overall impression created by the experiment results and users’ feedback is that clusters cannot be relied on to consistently produce meaningful document groups that can easily be recognised by the users. They also seem to lead to unrealistic user expectations.  相似文献   

20.
Representation learning has recently been used to remove sensitive information from data and improve the fairness of machine learning algorithms in social applications. However, previous works that used neural networks are opaque and poorly interpretable, as it is difficult to intuitively determine the independence between representations and sensitive information. The internal correlation among data features has not been fully discussed, and it may be the key to improving the interpretability of neural networks. A novel fair representation algorithm referred to as FRC is proposed from this conjecture. It indicates how representations independent of multiple sensitive attributes can be learned by applying specific correlation constraints on representation dimensions. Specifically, dimensions of the representation and sensitive attributes are treated as statistical variables. The representation variables are divided into two parts related to and unrelated to the sensitive variables by adjusting their absolute correlation coefficient with sensitive variables. The potential impact of sensitive information on representations is concentrated in the related part. The unrelated part of the representation can be used in downstream tasks to yield fair results. FRC takes the correlation between dimensions as the key to solving the problem of fair representation. Empirical results show that our representations enhance the ability of neural networks to show fairness and achieve better fairness-accuracy tradeoffs than state-of-the-art works.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号