首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Knowledge representation learning(KRL) transforms knowledge graph(KG) from symbol space to vector space. However, KRL under open world assumption(OWA) is deeply trapped in the dilemma of lack of labels due to difficulty or high cost in labeling. To address this problem, we propose KRL_MLCCL:Multi-Label Classification based on Contrastive Learning(CL) Knowledge Representation Learning method. Specifically, (1)we formalize a problem of solving true knowledge graph objects(KGOs) matchings(KGOMs) under the OWA in the original KGOM sample space(KGOMSS)(multi-label classification with one known true matching(positive-example)). (2)we solve the problem in the new KGOMSS, generated through augmenting the true matching according to CL’s idea(multi-label classification with multiple known true matching). (3)we score the true matchings based on hermitian inner product and softmax and minimize a negative logarithm likelihood loss to establish KRL_MLCCL model preliminarily. (4)we migrate the learned model back to the original KGOMSS to solve the true matching problem. We creatively design and apply a positive-example augmentation way of CL enabling KRL_MLCCL with back migration ability: “pulling KGOs in true matching close and pushing KGOs in false matching away”, which helps KRL out of the labels shortage dilemma faced in modeling. We also propose a negative-example noise filtering algorithm to enhance this ability. The open world entity prediction(OWEP) experiment on dataset FB15K-237-OWE shows that the performance of KRL_MLCCL is increased by 3% in Hits@10 and 1.32% in MRR compared with the state-of-the-art in the baselines. The experiments of OWEP in KG also show that KRL_MLCCL has a better back migration ability.  相似文献   

2.
Adversarial training is effective to train robust image classification models. To improve the robustness, existing approaches often use many propagations to generate adversarial examples, which have high time consumption. In this work, we propose an efficient adversarial training method with loss guided propagation (ATLGP) to accelerate the adversarial training process. ATLGP takes the loss value of generated adversarial examples as guidance to control the number of propagations for each training instance at different training stages, which decreases the computation while keeping the strength of generated adversarial examples. In this way, our method can achieve comparable robustness with less time than traditional training methods. It also has good generalization ability and can be easily combined with other efficient training methods. We conduct comprehensive experiments on CIFAR10 and MNIST, the standard datasets for several benchmarks. The experimental results show that ATLGP reduces 30% to 60% training time compared with other baseline methods while achieving similar robustness against various adversarial attacks. The combination of ATLGP and ATTA (an efficient adversarial training method) achieves superior acceleration potential when robustness meets high requirements. The statistical propagation in different training stages and ablation studies prove the effectiveness of applying loss guided propagation for each training instance. The acceleration technique can more easily extend adversarial training methods to large-scale datasets and more diverse model architectures such as vision transformers.  相似文献   

3.
Both node classification and link prediction are popular topics of supervised learning on the graph data, but previous works seldom integrate them together to capture their complementary information. In this paper, we propose a Multi-Task and Multi-Graph Convolutional Network (MTGCN) to jointly conduct node classification and link prediction in a unified framework. Specifically, MTGCN consists of multiple multi-task learning so that each multi-task learning learns the complementary information between node classification and link prediction. In particular, each multi-task learning uses different inputs to output representations of the graph data. Moreover, the parameters of one multi-task learning initialize the parameters of the other multi-task learning, so that the useful information in the former multi-task learning can be propagated to the other multi-task learning. As a result, the information is augmented to guarantee the quality of representations by exploring the complex constructure inherent in the graph data. Experimental results on six datasets show that our MTGCN outperforms the comparison methods in terms of both node classification and link prediction.  相似文献   

4.
Multi-label text categorization refers to the problem of assigning each document to a subset of categories by means of multi-label learning algorithms. Unlike English and most other languages, the unavailability of Arabic benchmark datasets prevents evaluating multi-label learning algorithms for Arabic text categorization. As a result, only a few recent studies have dealt with multi-label Arabic text categorization on non-benchmark and inaccessible datasets. Therefore, this work aims to promote multi-label Arabic text categorization through (a) introducing “RTAnews”, a new benchmark dataset of multi-label Arabic news articles for text categorization and other supervised learning tasks. The benchmark is publicly available in several formats compatible with the existing multi-label learning tools, such as MEKA and Mulan. (b) Conducting an extensive comparison of most of the well-known multi-label learning algorithms for Arabic text categorization in order to have baseline results and show the effectiveness of these algorithms for Arabic text categorization on RTAnews. The evaluation involves four multi-label transformation-based algorithms: Binary Relevance, Classifier Chains, Calibrated Ranking by Pairwise Comparison and Label Powerset, with three base learners (Support Vector Machine, k-Nearest-Neighbors and Random Forest); and four adaptation-based algorithms (Multi-label kNN, Instance-Based Learning by Logistic Regression Multi-label, Binary Relevance kNN and RFBoost). The reported baseline results show that both RFBoost and Label Powerset with Support Vector Machine as base learner outperformed other compared algorithms. Results also demonstrated that adaptation-based algorithms are faster than transformation-based algorithms.  相似文献   

5.
Classical supervised machine learning (ML) follows the assumptions of closed-world learning. However, this assumption does not work in an open-world dynamic environment. Therefore, the automated systems must be able to discover and identify unseen instances. Open-world ML can deal with unseen instances and classes through a two-step process: (1) discover and classify unseen instances and (2) identify novel classes discovered in step (1). Most existing research on open-world machine learning (OWML) only focuses on step 1. However, performing step 2 is required to build intelligent systems. The proposed framework comprises three different but interconnected modules that discover and identify unseen classes. Our in-depth performance evaluation establishes that the proposed framework improves open accuracy by up to 8% compared to the state-of-the-art models.  相似文献   

6.
Text classification or categorization is the process of automatically tagging a textual document with most relevant labels or categories. When the number of labels is restricted to one, the task becomes single-label text categorization. However, the multi-label version is challenging. For Arabic language, both tasks (especially the latter one) become more challenging in the absence of large and free Arabic rich and rational datasets. Therefore, we introduce new rich and unbiased datasets for both the single-label (SANAD) as well as the multi-label (NADiA) Arabic text categorization tasks. Both corpora are made freely available to the research community on Arabic computational linguistics. Further, we present an extensive comparison of several deep learning (DL) models for Arabic text categorization in order to evaluate the effectiveness of such models on SANAD and NADiA. A unique characteristic of our proposed work, when compared to existing ones, is that it does not require a pre-processing phase and fully based on deep learning models. Besides, we studied the impact of utilizing word2vec embedding models to improve the performance of the classification tasks. Our experimental results showed solid performance of all models on SANAD corpus with a minimum accuracy of 91.18%, achieved by convolutional-GRU, and top performance of 96.94%, achieved by attention-GRU. As for NADiA, attention-GRU achieved the highest overall accuracy of 88.68% for a maximum subsets of 10 categories on “Masrawy” dataset.  相似文献   

7.
Researchers have been aware that emotion is not one-hot encoded in emotion-relevant classification tasks, and multiple emotions can coexist in a given sentence. Recently, several works have focused on leveraging a distribution label or a grayscale label of emotions in the classification model, which can enhance the one-hot label with additional information, such as the intensity of other emotions and the correlation between emotions. Such an approach has been proven effective in alleviating the overfitting problem and improving the model robustness by introducing a distribution learning component in the objective function. However, the effect of distribution learning cannot be fully unfolded as it can reduce the model’s discriminative ability within similar emotion categories. For example, “Sad” and “Fear” are both negative emotions. To address such a problem, we proposed a novel emotion extension scheme in the prior work (Li, Chen, Xie, Li, and Tao, 2021). The prior work incorporated fine-grained emotion concepts to build an extended label space, where a mapping function between coarse-grained emotion categories and fine-grained emotion concepts was identified. For example, sentences labeled “Joy” can convey various emotions such as enjoy, free, and leisure. The model can further benefit from the extended space by extracting dependency within fine-grained emotions when yielding predictions in the original label space. The prior work has shown that it is more apt to apply distribution learning in the extended label space than in the original space. A novel sparse connection method, i.e., Leaky Dropout, is proposed in this paper to refine the dependency-extraction step, which further improves the classification performance. In addition to the multiclass emotion classification task, we extensively experimented on sentiment analysis and multilabel emotion prediction tasks to investigate the effectiveness and generality of the label extension schema.  相似文献   

8.
企业高管团队建立合理的隐性知识共享行为激励机制有助于提高组织响应速度和创新能力。基于委托代理理论,对高管团队隐性知识共享行为的激励机制提出了相关设计思路,建立合理的隐性知识共享行为补偿策略,调整团队成员利益,使团队成员能预期到知识共享的正面效应,实现企业隐性知识的流动、转化、共享和创新。  相似文献   

9.
Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.  相似文献   

10.
随着嵌入式系统的飞速发展,Nucleus实时操作系统已成为嵌入式应用的潮流和方向。本文提出了基于NucleusPLUS嵌入式实时多任务操作系统在铁路变电站通信管理装置上的实现方案,给出了NucleusPLUS嵌入式实时多任务操作系统结构,以及该方案的硬件系统和软件设计。实际运行表明该方案可行。  相似文献   

11.
This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used.Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.  相似文献   

12.
为了提高人脸检测的速度及鲁棒性,提出了一种基于级联分类器和期望最大、主成分分析(EM PCA)的人脸检测方法.该方法在训练阶段利用不同分辨率的训练样本来训练2个fisher线性分类器,再利用EM PCA提取特征来训练非线性支持向量机(SVM);在检测阶段,首先通过2个fisher线性分类器快速过滤掉大量的背景区域,再利用非线性支持向量机对余下的候选区域进行进一步验证,以确认是否为人脸.实验结果证明了该方法的有效性和正确性.  相似文献   

13.
We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese–English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.  相似文献   

14.
一种基于不变矩的图像分类算法   总被引:1,自引:0,他引:1  
在图像分类和识别技术中,针对图像特征的提取这一重要问题,文章实现了用三种不变矩对图像目标提取特征并进行分类的方法,该方法分别提取目标的区域中心矩、区域中心不变矩和径向矩作为特征矢量,并与区域协特征方差算子进行图像分类比较分析。将特征用支持向量机的方法进行分类器的训练,再对目标进行分类。实验结果表明,区域径向矩算子运用在车辆目标分类中具有较好的效果。  相似文献   

15.
基于共被引率分析的期刊分类研究   总被引:3,自引:0,他引:3       下载免费PDF全文
王贤文  刘则渊 《科研管理》2009,30(5):187-195
摘要:本文利用Web of Science中的Cited Reference Search功能,在整个数据库中检索期刊的共被引次数矩阵,可以最大程度地保持数据的完整性。根据本文提出的计算期刊共被引率矩阵的方法,通过将期刊的共被引情况标准化,可以减少数据误差。随后作者从JCR的4个学科中随机选择若干种期刊,对该方法进行了实证检验,聚类的结果与JCR中的期刊学科分类完全一致。作者进一步以SSCI收录的78种管理学期刊为研究对象,检索和计算期刊共被引率矩阵,利用社会网络分析工具Netdraw进行网络结构的分析,研究管理学学科的内部知识结构和知识交流情况。  相似文献   

16.
基于神经网络的城镇土地分等定级评价   总被引:1,自引:0,他引:1  
通过对我国城镇土地分等定级方法的分析,构建了国有土地中城镇土地分等定级评价指标体系和相应的动态神经网络的模型,最后对于如何得到学习样本的问题进行了讨论。  相似文献   

17.
This paper presents a binary classification of entrepreneurs in British historical data based on the recent availability of big data from the I-CeM dataset. The main task of the paper is to attribute an employment status to individuals that did not fully report entrepreneur status in earlier censuses (1851–1881). The paper assesses the accuracy of different classifiers and machine learning algorithms, including Deep Learning, for this classification problem. We first adopt a ground-truth dataset from the later censuses to train the computer with a Logistic Regression (which is standard in the literature for this kind of binary classification) to recognize entrepreneurs distinct from non-entrepreneurs (i.e. workers). Our initial accuracy for this base-line method is 0.74. We compare the Logistic Regression with ten optimized machine learning algorithms: Nearest Neighbors, Linear and Radial Support Vector Machine, Gaussian Process, Decision Tree, Random Forest, Neural Network, AdaBoost, Naive Bayes, and Quadratic Discriminant Analysis. The best results are boosting and ensemble methods. AdaBoost achieves an accuracy of 0.95. Deep-Learning, as a standalone category of algorithms, further improves accuracy to 0.96 without using the rich text-data that characterizes the OccString feature, a string of up to 500 characters with the full occupational statement of each individual collected in the earlier censuses. Finally, and now using this OccString feature, we implement both shallow (bag-of-words algorithm) learning and Deep Learning (Recurrent Neural Network with a Long Short-Term Memory layer) algorithms. These methods all achieve accuracies above 0.99 with Deep Learning Recurrent Neural Network as the best model with an accuracy of 0.9978. The results show that standard algorithms for classification can be outperformed by machine learning algorithms. This confirms the value of extending the techniques traditionally used in the literature for this type of classification problem.  相似文献   

18.
针对钢板表面缺陷图像分类传统深度学习算法中需要大量标签数据的问题,提出一种基于主动学习的高效分类方法。该方法包含一个轻量级的卷积神经网络和一个基于不确定性的主动学习样本筛选策略。神经网络采用简化的convolutional base进行特征提取,然后用全局池化层替换掉传统密集连接分类器中的隐藏层来减轻过拟合。为了更好的衡量模型对未标签图像样本所属类别的不确定性,首先将未标签图像样本传入到用标签图像样本训练好的模型,得到模型对每一个未标签样本关于标签的概率分布(probability distribution over classes, PDC),然后用此模型对标签样本进行预测并得到模型对每个标签的平均PDC。将两类分布的KL-divergence值作为不确定性指标来筛选未标签图像进行人工标注。根据在NEU-CLS开源缺陷数据集上的对比实验,该方法可以通过44%的标签数据实现97%的准确率,极大降低标注成本。  相似文献   

19.
In this work we develop new journal classification methods based on the h-index. The introduction of the h-index for research evaluation has attracted much attention in the bibliometric study and research quality evaluation. The main purpose of using an h-index is to compare the index for different research units (e.g. researchers, journals, etc.) to differentiate their research performance. However the h-index is defined by only comparing citations counts of one’s own publications, it is doubtful that the h index alone should be used for reliable comparisons among different research units, like researchers or journals. In this paper we propose a new global h-index (Gh-index), where the publications in the core are selected in comparison with all the publications of the units to be evaluated. Furthermore, we introduce some variants of the Gh-index to address the issue of discrimination power. We show that together with the original h-index, they can be used to evaluate and classify academic journals with some distinct advantages, in particular that they can produce an automatic classification into a number of categories without arbitrary cut-off points. We then carry out an empirical study for classification of operations research and management science (OR/MS) journals using this index, and compare it with other well-known journal ranking results such as the Association of Business Schools (ABS) Journal Quality Guide and the Committee of Professors in OR (COPIOR) ranking lists.  相似文献   

20.
在全球知识经济发展的大背景下,跨学科研究的深度和广度已经成为影响创新进程的一个重要因素,对于国家的社会经济发展和学术成长具有重要的影响。本文在梳理当前国内外跨学科测度研究的基础上,基于汤森路透(Thomson Reuters)科学引文索引(SCI)以及社会科学引文索引(SSCI)中的Web of Science分类,从学科专业化指数、学科集成指数和学科扩散指数三个方面来衡量研究的跨学科性,并通过科学地图可视化的方法来展示跨学科研究的学科分布特征。最后,本文选取2005年至2014年诺贝尔物理学奖获得者所著文献作为实证案例,通过构建的测度指标来研究这些顶级学者论文的跨学科特征。研究结果表明物理学的创新性成果具有明显的学科特征,诺贝尔物理学奖获得者们的研究成果总体上专注于物理学科领域,学科的集成程度和扩散程度还处于较低水平。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号