首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《Journal of The Franklin Institute》2023,360(14):10564-10581
In this work, we investigate consensus issues of discrete-time (DT) multi-agent systems (MASs) with completely unknown dynamic by using reinforcement learning (RL) technique. Different from policy iteration (PI) based algorithms that require admissible initial control policies, this work proposes a value iteration (VI) based model-free algorithm for consensus of DTMASs with optimal performance and no requirement of admissible initial control policy. Firstly, in order to utilize RL method, the consensus problem is modeled as an optimal control problem of tracking error system for each agent. Then, we introduce a VI algorithm for consensus of DTMASs and give a novel convergence analysis for this algorithm, which does not require admissible initial control input. To implement the proposed VI algorithm to achieve consensus of DTMASs without information of dynamics, we construct actor-critic networks to online estimate the value functions and optimal control inputs in real time. At last, we give some simulation results to show the validity of the proposed algorithm.  相似文献   

2.
Unmanned surface vehicles (USVs) are a promising marine robotic platform for numerous potential applications in ocean space due to their small size, low cost, and high autonomy. Modelling and control of USVs is a challenging task due to their intrinsic nonlinearities, strong couplings, high uncertainty, under-actuation, and multiple constraints. Well designed motion controllers may not be effective when exposed in the complex and dynamic sea environment. The paper presents a fully data-driven learning-based motion control method for an USV based on model-based deep reinforcement learning. Specifically, we first train a data-driven prediction model based on a deep network for the USV by using recorded input and output data. Based on the learned prediction model, model predictive motion controllers are presented for achieving trajectory tracking and path following tasks. It is shown that after learning with random data collected from the USV, the proposed data-driven motion controller is able to follow trajectories or parameterized paths accurately with excellent sample efficiency. Simulation results are given to illustrate the proposed deep reinforcement learning scheme for fully data-driven motion control without any a priori model information of the USV.  相似文献   

3.
This paper focuses on the optimal tracking control problem (OTCP) for the unknown multi-input system by using a reinforcement learning (RL) scheme and nonzero-sum (NZS) game theory. First, a generic method for the OTCP of multi-input systems is formulated with steady-state controls and optimal feedback controls based on the NZS game theory. Then a three-layer neural network (NN) identifier is introduced to approximate the unknown system, and the input dynamics are obtained by using the derivative of the identifier. To transform the OTCP into a regulation optimal problem, an augmentation of the multi-input system is constructed by using the tracking error and the commanded trajectory. Moreover, we use an NN-based RL method to online learn the optimal value functions of all the inputs, which are then directly used to calculate the optimal tracking controls. All the NN weights are tuned synchronously online with a newly introduced adaptation based on the estimation error. The convergence of the NN weights and the stability of the closed-loop system are analyzed. Finally, a two-motor driven servo system and another nonlinear system are presented to illustrate the feasibility of the algorithm for both linear and nonlinear multi-input systems.  相似文献   

4.
This study presents a new framework for merging the Adaptive Fuzzy Sliding-Mode Control (AFSMC) with an off-policy Reinforcement Learning (RL) algorithm to control nonlinear under-actuated agents. In particular, a near-optimal leader-follower consensus is considered, and a new method is proposed using the framework of graphical games. In the proposed technique, the sliding variables’ coefficients are considered adaptively tuned policies to achieve an optimal compromise between the satisfactory tracking performance and the allowable control efforts. Contrary to the conventional off-policy RL algorithms for consensus control of multi-agent systems, the proposed method does not require partial knowledge of the system dynamics to initialize the RL process. Furthermore, an actor-critic fuzzy methodology is employed to approximate optimal policies using the measured input/output data. Therefore, using the tuned sliding vector, the control input for each agent is generated which includes a fuzzy term, a robust term, and a saturation compensating term. In particular, the fuzzy system approximates a nonlinear function, and the robust part of the input compensates for any possible mismatches. Furthermore, the saturation compensating gain prevents instability due to any possible actuator saturation. Based on the local sliding variables, the fuzzy singletons, the bounds of the approximation errors, and the compensating gains are adaptively tuned. Closed-loop asymptotic stability is proved using the second Lyapunov theorem and Barbalat's lemma. The method's efficacy is verified by consensus control of multiple REMUS AUVs in the vertical plane.  相似文献   

5.
In this paper, the target tracking control problem is investigated for an underactuated autonomous underwater vehicle (AUV) in the presence of actuator faults and external disturbances based on event-triggered mechanism. Firstly, the five degrees-of-freedom kinematic and dynamic models are constructed for an underactuated AUV, where the backstepping method is introduced as the major control framework. Then, radial basis function neural network (RBFNN) and adaptive control method are made full use of estimating and compensating the influences of uncertain information and actuator faults. Besides, the relative threshold event-triggered strategy is integrated into the tracking control to further reduce communication burden from the controller to the actuator. Moreover, through Lyapunov analysis, it is proved that the designed controllers guarantee that the tracking error variables of the underactuated AUV are uniformly ultimately bounded and can converge to a small neighborhood of the origin. Finally, the effectiveness and reasonableness of the designed tracking controllers are illustrated by comparative simulations.  相似文献   

6.
In precision motion systems, well-designed feedforward control can effectively compensate for the reference-induced error. This paper aims to develop a novel data-driven iterative feedforward control approach for precision motion systems that execute varying reference tasks. The feedforward controller is parameterized with the rational basis functions, and the optimal parameters are sought to be solved through minimizing the tracking error. The key difficulty associated with the rational parametrization lies in the non-convexity of the parameter optimization problem. Hence, a new iterative parameter optimization algorithm is proposed such that the controller parameters can be optimally solved based on measured data only in each task irrespective of reference variations. Two simulation cases are presented to illustrate the enhanced performance of the proposed approach for varying tasks compared to pre-existing results.  相似文献   

7.
This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time multiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton–Jacobi–Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent i. Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme.  相似文献   

8.
Text-enhanced and implicit reasoning methods are proposed for answering questions over incomplete knowledge graph (KG), whereas prior studies either rely on external resources or lack necessary interpretability. This article desires to extend the line of reinforcement learning (RL) methods for better interpretability and dynamically augment original KG action space with additional actions. To this end, we propose a RL framework along with a dynamic completion mechanism, namely Dynamic Completion Reasoning Network (DCRN). DCRN consists of an action space completion module and a policy network. The action space completion module exploits three sub-modules (relation selector, relation pruner and tail entity predictor) to enrich options for decision making. The policy network calculates probability distribution over joint action space and selects promising next-step actions. Simultaneously, we employ the beam search-based action selection strategy to alleviate delayed and sparse rewards. Extensive experiments conducted on WebQSP, CWQ and MetaQA demonstrate the effectiveness of DCRN. Specifically, under 50% KG setting, the Hits@1 performance improvements of DCRN on MetaQA-1H and MetaQA-3H are 2.94% and 1.18% respectively. Moreover, under 30% and 10% KG settings, DCRN prevails over all baselines by 0.9% and 1.5% on WebQSP, indicating the robustness to sparse KGs.  相似文献   

9.
To accurately regulate hydrogen flow and guarantee satisfactory output voltage control performance, taking advantage of the high adaptability and robustness of large-scale deep reinforcement learning, an optimal fractional-order proportion integral differential (FOPID) controller for controlling proton exchange membrane fuel cell (PEMFC) output voltage is proposed in this paper. In addition, an optimal trajectory exploration large-scale multi-delay deep deterministic policy gradient (OTEL-MD3PG) algorithm, which naturally considers the baseline FOPID coefficients in the design objective and provides the online coefficient adjusting ability through learning, is designed as the tuner of the controller to improve adaptability and robustness. This algorithm adopts the optimal trajectory exploration policy, whereby a new agent (demonstrator) generates demonstration samples that instruct the agent to learn, and another agent (tracker) adds noise to the action of the demonstrator to explore the limits of its control trajectory, thereby obtaining a more robust control strategy. The simulation results show that this proposed algorithm offers a rapid response, strong anti-interference, and excellent control performance.  相似文献   

10.
Either traditional learning methods or deep learning methods have been widely applied for the early Alzheimer’s disease (AD) diagnosis, but these methods often suffer from the issue of training set bias and have no interpretability. To address these issues, this paper proposes a two-phase framework to iteratively assign weights to samples and features. Specifically, the first phase automatically distinguishes clean samples from training samples. Training samples are regarded as noisy data and thus should be assigned different weights for penalty, while clean samples are of high quality and thus are used to learn the feature weights. In the second phase, our method iteratively assigns sample weights to the training samples and feature weights to the clean samples. Moreover, their updates are iterative so that the proposed framework deals with the training set bias issue as well as contains interpretability on both samples and features. Experimental results on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset show that our method achieves the best classification performance in terms of binary classification tasks and has better interpretability, compared to the state-of-the-art methods.  相似文献   

11.
This paper focuses on the problem of chaos control for the permanent magnet synchronous motor with chaotic oscillation, unknown dynamics and time-varying delay by using adaptive sliding mode control based on dynamic surface control. To reveal the mechanism of motor system and facilitate controller design, the dynamic behavior of the system is investigated. Nonlinear items of system model, upper bounds of time delays and their derivatives are taken as unknown in the overall process. A RBF neural network with an adaptive law, which eliminates restrictions on accurate model and parameters, is employed to cope with unknown dynamics. In order to solve issues such as chaotic oscillation, ‘explosion of complexity’ of backstepping, and chattering associated with sliding mode control, a sliding mode controller is developed within the framework of dynamic surface control by the hybrid of adaptive technology and RBF neural network. In addition, an appropriate Lyapunov function is employed to demonstrate the system stability. Finally, the feasibility of the proposed scheme is testified by simulation.  相似文献   

12.
闫盛枫 《情报科学》2021,39(9):146-154
【目的/意义】探测特定领域政策文本语义主题,揭示我国政策部署领域与未来发展趋势。【方法/过程】提出 一种融合词向量语义增强和DTM模型的公共政策文本时序建模与可视化方法,采用DTM模型实现政策文本的时 序切割和主题建模,利用深度学习Word2vec算法中Skip-gram词嵌入技术可以对上下文词汇进行有效预测,增强 其语义表达性和政策解释性,以更为准确地揭示我国公共政策的部署重点。【结果/结论】实验表明本文提出的方法 对于公共政策主题识别和政策文本量化具有更好的知识抽取和语义表达能力,对我国公共政策挖掘和信息揭示具 有良好的揭示。【创新/局限】提出融合词向量语义增强和DTM模型的公共政策文本时序建模方法,一定程度上提 升了政策文本的主题语义表达,未来考虑利用深度学习技术如LSTM算法、BERT模型等识别政策中的领域知识单 元和语法结构。  相似文献   

13.
Although deep learning breakthroughs in NLP are based on learning distributed word representations by neural language models, these methods suffer from a classic drawback of unsupervised learning techniques. Furthermore, the performance of general-word embedding has been shown to be heavily task-dependent. To tackle this issue, recent researches have been proposed to learn the sentiment-enhanced word vectors for sentiment analysis. However, the common limitation of these approaches is that they require external sentiment lexicon sources and the construction and maintenance of these resources involve a set of complexing, time-consuming, and error-prone tasks. In this regard, this paper proposes a method of sentiment lexicon embedding that better represents sentiment word's semantic relationships than existing word embedding techniques without manually-annotated sentiment corpus. The major distinguishing factor of the proposed framework was that joint encoding morphemes and their POS tags, and training only important lexical morphemes in the embedding space. To verify the effectiveness of the proposed method, we conducted experiments comparing with two baseline models. As a result, the revised embedding approach mitigated the problem of conventional context-based word embedding method and, in turn, improved the performance of sentiment classification.  相似文献   

14.
In recent years, reasoning over knowledge graphs (KGs) has been widely adapted to empower retrieval systems, recommender systems, and question answering systems, generating a surge in research interest. Recently developed reasoning methods usually suffer from poor performance when applied to incomplete or sparse KGs, due to the lack of evidential paths that can reach target entities. To solve this problem, we propose a hybrid multi-hop reasoning model with reinforcement learning (RL) called SparKGR, which implements dynamic path completion and iterative rule guidance strategies to increase reasoning performance over sparse KGs. Firstly, the model dynamically completes the missing paths using rule guidance to augment the action space for the RL agent; this strategy effectively reduces the sparsity of KGs, thus increasing path search efficiency. Secondly, an iterative optimization of rule induction and fact inference is designed to incorporate global information from KGs to guide the RL agent exploration; this optimization iteratively improves overall training performance. We further evaluated the SparKGR model through different tasks on five real world datasets extracted from Freebase, Wikidata and NELL. The experimental results indicate that SparKGR outperforms state-of-the-art baseline models without losing interpretability.  相似文献   

15.
In this paper, an integrated design of data-driven fault-tolerant tracking control is addressed relying on the Markov parameters sequence identification and adaptive dynamic programming techniques. For the unknown model systems, the sequence of Markov parameters together with the covariance of innovation signal is firstly estimated by least square method. After a transformation of value function from stochastic to deterministic, a policy iteration adaptive dynamic programming algorithm is then formulated to find the optimal tracking control law. In order to eliminate the influence of unpredicted faults, an active fault-tolerant supervisory control strategy is further constructed by synthesizing fault detection, isolation, estimation and compensation. All these involved designs are performed in the data-driven manner, and thus avoid the information requirement about system drift dynamics. From the perspective of system operation management, the above integrated control scheme provides a framework to achieve the tracking performance optimization, monitoring and maintaining simultaneously. The effectiveness of these conclusions is finally verified via two case studies.  相似文献   

16.
This paper proposes a new method for semi-supervised clustering of data that only contains pairwise relational information. Specifically, our method simultaneously learns two similarity matrices in feature space and label space, in which similarity matrix in feature space learned by adopting adaptive neighbor strategy while another one obtained through tactful label propagation approach. Moreover, the above two learned matrices explore the local structure (i.e., learned from feature space) and global structure (i.e., learned from label space) of data respectively. Furthermore, most of the existing clustering methods do not fully consider the graph structure, they can not achieve the optimal clustering performance. Therefore, our method forcibly divides the data into c clusters by adding a low rank restriction on the graphical Laplacian matrix. Finally, a restriction of alignment between two similarity matrices is imposed and all items are combined into a unified framework, and an iterative optimization strategy is leveraged to solve the proposed model. Experiments in practical data show that our method has achieved brilliant performance compared with some other state-of-the-art methods.  相似文献   

17.
In this paper, a numerical method to solve nonlinear optimal control problems with terminal state constraints, control inequality constraints and simple bounds on the state variables, is presented. The method converts the optimal control problem into a sequence of quadratic programming problems. To this end, the quasilinearization method is used to replace the nonlinear optimal control problem with a sequence of constrained linear-quadratic optimal control problems, then each of the state variables is approximated by a finite length Chebyshev series with unknown parameters. The method gives the information of the quadratic programming problem explicitly (The Hessian, the gradient of the cost function and the Jacobian of the constraints). To show the effectiveness of the proposed method, the simulation results of two constrained nonlinear optimal control problems are presented.  相似文献   

18.
This paper presents the problems of state space model identification of multirate processes with unknown time delay. The aim is to identify a multirate state space model to approximate the parameter-varying time-delay system. The identification problems are formulated under the framework of the expectation maximization algorithm. Through introducing two hidden variables, a new expectation maximization algorithm is derived to estimate the unknown model parameters and the time-delays simultaneously. The effectiveness of the proposed algorithm is validated by a simulation example.  相似文献   

19.
An adaptive dynamic programming controller based on backstepping method is designed for the optimal tracking control of hypersonic flight vehicles. The control input is divided into two parts namely stable control and optimal control. First, the back-stepping method is exploited via neural networks (NNs) to estimate unknown functions. Then, the computational load is reduced by the minimal-learning-parameter (MLP) scheme. To avoid the problem of “explosion of terms”, a first-order filter is adopted. Next, the optimal controller is designed based on the adaptive dynamic programming. In order to solve the Hamiltonian equation, NNs estimators are introduced to approximate performance indicators, achieving the approximate optimal control of hypersonic flight vehicles. Finally, the effectiveness and advantages of the control method are verified by simulation results.  相似文献   

20.
Graph neural networks have been frequently applied in recommender systems due to their powerful representation abilities for irregular data. However, these methods still suffer from the difficulties such as the inflexible graph structure, sparse and highly imbalanced data, and relatively shallow networks, limiting rate prediction ability for recommendations. This paper presents a novel deep dynamic graph attention framework based on influence and preference relationship reconstruction (DGA-IPR) for recommender systems to learn optimal latent representations of users and items. The entire framework involves a user branch and an item branch. An influence-based dynamic graph attention (IDGA) module, a preference-based dynamic graph attention (PDGA) module, and an adaptive fine feature extraction (AFFE) module are respectively constructed for each branch. Concretely, the first two attention modules concentrate on reconstructing influence and preference relationship graphs, breaking imbalanced and fixed constraints of graph structures. Then a deep feature aggregation block and an adaptive feature fusion operation are built, improving the network depth and capturing potential high-order information expressions. Besides, AFFE is designed to acquire finer latent features for users and items. The DGA-IPR architecture is formed by integrating IDGA, PDGA, and AFFE for users and items, respectively. Experiments reveal the superiority of DGA-IPR over existing recommendation models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号