首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time multiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton–Jacobi–Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent i. Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme.  相似文献   

2.
In the present paper, we study stochastic boundary control problems where the system dynamics is a controlled stochastic parabolic equation with Neumann boundary control and boundary noise. Under some assumptions, the continuity and differentiability of the value function are proved. We also define a new type of Hamilton–Jacobi–Bellman (HJB) equation and prove that the value function is a viscosity solution of this HJB equation.  相似文献   

3.
In this work, we consider an optimal control problem of a class of stochastic differential equations driven by additive noise with aftereffect appearing in control. We develop a semigroup theory of the driving deterministic neutral system and identify explicitly the adjoint operator of the corresponding infinitesimal generator. We formulate the time delay equation under consideration into an infinite dimensional stochastic control system without time lag by means of the adjoint theory established. Consequently, we can deal with the associated optimal control problem through the study of a Hamilton–Jacob–Bellman (HJB) equation. Last, we present an example whose optimal control can be explicitly determined to illustrate our theory.  相似文献   

4.
In this paper, a novel backstepping-based adaptive dynamic programming (ADP) method is developed to solve the problem of intercepting a maneuver target in the presence of full-state and input constraints. To address state constraints, a barrier Lyapunov function is introduced to every backstepping procedure. An auxiliary design system is employed to compensate the input constraints. Then, an adaptive backstepping feedforward control strategy is designed, by which the tracking problem for strict-feedback systems can be reduced to an equivalence optimal regulation problem for affine nonlinear systems. Secondly, an adaptive optimal controller is developed by using ADP technique, in which a critic network is constructed to approximate the solution of the associated Hamilton–Jacobi–Bellman (HJB) equation. Therefore, the whole control scheme consists of an adaptive feedforward controller and an optimal feedback controller. By utilizing Lyapunov's direct method, all signals in the closed-loop system are guaranteed to be uniformly ultimately bounded (UUB). Finally, the effectiveness of the proposed strategy is demonstrated by using a simple nonlinear system and a nonlinear two-dimensional missile-target interception system.  相似文献   

5.
In this paper, based on the Smith iteration (Smith, 1968), an inner-outer (IO) iteration algorithm for solving the coupled Lyapunov matrix equations (CLMEs) is presented. First, the IO iteration algorithm for solving the Sylvester matrix equation is proposed, and its convergence is analyzed in detail. Second, the IO iteration algorithm for solving the CLMEs is constructed. By utilizing the latest estimation, a current-estimation-based and two weighted IO iteration algorithms are also given for solving the CLMEs, respectively. Convergence analyses indicate that the iteration solutions generated by these algorithms always converge to the unique solutions to the CLMEs for any initial conditions. Finally, Several numerical examples are provided to show the superiority of the proposed numerical algorithms.  相似文献   

6.
This paper develops a new dual ML-ADHDP method to solve the optimal consensus problem (OCP) of a class of heterogeneous discrete-time nonlinear multi-agent systems (MASs) with unknown dynamics and time delay. A hierarchical and distributed control strategy is used to transform the original problem into nonlinear model reference adaptive control (MRAC) problems and an OCP of virtual linear MASs. For the nonlinear MRAC problems, a new multi-layer action-dependent heuristic dynamic programming (ML-ADHDP) method is developed to overcome the unknown dynamics and neural network estimation errors, which has higher control accuracy. In order to solve the OCP of virtual linear MASs and improve the convergence speed, a new multi-layer performance index is proposed. Then the ML-ADHDP method is used to solve the coupled Hamiltonian–Jacobi–Bellman equation and obtain the optimal virtual control. Theoretical analysis proves that the original MASs can achieve Nash equilibrium, and simulation results show that the developed dual ML-ADHDP method ensures better convergence speed and higher control accuracy of original MASs.  相似文献   

7.
This paper focuses on the numerical solution of a class of generalized coupled Sylvester-conjugate matrix equations, which are general and contain many significance matrix equations as special cases, such as coupled discrete-time/continuous-time Markovian jump Lyapunov matrix equations, stochastic Lyapunov matrix equation, etc. By introducing the modular operator, a cyclic gradient based iterative (CGI) algorithm is provided. Different from some previous iterative algorithms, the most significant improvement of the proposed algorithm is that less information is used during each iteration update, which is conducive to saving memory and improving efficiency. The convergence of the proposed algorithm is discussed, and it is verified that the algorithm converges for any initial matrices under certain assumptions. Finally, the effectiveness and superiority of the proposed algorithm are verified with some numerical examples.  相似文献   

8.
分布式水循环模型的参数优化算法比较及应用   总被引:1,自引:0,他引:1  
孙波扬  张永勇  门宝辉  张士锋 《资源科学》2013,35(11):2217-2223
分布式水文模型的优势在于还原水文过程的时空变异性,可以很好地模拟和反映各种水文要素和下垫面因素的时空分布不均匀性。由此也导致模型参数过多,在子流域过多的情况下,人工调节参数繁琐复杂,应用优化算法实现参数自动调节成为首选。本文选取石羊河流域九条岭站1988-2005年实测径流资料,分别应用SCE-UA算法、遗传算法(GA)和粒子群算法(PSO)对分布式水循环模型(时变增益模型)进行参数率定,对比3种算法的收敛速度、所需迭代次数和算法稳定性。结果表明:通过SCE-UA、GA和PSO的优化,模型水平衡系数都控制在0.0左右,而相关系数和效率系数分别能达到0.90和0.84以上,模拟精度较好。但粒子群算法的全局搜索能力和收敛速度优于SCE-UA和遗传算法,所需迭代次数最少,初值敏感性小,更适合时变增益模型的参数寻优,有很高的扩展性和改进潜力。  相似文献   

9.
The paper is indicated to constructing a modified conjugate gradient iterative (MCG) algorithm to solve the generalized periodic multiple coupled Sylvester matrix equations. It can be proved that the proposed approach can find the solution within finite iteration steps in the absence of round-off errors. Furthermore, we provide a method for choosing the initial matrices to obtain the least Frobenius norm solution of the system. Some numerical examples are illustrated to show the performance of the proposed approach and its superiority over the existing method CG.  相似文献   

10.
In this paper, a novel tracking control scheme for continuous-time nonlinear affine systems with actuator faults is proposed by using a policy iteration (PI) based adaptive control algorithm. According to the controlled system and desired reference trajectory, a novel augmented tracking system is constructed and the tracking control problem is converted to the stabilizing issue of the corresponding error dynamic system. PI algorithm, generally used in optimal control and intelligence technique fields, is an important reinforcement learning method to solve the performance function by critic neural network (NN) approximation, which satisfies the Lyapunov equation. For the augmented tracking error system with actuator faults, an online PI based fault-tolerant control law is proposed, where a new tuning law of the adaptive parameter is designed to tolerate four common kinds of actuator faults. The stability of the tracking error dynamic with actuator faults is guaranteed by using Lyapunov theory, and the tracking errors satisfy uniformly bounded as the adaptive parameters get converged. Finally, the designed fault-tolerant feedback control algorithm for nonlinear tracking system with actuator faults is applied in two cases to track the desired reference trajectory, and the simulation results demonstrate the effectiveness and applicability of the proposed method.  相似文献   

11.
This paper focuses on constructing a conjugate gradient-based (CGB) method to solve the generalized periodic coupled Sylvester matrix equations in complex space. The presented method is developed from a point of conjugate gradient methods. It is proved that the presented method can find the solution of the considered matrix equations within finite iteration steps in the absence of round-off errors by theoretical derivation. Some numerical examples are provided to verify the convergence performance of the presented method, which is superior to some existing numerical algorithms both in iteration steps and computation time.  相似文献   

12.
In this paper, a novel complete model-free integral reinforcement learning (CMFIRL) algorithm based fault tolerant control scheme is proposed to solve the tracking problem of steer-by-wire (SBW) system. We begin with the recognition that the reference errors can eventually converge to zero based on the command generator model. Then an augmented tracking system is constructed with a corresponding performance index which is considered as a type of actuator failure. By using the reinforcement learning (RL) technique, three novel online update strategies are respectively developed to cope with the following three cases, i.e., model-based, partially model-free, and completely model-free. Especially, the RL algorithm for the complete model-free case eliminates the constraints of requiring the known system dynamics in fault-tolerant tracking controlling. The system stability and the convergence of the CMFIRL iteration algorithm are also rigorously proved. Finally, a simulation example is given to illustrate the effectiveness of the proposed approach.  相似文献   

13.
文章构建了我国生态—经济—科技系统耦合协调发展评价指标体系,采用熵值法进行客观赋权,然后引入耦合协调发展的函数模型,测算了2014年我国省域生态—经济—科技系统的耦合度和协调度,最后借助空间计量模型对耦合协调度的影响因素进行回归分析。研究结果表明:(1)我国省域生态—经济—科技系统耦合度多处在拮抗时期,发展特点表现为生态环境逐渐被破坏,承载力变小;(2)我国省域生态—经济—科技系统耦合协调度基础处于勉强协调状态;(3)经济结构、经济效益以及科技成果转化水平是影响耦合协调度的突出因素。  相似文献   

14.
This paper is concerned with the resilient dynamic output-feedback (DOF) distributed model predictive control (DMPC) problem for discrete-time polytopic uncertain systems under synchronous Round-Robin (RR) scheduling. In order to alleviate the computation burden and improve the system robustness against uncertainties, the global system is decomposed into several subsystems, where each subsystem under synchronous RR scheduling communicates with each other via a network. The RR scheduling is adopted to avoid data collisions, however the updating information at each time instant is unfortunately reduced, and the underlying RR scheduling of subsystems are deeply coupled. The main purpose of this paper is to design a set of resilient DOF-based DMPC controllers for systems under the consideration of polytopic uncertainties and synchronous RR scheduling, such that the desirable performance can be obtained at a low cost of computational time. A novel distributed performance index dependent of the synchronous RR scheduling is constructed, where the last iteration information from the neighbor subsystems is used to deal with various couplings. Then, by resorting to the distributed RR-dependent Lyapunov-like approach and inequality analysis technique, a certain upper bound of the objective is put forward to establish a solvable auxiliary optimization problem (AOP). Moreover, by using the Jacobi iteration algorithm to solve such a problem online, the distributed feedback gains are directly obtained to guarantee the convergence of system states. Finally, two examples including a distillation process example and a numerical example are employed to show the effectiveness of the proposed resilient DMPC strategy.  相似文献   

15.
This paper analyzes synchronization in finite time for two types of coupled delayed Cohen–Grossberg neural networks (CDCGNNs). In the first type, linearly coupled Cohen–Grossberg neural networks with and without coupling delays are considered, respectively. In the second type, nonlinearly coupled Cohen–Grossberg neural networks both with and without coupling delays are discussed. By designing suitable controllers and using some inequality techniques, several criteria ensuring finite-time synchronization of the CDCGNNs with linear coupling and nonlinear coupling are derived, respectively. Moreover, the settling times of synchronization in finite time for the considered networks are also predicted. In the end, the availability for the acquired finite-time synchronization conditions is confirmed by two selected numerical examples.  相似文献   

16.
《Journal of The Franklin Institute》2022,359(17):10172-10205
Recently, the sparsity-aware sign subband adaptive filter algorithm with individual-weighting-factors (S-IWF-SSAF) was devised. To accomplish performance enhancement, the variable parameter S-IWF-SSAF (VP-S-IWF-SSAF) algorithm was developed through optimizing the step-size and penalty factor, respectively. Different from the optimization scheme, we devise a family of variable step-size strategy S-IWF-SSAF (VSS-S-IWF-SSAF) algorithms based on the transient model of algorithms via minimizing the mean-square deviation (MSD) on each iteration with some reasonable and frequently adopted assumptions and Price's theorem. And in order to enhance the tracking capability, an effective reset mechanism is also incorporated into the proposed algorithms. It is worth mentioning that the presented algorithms could acquire lower computational requirements and exhibit higher steady-state estimation accuracy obviously and acceptable tracking characteristic in comparison to the VP-S-IWF-SSAF algorithm. In addition, the stable step-size range in the mean and mean square sense and steady-state performance are concluded. And the computational requirements are exhibited as well. Monte-Carlo simulations for system identification and adaptive echo cancellation applications certify the proposed algorithms acquire superior performance in contrast to other related algorithms within various system inputs under impulsive interference environments.  相似文献   

17.
To perform repetitive tasks, this paper proposes an adaptive boundary iterative learning control (ILC) scheme for a two-link rigid–flexible manipulator with parametric uncertainties. Using Hamilton?s principle, the coupled ordinary differential equation and partial differential equation (ODE–PDE) dynamic model of the system is established. In order to drive the joints to follow desired trajectory and eliminate deformation of flexible beam simultaneously, boundary control strategy is added based on the conventional joints torque control. The adaptive iterative learning algorithm for boundary control scheme includes a proportional-derivative (PD) feedback structure and an iterative term. This novel controller is designed to deal with the unmodeled dynamics and other unknown external disturbances. Numerical simulations are provided to verify the performance of proposed controller in MATLAB.  相似文献   

18.
In this paper, combining the multi-step Smith-inner-outer (MSIO) iteration framework with some tunable parameters, a relaxed MSIO iteration method is proposed for solving the Sylvester matrix equation and coupled Lyapunov matrix equations (CLMEs) in the discrete-time jump linear systems with Markovian transitions. The convergence properties of the relaxed MSIO iteration method are investigated, and the choices of the parameters are also discussed. In order to accelerate the convergence rate of the relaxed MSIO iteration method for solving the CLMEs, a current-estimation-based and a weighted relaxed MSIO iteration algorithms are presented, respectively. Finally, several numerical examples are given to verify the superiorities of the proposed relaxed algorithms.  相似文献   

19.
为探究精益建设技术与项目绩效之间的内在作用机理,构建基于BP和SVM变量筛选的6S、可视化管理、最后计划者等7种精益建设技术与知识能力、财务、业主等5个项目绩效分项指标和综合指标的耦合模型。仿真结果表明:在精益建设技术特征与项目绩效分项指标的耦合模型仿真分析中,基于GA-BP的预测模型比标准BP神经网络模型精度要高;在精益建设技术特征与项目绩效综合指标的耦合模型仿真分析中,基于SVM的预测模型比GA-BP的预测模型精度要高。另外,利用BP和SVM结合MIV算法进一步探究不同精益建设技术对项目绩效各指标和综合指标的影响程度。研究结果为项目利益相关者提高项目管理绩效提供决策支持。  相似文献   

20.
《Journal of The Franklin Institute》2023,360(14):10564-10581
In this work, we investigate consensus issues of discrete-time (DT) multi-agent systems (MASs) with completely unknown dynamic by using reinforcement learning (RL) technique. Different from policy iteration (PI) based algorithms that require admissible initial control policies, this work proposes a value iteration (VI) based model-free algorithm for consensus of DTMASs with optimal performance and no requirement of admissible initial control policy. Firstly, in order to utilize RL method, the consensus problem is modeled as an optimal control problem of tracking error system for each agent. Then, we introduce a VI algorithm for consensus of DTMASs and give a novel convergence analysis for this algorithm, which does not require admissible initial control input. To implement the proposed VI algorithm to achieve consensus of DTMASs without information of dynamics, we construct actor-critic networks to online estimate the value functions and optimal control inputs in real time. At last, we give some simulation results to show the validity of the proposed algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号