Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
Format: PDF / Kindle (mobi) / ePub
Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making.
Define the error in the cost function as (11.55) whose dynamics are given by (11.56) Next, we define an auxiliary cost error vector as (11.57) where Y(k − 1) = [r(k − 1) r(k − 2) ... r(k − 1 − j)] and with Δσ(k) = σ(k) − σ(k − 1), 0 < j < k − 1 N and N being the set of natural real numbers. It is useful to observe that (11.57) can be rewritten as Ec(k) = [ec(k|k) ec(k|k − 1) ec(k|k − j)] where the notation ec(k|k − 1) means the cost error ec(k − 1) re-evaluated at time k using the actual.
This chapter will not discuss the continuous-time versions, in part because Frank Lewis will address that in his sections. In my own work, I have been motivated most of all by the ultimate goal of understanding and replicating the optimization and prediction capabilities of the mammal brain [2, 13]. The higher levels of the brain like the cerebral cortex and the limbic system are tightly controlled by regular “clock signals” broadcast from the nonspecific thalamus, enforcing basic rhythms of.
19. D.P. Bertsekas. Approximate policy iteration: a survey and some new methods. Journal of Control Theory and Applications, 9:310–335, 2011. 20. D.P. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Lab. for Information and Decision Systems Report LIDS-P-2349, MIT, 1996. 21. M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994. 22. D.P. Bertsekas. Dynamic Programming and.
Computer Science and Engineering, University of Washington, Seattle, WA, USA Michael Fairbank, School of Informatics, City University, London, UK Vivek Farias, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA Silvia Ferrari, Laboratory for Intelligent Systems and Control (LISC), Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA Rafael Fierro, MARHES Lab, Department of Electrical & Computer Engineering, University.
Of Business, Columbia University, New York, NY, USA Remi Munos, SequeL team, INRIA Lille– Nord Europe, France Zhen Ni, Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA Warren B. Powell, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA L.A. Prashanth, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India Danil Prokhorov, Toyota Research.