Format

Send to

Choose Destination
Neural Comput. 2017 Oct;29(10):2800-2824. doi: 10.1162/neco_a_01004. Epub 2017 Aug 4.

Nonconvex Policy Search Using Variational Inequalities.

Author information

1
School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163, U.S.A. yusen.zhan@wsu.edu.
2
Department of Computer Science, American University of Beirut, 1107 2020, Lebanon hb71@aub.edu.lb.
3
School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163, U.S.A. taylorm@eecs.wsu.edu.

Abstract

Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies. These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

PMID:
28777726
DOI:
10.1162/neco_a_01004

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center