L1 Regularized Value Function Approximation for Reinforcement Learning

Christopher Painter-Wakefield and Ronald Parr. L1 Regularized Linear Temporal Difference Learning. Technical Report CS-2012-01.

Several recent efforts in the field of reinforcement learning have focused attention on the importance of regularization, but the techniques for incorporating regularization into reinforcement learning algorithms, and the effects of these changes upon the convergence of these algorithms, are ongoing areas of research. In particular, little has been written about the use of regularization in online reinforcement learning. In this paper, we describe a novel online stochastic approximation algorithm for reinforcement learning. We prove convergence of the online algorithm and show that the L1 regularized linear fixed point of LARS-TD and LC-TD is an equilibrium fixed point of the algorithm.

Jeff Johns, Christopher Painter-Wakefield, and Ronald Parr. Linear Complementarity for Regularized Policy Evaluation and Improvement. Advances in Neural Information Processing Systems 23 (NIPS 2010).

Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation: LARS-TD. The LCP formulation allows the use of efficient off-the-shelf solvers, leads to stronger convergence and uniqueness results, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a "greedy" homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization.