Category: Sequential Decision Problems
-
Understanding PPO from first principles
Proximal Policy Optimization (PPO) algorithm is arguably the default choice in modern reinforcement learning (RL) libraries. In this post we understand how to derive PPO from first principles. First, we brush up our memory on the underlying Markov Decision Process (MDP) model. 1. Preliminaries on Markov Decision Process (MDP) In an MDP, an agent (say,…
-
Retirement, Stopping times and Bandits: The Gittins index
“A colleague of high repute asked an equally well-known colleague:— What would you say if you were told that the multi-armed bandit problem had been solved?— Sir, the multi-armed bandit problem is not of such a nature that it can be solved.” Peter Whittle In our busy daily life, while multi-tasking we are constantly faced…
-

Is Reinforcement Learning all you need?
When attacking a new problem, the algorithm designer typically follows 3 main steps: When reporting her/his work, the algorithm designer will proudly focus on step 3), briefly mention 2) and likely sweep 1) under the carpet. Yet, skimming alternatives off is a crucial step, that inevitably impacts (positively or negatively) months of hard work on…