Reinforcement Learning 1 - Value Iteration and Policy Iteration
Value iteration: start from an initial valuec $v_{0}$, Step 1: Policy update, Step 2: Value update
Policy iteration: start from an initial policy $\pi_{0}$, Step 1: Policy evaluation, Step 2: Policy improvement