Regarding 2.4,2.6, they don't consider stochastic policies as such, so there's some work needed. ]]>

*Prove that for MDPs, stochastic policies are not better than deterministic ones. Namely, show that if a stochastic policy $\pi^*(a|s)$ obtains an optimal value function $V^*(s)$, then there is a deterministic policy $\hat\pi(a)$ that achieves the same value*

- Are these typos? I'm pretty sure $\hat\pi$ should be defined over $s$ and not $a$
- Regarding notations - $\pi^*(a|s)$ is the probability of an action given a state, whereas $\hat\pi(s)$ is the action picked when with a deterministic policy?
- If I understand correctly, proposition 2.4 and proposition 2.6 (from the scribes) almost solve this question entirely. Am I missing anything, or is it really that simple?

Thanks in advance!

]]>