Question 2 - typo and/or triviallity? - Advanced Methods in Machine Learning

Question 2 - typo and/or triviallity?

Forum » Discussions / HW4 » Question 2 - typo and/or triviallity?

Started by:

Barak (guest)
Date: 11 Jun 2017 10:01
Number of posts: 3

RSS: New posts

Unfold All Fold All More Options

Fold

Question 2 - typo and/or triviallity?

Barak (guest) 11 Jun 2017 10:01

In question 2 in the homework, we have

Prove that for MDPs, stochastic policies are not better than deterministic ones. Namely, show that if a stochastic policy $\pi^*(a|s)$ obtains an optimal value function $$V^*(s)$$ , then there is a deterministic
policy $\hat\pi(a)$ that achieves the same value

Are these typos? I'm pretty sure $\hat\pi$ should be defined over $$s$$ and not $$a$$
Regarding notations - $\pi^*(a|s)$ is the probability of an action given a state, whereas $\hat\pi(s)$ is the action picked when with a deterministic policy?
If I understand correctly, proposition 2.4 and proposition 2.6 (from the scribes) almost solve this question entirely. Am I missing anything, or is it really that simple?

Thanks in advance!

Reply Options

Unfold Question 2 - typo and/or triviallity? by

Barak (guest), 11 Jun 2017 10:01

Fold

Amir (guest) 15 Jun 2017 13:13

Thanks for the typo fixed. I'll fix.
Regarding 2.4,2.6, they don't consider stochastic policies as such, so there's some work needed.

Reply Options

Unfold by

Amir (guest), 15 Jun 2017 13:13

Fold

Amir (guest) 15 Jun 2017 13:14

And yes regarding your second question.

Reply Options

Unfold by

Amir (guest), 15 Jun 2017 13:14

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License