policy gradient theorem
http://adv-ml-2017.wikidot.com/forum/t-2306503/policy-gradient-theorem
Posts in the discussion thread "policy gradient theorem"Sat, 17 Apr 2021 05:36:58 +0000http://adv-ml-2017.wikidot.com/forum/t-2306503#post-2862315(no title)
http://adv-ml-2017.wikidot.com/forum/t-2306503/policy-gradient-theorem#post-2862315
Fri, 16 Jun 2017 21:43:17 +0000Daniel Carmon1601183
You can give an optimizer the gradients using methods such as Optimizer.apply_gradients
]]>
http://adv-ml-2017.wikidot.com/forum/t-2306503#post-2853017(no title)
http://adv-ml-2017.wikidot.com/forum/t-2306503/policy-gradient-theorem#post-2853017
Wed, 14 Jun 2017 19:27:04 +0000another guest
i have this issue still not completely clear. i thought of using tf.gradients, but then the assignment said specifically to use an optimizer, e.g. ADAM. and as the previous guest said, the optimizer takes the true cost, while tf.gradients will return the grads.
]]>
http://adv-ml-2017.wikidot.com/forum/t-2306503#post-2841672Re: policy gradient theorem
http://adv-ml-2017.wikidot.com/forum/t-2306503/policy-gradient-theorem#post-2841672
Sun, 04 Jun 2017 15:43:28 +0000Daniel Carmon1601183
In the right-hand side of eq4, we're using the derivative of the log probabilities of the agent's actions. These log probabilities come from the specific network we choose, and you should let tf differentiate them for you automatically (e.g using tf.gradients).
]]>
http://adv-ml-2017.wikidot.com/forum/t-2306503#post-2840786policy gradient theorem
http://adv-ml-2017.wikidot.com/forum/t-2306503/policy-gradient-theorem#post-2840786
Sat, 03 Jun 2017 16:48:19 +0000student
On the one hand, we are instructed to use the policy gradient theorem. On the other hand, the tensorflow optimizers take the true cost as parameters, and calculate the derivatives themselves. Am I missing something?
]]>