policy gradient theorem
Fri, 16 Jun 2017 Daniel Carmon
You can give an optimizer the gradients using methods such as Optimizer.apply_gradients
Wed, 14 Jun 2017 another guest
i have this issue still not completely clear. i thought of using tf.gradients, but then the assignment said specifically to use an optimizer, e.g. ADAM. and as the previous guest said, the optimizer takes the true cost, while tf.gradients will return the grads.
Sun, 04 Jun 2017 Daniel Carmon
In the right-hand side of eq4, we're using the derivative of the log probabilities of the agent's actions. These log probabilities come from the specific network we choose, and you should let tf differentiate them for you automatically (e.g using tf.gradients).
Sat, 03 Jun 2017 student
On the one hand, we are instructed to use the policy gradient theorem. On the other hand, the tensorflow optimizers take the true cost as parameters, and calculate the derivatives themselves. Am I missing something?
