how does the policy gradient theorem reduce the variance?
I think I understand the derivation of the policy gradient theorem presented in the scribe rl_class1, section 2.3. However, I do not understand why does it reduce the variance? Is it because the new expression for the gradient has less terms?
