Hi,

The performance of our agent is dependent on (i.e is a function of) it's parameters. This "performance" could be a function we'll want to maximize, or to minimize. Since we can always multiply it by -1, it doesn't matter as long as we're consistent.

Now lets remember why we're looking at the gradient in the first place.

From Calculus I, if $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is a differentiable function, and $h$ is a unit vector in $\mathbb{R}^n$, then for every $\epsilon > 0$ we have:

(1)
\begin{align} f(x+\epsilon h) = f(x)+\epsilon <\nabla{f(x)},h> +O(\epsilon^2) \end{align}

If $\epsilon$ is small enough, the third term on the right is negligible, and thus if you have an $h$ that maximizes the second term, that direction will be the direction that maximizes $f(x+\epsilon h)$, thus the direction of steepest ascent.

Likewise, the opposite direction will be the direction of steepest descent.

From Linear Algebra I, we know that for a fixed vector $x_0$, the vector $x$ which maximizes $<x_0,x>$ is $x_0$ itself.

Going back to Eq.1, we see that if we have a point $x$, and we want to find a point $x'$ for which $f(x) \leq f(x')$, then we can use $x' = x+\epsilon \nabla{f(x)}$ for a small enough epsilon.

Likewise, if we want to find a point $x'$ for which $f(x) \geq f(x')$, then we can use $x' = x-\epsilon \nabla{f(x)}$.

In gradient ascent/descent, we try to find a point $x_{opt}$ which maximizes/minimizes $f$, by starting with some random point $x_0$, and then iteratively define $x_{i+1} = x_i \pm \epsilon_i \nabla{f(x_i)}$, where + is for ascent and - is for descent.

This is what we do in the algorithm described in the HW, except that we use an elaborate estimation of the gradient instead of the real one, and we use the ADAM optimizer to guess for us a good $\epsilon_i$, and do some other tricks.

So in conclusion, if you take your updates steps to be in the direction of the gradient, the function who's gradient you take will be maximized. If you take them to be in the opposite direction, that function will be minimized.