It's written in the instructions that bi is from dimension ni.

shouldn't it be ni-1, since each layer in the network we multiply x with Wi and Wi dimensions are niX(ni-1), i.e the result vector dimension of that layer is smaller by 1?

also, i don't understand from the description how we end with a vector of 4 dimensions. we have 3 layer, each layer shrinks the vector by 1 according to the description, since each Wi is of size niX(ni-1), but n0=8 and 0<=i<=3, so n4 should be 5.

does that mean we need to choose one of the layers to be niX(ni-2)? or did you mean something else?