How can we find its hessian, or even the gradient? how do we take its derivative in every coordinate? ]]>

In the begin of the semester we said that:

p(x) = 1/Z * sum ij∈E (φij (xi, xj ))

And in the approximate inference we said that:

p(x) = 1/Z * (sum ij∈E (φij (xi, xj ) + sum i∈E (φi (xi))

what is the correct definition? Witch one of them should be used in Q4?

If the second is correct, why we add the singleton part?

ido

]]>i don't understand why the conditions bounds the solution.

for teta_i, teta_j, i can choose any positive integer i like - for example 10 and 10, and for teta_ij i can choose 9.

it seems this assignment stands in all the conditions, and it seems we can choose any greater positive integer and it we will still ger good assignment.

If there is not bound then there is no solution that can max f(teta), right?

]]>How is transforming the teta matrix into a matrix that has zeros in every cell but the bottom right doesn't cause data loss on the distribution? Since we consider only p(1,1) for every ij in E when maximizing.

Thank!

]]>In our case - it is $n$, right?

Thanks

]]>Its not clear from the question.

Thanks!

]]>I did not understand something about the importance sampling.

At the question it is written that Eq[Z]=Ep[f(X)] but f was not used at the side of Eq[Z].

At class we used f that is an indicator function. Is that what meant here also? ]]>

Only the

λ = mini: τij>0 τij

instead of λ = mini: τi>0 τi

Otherwise we may get negative values in z.

]]>p(x) that has maximum entropy among all distributions that satisfy Ep [fi(x)] for all i."

What is the question here? What do we need to show? This is really unfathomable…

]]>if for some i,j we have si>0, sj>0, then we can have τi=τj=τij that go to infinity together with f.

Is there a mistake in the constraints? Or am I missing something?

]]>