# Variational methods

Variational methods in statistical mechanics are very important since the provide a tool to formulate mean field theories which are valid for any temperature range and with order parameters of essentially arbitrary complexity. Their central idea is what one would expect: if ${\displaystyle {\mathcal {H}}}$ is the Hamiltonian of a physical system and ${\displaystyle \psi _{\alpha }}$ is a set of arbitrary trial states, then we can obtain the energy of the ground state of the system by minimizing ${\displaystyle \left\langle {\mathcal {H}}\right\rangle _{\psi _{\alpha }}}$ with respect to ${\displaystyle \psi _{\alpha }}$; since every ${\displaystyle \psi _{\alpha }}$ will be in general a function or an even more complex object, ${\displaystyle \left\langle {\mathcal {H}}\right\rangle _{\psi _{\alpha }}}$ in general is a functional so its minimization must be intended in the sense of functional analysis. We will see that however the only mean value of the Hamiltonian won't be sufficient since we know that the equilibrium configurations of the system are given by the minima of the free energy. In other words, we will compute the free energy using some trial states ${\displaystyle \psi _{\alpha }}$ and then minimize it in the ways we will explain.

Such variational methods are also used in quantum mechanics when a system is too complex and its Schro"dinger equation can't be solved exactly: in this case one introduces a set ${\displaystyle |\psi _{\alpha }\rangle }$ of trial wave functions and minimizes the functional ${\displaystyle E_{\alpha }=\langle \psi _{\alpha }|{\mathcal {H}}|\psi _{\alpha }\rangle }$ with respect to ${\displaystyle |\psi _{\alpha }\rangle }$, so that both the wave functions and the energy of the ground state of the system can be found. In statistical mechanics variational methods are performed using the phase space equilibrium probability density of the system. In particular, the approach of variational methods in statistical mechanics is based upon two inequalities which we now show.

Theorem

Let ${\displaystyle \varphi }$ be a random variable (it can be either discrete or continuous), and call its probability density ${\displaystyle \rho }$; clearly, for any function ${\displaystyle f}$ of ${\displaystyle \varphi }$ the mean value of ${\displaystyle f}$ is defined as:

${\displaystyle \left\langle f(\varphi )\right\rangle _{\rho }:=\operatorname {Tr} (\rho (\varphi )f(\varphi ))}$

• If ${\displaystyle f}$ is the exponential function then this inequality holds:

${\displaystyle \left\langle e^{-\lambda \varphi }\right\rangle _{\rho }\geq e^{-\lambda \left\langle \varphi \right\rangle _{\rho }}\quad \forall \rho ,\quad \forall \lambda \in \mathbb {R} }$

• If ${\displaystyle {\mathcal {H}}(\varphi )}$ is the Hamiltonian of a system and ${\displaystyle F}$ its free energy, then:

${\displaystyle F\leq \operatorname {Tr} (\rho {\mathcal {H}})+k_{B}T\operatorname {Tr} (\rho \ln \rho )\qquad \forall \rho }$

Proof
• Supposing ${\displaystyle \varphi }$ a real number, from the Taylor expansion of the exponential we have ${\displaystyle e^{-\varphi }\geq 1-\varphi }$ and so (we omit the subscript ${\displaystyle \rho }$ on mean values for simplicity):

${\displaystyle e^{-\lambda \varphi }=e^{-\lambda (\varphi +\left\langle \varphi \right\rangle -\left\langle \varphi \right\rangle )}=e^{-\lambda \left\langle \varphi \right\rangle }e^{-\lambda (\varphi -\left\langle \varphi \right\rangle )}\geq e^{-\lambda \left\langle \varphi \right\rangle }\left[1-\lambda (\varphi -\left\langle \varphi \right\rangle )\right]}$
and therefore taking the mean value of both sides:
${\displaystyle \left\langle e^{-\lambda \varphi }\right\rangle \geq \left\langle e^{-\lambda \left\langle \varphi \right\rangle }\right\rangle -\lambda e^{-\lambda \left\langle \varphi \right\rangle }\left\langle (\varphi -\left\langle \varphi \right\rangle )\right\rangle =e^{-\lambda \left\langle \varphi \right\rangle }}$

• The canonical partition function of the system can be written as:

${\displaystyle Z=\operatorname {Tr} e^{-\beta {\mathcal {H}}}=\operatorname {Tr} \left(\rho e^{-\beta {\mathcal {H}}-\ln \rho }\right)=\left\langle e^{-\beta {\mathcal {H}}-\ln \rho }\right\rangle \geq e^{-\beta \left\langle {\mathcal {H}}\right\rangle -\left\langle \ln \rho \right\rangle }}$
where the last step comes from the first inequality. Since ${\displaystyle Z=e^{-\beta F}}$ with ${\displaystyle F}$ the free energy of the system, we have:
${\displaystyle e^{-\beta F}\geq e^{-\beta \left\langle {\mathcal {H}}\right\rangle -\left\langle \ln \rho \right\rangle }}$
and taking the logarithm:
${\displaystyle F\leq \left\langle {\mathcal {H}}\right\rangle +k_{B}T\left\langle \ln \rho \right\rangle =\operatorname {Tr} (\rho {\mathcal {H}})+k_{B}T\operatorname {Tr} (\rho \ln \rho )}$

Remember also that since ${\displaystyle \rho }$ is a probability distribution it must satisfy:

${\displaystyle \rho (\varphi )\geq 0\quad \qquad \operatorname {Tr} \rho =1}$

Now, the free energy ${\displaystyle F}$ of the system is a functional of the probability density ${\displaystyle \rho }$ and from what we have just seen we can set an upper bound to ${\displaystyle F}$:

${\displaystyle F\leq \left\langle {\mathcal {H}}\right\rangle +k_{B}T\left\langle \ln \rho \right\rangle =\operatorname {Tr} (\rho {\mathcal {H}})+k_{B}T\operatorname {Tr} (\rho \ln \rho ):=F_{\text{m.f.}}}$
where "m.f." stands for "mean field". In other words we can estimate the free energy of the system in mean field theories as ${\displaystyle \left\langle {\mathcal {H}}\right\rangle +k_{B}T\left\langle \ln \rho \right\rangle }$, and the best approximation of the real free energy will be given by the minimization of ${\displaystyle F_{\text{m.f.}}}$. In particular, the ground state configuration of the system will be given by the form of ${\displaystyle \rho }$ that minimizes ${\displaystyle F}$, which can be easily determined in general:
${\displaystyle {\frac {\delta F}{\delta \rho }}_{|\rho _{\text{eq}}}=0\quad \Rightarrow \quad \rho _{\text{eq}}={\frac {A}{e}}e^{-\beta {\mathcal {H}}}}$
with ${\displaystyle A}$ a generic constant, and since ${\displaystyle \rho _{\text{eq}}}$ must be subjected to the constraint ${\displaystyle \operatorname {Tr} \rho _{\text{eq}}=1}$ we find ${\displaystyle A=e/Z}$, so that:
${\displaystyle \rho _{\text{eq}}={\frac {1}{Z}}e^{-\beta {\mathcal {H}}}}$
This is extremely reasonable: since we have only required the minimization of the free energy, the probability density we obtain is the one we would expect from ensemble theory. However, until now the computation is still exact: if we want to determine ${\displaystyle \rho _{\text{eq}}}$ we must compute ${\displaystyle Z}$, which in general is not feasible. Within this variational approach the mean field approximation comes into play by choosing the following form of the trial probability density:
${\displaystyle \rho _{\text{m.f.}}=\prod _{\alpha }\rho _{\alpha }}$
where again "m.f." stands for "mean field", ${\displaystyle \alpha }$ labels the degrees of freedom of the system and ${\displaystyle \rho _{\alpha }}$ is the probability distribution of the sole${\displaystyle \alpha }$-th degree of freedom. In other words we are approximating the probability distribution so that the degrees of freedom are statistically independent[1], namely:
${\displaystyle \left\langle f_{1}(\varphi _{1})f_{2}(\varphi _{2})\right\rangle _{\rho _{\text{m.f.}}}=\left\langle f_{1}(\varphi _{1})\right\rangle _{\rho _{\text{m.f.}}}\left\langle f_{2}(\varphi _{2})\right\rangle _{\rho _{\text{m.f.}}}\quad \forall f_{1},f_{2}}$
This way the free energy of the system has the form:
${\displaystyle F_{\rho _{\text{m.f.}}}=\left\langle {\mathcal {H}}\right\rangle _{\rho _{\text{m.f.}}}+k_{B}T\sum _{\alpha }\left\langle \ln \rho _{\alpha }\right\rangle _{\rho _{\text{m.f.}}}=\left\langle {\mathcal {H}}\right\rangle _{\rho _{\text{m.f.}}}+k_{B}T\sum _{\alpha }\operatorname {Tr} (\rho _{\alpha }\ln \rho _{\alpha })}$
and must be minimized with respect to ${\displaystyle \rho _{\alpha }}$. This can be done with two different approaches:

• The most used one consists in parametrizing ${\displaystyle \rho _{\alpha }}$ with an appropriately defined order parameter ${\displaystyle \left\langle \varphi _{\alpha }\right\rangle }$ that can describe an eventual phase transition; in this way ${\displaystyle F}$ becomes a (real) function of ${\displaystyle \left\langle \varphi _{\alpha }\right\rangle }$, and the minimization becomes simpler since it reduces to minimizing a simple function.

The parametrization must of course satisfy the constraints:

${\displaystyle \operatorname {Tr} \rho _{\alpha }=1\quad \qquad \operatorname {Tr} (\rho _{\alpha }\varphi _{\alpha })=\left\langle \varphi _{\alpha }\right\rangle }$
The advantage of such an approach is that the variational parameter ${\displaystyle \varphi _{\alpha }}$ coincides with the order parameter.

• Another possible approach consists in considering ${\displaystyle \rho _{\alpha }}$ itself as a variational parameter, and minimizing ${\displaystyle F}$ with respect to it. This is a more general approach, but this time it's harder to establish a connection between ${\displaystyle F}$ as a functional of ${\displaystyle \rho }$ and ${\displaystyle F}$ as a function of the order parameter that describes a phase transition.
We will analyse these two different approaches applying them to two different models.
1. This is physically equivalent to what we have done in the Weiss mean field theory for the Ising model (see Weiss mean field theory for the Ising model), where we neglected the correlations between spins.