# Ergodicity

Let us note that Liouville's theorem does not prevent the microstate of a system to be confined in a particular region of the constant-energy phase space hypersurface, or to move "mostly" in that region[1]. In this case we surely have:

${\displaystyle {\overline {O}}(\mathbb {Q} ,\mathbb {P} )=\lim _{T\to \infty }{\frac {1}{T}}\int _{0}^{T}O(\mathbb {Q} (t),\mathbb {P} (t))dt\neq \left\langle O\right\rangle }$
Such phenomena do occur in physical systems, in particular those with few degrees of freedom. Planetary systems are an example since the orbits of the planets are approximately stable and thus the representative point of the system always remains in the same region of the phase space; if it "spanned" all the accessible phase space hypersurface of constant energy then the orbits would mess up completely and the system wouldn't be stable. There are also however many-particle systems whose representative point remains almost always in the same region of phase space. An example is the Fermi-Pasta-Ulam system, a chain of ${\displaystyle N}$ anharmonic oscillators (namely every particle of the system is subject to the potential[2]${\textstyle U(x)=kx^{2}+ux^{4}}$). In this case it turns out that if we give some energy to a single particle, this energy is not distributed along all the system as one would expect if the representative point of the system spanned the entire accessible region of phase space. However this kind of system is a bit "pathological" since it has been shown that in the continuum limit the Fermi-Pasta-Ulam system has an infinite number of conserved quantities, not just one (as we require).

We are interested, however, in systems where such phenomena do not occur. In particular in order to establish a link with the microcanonical ensemble we would like to discuss systems where the representative point "spans" the whole accessible phase space, namely it spends on average the same time covering different regions of the constant energy hypersurface. This property of dynamical systems is known with the name of ergodicity. Its most intuitive definition (although not the most useful, as we will shortly see) is the following:

Theorem (Ergodicity, I)

A dynamical system is said to be ergodic if on the hypersurface of constant energy the time evolution of almost every point eventually passes arbitrarily close to any other point.

The expression "almost every point" of the hypersurface means that we are considering it up to a set of null measure. This is needed to avoid problems with strange or unusual configurations: in this way for example we are excluding the possibility that all the particles move precisely at the same velocity in neat rows. As we have said this definition is not very useful, nor it is really clear since it does not follow explicitly from it that the representative point covers different regions of phase space on average in the same time.

In order to give a much more useful definition of ergodicity we first must introduce the concept of ergodic component of a set (which we think of as the hypersurface of constant energy):

Theorem (Ergodic component)

Let ${\displaystyle S}$ be a subset of the phase space. Then a set ${\displaystyle \Sigma \subset S}$ is called ergodic component of ${\displaystyle S}$ if it is invariant under time evolution, namely:

${\displaystyle (\mathbb {Q} (0),\mathbb {P} (0))\in \Sigma \quad \Longrightarrow \quad (\mathbb {Q} (t),\mathbb {P} (t))\in \Sigma \quad \forall t}$

Intuitively, different ergodic components of a same set are subsets that are not "mixed" together by time evolution. Now, we can give another definition of ergodicity:

Theorem (Ergodicity, II)

A dynamical system is said to be ergodic if the measure of every ergodic component ${\displaystyle \Sigma }$ of the hypersurface of constant energy ${\displaystyle S}$ is either zero or equal to the measure of ${\displaystyle S}$. In other words, if ${\displaystyle \mu }$ is the measure defined on phase space then the system is ergodic if for any ergodic component ${\displaystyle \Sigma }$ of ${\displaystyle S}$ we have ${\displaystyle \mu (\Sigma )=0}$ or ${\displaystyle \mu (\Sigma )=\mu (S)}$.

This means that a dynamical system is ergodic if the hypersurface of constant energy ${\displaystyle S}$ is invariant under time evolution up to a set of null measure.

Even if it is not immediately clear these two definitions are equivalent, but we won't show that. What we want to do now is to show that if a system is ergodic according to this last definition then the time average and the microcanonical ensemble average of an observable coincide.

Theorem

If a system is ergodic according to the second definition and ${\displaystyle O(\mathbb {Q} ,\mathbb {P} )}$ is an observable, then time average and ensemble average coincide:

${\displaystyle {\overline {O}}(\mathbb {Q} ,\mathbb {P} )=\left\langle O\right\rangle }$

Proof

First of all, let us note that:

${\displaystyle {\overline {O}}(\mathbb {Q} (0),\mathbb {P} (0))={\overline {O}}(\mathbb {Q} (t_{0}),\mathbb {P} (t_{0}))}$
namely the time average does not depend on the initial instant we choose. In fact we have:
{\displaystyle {\begin{aligned}{\overline {O}}(\mathbb {Q} (0),\mathbb {P} (0))=\lim _{T\to \infty }{\frac {1}{T}}\int _{0}^{T}O(\mathbb {Q} (t),\mathbb {P} (t))dt=\\=\lim _{T\to \infty }\left[{\frac {1}{T}}\int _{0}^{t_{0}}O(\mathbb {Q} (t),\mathbb {P} (t))dt+{\frac {1}{T}}\int _{t_{0}}^{T}O(\mathbb {Q} (t),\mathbb {P} (t))dt\right]\end{aligned}}}
The first integral is a constant (it does not depend on ${\displaystyle T}$), so in the limit ${\displaystyle T\to \infty }$ the first term vanishes. Therefore, multiplying and dividing by ${\displaystyle T-t_{0}}$:
${\displaystyle {\overline {O}}(\mathbb {Q} (0),\mathbb {P} (0))=\lim _{T\to \infty }{\frac {T-t_{0}}{T}}{\frac {1}{T-t_{0}}}\int _{t_{0}}^{T}O(\mathbb {Q} (t),\mathbb {P} (t))dt={\overline {O}}(\mathbb {Q} (t_{0}),\mathbb {P} (t_{0}))}$
(since ${\textstyle \lim _{T\to \infty }(T-t_{0})/T=1}$).

We now define:

${\displaystyle R_{a}=\left\lbrace (\mathbb {Q} ,\mathbb {P} ):\quad {\overline {O}}(\mathbb {Q} ,\mathbb {P} )\leq a\right\rbrace \quad a\in \mathbb {R} }$
For what we have noted above this is an ergodic component of the hypersurface ${\displaystyle S}$ since the time evolution maps ${\displaystyle R_{a}}$ into itself. Therefore, either ${\displaystyle \mu (R_{a})=0}$ or ${\displaystyle \mu (R_{a})=\mu (S)}$; note also that if ${\displaystyle a then ${\displaystyle R_{a}\subset R_{a'}}$ and ${\displaystyle \mu (R_{a})=\mu (S)\Rightarrow \mu (R_{a'})=\mu (S)}$. We now call ${\displaystyle a^{*}}$ the smallest value of ${\displaystyle a}$ such that ${\displaystyle \mu (R_{a})=\mu (S)}$:
${\displaystyle a^{*}=\inf \left\lbrace a\in \mathbb {R} :\quad \mu (R_{a})=\mu (S)\right\rbrace }$
and we want to show that ${\displaystyle \mu (R_{a^{*}})=\mu (S)}$ and that ${\displaystyle a^{*}={\overline {O}}(\mathbb {Q} ,\mathbb {P} )}$ for almost all ${\displaystyle (\mathbb {Q} ,\mathbb {P} )}$.

Let us therefore consider a sequence ${\displaystyle a_{n}}$ (with of course ${\displaystyle n\in \mathbb {N} }$) monotonically increasing and such that ${\displaystyle \lim _{n\to \infty }a_{n}=\infty }$, and call ${\displaystyle R_{a_{n}}=R_{n}}$. Then:

${\displaystyle R_{n}\subset R_{n+1}\quad \qquad \bigcup _{n}R_{n}=S}$
From theorems of measure theory this means that ${\displaystyle \mu (R_{n})\to \mu (S)^{-}}$. Since ${\displaystyle R_{n}}$ are ergodic components of ${\displaystyle S}$, this means that:
${\displaystyle \exists n_{0}\in \mathbb {N} :\quad \mu (R_{n})=\mu (S)\quad \forall n\geq n_{0}}$
and so that there are some finite ${\displaystyle a\in \mathbb {R} }$ such that ${\displaystyle \mu (R_{a})=\mu (S)}$; we thus deduce that ${\displaystyle \exists a^{*}<\infty }$. Let us now consider another sequence of ergodic components:
${\displaystyle R'_{n}=R_{a^{*}+1/n}}$
Then:
${\displaystyle R'_{n}\supset R'_{n+1}\quad \qquad \mu (R'_{n})=\mu (S)}$
and the last equality is true because for every ${\displaystyle n}$ there exists ${\displaystyle a^{*} such that ${\displaystyle \mu (R_{a'_{n}})=\mu (S)}$. Again, from theorems of measure theory this means that:
${\displaystyle \mu \left(\bigcap _{n}R'_{n}\right)=\lim _{n\to \infty }\mu (R'_{n})}$
but since ${\displaystyle \mu (R'_{n})=\mu (S)}$:
${\displaystyle \mu \left(\bigcap _{n}R'_{n}\right)=\lim _{n\to \infty }\mu (S)=\mu (S)}$
Therefore, in order to show that ${\displaystyle \mu (R_{a^{*}})=\mu (S)}$ we must show that ${\displaystyle \bigcap _{n}R'_{n}=\bigcap _{a>a^{*}}R_{a}=R_{a^{*}}}$. We surely have ${\displaystyle R_{a^{*}}\subset R_{a}}$ for ${\displaystyle a>a^{*}}$, therefore:
${\displaystyle R_{a^{*}}\subset \bigcap _{a>a^{*}}R_{a}}$
If we now consider ${\displaystyle (\mathbb {Q} ,\mathbb {P} )\in \bigcap _{a>a^{*}}R_{a}}$, then:
${\displaystyle {\overline {O}}(\mathbb {Q} ,\mathbb {P} )\leq a\quad \forall a>a^{*}\quad \Rightarrow \quad {\overline {O}}(\mathbb {Q} ,\mathbb {P} )\leq a^{*}\quad \Rightarrow \quad (\mathbb {Q} ,\mathbb {P} )\in R_{a^{*}}}$
This means that:
${\displaystyle R_{a^{*}}\supset \bigcap _{a>a^{*}}R_{a}}$
and so:
${\displaystyle R_{a^{*}}=\bigcap _{a>a^{*}}R_{a}}$

We thus have found that:

${\displaystyle \mu (R_{a})={\begin{cases}0&a
Therefore, if ${\displaystyle a>a^{*}}$ than ${\displaystyle R_{a^{*}}\subset R_{a}}$ and ${\displaystyle \mu (R_{a}\setminus R_{a^{*}})=\mu (R_{a})-\mu (R_{a^{*}})=0}$. This means that ${\displaystyle {\overline {O}}(\mathbb {Q} ,\mathbb {P} )=a^{*}}$ everywhere but on the points ${\displaystyle (\mathbb {Q} ,\mathbb {P} )}$ of the sets:
${\displaystyle \left\lbrace (\mathbb {Q} ,\mathbb {P} ):\quad {\overline {O}}(\mathbb {Q} ,\mathbb {P} )a^{*}\right\rbrace =(S\setminus R_{a^{*}})\setminus R'}$
However, ${\displaystyle \mu (R')=0}$ (this can be shown similarly as what we have done, using the sequence of ergodic components ${\displaystyle R''_{n}=R_{a^{*}-1/n}}$), and also the second set has null measure since ${\displaystyle \mu (S\setminus R_{a^{*}})=0}$.

Therefore, we now have to show that from the fact that ${\displaystyle {\overline {O}}(\mathbb {Q} ,\mathbb {P} )=a^{*}}$ almost everywhere it follows that ${\displaystyle \left\langle O\right\rangle =a^{*}}$. We have[3]:

${\displaystyle \left\langle O\right\rangle ={\frac {1}{\Delta \cdot \Omega (E)}}\int _{E\leq {\mathcal {H}}\leq E+\Delta }O(\mathbb {Q} ,\mathbb {P} )d\Gamma }$
If we call ${\displaystyle (\mathbb {Q} (t),\mathbb {P} (t))}$ the time evolution of ${\displaystyle (\mathbb {Q} ,\mathbb {P} )}$ so that ${\displaystyle (\mathbb {Q} ,\mathbb {P} )=(\mathbb {Q} (t=0),\mathbb {P} (t=0))}$ and ${\displaystyle \Gamma _{0}=\left\lbrace (\mathbb {Q} ,\mathbb {P} ):{\mathcal {H}}(\mathbb {Q} ,\mathbb {P} )\in [E,E+\Delta ]\right\rbrace }$, then by definition we have:
${\displaystyle \left\langle O(t)\right\rangle ={\frac {1}{\Delta \cdot \Omega (E)}}\int _{\Gamma _{0}}O(\mathbb {Q} (t),\mathbb {P} (t))d\Gamma }$
Note that the integral is made over ${\displaystyle \Gamma _{0}}$ and so we are integrating over the initial conditions (in fact, in general ${\displaystyle \mathbb {Q} (t)}$ and ${\displaystyle \mathbb {P} (t)}$ will be functions of time and of the initial conditions, namely ${\displaystyle \mathbb {Q} (t)={\hat {\mathbb {Q} }}(t;\mathbb {Q} ,\mathbb {P} )}$ and ${\displaystyle \mathbb {P} (t)={\hat {\mathbb {P} }}(t;\mathbb {Q} ,\mathbb {P} )}$). Therefore, changing variables to ${\displaystyle (\mathbb {Q} (t),\mathbb {P} (t))}$:
${\displaystyle \left\langle O(t)\right\rangle ={\frac {1}{\Delta \cdot \Omega (E)}}\int _{\Gamma _{t}}\left({\text{Jac}}(t)\right)^{-1}O(\mathbb {Q} (t),\mathbb {P} (t))d\Gamma _{t}}$
where we have written the Jacobian of the change of coordinates in this way for convenience. However, since ${\displaystyle {\mathcal {H}}(\mathbb {Q} (t),\mathbb {P} (t))={\mathcal {H}}(\mathbb {Q} ,\mathbb {P} )}$ we have that ${\displaystyle \Gamma _{t}=\Gamma _{0}}$ and so:
${\displaystyle \left\langle O(t)\right\rangle ={\frac {1}{\Delta \cdot \Omega (E)}}\int _{\Gamma _{0}}\left({\text{Jac}}(t)\right)^{-1}O(\mathbb {Q} ,\mathbb {P} )d\Gamma }$
where we have renamed the integration variables to ${\displaystyle (\mathbb {Q} ,\mathbb {P} )}$. Now:
${\displaystyle \left({\text{Jac}}(t)\right)^{-1}=\det {\frac {\partial (\mathbb {Q} ,\mathbb {P} )}{\partial (\mathbb {Q} (t),\mathbb {P} (t))}}\quad \Rightarrow \quad {\text{Jac}}(t)=\det {\frac {\partial (\mathbb {Q} (t),\mathbb {P} (t))}{\partial (\mathbb {Q} ,\mathbb {P} )}}}$
namely:
${\displaystyle {\text{Jac}}(t)={\begin{vmatrix}{\frac {\partial q_{1}(t)}{\partial q_{1}}}&\cdots &{\frac {\partial p_{3N}(t)}{\partial q_{1}}}\\\vdots &\ddots &\vdots \\{\frac {\partial q_{1}(t)}{\partial p_{3N}}}&\cdots &{\frac {\partial p_{3N}(t)}{\partial p_{3N}}}\end{vmatrix}}}$
For dynamical systems, ${\displaystyle {\text{Jac}}(t)=1}$ for all ${\displaystyle t}$ as a consequence of Hamilton's equations. In fact calling for brevity ${\displaystyle (\mathbb {Q} (t),\mathbb {P} (t)):=x}$ and ${\displaystyle (\mathbb {Q} ,\mathbb {P} ):=y}$, then:
${\displaystyle {\frac {d}{dt}}{\text{Jac}}(t)=\sum _{i=1}^{6N}J_{i}\quad \qquad J_{i}=\det {\frac {\partial (x_{1},\dots ,{\dot {x}}_{i},\dots ,x_{6N})}{\partial (y_{1},\dots ,y_{6N})}}}$
and we have:
${\displaystyle {\dot {x}}_{i}={\begin{cases}{\frac {\partial {\mathcal {H}}}{\partial p_{i}}}&{\text{ if }}x_{i}=q_{i}\\-{\frac {\partial {\mathcal {H}}}{\partial q_{i}}}&{\text{ if }}x_{i}=p_{i}\end{cases}}}$
Furthermore, in general:
${\displaystyle {\frac {\partial {\dot {x}}_{i}}{\partial y_{k}}}=\sum _{j=1}^{6N}{\frac {\partial {\dot {x}}_{i}}{\partial x_{j}}}{\frac {\partial x_{j}}{\partial y_{k}}}}$
and thus:
${\displaystyle {\frac {d}{dt}}{\text{Jac}}(t)=\sum _{i=1}^{6N}\sum _{j=1}^{6N}{\frac {\partial {\dot {x}}_{i}}{\partial x_{j}}}\det {\frac {\partial (x_{1},\dots ,x_{i-1},x_{j},x_{i+1},\dots ,x_{6N})}{\partial (y_{1},\dots ,y_{6N})}}}$
This determinant is null if ${\displaystyle i\neq j}$ (because the determinant of a matrix with two equal columns vanishes), otherwise it will be equal to a constant ${\displaystyle J}$. Therefore:
${\displaystyle \det {\frac {\partial (x_{1},\dots ,x_{i-1},x_{j},x_{i+1},\dots ,x_{6N})}{\partial (y_{1},\dots ,y_{6N})}}=J\delta _{ij}}$
and so:
${\displaystyle {\frac {d}{dt}}{\text{Jac}}(t)=J\sum _{i=1}^{6N}{\frac {\partial {\dot {x}}_{i}}{\partial x_{i}}}=J\sum _{i=1}^{3N}\left({\frac {\partial {\dot {q}}_{i}}{\partial q_{i}}}+{\frac {\partial {\dot {p}}_{i}}{\partial p_{i}}}\right)J\sum _{i=1}^{3N}\left({\frac {\partial ^{2}{\mathcal {H}}}{\partial q_{i}\partial p_{i}}}-{\frac {\partial ^{2}{\mathcal {H}}}{\partial p_{i}\partial q_{i}}}\right)=0}$
Therefore ${\displaystyle {\text{Jac}}(t)={\text{const.}}={\text{Jac}}(0)=1}$, since for ${\displaystyle t=0}$ the change of coordinates is the identity.

This means that:

${\displaystyle \left\langle O(t)\right\rangle ={\frac {1}{\Delta \cdot \Omega (E)}}\int _{\Gamma _{0}}O(\mathbb {Q} ,\mathbb {P} )d\Gamma }$
namely ${\displaystyle \left\langle O(t)\right\rangle }$ doesn't depend on time. Therefore:
{\displaystyle {\begin{aligned}\left\langle O\right\rangle =\left\langle O(t)\right\rangle {O(t)}=\lim _{T\to \infty }{\frac {1}{T}}\int _{0}^{T}\left\langle O(t)\right\rangle dt=\left\langle \lim _{T\to \infty }{\frac {1}{T}}\int _{0}^{T}O(\mathbb {Q} (t),\mathbb {P} (t))dt\right\rangle =\\=\left\langle {\overline {O}}(\mathbb {Q} ,\mathbb {P} )\right\rangle =\left\langle a^{*}\right\rangle =a^{*}\end{aligned}}}
This ultimately implies that ${\displaystyle \left\langle O\right\rangle ={\overline {O}}}$, which is what we wanted to show.

So, we now know that if a system is ergodic then the microcanonical ensemble is well defined. But how can we know if a system is ergodic or not? Unfortunately we still don't know: this is to date an open problem. We can however cite two other important systems which can be not ergodic: magnets and glasses. A magnet (as also shown throughout Statistical mechanics of phase transitions) can be considered as composed of small orientable magnetic dipoles (the spins of the atoms); at high temperatures the system is "disordered" and the dipoles are not aligned, but when the temperature becomes smaller than the so called "critical" one ${\displaystyle T_{c}}$ these dipoles align along any of the possible directions in space. The system thus spontaneously breaks its internal symmetry; such phenomena lead to ergodicity breaking: in fact when ${\displaystyle T it can be shown that the time it takes the system to spontaneously rearrange its magnetization along another direction grows with the dimension of the system. This means that in the thermodynamic limit the system will always remain in the same configurations, and so its representative point will not visit all the available regions of phase space (note that the configuration of the system is now given by the spin configuration, not the positions of the particles). The same argument applies to other kinds of phase transitions that break a symmetry of a given system, for example the solidification of a fluid. Glasses are much more complicated systems, and many of their properties are still unknown. Their main characteristic is that they are nor crystalline solids nor fluids, so strictly speaking they are not at equilibrium: they tend to approach a crystalline configuration, but the process takes insanely huge amounts of time (glass dynamics is often referred to as "sluggish dynamics").

1. This means that the time evolution of the representative point is such that it is much more probable to find it in determinate regions of phase space than others.
2. A simple harmonic potential would be too simple: with a proper change of coordinates, in fact, the system can be described as a set of ${\displaystyle N}$ independent particles.
3. We consider a thin energy shell instead of an hypersurface because it makes things simpler.