# The microcanonical ensemble

Now that we have laid out the general framework we needed we can proceed to study the properties of the microcanonical ensemble; we therefore have a system with fixed values of energy ${\displaystyle E}$, volume ${\displaystyle V}$ and number of particles ${\displaystyle N}$. We have just seen that the ensemble of such a system is constituted by a multitude of equivalent microstates; since we have no additional information we can assume that all these microscopic configurations are equally probable. In other words, we introduce statistics in our treatment by formulating the so called a priori equal probability postulate:

If a system is in a given macroscopic configuration it can be found with equal probability in any of the microstates of its ensemble.

Mathematically this means introducing a constant probability density ${\displaystyle \rho (\mathbb {Q} ,\mathbb {P} )}$ in the ensemble of the system, namely:

${\displaystyle \rho (\mathbb {Q} ,\mathbb {P} )={\frac {\delta ({\mathcal {H}}-E)}{\Omega (E,V,N)}}}$
where ${\displaystyle \Omega (E,V,N)}$ is the volume occupied by the ensemble in phase space (i.e. the volume of the set of points corresponding to all the microstates of a given ensemble, which is a hypersurface of constant energy ${\displaystyle E}$):
${\displaystyle \Omega (E,V,N)=\int _{{\mathcal {H}}=E}d\Gamma =\int d\Gamma \delta ({\mathcal {H}}-E)}$
where ${\displaystyle d\Gamma }$ is a short notation for ${\textstyle d\mathbb {Q} d\mathbb {P} =\prod _{i=1}^{N}d{\vec {q}}_{i}d{\vec {p}}_{i}}$. If we now divide both sides by ${\displaystyle \Omega (E,V,N)}$ we see that ${\displaystyle \rho (\mathbb {Q} ,\mathbb {P} )}$ as we have defined it has indeed the meaning of a probability density[1]:

${\displaystyle 1=\int d\Gamma {\frac {\delta ({\mathcal {H}}-E)}{\Omega (E,V,N)}}:=\int d\Gamma \rho (\mathbb {Q} ,\mathbb {P} )}$
In particular the Dirac delta is needed to make ${\displaystyle \rho }$ vanish everywhere but on the hypersurface of energy ${\displaystyle E}$ in phase space, and ${\displaystyle \Omega }$ in order to correctly normalize ${\displaystyle \rho }$.

Therefore since we have introduced a probability density in phase space, if in general we define an observable ${\displaystyle O(\mathbb {Q} ,\mathbb {P} )}$ as a function of all the positions and momenta of the particles we can define its mean value in the ensemble as:

${\displaystyle \left\langle O(\mathbb {Q} ,\mathbb {P} )\right\rangle =\int \rho (\mathbb {Q} ,\mathbb {P} )O(\mathbb {Q} ,\mathbb {P} )d\Gamma }$
This is the value of ${\displaystyle O}$ that we actually measure: since the microstate of the system is continuously moving through the ensemble and since the time that macroscopic measurements require is many orders of magnitude longer than the time intervals typical of microscopic dynamics, we will only be able to measure ensemble averages[2]. Now, the mean value of an observable is significant if its variance is small, otherwise the results of a measure of the same quantity in the same conditions would fluctuate over a wide range making the observable essentially meaningless; for example, since we know that a system in equilibrium has a constant value of energy we expect that, in order to be a good theory, statistical mechanics can show that the statistical fluctuations of energy (at least for macroscopic systems) are very small and thus negligible, but of course this consideration can be extended to any observable (like the number of particles of a system)[3]. We will show that (fortunately!) this is always the case.

We can however already get a taste of that in a very simple situation: consider a gas of ${\displaystyle 2N}$ particles in a cubic box of side ${\displaystyle L}$ and let us mentally divide this box into two halves, the right and the left one. We ask: what is the probability to find ${\displaystyle N-m}$ particles in the right half and ${\displaystyle N+m}$ in the left one? Since intuitively the probability to find a single particle in one half of the box is ${\displaystyle 1/2}$ we will have:

${\displaystyle P(m)={\frac {1}{2^{N}}}{\begin{pmatrix}2N\\N+m\end{pmatrix}}={\frac {1}{2^{N}}}{\frac {(2N)!}{(N+m)!\cdot (N-m)!}}}$
where we have introduced the binomial factor because all the configurations which differ for the exchange of particles are equivalent, since they are identical. The original question we asked can now be rephrased as: how does ${\displaystyle P(m)}$ change with ${\displaystyle m}$? If we call ${\displaystyle p}$ the probability to find a particle in the left half and ${\displaystyle q}$ the probability to find it in the right one (we will later set both equal to ${\displaystyle 1/2}$, but let's distinguish them for now), we have:
${\displaystyle P(m)=p^{N+m}q^{N-m}{\begin{pmatrix}2N\\N+m\end{pmatrix}}}$
and then:
${\displaystyle \left\langle N+m\right\rangle =\sum _{m}(N+m)P(m)=p{\frac {\partial }{\partial p}}\underbrace {\sum _{m}P(m)} _{(p+q)^{2N}}=p\cdot 2N(p+q)^{2N-1}}$
Setting ${\displaystyle p=q=1/2}$ we get:
${\displaystyle \left\langle N+m\right\rangle =N\quad \Rightarrow \quad \left\langle m\right\rangle =0}$
which is rather reasonable. Therefore, we expect that the configuration where ${\displaystyle N}$ particles are in the right half and ${\displaystyle N}$ in the left one is the most probable for our system. However, how much more probable is this configuration with respect to the other ones? In order to understand that let us also compute the standard deviation from the mean value ${\textstyle \left\langle N+m\right\rangle }$. We have:
${\displaystyle p^{2}{\frac {\partial ^{2}}{\partial p^{2}}}\sum _{m}P(m)=\sum _{m}(N+m)(N+m-1)P(m)=\left\langle (N+m)^{2}\right\rangle -\left\langle N+m\right\rangle \quad \Rightarrow }$
{\displaystyle {\begin{aligned}\Rightarrow \quad \left\langle (N+m)^{2}\right\rangle =p^{2}{\frac {\partial }{\partial p}}\left[2N(p+q)^{2N-1}\right]+p2N(p+q)^{2N-1}=\\=p^{2}2N(2N-1)(p+q)^{2N-2}+p2N(p+q)^{2N-1}\end{aligned}}}
and setting ${\displaystyle p=q=1/2}$:
${\displaystyle \left\langle (N+m)^{2}\right\rangle =N^{2}+{\frac {N}{2}}}$
Therefore:
${\displaystyle \sigma _{\left\langle N+m\right\rangle }^{2}=\left\langle (N+m)^{2}\right\rangle -\left\langle N+m\right\rangle ^{2}={\frac {N}{2}}\quad \Rightarrow \quad \sigma _{\left\langle N+m\right\rangle }={\sqrt {\frac {N}{2}}}}$
This means that the relative fluctuation is:
${\displaystyle {\frac {\sigma _{\left\langle N+m\right\rangle }}{\left\langle N+m\right\rangle }}={\frac {1}{\sqrt {2N}}}}$
which turns out to be astonishingly small: in fact if ${\displaystyle N\sim 10^{23}}$ this relative fluctuation is of the order of ${\displaystyle 10^{-11}}$. We can therefore conclude that the fluctuations of the number of particles in the two halves from their mean values are absolutely negligible (we never observe the gas spontaneously occupying only one half of the system!).

We can also obtain the same result in a slightly more complicated way, but which allows us to extract some more interesting information on the system. In order to do that let us consider the logarithm of ${\displaystyle P(m)}$:

${\displaystyle \ln P(m)=-2N\ln 2+\ln(2N)!-\ln(N-m)!-\ln(N+m)!}$
Using Stirling's approximation (see the appendix Stirling's approximation) for large ${\displaystyle N}$ we get:
{\displaystyle {\begin{aligned}\ln P(m)\sim -2N\ln 2+2N\ln(2N)+\ln {\sqrt {2\pi \cdot 2N}}-(N-m)\ln(N-m)-\\-\ln {\sqrt {2\pi (N-m)}}-(N+m)\ln(N+m)-\ln {\sqrt {2\pi (N+m)}}\end{aligned}}}
and with some algebraic reshuffling we obtain:
{\displaystyle {\begin{aligned}\ln P(m)\sim -{\frac {1}{2}}\ln(N\pi )-N\left[\left(1-{\frac {m}{N}}\right)\ln \left(1-{\frac {m}{N}}\right)+\left(1+{\frac {m}{N}}\right)\ln \left(1+{\frac {m}{N}}\right)\right]-\\-{\frac {1}{2}}\left[\ln \left(1-{\frac {m}{N}}\right)+\ln \left(1+{\frac {m}{N}}\right)\right]\end{aligned}}}
(note that ${\displaystyle \ln P(m)}$ is even in ${\displaystyle m}$, as we could have expected). If we now suppose that ${\displaystyle m\ll N}$, and the previous computation showed that this is indeed the case, then:
${\displaystyle \ln \left(1\pm {\frac {m}{N}}\right)\sim \pm {\frac {m}{N}}-{\frac {1}{2}}{\frac {m^{2}}{N^{2}}}+\cdots }$
and plugging this approximation to the second order in ${\displaystyle \ln P(m)}$ we get:
${\displaystyle \ln P(m)=-{\frac {1}{2}}\ln(N\pi )-{\frac {m^{2}}{N}}+O(m^{4})}$
and exponentiating:
${\displaystyle P(m)={\frac {1}{\sqrt {\pi N}}}e^{-{\frac {m^{2}}{N}}}}$

Therefore, we learn the interesting fact that for macroscopic systems if ${\displaystyle m\ll N}$ the probability to find ${\displaystyle m}$ particles in excess or lack in the two halves of the system is distributed along a Gaussian with ${\textstyle \sigma _{N}={\sqrt {N/2}}}$; we have therefore found the same result as before, since the relative fluctuation is again of the order ${\displaystyle N^{-1/2}}$.
1. What we are seeing now could also have been derived in a different but equivalent way. Let us suppose that the energy of the system instead of being exactly equal to ${\displaystyle E}$ can belong to the interval ${\displaystyle [E,E+\Delta E]}$ with ${\displaystyle \Delta E\ll E}$. This means that in phase space the system will occupy the region enclosed by the two hypersurfaces of energy ${\displaystyle E}$ and ${\displaystyle E+\Delta E}$; the volume of this region can be written as:
${\displaystyle \Omega (E,V,N)\Delta E=\int d\Gamma \left\lbrace \Theta \left[{\mathcal {H}}(\mathbb {Q} ,\mathbb {P} )-E\right]-\Theta \left[{\mathcal {H}}(\mathbb {Q} ,\mathbb {P} )-(E+\Delta E)\right]\right\rbrace }$
where ${\displaystyle \Theta }$ is the Heaviside step function. Within the theory of distributions it can be shown that, formally, the derivative of ${\displaystyle \Theta }$ is the Dirac ${\displaystyle \delta }$ function, i.e. ${\displaystyle d\Theta (x)/dx=\delta (x)}$, so that:
{\displaystyle {\begin{aligned}\Theta \left[{\mathcal {H}}-E\right]-\Theta \left[{\mathcal {H}}-(E+\Delta E)\right]=-{\frac {d}{dE}}\Theta ({\mathcal {H}}-E)\Delta E=-\Delta E\left[-\delta ({\mathcal {H}}-E)\right]=\\=\Delta E\cdot \delta ({\mathcal {H}}-E)\end{aligned}}}
and thus:
${\displaystyle \Omega (E,V,N)=\int d\Gamma \delta ({\mathcal {H}}-E)}$
On the other hand, we could have equivalently obtained the expression of ${\displaystyle \rho (\mathbb {Q} ,\mathbb {P} )}$ from the general definition of the mean value of an observable ${\displaystyle O}$ over this ensemble:
{\displaystyle {\begin{aligned}\left\langle O(\mathbb {Q} ,\mathbb {P} )\right\rangle ={\frac {1}{\Omega (E,V,N)}}\int d\Gamma \left\lbrace \Theta \left[{\mathcal {H}}(\mathbb {Q} ,\mathbb {P} )-E\right]-\right.\\\left.-\Theta \left[{\mathcal {H}}(\mathbb {Q} ,\mathbb {P} )-(E+\Delta E)\right]\right\rbrace O(\mathbb {Q} ,\mathbb {P} )\end{aligned}}}
and proceeding in the same way.
2. In other words, it is impossible to measure a macroscopic quantity relative to a single microstate of the system since in the time the measurement takes the system will have acquired many other different microscopic configurations and what we measure is the average (of course weighted with the probability density of the ensemble) over all the microscopic configurations acquired. This also relates to what we will see in The foundations of statistical mechanics and in the appendix A more convincing foundation of statistical mechanics.
3. Of course this is not possible in the microcanonical ensemble, since both ${\displaystyle E}$ and ${\displaystyle N}$ are fixed, and will become possible in the other ensembles.