# Entropy as ignorance: information entropy

We now proceed to analyse the most general interpretation that can be given of entropy: a "measure" of our "ignorance" about a system. In fact, all the previous interpretations of entropy can be connected to this one: a system in equilibrium maximizes its entropy because we have lost all the information about the initial conditions except for the conserved quantities; therefore maximizing the entropy means maximizing our ignorance about the details of the system. On the other hand the entropy of a mixture of different gases is a measure of the number of the possible configurations of the system, given our ignorance about it.

Therefore, entropy can be regarded as a property not of the system but of our ignorance about the system itself, represented by the ensemble of its possible configurations[1]. We have also always restricted ourselves to systems where our ignorance is maximal, namely where all the allowed configurations are equally probable; but what about systems where we have partial information and some configurations are more probable than others? As we will now see the definition of entropy can be generalized to general probability distributions and finds applications in many other fields of physics and science.

In the microcanonical ensemble we have seen that the number of allowed states for a system of energy ${\displaystyle E}$ is the volume ${\displaystyle \Omega (E)}$ in phase space of the hypersurface of constant energy ${\displaystyle E}$, and that the phase space probability density is ${\displaystyle \rho =1/\Omega (E)}$ on this surface. From this we have defined the entropy as ${\displaystyle S=k_{B}\ln \Omega (E)=-k_{B}\ln \rho }$; formally we can write[2]:

${\displaystyle {\frac {S}{k_{B}}}=-\sum _{i=1}^{\Omega (E)}\rho \ln \rho =-\left\langle \ln \rho \right\rangle }$
We could therefore argue that if a system is described by a general phase space probability density ${\displaystyle \rho (\mathbb {Q} ,\mathbb {P} )}$ (which can also describe the system out of equilibrium, so in general ${\displaystyle \rho }$ can explicitly depend on time) its entropy can be defined as:
${\displaystyle S=-k_{B}\left\langle \ln \rho \right\rangle =-k_{B}\int {\frac {d\Gamma }{h^{3N}N!}}\rho (\mathbb {Q} ,\mathbb {P} )\ln \rho (\mathbb {Q} ,\mathbb {P} )}$
(where we are implicitly assuming that ${\displaystyle \rho }$ has been adimensionalized so that ${\displaystyle \ln \rho }$ makes sense). We can immediately see that we recover the original definition of entropy in the microcanonical ensemble by substituting ${\displaystyle \rho ={\text{const.}}=1/\Omega (E)}$:

${\displaystyle S=k_{B}\int {\frac {d\Gamma }{h^{3N}N!}}{\frac {\ln \Omega (E)}{\Omega (E)}}=k_{B}{\frac {\ln \Omega (E)}{\Omega (E)}}\underbrace {\int {\frac {d\Gamma }{h^{3N}N!}}} _{\Omega (E)}=k_{B}\ln \Omega (E)}$
1. Note that entropy can be indeed regarded as a measure of ignorance or equivalently information, but it does not distinguish the utility of that information. In other words, having a lot of information from the point of view of entropy does not mean we have useful information about a system.
2. As we will see in A niftier framework for the statistical ensembles, this definition of entropy allows to derive in a systematic way the canonical and grand canonical ensembles from the microcanonical one.