The way we have obtained the canonical and grand canonical partition functions from the microcanonical ensemble in The canonical ensemble and The grand canonical ensemble is a rather "classic" one, and maybe also the most intuitive.
However this is not the only possible one: in fact, as we now want to show it is possible to obtain all the ensembles (including the microcanonical one) from the principle of maximum entropy, where the entropy is defined in the most general way, i.e. :
In other words what we want to see is that maximizing the entropy of a system as written now with appropriately chosen constraints it is possible to determine both the canonical and grand canonical ensembles.
Let us consider a very different but simple and intuitive example to understand how this can be possible.
Suppose we have a normal six-sided die; if we know nothing about it (namely we don't know if it has been fixed or not) then all the possible rolls have the same probability, i.e. for . This fact can also be obtained from the maximization of Shannon's entropy (we remove any proportionality constant for simplicity):
with the constraint
. In fact (as it must be done for constrained optimization problems like this one) the maximization of
where we have simply relabelled the constant in the last step (note that
doesn't depend on
); therefore from
we have exactly
Now, suppose that the die has been fixed and that ; in order to find the new probabilities we now have to maximize with the additional constraint . Therefore:
and requiring that
So we see that we indeed managed to reconstruct all the probability distribution of the system only from the maximization of its entropy, with the appropriate constraints.
Let us now see this more in general: suppose we have a system which can be found in different states (for simplicity we now consider the discrete case), each with probability . Let us also suppose that we have put some contraints on the mean values of some observables defined on this system, i.e.:
are some functions depending on
(considering the previous example of the die, with our notation we have
are some given
values of the observables. We have put also the normalization condition in the same form as the other constraints (with
) in order to have a more general notation.
As we know the entropy of the system will be given by (we again drop any constant in front of
Let us therefore see what happens if we maximize
with the constraints .
What we have to find is:
is a short notation to indicate the set of the probabilities
seen as functions of
From the normalization condition we have:
If we define:
which has a very familiar form (the one of the canonical and grand canonical probability densities).
Now, in order to solve the problem we still have to impose all the other constraints: . These can be written as:
From the definition of
we see that:
which has exactly the same form as the equation that defines the mean value of the energy in the canonical ensemble, for example.
Therefore the values
of the parameters
with the constraints are the solutions of the equations:
These equations are in general very difficult to solve analytically but there is a rather simple method, which we now briefly see, that allows one to determine the parameters
Let us begin noting that:
where the last term is the mean value of the product of two fluctuations: this is the covariance
. In general, the
-th element of the covariance matrix
is exactly defined as the covariance between
Therefore, we have that:
The covariance matrix is positive (semi)definite; in fact if
is a generic vector, then:
We needed these observations because if we now define the function:
we have that:
has an extremum in
and so this extremum is a minimum:
is minimized by the values
of the parameters
which maximize the entropy
of the system.
Therefore, in this way we can simply determine the values
by findind the minima of
, which is a rather straightforward computational problem.
Let us now briefly see what happens in the continuous case, so that we can use what we have seen in the framework of ensemble theory.
Since we are now dealing with continuous probability densities , they will not depend on the discrete index but on the "continuous index" , and of course the summations over must be replaced with integrations in phase space. In other words, the entropy of the system will be:
and the contraints are:
The probability density will be of the form:
and the values
of the parameters
will be again the solutions of the equations:
Let us now apply all this to the ensemble theory.
In the microcanonical ensemble we only have the normalization constraint:
where the integration is done over the
points that satisfy
the Hamiltonian of the system and
value of the energy. In this case, therefore, the only non-null "observable" is
, which as we have seen is a "fictious" one (defined so that also the normalization condition can be put in the form of a constraint on the mean value of a given observable). In other words, referring to our notation we have
and the probability density has indeed the form:
where we have called
the normalization factor.
The value of
can be obtained intuitively as we have done in The microcanonical ensemble
, i.e. since
must be zero everything but on the phase space hypersurface of constant energy
, whose volume is
In the canonical ensemble we have a new constraint, i.e. we require the mean value of the energy to be fixed:
With our previous notation we have
, so that:
In the grand canonical ensemble, then, we have the additional constraint of having the mean value of the number of particles fixed, namely . Explicitly we have that the entropy of the system is:
and the constraints are:
In this case, we will have:
We conclude with an observation.
We have determined the properties of the ensembles fixing the values of the first moments of the observables (i.e., and ); we can ask: why haven't we fixed also other moments (namely , etc.)?
In general it can happen that those additional moments are redundant; let us see a simple example in order to understand that.
Suppose is a stochastic variable distributed along a probability distribution ; imagine that we are given values of without knowing and that we want to understand what is from the -s. How can we proceed? We could try to guess with a procedure similar to what we have seen now. For example we could compute the -th moments of with , namely , and . Then, our guess for would be:
and the values
which give the correct expression of
are given by the solutions of:
is computed from the given set of
If we determine
with and increasing number
of data, we expect (or at least hope) that the parameters
will tend to some definite values; what happens is that they often tend to zero, when
For example, if in reality
then repeating the computations with higher values of
we would find
: the second and third moments of
are useless if we want to determine
Let us note, however, that the use of the moments of
can be useless in some cases: if in fact
then we will never be able to express it as a product of exponentials, so the parameters
will not tend to definite values. What can we do in this case? We can use the moments of
; in fact if we compute for example
, then our guess for
and so in this case we would expect
; we also see that if we included come higher moments of
, their relative parameters would have all gone to zero, and so all the
-th moments with
Therefore, we see that depending on the kind probability distribution we use different "recipes"
in order to determine
However, in ensemble theory something slightly different happens. Suppose in fact that we have fixed the first two moments of in the canonical ensemble; then we would have:
Now the energy is extensive, so
and thus the leading term in the exponential should be
; in general if we fixed an arbitrary number of moments of
, the leading one is the last. This, however, doesn't really make sense from a physical point of view since it would imply that the only significant contribution is that of the
-th moment, with
Therefore, in the case of statistical mechanics we are (although implicitly) assuming
that the moments different from the first are actually significant, since strictly speaking there is nothing that would prevent us from fixing also their values. It is the incredible accuracy of the predictions made by statistical mechanics with the experimental results that confirms that this is a reasonable assumption.
- ↑ At this point there is no way to understand that, and we are about to see something similar also in the grand canonical ensemble. This is the disadvantage of this way of deducing the statistical ensembles: it is elegant and mathematically consistent, but not really physically intuitive. The "classical" way we have used to derive the ensembles is surely less formal and rigorous, but it allows to understand physically what happens.
- ↑ It could be asked then what can we do if we don't know absolutely nothing about . In this case there is nothing that can help, besides experience; in such cases, in fact, one tries to get some insights on the problems and then try different "recipes", from which something new can be learned about .