In this blog, I will introduce the Generalised Maximum Entropy principle as presented in (Kesavan and Kapur 1989). To generalise MaxEnt, the paper explicitly expresses its three probabilistic entities, namely, the entropy measure, the set of moment constraints and the probability distribution, and then examines its consequences, e.g., the inverse MaxEnt. The paper links MaxEnt to Minimum Discrimination Information Principle (Kullback-Leibler divergence).
The general form of MaxEnt
The MaxEnt principle essentially proposes a procedure that inferring particular probability distribution by maximizing the entropy subject to constraints. Maximising the entropy determines the most unbiased probability distribution, while the imposed constraints ensure the inferred probability distribution is consistent with the data.
Following (Kesavan and Kapur 1989), we first give the formalism the MaxEnt principle, which in essence is a process of constrained maximisation using Lagrange multipliers:
Subject to the constraints:
where the first constraint (equation (2)) is the normalization constraint. The second one (equation (3)) ensures the inferred probability distribution is consistent with the data. In equation (3), is a function or property (e.g., energy) of a microscopic variable (e.g., a particle) such that is the theoretical expectation value of the property of the whole system (e.g., idealised gas); and can be a moment (e.g., mean) of the observed property of the whole system.
Equation (2) essentially represents the so-called testable information, i.e., “a statement about a probability distribution whose truth or falsity is well-defined”. It is worth mentioning that, equation (2) can be inequality constraints as well, i.e.,
The Lagrangian is:
We then maximise equation (4), yielding:
Let , we have
To determine the Lagrange multipliers , , we substitute equations (2) and (3) into (4) to get
Substitute equation (6) into (7), we obtain
We then solve equations (8) using any root solver to obtain the desired values of , .
For continuous random variate with a probability density function whose support is a set , the continuous MaxEnt can be written as:
MDI and Relative Entropy Maximisation
The Minimum Discrimination Informatoin (MDI) principle is based on Kullback-Leibler (KL) divergence, which is a measure to discriminate the probability distribution from :
which is always , and the global minimum value is zero when the two distributions are identical.
Based on KL divergence, Kullback proposed the Principle of Minimum Discrimination Information (MDI): given new facts, a new distribution should be chosen which is as close to the reference distribution as possible; so that the new data produces as small an information gain as possible. This can be formulated as the same Lagrange with the same constraints as MaxEnt:
KL divergence is also commonly referred to as relative entropy, which is defined as
The relative entropy maximisation principle essentially introduces a priori probability distribution . When we do not have any knowledge of the a priori probability, we can assume is uniform, that is, , .
The generalised MaxEnt
The generalised version essentially aims to determine any single entity when the rest of the three are specified. The GMEP first generalised MEP by relaxing the restriction of the entropy measure to Shannon entropy as in Jayne’s MEP, and to the Kullback-Leibler measure as in Kullback’s MDIP. Following the paper, we introduce two versions based on MEP and MDIP, respectively.
GMEP (The MEP version)
We introduce a convex function such that the generalised entropy measure is defined as:
and the constraints are defined as
Similarly, we use Lagrange method to maximise (xx) subject to the constraints in equations (xx-xx), and let which yield the expression:
The Direct Problem
If we know the entropy measure and the constraint mean values of , we can determine the probability distribution that maximise the entropy measure. This is done by substitute (21) into (20) and solve the Lagrange multipliers that in turn yield the probability as in equation (5).
The First Inverse Problem: Determination of Constraints
The first inverse problem is essentially to find the most unbiased constraints. Given the entropy measure and the probability distribution , we can determine one or more constraints that yield the given probability distribution, assuming that the entropy is maximised subject to these constraints.
Since we know , we can derive , so that we can determine the right hand side of equation (21). We can then identify the values for by matching terms, which yields one of the most unbiased sets of constraints. We shall use an example to illustrate how to do this.
The Second Invest Problem: Determination of the Entropy Measure)
Given the constraints, , and the probability distribution , determine the most unbiased entropy measure that when maximized subject to the given constraints, yields the given probability distribution.
We can obtain a differential equation by substituting the given values of and into (21). After solving the differential equation, we obtain and then we can determine the entropy measure .
Example: Generalised Jaynes’ dice
Suppose we have a dice with sides and the -th sides have a unique number , of which the probability distribution is . We shall denote an entropy measure as .
Solution of the direct problem
For the direct problem, we assume we know the entropy measure and the constraints. The problem is to obtain the probability distribution. Formally, we define the entropy measure as the Shannon’s entropy:
We also define the normalisation constraint and the constraint that ensure the inferred probability distribution is consistent with the observed mean value from dice rollings :
Using Lagrange method, we can obtain the solution of the probability distribution (See my previous blog for detailed derivation but with some notation differences):
while can be obtained by using any root solver to solve the following equation:
The First Inverse Problem: Determination of Constraints
For this problem, we have defined the entropy measure as the Shannon’s entropy:
and also know the probability distribution as
where is a constant. Noting the normalisation constraint is obvious, now the problem is to determine a most unbiased set of constraints in the form of
From equation () we have
Substitute (xx) into (xxx), we have
Equating terms, we obtain
Therefore, we can derive the constraints:
The Second Inverse Problem: Determination of the Entropy Measure
In this problem, we know the probability distribution as defined in equation (xxx), the constraints as defined in (xxxx), we need to determine the most unbiased entropy measure. From equation (21)