In this blog, I will introduce the Generalised Maximum Entropy principle as presented in (Kesavan and Kapur 1989). To generalise MaxEnt, the paper explicitly expresses its three probabilistic entities, namely, the entropy measure, the set of moment constraints and the probability distribution, and then examines its consequences, e.g., the inverse MaxEnt. The paper links MaxEnt to Minimum Discrimination Information Principle (Kullback-Leibler divergence).

# The general form of MaxEnt

The MaxEnt principle essentially proposes a procedure that inferring particular probability distribution by maximizing the entropy subject to constraints. Maximising the entropy determines the most unbiased probability distribution, while the imposed constraints ensure the inferred probability distribution is consistent with the data.

Following (Kesavan and Kapur 1989), we first give the formalism the MaxEnt principle, which in essence is a process of constrained maximisation using Lagrange multipliers:

Subject to the constraints:

where the first constraint (equation (2)) is the normalization constraint. The second one (equation (3)) ensures the inferred probability distribution is consistent with the data. In equation (3), is a function or property (e.g., energy) of a microscopic variable (e.g., a particle) such that is the theoretical expectation value of the property of the whole system (e.g., idealised gas); and can be a moment (e.g., mean) of the observed property of the whole system.

Equation (2) essentially represents the so-called testable information, i.e., “*a statement about a probability distribution whose truth or falsity is well-defined*”. It is worth mentioning that, equation (2) can be inequality constraints as well, i.e.,

The Lagrangian is:

We then maximise equation (4), yielding:

Let , we have

To determine the Lagrange multipliers , , we substitute equations (2) and (3) into (4) to get

and

Substitute equation (6) into (7), we obtain

We then solve equations (8) using any root solver to obtain the desired values of , .

For continuous random variate with a probability density function whose support is a set , the continuous MaxEnt can be written as:

subject to

# MDI and Relative Entropy Maximisation

The Minimum Discrimination Informatoin (MDI) principle is based on Kullback-Leibler (KL) divergence, which is a measure to discriminate the probability distribution from :

which is always , and the global minimum value is zero when the two distributions are identical.

Based on KL divergence, Kullback proposed the Principle of Minimum Discrimination Information (MDI): given new facts, a new distribution should be chosen which is as close to the reference distribution as possible; so that the new data produces as small an information gain as possible. This can be formulated as the same Lagrange with the same constraints as MaxEnt:

KL divergence is also commonly referred to as **relative entropy**, which is defined as

so that

and

The relative entropy maximisation principle essentially introduces a priori probability distribution . When we do not have any knowledge of the a priori probability, we can assume is uniform, that is, , .

# The generalised MaxEnt

The generalised version essentially aims to determine any single entity when the rest of the three are specified. The GMEP first generalised MEP by relaxing the restriction of the entropy measure to Shannon entropy as in Jayne’s MEP, and to the Kullback-Leibler measure as in Kullback’s MDIP. Following the paper, we introduce two versions based on MEP and MDIP, respectively.

### GMEP (The MEP version)

We introduce a convex function such that the generalised entropy measure is defined as:

and the constraints are defined as

Similarly, we use Lagrange method to maximise (xx) subject to the constraints in equations (xx-xx), and let which yield the expression:

#### The Direct Problem

If we know the entropy measure and the constraint mean values of , we can determine the probability distribution that maximise the entropy measure. This is done by substitute (21) into (20) and solve the Lagrange multipliers that in turn yield the probability as in equation (5).

#### The First Inverse Problem: Determination of Constraints

The first inverse problem is essentially to find the most unbiased constraints. Given the entropy measure and the probability distribution , we can determine one or more constraints that yield the given probability distribution, assuming that the entropy is maximised subject to these constraints.

Since we know , we can derive , so that we can determine the right hand side of equation (21). We can then identify the values for by matching terms, which yields one of the most unbiased sets of constraints. We shall use an example to illustrate how to do this.

#### The Second Invest Problem: Determination of the Entropy Measure)

Given the constraints, , and the probability distribution , determine the most unbiased entropy measure that when maximized subject to the given constraints, yields the given probability distribution.

We can obtain a differential equation by substituting the given values of and into (21). After solving the differential equation, we obtain and then we can determine the entropy measure .

## Example: Generalised Jaynes’ dice

Suppose we have a dice with sides and the -th sides have a unique number , of which the probability distribution is . We shall denote an entropy measure as .

### Solution of the direct problem

For the direct problem, we assume we know the entropy measure and the constraints. The problem is to obtain the probability distribution. Formally, we define the entropy measure as the Shannon’s entropy:

We also define the normalisation constraint and the constraint that ensure the inferred probability distribution is consistent with the observed mean value from dice rollings :

Using Lagrange method, we can obtain the solution of the probability distribution (See my previous blog for detailed derivation but with some notation differences):

while can be obtained by using any root solver to solve the following equation:

### The First Inverse Problem: Determination of Constraints

For this problem, we have defined the entropy measure as the Shannon’s entropy:

and also know the probability distribution as

where is a constant. Noting the normalisation constraint is obvious, now the problem is to determine a most unbiased set of constraints in the form of

From equation () we have

Substitute (xx) into (xxx), we have

Equating terms, we obtain

Therefore, we can derive the constraints:

### The Second Inverse Problem: Determination of the Entropy Measure

In this problem, we know the probability distribution as defined in equation (xxx), the constraints as defined in (xxxx), we need to determine the most unbiased entropy measure. From equation (21)

# Reference

End