# Inference of Gene Regulatory Networks using MaxEnt

In my first blog, I introduced the principle of maximum entropy (MaxEnt). Here I shall use an example of inferring Gene Regulatory Networks as proposed [1] to see how to apply MaxEnt.

Recalled that with MaxEnt, any problem of interest is a problem of inferring the least biased or the most uncertain probability distributions from limited data. To do so, MaxEnt essentially provides a constrained optimisation procedure: to maximise the entropy distribution, which quantifies the uncertainty of the probability distribution, while satisfying the constraints which enforce the probability distribution to be consistent with the data. In essence, unlike many methods that assume a model, MaxEnt infers the least biased probabilistic models directly from the data.

From the above perspective, the goal of GNR inference is to infer the least biased probabilistic model of gene interactions that could give rise to the macrostate of a genome. To depict the macrostate of the genome of interest, we assume we conduct microarray or RNA-seq experiments to measure the expression levels of its $N$ genes denoted as a state vector $\vec{x} = (x_1, \cdots, x_N)$. We assume we have repeated the experiment $T$ times which generated $T$ distinct state vector $\vec{x}^1, \dots, \vec{x}^T$. Let $\rho(\vec{x})$ denote the probability distribution function of $\vec{x}$, i.e., the probability that the genome is in the arbitrary state $\vec{x}$. Based on MaxEnt, the aim of the GRN inference is to find the most unbiased probability distribution $\rho(\vec{x})$ by maximising the Shannon entropy of the system, i.e., the amount of information in the whole genome, or the number of microscopic states, subject to some constraints:

\begin{aligned} - \sum_{i=1}^{N} \rho(x_i) \ln(\rho(x_i)). \end{aligned} \ \ \ \ (8)

The constraints represent the observed macroscopic state of the genome, which include:

1) Normalisation contratint, i.e., the probabilities of all observable states sum to 1

\begin{aligned} \sum_{\vec{x}} \rho(\vec{x}) = \sum_{k=1}^{T} \rho(\vec{x}^k) = 1 \end{aligned} \ \ \ \ (9)

2) The first moment, $\langle x_i \rangle$,

\begin{aligned} \langle x_i \rangle = \sum_{\vec{x}} \rho(\vec{x}) x_i = \frac{1}{T} \sum_{k=1}^{T} x_{i}^{k} \end{aligned} \ \ \ \ (10)

3) The second moment, $\langle x_i, x_j \rangle$, coincide with those derived from the expression data.

\begin{aligned} \langle x_i x_j \rangle = \sum_{\vec{x}} \rho(\vec{x}) x_i x_j = \frac{1}{T} \sum_{k=1}^{T} x_i^k x_j^k \end{aligned}. \ \ \ \ (11)

The above constraints 2 and 3 ensure that the probability distribution $\rho(\vec{x})$ preserves 1) the mean expression level of each gene (constraint 2); and 2) the correlations between genes (constraint 3), respectively.

Using Lagrange multipliers, we concatenate the objective function and constraints to form a constrained optimisation problem, which essentially link the microscopic and macroscopic scales. By maximising this problem, we essentially maximise the number of microscopic realisation of a system, with the constraints to ensure the inferred probability distribution is consistent with the data, i.e., the observations of the macroscopic states. The Lagrangian can be written as:

\begin{aligned} L = - \sum_{i=1}^{N} \rho(x_i) \ln(\rho(x_i)) - \lambda \left[ \sum_{\vec{x}} \rho(\vec{x}) - 1 \right] - \sum_{i=1}^{N} \mu_i \sum_{\vec{x}} \rho(\vec{x}) x_i - \sum_{i,j=1}^{N} \eta_{ij} \sum_{\vec{x}} \rho(\vec{x}) x_i x_j \end{aligned} . \ \ \ \ (12)

To maximise $L$, we set its first-order derivative against $\rho(\vec{x})$ to 0, i.e., $\partial L / \partial \rho(\vec{x}) = 0$ and obtain:

\begin{aligned} \rho(\vec{x}) = \exp (-1 - \lambda - \vec{\mu} \cdot \vec{x} - \frac{1}{2} \vec{x} \mathbf{M} \vec{x} ) = A \exp (-\frac{1}{2} \vec{y} \mathbf{M} \vec{y}) \end{aligned} . \ \ \ \ (13)

where matrix $\mathbf{M} \equiv \frac{1}{2} \eta$, $\vec{y} = \vec{x} + \vec{\mu} \mathbf{M}^{-1}$ and the constant $A = \exp (\frac{1}{2} \vec{\mu} \mathbf{M}^{-1} \vec{\mu} - \lambda)$.

Equation (13) depicts a Boltzmann-like distribution, i.e., $\rho(\vec{x}) \sim \exp(-H)$, where $H = \frac{1}{2} \vec{y} \mathbf{M} \vec{y}$ plays the role of the energy function in conventional statistical mechanics. Constant $A$ is the inverse of

To calculate the elements of $\mathbf{M}$, we assume the discrete states of the genome $\vec{x}$ can be approximated with a continuum. Therefore, the constraints (Equations 9-11) becomes

\begin{aligned} 1 = \int_{1}^{T} d \vec{x} \rho (\vec{x})= \int d^{N}x \rho (\vec{x}) = \int d^{N} y A \exp (- \frac{1}{2} \vec{y} \mathbf{M} \vec{y}) \end{aligned} . \ \ \ \ (14)

\begin{aligned} \langle x_i \rangle & = \int_{1}^{T} d \vec{x} \rho (\vec{x}) x_i \\ &= \int d^{N}x \rho (\vec{x}) x_i \\ &= \int d^{N} y A \exp (- \frac{1}{2} \vec{y} \mathbf{M} \vec{y}) (y_i - \sum_{j} M_{ij}^{-1} \mu_j) \\ &= - \sum_{j} M_{ij}^{-1} \mu_{j} \end{aligned}. \ \ \ \ (15)

\begin{aligned} \langle x_i x_j \rangle & = \int_{1}^{T} d \vec{x} \rho (\vec{x}) x_i x_j \\ &= \int d^{N}x \rho (\vec{x}) x_i x_j \\ &= \int d^{N} y A \exp (- \frac{1}{2} \vec{y} \mathbf{M} \vec{y}) \left[ \langle x_i \rangle \langle x_j \rangle + y_i \langle x_j \rangle + y_j \langle x_i \rangle + y_i y_j \right] \\ &= \langle x_i \rangle \langle x_j \rangle + \int d^N y A \exp (\frac{1}{2} \vec{y} \mathbf{M} \vec{y}) y_i y_j \end{aligned}. \ \ \ \ (16)

Rearranging equation (16), we have

\begin{aligned} \int d \vec{x} \rho(\vec{x}) y_i y_j = \langle x_i x_j \rangle - \langle x_i \rangle \langle x_j \rangle = C_{ij} \end{aligned}. \ \ \ \ (17)

Solving the above equation yields:

\begin{aligned} \langle y_i y_j \rangle = M_{ij}^{-1} \end{aligned}. \ \ \ \ (18)

By substituting Equation (18) into Equation (16), we obtain

\begin{aligned} M_{ij}^{-1} = C_{ij} = \langle x_i, x_j \rangle - \langle x_i \rangle \langle x_j \rangle \end{aligned}. \ \ \ \ (19)

or

\begin{aligned} \mathbf{M}^{-1} = \mathbf{C} \end{aligned}. \ \ \ \ (20)

which means the correlation or interaction matrix is the inverse of the covariance matrix, of which the matrix element $M_{ij}$ represents the interaction between genes $i$ and $j$. This also means we can obtain $M_{ij}$ by inverting the matrix of the gene expression data covariances.

However, because the number of samples $T$ in a gene expression dataset is usually much smaller than the number of genes $N$, $\mathbf{M}$ is singular and the inversion problem is undetermined. To solve this problem, we use spectral decomposition to inverse $\mathbf{C}$ in the non-zero eigenspace that represents the subspace spanned by the gene expression data:

\begin{aligned} C_{ij} = \sum_{k=1}^N \omega_{k} v_{i}^{k} v_{j}^{k} \end{aligned}, \ \ \ \ (21)

where $w_k$ is the $k$-th eigenvalue of $\mathbf{C}$, and $\vec{v}^k$ is the corresponding eigenvector, yielding:

\begin{aligned} M_{ij} = C_{ij}^{-1} = \sum_{k=1}^{T-1} \omega_{k}^{-1} v_{i}^{k} v_{j}^{k} \end{aligned}. \ \ \ \ (22)

where the summation is over the non-zero eigenvalues.

The above method was published in 2006, and since then there are some improved methods, e.g., method based on Ising model approximation [2]. Interested readers are referred to two recent review papers [3,4] (it is worth mentioning that paper [4] also review the application of MaxEnt to protein contact prediction). However, all the state-of-the-art methods as reviewed in the two papers can only infer undirected networks, which is not enough for gaining insights into the real regulatory mechanisms. Another drawback is that these MaxEnt methods cannot incorporate prior probability distribution that derived from previous data, which might not be efficient. In future blogs, I shall discuss my thoughts on addressing these drawbacks.

1.