Using trees or forests to model biomolecular networks

Edmund Jones and Vanessa Didelez


Abstract:

Gene regulation networks are often modelled by Gaussian graphical models. The Bayesian approach to learning the network structure from data is extremely computerintensive, because of the large number of possible graphs and the need for timeconsuming approximations on the class of non-decomposable graphs. It is common to restrict attention to decomposable graphs, for which the posterior probability can be calculated easily and exactly.

Another possibility is to restrict attention even further, to forests, which are graphs that have no cycles, or trees, which are connected forests. With forests and trees the calculations are especially simple, and if the graph prior distribution has a certain property then the graph with highest posterior probability can easily be found. It is widely believed that biomolecular networks are sparse, and one justification for considering only forests or trees is the claim that sparse graphs are locally tree-like. This poster will firstly show how this claim can be made rigorous using results from Erdős–Rényi random graph theory.

Even with the restriction to trees and only 15 or 20 nodes, checking all the graphs is still too time-consuming. MCMC methods can be used to sample from the posterior distribution on the set of graphs, though proposals that simply add or delete single edges give poor mixing. If only trees are considered, the simplest proposal is instead to move an edge, but choosing a move uniformly at random requires information about the graph that is difficult to maintain. The poster will present ideas on how to store and manipulate forests and trees, and a discussion of other issues that arise.