The INI has a new website!

This is a legacy webpage. Please visit the new site to ensure you are seeing up to date information.

An Isaac Newton Institute Workshop

Recent Advances in Statistical Genetics and Bioinformatics

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Author: Peter Green (Bristol)


We consider methodology for Bayesian model-based clustering of gene expression profiles, that is, measurements of expression levels of a large number of genes, typically from microarray assays, across a number of different experimental conditions and/or biological subjects. We follow a familiar approach using Dirichlet-process-based models to cluster the genes implicitly, but depart from standard practice in several ways. First, we incorporate regression on covariate information at the condition/subject level by modelling regression coefficients, not the expectations of the data directly. More importantly, we replace the Dirichlet process by one of a richer family of models, generated from a stick-colouring-and-breaking construction, under which cluster identities are not exchangeable: this allows modelling a 'background' cluster, for example. Finally, we follow a formal decision-theoretic approach to point estimation of the clustering, using a pairwise coincidence loss function. This is joint work with John Lau at Bristol.