Hybrid Dirichlet processes for functional data
Seminar Room 1, Newton Institute
I discuss Bayesian modeling of random effects and clustering in functional data; examples include functional regression and spatial data with individual heterogeneity. In the recent years there has been an enormous growth of interest in statistical applications of Bayesian nonparametric procedures for modeling heterogeneity and clustering structures in the data. Indeed, the Dirichlet process, and more generally species sampling priors, have revealed extremely fruitful for modeling the clustering allocation or the appearance of new clusters, or species, in samples from a population of (potentially infinite) species. Each individual is allocated in one of the observed species or in a new one according to a probability law which is implicit in the choice of the prior (e.g., by the well known Polya urn scheme for the Dirichlet process). However, for functional data this implies that a new species is envisaged even if the curve differs from the previously observed ones only for some coordinates. This often produces as many species as the sample size, thus defying the clustering purposes of the model. Instead, a more effective description of the data could be obtained by allowing hybrid species, where portions of the curves may belong to different species. This can model local mutations of the curves from one species to a new one. In other words, the Dirichlet process implies a probability law on global random partitions, while for multivariate or functional data new notions of dependent local partitions arise.
In my talk, I will first consider (finite or infinite) mixture models of Gaussian processes where the mixing distribution is a (finite-dimensional) functional Dirichlet process. However, functional Dirichlet processes imply a global effect for the mixing variables that induce the clustering, while, for functional data, local random effects along the curve are often more sensitive. Thus, we propose a generalized family of species sampling priors well suited for modeling hybrid species and local random partitions. This class of priors is appealing in that it provides a general and directly interpretable Bayesian mixture model for functional data, including as special cases models with global or local effects proposed very recently in the literature. Theoretical properties of the proposed priors will be developed, including a weak limit result for the finite-dimensional case. Applications to simulated data and image classification will illustrate the procedure.