The ultrametric topology perspective on analysis of massive, very high dimensional data stores
Seminar Room 1, Newton Institute
An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling. Other applications will be described, in particular in the area of textual data mining.
 F. Murtagh, On ultrametricity, data coding, and computation, Journal of Classification, 21, 167-184, 2004.
 F. Murtagh, G. Downs and P. Contreras, "Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding", SIAM Journal on Scientific Computing, in press, 2007.
 F. Murtagh, The remarkable simplicity of very high dimensional data: application of model-based clustering, submitted, 2007.
 F. Murtagh, Symmetry in data mining and analysis: a unifying view based on hierarchy, submitted, 2007.
- http://www.cs.rhul.ac.uk/home/fionn/papers - Copies of papers
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.