The INI has a new website!

This is a legacy webpage. Please visit the new site to ensure you are seeing up to date information.

Skip to content



Text mining and high dimensional statistical analysis

Said, YH (George Mason)
Wednesday 13 February 2008, 14:00-15:00

Seminar Room 2, Newton Institute Gatehouse


Text mining can be thought of as a synthesis of information retrieval, natural language processing and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the associated lexicon can be a million or more words. Analysis is often done by consideration of a term-document matrix or even a bigram-document matrix. The dimensionality of the term vector can thus easily be a million or more. In this talk I will describe some of the approaches to text mining on which we have been working. This is a joint work with Dr Edward Wegman.


[pdf ]



Back to top ∧