July to December 1998

**Organisers**: W J Fitzgerald (*Cambridge*), R L Smith (*University of North Carolina*), A Walden (*Imperial College, London*) and P C Young (*Lancaster University*)

The classical theory of signal processing is based on models which are stationary, linear and in many cases also Gaussian. Recent advances in time series and the theory of signal processing have drawn attention to many new models and methods. Among these are nonlinear autoregressive and nonlinear state-space models, state-space models with time-varying or state-dependent coefficients as models for nonstationary and nonlinear series, linear non-Gaussian processes which pose some specific problems not encountered with Gaussian processes, methods derived from the theory of dynamical systems, and many others. There has been a parallel growth in the applications of signal processing in many modern areas of modern engineering and into other areas such as financial time series, the environmental sciences, physiology, etc. In many instances, methods have developed in an ad hoc way, being designed for specific applications rather than fitted into a general framework. In particular, very often similar problems have been examined by different groups of workers with minimal contact between the groups. The purpose of this programme was to bring together statisticians, engineers and other researchers who use signal processing methodology to unify existing methods and to identify areas of research where new methodology is required.

The programme was structured around five major themes: Bayesian Statistics, Environmental Science, Dynamical Systems Theory, Econometric and Financial Applications, and Extreme Value Theory.

Each of these themes had a major workshop associated with it and, in
addition, several special theme weeks were concerned with mathematical
developments in nonstationary signal processing* eg* Gabor analysis
and Wavelet Analysis, extreme value statistics and the statistics of transmissible
spongiform encephalopathies (TSEs) and how this could possibly explain
nvCJD.

The recent advances in numerical approaches to Bayesian inference have enabled complex problems to be solved that were impossible to consider previously and an important area considered in our programme was how these methods could be unified and applied to the field of data analysis. This was a great success and is forming the basis for several future collaborations.

Another important area to come out of the programme was the application
of novel statistical methods to aspects of* eg* global warming time
series analysis and other areas, relevant to the insurance industry where
extreme value statistics are appropriate.

The overall planning of the programme was undertaken by Bill Fitzgerald, with the other organisers (as well as some participants) being involved in the arrangement of specific workshops, conferences and seminars.

The programme was the largest international event ever to take place addressing the problems associated with Nonlinear and Nonstationary Signal Processing and was unique in bringing together statisticians, mathematicians, engineers, econometricians and environmental scientists. Many of the world leaders in particular subject areas either spent extended periods of time at the Institute or attended the various workshops.

Weekly seminars were organised throughout the programme and formed a very useful framework which enabled people in different fields to understand the common threads that bind the subject together.

A balance was struck between the number of workshops and seminars and the amount of time that the programme participants could spend on their own research and in discussions with other participants.

The atmosphere was extremely good and very conducive to interactions - so much so that many new collaborations were formed which will be ongoing in the future.

Social events, ranging from cricket matches to receptions were held throughout the programme and these enabled very good interactions to take place between the participants of the programme.

**Bayesian Signal Processing**

The workshop on Bayesian Signal Processing was held in July, and was the start of the six-month programme. This workshop was organised by Bill Fitzgerald.

Bayesian inference is the methodology which underpins most of modern signal processing and data analysis and it was felt that a workshop dedicated to Bayesian methodology would be most appropriate.

In recent years there has been an explosion of interest in Bayesian statistics. The development of new algorithms for Bayesian computation, together with the ready availability of computing resources, have made feasible statistical methods which until the present decade were limited to small data sets and restricted classes of models. This "revolution" in statistical methodology has affected every area where statistics is applied, including medical statistics, the social sciences, analysis of financial data and the modelling of large environmental systems.

Signal processing refers to the class of methodologies available for
handling data produced sequentially in time. Most of the methods originated
in engineering but are now applied in many other areas as well,* eg*
communications, econometrics, geophysics, physiology, image processing.
Classical methods of signal processing, such as the Kalman filter, were
based on stochastic processes which are linear and Gaussian. One reason
for this restriction was that the computations required for such systems
are relatively simple and could be readily implemented using the computing
resources of the 1960s and 1970s. However such assumptions are inadequate
for many modern applications and more flexible models have emerged. Examples
include wavelet methods for time-frequency analysis, Kalman filters with
time-varying or state-dependent coefficients, new techniques for non-Gaussian
processes, and dynamical systems and chaos as mathematical models for time
series. However in many instances the new methodology has developed in
an ad hoc way, being designed for specific applications rather than fitted
into a unifying framework.

The main focus and thrust of this workshop was the fact that Bayesian methods provide a unifying methodology whereby different kinds of mathematical models may be examined within a common statistical framework. The workshop brought together the statistical and computational expertise of leading statisticians and the modelling expertise of mathematicians and subject matter specialists, with the broad objective of developing new signal processing tools which make efficient use of modern computational resources while combining the most up-to-date research of both groups of specialists.

Specific topics that were covered included:

1) Bayesian methods in general, and numerical methods in particular,

2) Nonlinear and nonstationary time series estimation,

3) Forecasting and changepoint modelling,

4) Nonlinear signal processing in econometrics and financial time series,

5) Dynamical systems and statistics,

6) Environmental applications and spatial data analysis.

The workshop had around 100 participants from 18 different countries. Many of the talks were aimed at young research workers and this was combined with more advanced ‘state-of-the-art’ approaches. The younger scientists had the opportunity to show their work during several poster sessions.

The workshop was a huge success and has formed the basis for several collaborations.

**Gabor Analysis workshop**

This workshop, which involved around 30 participants, and was held in
the first week of August, was organised by Hans Feichtinger and Bill Fitzgerald.
For many years the Fourier transform has been one of the main tools in
applied mathematics and signal processing. However, due to the large diversity
of problems with which science is confronted on a regular basis, it is
clear that there does not exist a single universal method which is well
adapted to all the problems. In this workshop leading experts in the field
of Gabor analysis, which is becoming an accepted area of research which
is both theoretically appealing and has had many application successes,
were brought together to discuss current new developments. One aim of Gabor
analysis is to be able to represent one-dimensional signals in two dimensions,
namely time and frequency. One goal is to find simple elements, the *atoms*,
of a function space and the assembly rules that allows the reconstruction
of all the elements of the function space using these atoms. There are
certain similarities between the Gabor approach and the current interest
in time-frequency distributions and several talks were aimed at this similarity.
The new book, edited by Hans Feichtinger and Thomas Strohmer entitled *Gabor
Analysis and Algorithms*, Birkhauser, 1998, discusses many of the areas
addressed during the workshop and most of the contributing authors attended
the workshop.

**Environmental Signal Processing**

This workshop, which involved around 60 participants, was organised by Richard Smith and Peter Young and took place in the middle of August.

A major area of application of signal processing is in the analysis
of environmental data, interpreted here to include areas such as hydrology
and oceanography, climatology and studies related to pollution. Characteristic
features of such applications include constructing models for dependence
and trends, but it is also a common feature that data are collected spatially.
Therefore, a part of our objective here was to extend signal processing
techniques to the analysis of spatial data. In this we concentrated on
themes closely related to those developed elsewhere in the programme since
to consider all aspects of spatial data (*eg* image analysis) would
be going too far from our dominant theme.

A key issue in some environmental applications is the relationship between
purely statistical approaches to data analysis and those which use physically
motivated models. A general theme developed over many years, by PC Young,
is data-based mechanistic (DBM) modelling, in which transfer-function models
are fitted to data and then reinterpreted in physical terms. The models
fitted are very often nonlinear and/or nonstationary. Specific environmental
applications include the "active mixing volume" approach (Young/Lees) to
the dispersion of a pollutant in water, and models for rainfall-flow (Young/Beven).
Other environmental applications of signal processing include distinguishing
between trends and various forms of time series dependence (eg long-memory)
in climatic data series, ozone etc, and models based on stochastic time
transformations in geological and paleoclimatological problems. There are
also a number of researchers looking at climatological problems from a
dynamical systems point of view,* eg* the "singular spectral analysis"
approach, and Milan Palus from the Czech Republic gave several presentations
on these approaches.

Extensions to spatial-temporal models are important because many of
the environmental problems involve spatially collected data. The two main
approaches to spatial data are the geostatistical approach, based on Gaussian
models with parametrically specified covariance functions, and approaches
stemming from the fundamental work of Besag which are based on families
of conditional probabilities. Current applications to spatial-temporal
data, tend to rely on simple product-form combinations of spatial and temporal
operators. Our general approach to nonlinear and nonstationary signal processing

suggests many extensions of non-product form, with potential applications
including calibration of weather-radar data, modelling of ground-level ozone, spatial-temporal
models in climatology, and many others.

**IEEE Neural Networks workshop**

The Eighth IEEE Workshop on Neural Networks for Signal Processing was held at the Isaac Newton Institute and Robinson College, 31 August - 2 September 1998 and formed a very interesting addition to the programme that was already taking place at the Institute. There was obviously a lot of overlap between the participants and many people who were attending the Newton programme gave papers at the IEEE workshop. Also, many delegates from the IEEE workshop stayed on afterwards to continue interactions with the programme. We also hosted a very successful reception and poster session for workshop delegates and this enabled detailed discussions to take place in a relaxed environment.

The proceedings has already appeared as a published IEEE book (ISBN 0-7803-5060-X).

Dynamics and Statistics

This workshop, which involved around 60 participants, was organised by Richard Smith and Alistair Mees and took place in the middle of September.

The "signals" used in signal processing have physical origins, and processing them will be greatly assisted if knowledge about the underlying dynamics is known or can be inferred. During the past 15 years, workers in the dynamical systems community (Eckmann/Ruelle, Farmer/Sidorowich, Casdagli, Mees, Sugihara/May) have developed signal processing from a different point of view. They were often motivated by the desire to identify whether a system is chaotic or noisy but in large measure their work is applicable even in the absence of chaos. There are numerous examples of signals that look random but are not, and cannot be analysed by linear methods. Early successes of these methods encouraged applications which are more controversial, such as deterministic modelling of financial time series.

Initially, statisticians and dynamicists worked almost independently with little exchange of ideas and both groups suffering as a result.

There have been comparatively few statisticians working in this field,
and much of the best work is being done by L Smith and H Abarbanel who
both participated during both the workshop and for longer periods during
the programme as a whole. L Smith discussed *surrogate data analysis*
which is closely related to bootstrapping; comparisons with the statistical
theory of bootstrapping shed much light on the area.

Other areas where statisticians and dynamicists benefited from interaction included: the estimation of fractal dimensions, Lyapunov exponents and other dynamical invariants; the shadowing lemma of dynamical systems theory and its relation with statistical techniques such as extended Kalman filtering; embeddings both as a theoretical property of dynamical systems and the practical estimation of embedding dimension; geometric filtering theory; and the often underestimated question of what happens with all of these techniques when statistical noise is added to a nonlinear dynamical system. In all of these areas, both dynamicists and statisticians have key ideas to contribute and both groups benefited from the interaction.

**Econometrics and Financial workshop**

This workshop, which involved around 60 participants, was organised by Neil Shepard and Ruey Tsay and took place in the middle of October.

Signal processing has played an important role in economics for many years. Classical signal extraction results have influenced recent generations of economists.

Nonlinear signal processing problems come up in tackling many models
suggested in various aspects of economics and finance. Leading cases include
the stochastic volatility generalisation of the option pricing of the well-known
Black-Scholes formula, the nonlinearity in Tobin’s *q* for optimal
investment at the firm level, and the Cox-Ingersol-Ross term structure
of interest rates model. These models involve the use of stochastic differential
equations with unobserved state variables and must take into consideration
nonlinearity and nonstationarity of the underlying signal.

Econometricans have recently turned their attention to developing methods for tackling these types of models. In stochastic volatility studies, many authors have applied numerical methods and Markov Chain Monte Carlo methods to develop nonlinear models capable of describing the observed features such as volatility clusterings and long-range dependence. In addition, several authors have considered nonlinear diffusions, continuous time models and nonparametric methods in mathematical finance with considerable success.

A different, but closely related, approach to volatility modelling in the econometric literature is to consider duration models which combine point processes with time series analysis to describe the evolution of the underlying signal or information. In financial applications, this approach would consider jointly the transaction events and the associated volumes and prices. This research is also closely related to generalised linear models, and nonlinearity and nonstationarity again play an important role.

Considerable efforts have been devoted to the study of business cycles in economics. For instance, Markov switching models are widely used to study the status of the economy and the transition between recession and expansion. From the signal processing point of view, such studies can be formulated into a nonlinear state-space model and estimated effectively by MCMC methods.

**Extreme Value Statistics and Insurance**

This workshop, which involved around 60 participants, was organised by Richard Smith and Bill Fitzgerald and took place in the middle of August.

*Extreme value theory* is a branch of statistics concerned particularly
with the most extreme values (maxima and minima) of a sample of random
events. It has applications in many areas where the most extreme values
are important, such as strength of materials, structural reliability, extreme
values of environmental pollutants such as ozone and sulphur dioxide, and
climatological extremes. Its relevance to the subject of insurance stems
from the obvious point that a small number of very large claims may have
serious consequences for an insurance company. Extreme value theory provides
a set of mathematical tools both for characterising probabilities of very
large claims and for assessing their consequences for the solvency of a
company. During the programme, many data sets were analysed and the results
presented at the various workshops. The finding are to be published shortly.

Traditional actuarial theory relies on standard probability distributions
such as normal and gamma. Such *short-tailed* distributions have the
advantage of being easy to handle statistically, and of leading to a nice
mathematical theory of long-term loss probabilities. The trouble is that
real data on insurance claims tend to show occasional very large claims
which are much more consistent with *long-tailed* distributions such
as the Pareto, and which lie outside the scope of the usual theory. However
during the past fifteen years, a new set of statistical and mathematical tools
has been developed to handle such distributions. These tools are now at
the stage where they are being applied to real data from insurance companies.
They are particularly relevant when applied to problems of *reinsurance*,
which is where the most critical problems associated with large claims
arise.

The application of Bayesian numerical methods to these problems formed a central focus to the analysis of the data during the programme.

Related to concern over the effects of large claims is concern over
their causes. The 1990s have seen a series of environmentally related insurance
disasters, mostly in the USA,* eg* Hurricane Andrew, the Northridge
earthquake, the Mississippi floods. Although there is no direct evidence
to suppose that these are linked with environmental trends such as global
warming, nevertheless it is natural to ask whether there is any association.
Statistical analyses of climatological data *have* demonstrated a
rise in the frequency of extreme events. It was therefore an important
research question to establish whether these extreme events have arisen
just by chance or can be associated in a causal fashion with known "signals"
such as the enhanced greenhouse gas effect. The finding from our analysis
are soon to be published in the scientific literature and in the three
books that will be published based on the six-month programme.

**Transmissible Spongiform Encephalopathies (TSEs) and nvCJD**

This workshop, which involved around 30 participants, was organised by Noah Linden, Frank Kelly, Bill Fitzgerald and Sheila Gore and took place in November.

This meeting had as its principal purpose to bring together a group of biomathematicians, a group who have been concerned with following the BSE and new variant CJD epidemics and experimental workers in the field of spongiform encephalopathies to assess the present situation with regard to the BSE and CJD epidemics and to explore the possibilities of modelling their progress and predicting their future cause.

An account of the back calculation techniques that have been used to map the BSE epidemic was given.

Estimates of the associated parameters can be obtained from the data that is available on the British bovine herds. The predictions of how the incidence would decline have been fairly closely followed.

The question of under-reporting was discussed and it was pointed out that while there may be some under-reporting its extent has probably been exaggerated in the past.

A session was devoted to the modelling of the new variant CJD epidemic.

**Data Analysis**

This final workshop of the programme was organised by Andrew Walden and Richard Smith and took place in early December.

Topics covered included image denoising, teletraffic analysis, car engine
diagnostics, analyses of Ulysses spacecraft data and recognition of fax-corrupted
words. Analyses of some common data sets, analysed during the six-month
programme, featured in a number of talks. Several methodological themes
recurred during the presentations, notably state-space modelling, wavelet
analyses, and Polya tree structures. Also the necessity of defining good
metrics and loss functions for measuring disparities in trees, images *etc*
was considered a rich area for future research.

As evidenced by intriguing and searching questions, a main achievement of the workshop was to make members of the engineering and statistical communities aware of each other’s methodological and applied work, in particular in the areas of Polya trees and wavelets, and to help define the interesting and important questions for future research.

Amongst the many successes of the programme, one of the most notable was the stimulation of new interdisciplinary collaborations between the groups of statisticians, mathematicians and engineers who participated in the programme.

This was apparent at each of the workshops and within the programme as a whole. For example, during the course of the programme various data sets were made available to participants. These data sets represented many and various different aspects of signal and data analysis. For example, some of the data sets were temperature data going back many years and were previously thought, by some people, to show evidence of global warming, other data sets were taken from the insurance industry and are examples of extreme value problems. Various participants analysed these data sets from their own points of view and some of the seminars and workshops were aimed at looking at the different approaches taken by the workers from different backgrounds.

As a direct result of these different approaches many collaborative ventures have started and the results should be very revealing.

For example, Peter Young and Bill Fitzgerald are trying to fuse together the approaches previously taken using Dynamic Harmonic Regression and Bayesian approaches to parameter estimation and Model selection. This will have many interesting applications in time series analysis.

Hans R Kunsch worked on the asymptotics of estimation of mutual information between lagged variables in a time series. This is often used as a diagnostic tool since in contrast to correlations, mutual information captures nonlinear relations between variables. Up until now, no results on the asymptotic properties of this estimator are known. Considerable insight has now been achieved about this estimator.

Hans R Kunsch also presented his work on the so-called particle filter at the Data Analysis Workshop. The particle filter is the basis for the current interest in sequential Markov Chain Monte Carlo Bayesian methods. The work presented by Mike Pitt and Neil Gordon, in this area, formed a very useful opportunity to discuss this very important approach from different perspectives.

Raquel Prado from the University Simon Bolivar, Venezuela, presented her work on latent structure in nonstationary time series and her work on time-varying autoregressions (TVAR) caused much interest since it forms a very useful potential way of dealing with multivariate environmental and biological time series that are currently being investigated using alternative approaches.

Richard G Baraniuk from Rice University was elected as Rosenbaum Fellow. Amongst the research carried out during his stay, several new areas of research were started: tree-structured lacunary wavelet series, tree-structured optimisations for wavelet-based signal processing, multifractal wavelet series and empirical mode decomposition. Also, interactions with statisticians (EI George and WJ Fitzgerald) showed that many of the new techniques of Markov Chain Monte Carlo have many applications in Time-Frequency analysis which have yet to be fully explored. Several INI technical reports will be forthcoming. Also, a book concerning Time-Frequency representation, together with Doug Jones was started. He wrote papers, during the programme, with Patrick Flandrin and AJEM Janssen. Several seminars were given at various Universities in Europe during the duration of the programme.

Ed George (Austin) developed, together with AP Dawid, a new approach for putting prior probability distributions over models in Bayesian analyses. The goal was to avoid the problem of putting too much probability on sets of similar models. The idea was to assign probability to Kullback-Leibler neighborhoods in model space thereby diluting probability across similar models. Work was also undertaken to prove asymptotic consistency of estimators for empirical Bayes methods for variable selection and this showed that such estimators are vastly superior to competing methods.

Work undertaken by Ed George and Bill Fitzgerald concerning model complexity penalisation has shed new light on the problems associated with Bayesian approaches to model selection.

Work was undertaken by Richard Smith, Peter Young and Bill Fitzgerald on certain data sets supplied by the insurance industry and various different approaches were used to very good effect.

These results were presented at the workshop concerning extreme value statistics.

Work undertaken by Don Percival (Washington), who was an EPSRC Senior Visiting Fellow at the Institute, and Andrew Walden, concentrated on wavelet methods applied to time series and half of a CUP book was written on this subject during the programme. Work was undertaken using tree structures and Polya trees in particular, for representing correlation structures arising in wavelet analysis.

David Thomson from Bell Labs visited the programme twice and spoke about multitaper spectral estimation. This formed the basis for a lot of interaction with applications to particular data sets.

Steve McLaughlin (Edinburgh) focused his work on two main areas: Higher order statistics for the detection of quadratic phase coupling in time series and the second was to develop parsimonious models for teletraffic data which exhibits long range dependence.

The programme benefited from sponsorship by Schlumberger Cambridge Research, the US office of Naval Research, Barclaycard, BP, the TSUNAMI initiative and the NSF.

We are very grateful to Cambridge Control for supplying us with copies of MATLAB for the full duration of the programme.

This programme would really not have been possible without the super-excellent support of the staff of the Isaac Newton Institute.