[Home]
[Book]
[Publications]
[Multiresolution]
MR/2:
MR/2: Multiscale Entropy and Applications
The term "entropy" is due to Clausius (1865), and the concept of
entropy was introduced by Boltzmann into statistical mechanics, in
order to measure the number of microscopic ways that a given
macroscopic state can be realized. Shannon (1948) founded the
mathematical theory of communication when he suggested that the
information gained in a measurement depends on the number of possible
outcomes out of which one is realized. Shannon also suggested that
the entropy can be used for maximization of the bits transferred under
a quality constraint. Jaynes (1957) proposed to use the entropy measure
for radio interferometric image deconvolution, in order to select between
a set of possible solutions that which contains the minimum of
information, or following his entropy definition, that which has
maximum entropy. In principle, the solution verifying such a condition
should be the most reliable. Much work has been carried out in the
last 30 years on the use of entropy for the general problem of data
filtering and deconvolution.
Traditionally information and entropy are determined from events and
the probability of their occurrence. Signal and noise are basic
building-blocks of signal and data analysis in the physical and
communication sciences. Instead of the probability of an event, we are
led to consider the probabilities of our data being either signal or
noise.
Consider any data signal with interpretative value. Now consider a
uniform "scrambling" of the same data signal. (Starck et al., 1998,
illustrate this with the widely-used Lena test image.) Any traditional
definition of entropy, the main idea of which is to establish a relation
between the received information and the probability of the observed
event, would give the same entropy for these two cases. A good
definition of entropy should instead satisfy the following criteria:
- The information in a flat signal is zero.
- The amount of information in a signal is independent of the
background.
- The amount of information is dependent on the noise. A given
signal Y (Y = X + Noise) doesn't furnish the same information if
the noise is high or small.
- The entropy must work in the same way for a signal value which has a
value B + epsilon (B being the background), and for a signal value
which has a value B - epsilon.
- The amount of information is dependent on the correlation in the
signal. If a signal S presents large features above the noise, it
contains a lot of information. By generating a new set of data from
S, by randomly taking the values in S, the large features will
evidently disappear, and this new signal will contain less
information. But the data values will be the same as in S.
To cater for background, we introduce the concept of multiresolution
into our entropy. We will consider that the information contained in
some dataset is the sum of the information at different resolution
levels, j. A wavelet transform is one choice for such a multiscale
decomposition of our data. We define the information of a wavelet
coefficient wj(k) at position k and at scale j as I = - ln (p(wj(k))),
where p is the probability of the wavelet coefficient. Entropy,
commonly denoted as H, is then defined as the sum over all positions,
k, and over all scales, j, of all I.
For Gaussian noise we continue in this direction, using Gaussian
probability distributions, and find that the entropy, H, is the sum
over all positions, k, and over all scales, j, of
(wj(k)^2)/(2 sigma^2 j) (i.e. the coefficient squared, divided by
twice the standard deviation squared of a given scale). Sigma, or
the standard deviation, is the (Gaussian) measure of the noise. We
see that the information is proportional to the energy of the
wavelet coefficients. The higher a wavelet coefficient, then the
lower will be the probability, and the higher will be the
information furnished by this wavelet coefficient.
Our entropy definition is completely dependent on the noise
modeling. If we consider a signal S, and we assume that the noise
is Gaussian, with a standard deviation equal to sigma, we won't
measure the same information compared to the case when we consider
that the noise has another standard deviation value, or if the
noise follows another distribution.
Returning to our example of a signal of substantive value, and a
scrambled version of this, we can plot an information versus scale
curve (e.g. log(entropy) at each scale using the above definition,
versus the multiresolution scale). For the scrambled signal, the
curve is flat. For the original signal, it increases with scale.
We can use such an entropy versus scale plot to investigate
differences between encrypted and unencrypted signals, to study
typical versus atypical cases, and to differentiate between
atypical or interesting signals.
To read further:
- J.L. Starck, F. Murtagh,
"Astronomical Image and Signal Processing" ,
IEEE Signal Processing magazine , 18, 2, pp 30--40, 2001.
- J.L. Starck, F. Murtagh, P. Querre, and F. Bonnarel,
"Entropy and Astronomical Data Analysis: perspectives
from multiresolution analysis", Astron. and Astrophys., 368,
pp 730--746,2001.
- J.L. Starck, and F. Murtagh, "Multiscale Entropy Filtering",
Signal Processing, Vol 76, No 2, pp 147-165, 1999.
- J.L. Starck, F. Murtagh and R. Gastaud, "A new entropy measure
based on the wavelet transform and noise modeling", IEEE
Transactions on Circuits and Systems - II: Analog and Digital
Signal Processing, 45, 1118-1124, 1998.
Applications in MR/2 are
Multiscale entropy calculation.
1D and 2D data filtering from the multiscale entropy.
Image deconvolution.
Image background fluctuation.