The main statistical application of wavelets is in signal denoising (a.k.a. smoothing, non- parametric function If you want to try it yourselves, try the ksmooth function in R. 1 where φ is the pdf of the standard normal. Thus. Request PDF on ResearchGate | On Jan 1, , Efstathios Paparoditis and others published Wavelet Methods in Statistics with R. Request PDF on ResearchGate | On Jun 1, , Jeffrey S. Morris and others published Wavelet Methods in Statistics with R by NASON, G. P.
|Language:||English, Spanish, German|
|Genre:||Science & Research|
|ePub File Size:||24.89 MB|
|PDF File Size:||8.51 MB|
|Distribution:||Free* [*Regsitration Required]|
Wavelet methods have recently undergone a rapid period of development with objectives: (i) providing an introduction to wavelets and their uses in statistics; DRM-free; Included format: PDF; ebooks can be used on all reading devices. Hence, the book is centred around the freeware R and WaveThresh software packages, which will enable readers to learn about statistical wavelet methods, use. Wavelet Methods in Statistics with R (Use R) · Read more Robust Statistical Methods with R Wavelet methods in mathematical analysis and engineering.
Mostly we keep this normalization throughout, although it is sometimes convenient to use other normalizations. Other resolution levels in the wavelet decomposition object can be obtained using the accessD function with the levels arguments set to one and 24 2 Wavelets zero. The vertical scale is exaggerated by ten times. Further details on WaveD can be found in e. However, these spline wavelets are not orthogonal functions, which makes them less attractive for some applications such as nonparametric regression. The text is interspersed with snippets of R code to illustrate the techniques presented and prove s the basis of an excellent text for private study. However, the Shannon wavelet is occasionally used in statistics in a theoretical setting.
In other words, it includes information from 2. Before we do this, we need to introduce some further notation. Now we are about to introduce coarser-scale detail. Later, we will go on to introduce detail at successively coarser scales. Hence, we need some way of keeping track of the scale of the detail. We do this by introducing another subscript, j which some authors represent by a superscript. The original sequence y consisted of 2J observations. To obtain the next coarsest detail we repeat the operation of 2.
From a quick glance of 2.
At this point, we feel the need to issue a warning over terminology. However, depending on the context, we sometimes use scale to mean level.
Now nothing can stop us! We can repeat the averaging Formula 2. Writing 2. The latter point also tells us when the algorithm stops: These kinds of diagrams are used extensively in the literature and are useful for showing the main features of multiscale algorithms. Since Figure 2. Example 2. Suppose that we begin with the following sequence of numbers: First apply Formula 2. It is useful to write down these computations in a graphical form such as that depicted by Figure 2.
Graphical depiction of a multiscale transform. The solid arrows indicate addition, and numbers set in the upright font correspond to the cj,k. The algorithm that we described above is an example of a pyramid algorithm. The 4 indicates that the sum of the last quarter of the data minus the sum of the third quarter is four.
So far we have avoided using the word wavelet in our description of the multiscale algorithm above. The algorithm that we have derived is one kind of discrete wavelet transform DWT , and the general pyramid algorithm for wavelets is due to Mallat b. The wavelets underlying the transform above are called Haar wavelets after Haar Welcome to Wavelets! For example, the inverse formulae to the simple ones in 2. For example, suppose we started with the input sequence 1, 1, 1, 1, 2, 2, 2, 2.
If we processed this sequence with the algorithm depicted by Figure 2. This behaviour is characteristic of wavelets: The vector we chose was actually piecewise constant, an extreme example of piecewise smooth.
In the example above the input sequence was 1, 1, 7, 9, 2, 8, 8, 6. See Section B. Hence the norm, or energy, of the output sequence is much larger than that of the input. We address this in the next section. Thus 2. Mostly we keep this normalization throughout, although it is sometimes convenient to use other normalizations.
For example, see the normalization for the Haar—Fisz transform in Section 6. We can rewrite 2. Let us perform the transform described in Example 2. Why is this? However, this raises a good point: In all these circumstances, one still obtains the same kind of analysis. Other resolution levels in the wavelet decomposition object can be obtained using the accessD function with the levels arguments set to one and 24 2 Wavelets zero.
The level is indicated by the left-hand axis. Note that the zero d2,1 is not plotted. Produced by f. Other interesting information about the ywd object can be obtained by simply typing the name of the object. For example: Discrete Wavelet Transform Object: List with 8 components with names 2. Mon Dec 4 Haar wavelet Boundary handling: Since the output has been computed from the input using a series of simple additions, subtractions, and constant scalings, it is no surprise that one can compute the output from the input using a matrix multiplication.
It is instructive to see the structure of the previous equations contained within the matrix. Not all wavelets are orthogonal and there are uses for non-orthogonal wavelets. For example, with non-orthogonal wavelets it is possible to adjust the relative resolution in time and scale e. Most of the wavelets we will consider in this book are orthogonal, although sometimes we shall use collections which do not form orthogonal systems, for example, the non-decimated wavelet transform described in Section 2.
Repeating this for each of the n rows of W results in n2 operations in total. Here, log n is small for even quite large n. WaveThresh contains functionality to produce the matrix representations of various wavelet transforms. To produce the matrix W shown in 2. One can verify the orthogonality of W using WaveThresh. It is perfectly possible to extend the following ideas to other intervals, the whole line R, or d-dimensional Euclidean space.
For Haar, involving any more than pairs automatically means a larger-scale Haar wavelet. With complete knowledge of a function, f x , one can, in principle, investigate it at any scale that one desires. We have not answered the question about how to obtain such a discrete sequence from a function. This is an important consideration and there are 2. However, until then suppose that such a sequence, derived from f x , is available.
For the Haar wavelet transform on functions we derive a similar notion which involves subtracting integrals of the function over consecutive pairs of intervals. At this point, it is worth explaining what the cJ,k represent. Using 2. Plot a in Figure 2. In each plot the horizontal label is time in seconds, and the vertical axis is milliVolts. Plots b, c, and d in Figure 2. These Haar approximations are reminiscent of the staircase approximation useful for example in measure theory for proving, among other things, the monotone convergence theorem, see Williams or Kingman and Taylor We could compute the local average over these dyadic intervals Ij,k for any j and k.
Equation 2. The formula for general wavelets is 2. Using this two-scale relation it is easy to see how 2. Of course, the computation in 2. The approximation f1 x is of the form 2. This again generalizes to all scales. The two functions we choose are the Blocks and Doppler test functions introduced by Donoho and Johnstone b and further discussed in Section 3. These functions can be produced using the DJ.
EX function in WaveThresh. The code that produced Figure 2. Top row: Bottom left: Bottom right: The ones at coarse levels are actually bigger. In Figure 2. The other point to note about Figure 2. This is because, in Haar terms, two neighbours, identical in value, were subtracted as in 2. As Figure 2. The ones that are extremely small e. This turns out to happen for a wide range of signals decomposed with the right kind of wavelets.
Such a property is of great use for compression purposes, see e. Taubman and Marcellin , and for statistical nonparametric regression, which we will elaborate on in Chapter 3. Finally, in Figure 2.
From such a plot one can clearly appreciate that there is a direct, but reciprocal, relationship between scale and frequency e. We will elaborate on this in Chapter 5. This section will concentrate on introducing and explaining concepts. We shall quote some results without proof. Full, comprehensive, and mathematical accounts can be found in several texts such as Mallat a,b , Meyer b , and Daubechies , These spaces could possibly contain functions with less detail, but there would be some absolute maximum level of detail.
This means that the spaces form a ladder: As j becomes large and negative we include fewer and fewer functions, and detail is progressively lost. We refer to this as interscale linkage. Finally, we have not said much about the contents of any of these Vj spaces. We saw this representation previously in 2. Here, it is valid for more general father wavelet functions, but the result is similar.
The dilation equation is fundamental in the theory of wavelets as its solution enables one to begin building a general MRA, not just for Haar wavelets. The dilation equation controls how the scaling functions relate to each other for two consecutive scales.
The general dilation equation in 2. Daubechies provides a key result that establishes the existence and construction of the wavelets Theorem 1 Daubechies , p. The representations given in 2. We will discuss this more in Section 4. This property has important consequences for data compression.
Similar remarks apply to many statistical estimation problems. Taking the wavelet transform of an object is often advantageous as it results in a sparse representation of that object. The wvmoments function in WaveThresh calculates the moments of wavelets numerically. Daubechies constructed such wavelets by an ingenious solution of the dilation equation 2. Each member of each family is indexed by a number N , which refers to the number of vanishing moments although in some references N denotes the length of hn , which is twice the number of vanishing moments.
WaveThresh contains two families of Daubechies wavelets which, in the package at least, are called the leastasymmetric and extremal-phase wavelets respectively. The least-asymmetric wavelets are sometimes known as symmlets. Real-valued compact orthonormal wavelets cannot be symmetric or antisymmetric unless it is the Haar wavelet, see Daubechies , Theorem 8.
However, both compactly supported complexvalued and biorthogonal wavelets can be symmetric, see Sections 2. Wavelets in these families possess members with higher numbers of vanishing moments, but they are not stored within WaveThresh. As another example, we choose the wavelet with filter. It is easy to draw pictures of wavelets within WaveThresh.
The following draw. The generic function, draw , can be used directly on objects produced by other functions such as wd so as to produce a picture of the wavelet that resulted in a particular wavelet decomposition. Lawton further noted that, apart from the Haar wavelet, the only compactly supported wavelets which are symmetric are CVDWs with an odd number of vanishing moments other, asymmetric complex-valued wavelets are possible for higher N.
For example, the plot or, more precisely, the plot. We show how complex-valued wavelets can be used for denoising purposes, including some WaveThresh examples, in Section 3.
We refer the reader to 2 Wavelets The real part is drawn as a solid black line and the imaginary part as a dotted line. Chui , 3. In a sense, it is the Fourier equivalent of the Haar wavelet, and hence certain paedagogical statements about wavelets could be made equally about Shannon as about Haar.
However, since Haar is easier to convey in the time domain and possibly because it is older , it is usually Haar that is used. However, the Shannon wavelet is occasionally used in statistics in a theoretical setting. Meyer wavelets are used extensively in the analysis of statistical inverse problems. For an important recent work that combines fast Fourier and wavelet transforms, and a comprehensive overview of the area see Johnstone et al.
We discuss statistical inverse problems further in Section 4. On taking Fourier transforms since convolutions turn into products, 2. We could use 2.
Hence using 2. Hence since the cardinal B-splines are compactly supported, the cardinal spline B-wavelet is also compactly supported. However, these spline wavelets are not orthogonal functions, which makes them less attractive for some applications such as nonparametric regression.
The connections to wavelets and development of compactly supported wavelets are described by Cohen et al. Instead of using the scaling function dilation equation, we use the analogous Equation 2. For example, we can achieve the same result as 2.
This latter operation is known as dyadic decimation or downsampling by an integer factor of 2. Hence the operations described by Formulae 2. Remember dj and cj here are vectors of length 2j for periodized wavelet transforms. However, the notation is mathematically liberating and of great use when developing more complex algorithms such as the non-decimated wavelet transform, the wavelet packet transform, or combinations of these. Wavelet packets we describe in Section 2.
We outline two approaches. A deterministic approach is described in Daubechies , Chapter 5, Note Suppose the information about our function comes to us as samples, i.
To do this, rearrange 2. This can be checked by drawing a picture of this scaling function. For example, using the WaveThresh function: So, one might claim that one only needs to initialize the wavelet transform using the original function samples.
However, it can be seen that the above results in a massive approximation, which is prone to error. A stochastic approach. A somewhat more familiar approach can be adopted in statistical situations. For example, in density estimation, one might be interested in collecting independent observations, X1 ,. Then an unbiased estimator of is given by the equivalent sample quantity, i. Further details on this algorithm and its use in density estimation can be found in Herrick et al.
One can solve the equations in 2. Earlier, in Section 2. This implies that the inverse transform to the Haar wavelet transform is just W T. For example, the transpose of 2. Let us continue Example 2. The inverse transform is performed using the wr function as follows: For more general Daubechies wavelets, one has to treat the issue of boundaries more carefully. It is, approximately, 0. WaveThresh implements two types of boundary extension for some routines: The function wd possesses both options, but many other functions just have the periodic extension.
Periodic extension is sometimes also known as being equivalent to using periodized wavelets for the discrete case. The formula works for both ends of the function, i. In the above we have talked about adapting the data so as to handle boundaries. The other possibility is to leave the data alone and to modify the wavelets themselves.
The other possibility is to modify the wavelet so that it always remains on 2. Recall that the dyadic decimation step, D0 , essentially picked every even element from a vector. The answer is that it could be. Nason and Silverman further point out that, at each level, one could choose either to use D0 or D1 , and a particular orthogonal basis could be labelled using the zeroes or ones implicit in the choice of particular D0 or D1 at each stage.
Such a transform is termed the -decimated wavelet transform. Inversion can be handled in a similar way. An important point is, therefore, that the standard DWT is dependent on choice of origin. For some statistical purposes, e. Indeed, typically we would prefer our method to be invariant to the origin choice, i.
The standard decimated DWT is orthogonal and transforms information from one basis to another. The Parseval relation shows that the total energy is conserved after transformation. However, there are several applications where it might be useful to retain and make use of extra information.
For example, in Examples 2. Now suppose we follow the recipe for the -decimated transform given in the previous section. If the original sequence had been rotated cyclically by one position, then we would obtain the sequence y8 , y1 ,. However, one can immediately see that keeping extra information destroys the orthogonal structure and the new transformation is redundant.
More precisely. The idea of the non-decimated wavelet transform NDWT is to retain both the odd and even decimations at each scale and continue to do the same at each subsequent scale.
So, start with the input vector y1 ,. See, for example, Holschneider et al. Also, Pesquet et al. One of the earliest statistical mentions of the NDWT is known as the maximal-overlap wavelet transform developed by Percival and Guttorp ; Percival We discuss this further in Section 5. We describe TI-denoising in more detail in Section 3. Nason and Silverman highlight the possibility for using non-decimated wavelets for determining the spectrum of a nonstationary or evolving time series.
This latter idea was put on a sound theoretical footing by Nason et al. This turns out not to be a good name because the NDWT is actually useful for studying nonstationary time series, see Section 5. However, some older works occasionally refer to the older name.
Let us again return to our simple example of y1 , y2 ,. One can continue in either fashion for coarser scales, and this results in a time-ordered NDWT or a packet-ordered one. Let us return again to our simple example. Let y1 ,. Positions 1, 3, 5, 7 are actually odd, but in the C programming language—which much of the low level of WaveThresh is written in—the positions are actually 0, 2, 4, 6.
C arrays start at 0 and not 1. Now let us apply the packet-ordered transform. This is carried out using the wst function: This extraction can be carried out using the getpacket function.
What about packets at coarser levels? For example, to obtain the 10 packet type: Hence, it should be possible to easily convert one type of object into another. This is indeed the case. In WaveThresh, the conversion between one object and another is carried out using the convert function. Used on a wst class object it produces the wd class object and vice versa.
Thus, to check: Let us end this series of examples with a more substantial one. The length of x is A plot of x, y is shown in Figure 2. Reproduced with permission from Nason and Silverman The chirp 0 Translate Standard transform Daub cmpct on ext. This is shown in Figure 2. In comparing Figure 2. The chirp signal is an example of a 65 9 8 7 6 5 4 3 2 1 0 Resolution Level 2. However, the NDWT is useful in the modelling and analysis of stochastic time series as described further in Chapter 5.
Finally, we also compute and plot the packet-ordered NDWT. This is achieved with the following commands: The bottom curve in Figure 2. The packets are separated by a short vertical dotted line. Daub cmpct on ext.
Section 2. Wavelet packets can also be extended to produce a non-decimated version, which we describe in Section 2. The next chapter explains how the NDWT can be a useful tool for nonparametric regression problems. Section 3. Chapter 5 describes how non-decimated wavelets can be used for the modelling and analysis of time series.
The number of mother wavelets is often denoted by L, and for simplicity of 2. In this section we base our exposition on, and borrow notation from, Downie and Silverman , which draws on work on multiple wavelets by Geronimo et al.
An orthonormal multiple wavelet basis admits the following representation, which is a multiple version of 2. The basis functions are orthonormal, i. Again, the idea is similar to before: The inverse formula is similar to the single wavelet case.
The rationale for multiple wavelet bases as given by Strang and Strela is that i multiple wavelets can be symmetric, ii they can possess short support, iii they can have higher accuracy, and iv can be orthogonal.
Strang and Strela recall Daubechies to remind us that no single wavelet can possess these four properties simultaneously. In most statistical work, the multiple wavelet transform has been proposed for denoising of univariate signals. However, there is immediately a problem 68 2 Wavelets with this.
Hence, a way has to be found to transform a univariate input sequence into a sequence of 2D vectors. More on these issues will be discussed in our section on multiple wavelet denoising in Section 3. Let us continue our previous example and compute the multiple wavelet transform of the chirp signal introduced in Example 2. The multiple wavelet code within WaveThresh was introduced by Downie The main functions are: The multiple wavelet transform of the chirp signal can be obtained by the following commands: One might reasonably ask the question: However, it is not the only possible basis.
Other bases for such function spaces are orthogonal polynomials and the Fourier basis. Indeed, there are many such bases, and it is possible to organize some of them into collections called basis libraries.
One such library is the wavelet packet library, which we will describe below and is described in detail by Wickerhauser , see also Coifman and Wickerhauser and Hess—Nielsen and Wickerhauser Here j and k are the scale and translation numbers respectively and n is a new kind of parameter called the number of oscillations. To form an orthonormal basis they cite the following proposition.
This operation is depicted by Figure 2. Illustration of wavelet packet transform applied to eight data points bottom to top. Reproduced with permission from Nason and Sapatinas In Section 2. Hence, the non-decimated wavelets are also a basis library and usage usually depends on selecting a basis element or averaging over the results of many.
For wavelet packets, selection is the predominant mode of operation. Basis averaging could be considered but has received little attention in the 2. Four wavelet packets derived from Daubechies least-asymmetric mother wavelet with ten vanishing moments.
These four wavelet packets are actually orthogonal and drawn by the drawwp. The vertical scale is exaggerated by ten times. So, for statistical purposes how does selection work? In principle, it is simple for nonparametric regression. One selects a particular wavelet packet basis, obtains a representation of the noisy data with respect to that, thresholds reduce noise, see Chapter 3 , and then inverts the packet transform with respect to that basis.
This task can be carried out rapidly using fast algorithms. Hence, an interesting question arises: Again, not much attention has been paid to this problem. For an example of basis selection followed by denoising see Ghugre et al. The reason for this is that many basis selection techniques are based on the Coifman and Wickerhauser bestbasis algorithm, which is a method that was originally designed to work on deterministic functions.
Of course, if the denoising is not good, then the basis selection might not work anyhow. We say a little more on denoising with wavelet packets in Section 3. A possible motivation for the best-basis method is signal compression.
The Shannon entropy is suggested as a measure of sparsity. For example, the WaveThresh function Shannon. Both these vectors have unit norm. These computations suggest that the Shannon entropy is minimized by sparse vectors. Here is a proof for a very simple case. The Shannon entropy is more usually computed on probabilities.
For the negative Shannon entropy it is a maximum. To summarize, the Shannon entropy can be used to measure the sparsity of a vector, and the Coifman—Wickerhauser algorithm searches for the basis that minimizes the overall negative Shannon entropy actually Coifman and Wickerhauser is more general than this and admits more general cost functions.
Then this operation is applied recursively if required. It takes a dyadic-length vector to transform and requires the filter. For example, suppose we wished to compute the wavelet packet transform of a vector of iid Gaussian random variables.
The time series at the bottom of the plot, scale eight, depicts the original data, z. The default plot arguments in plot. We shall replace the fourth packet packet 3 at scale six by a packet consisting of all zeroes and a single value of We can investigate the current values of packet 6, 3 74 2 Wavelets index packet 3 is the fourth at scale six, the others are indexed 0, 1, 2 by again using the generic getpacket function: Let us create a new wavelet packet object, zwp2, which is identical to zwp in all respects except it contains the new sparse packet: We can then examine the basis selected merely by typing the name of the node vector: The representation can be inverted with respect to the new selected basis contained within zwp2.
If the inversion is plotted, one sees a very large spike near the beginning of the series. More information on the usage of wavelet packets in statistical problems in regression and time series can be found in Sections 3. One generalization of the wavelet transform, the non-decimated transform, pointed out that the odd dyadic decimation operator, D1 , was perfectly valid and both could be used at each step of the wavelet transform.
Although this may sound complicated, the result is that we obtain wavelet packets that are nondecimated. Just as non-decimated wavelets are useful for time series analysis, so are non-decimated wavelet packets.
See Section 5. In its simplest form one applies both the D0 H and D0 G operators from 2. Then both operators are again applied but to both the columns of H and G. The basic algorithmic step for the 2D separable transform is depicted in Figure 2. For a more detailed description see Mallat The 2D transform of an image is shown in Figure 2. Schematic diagram of the central step of the 2D discrete wavelet transform. After Mallat b.
Both wd and imwd can perform the timeordered non-decimated transform, and wst and wst2D can perform 2D packetordered non-decimated transforms. Then we demonstrated that the idea could be generalized to wavelets that are smoother than Haar wavelets and, for some applications, more useful. In many mathematical presentations, e. Daubechies whose development we will follow here , the starting point is the continuous wavelet transform, CWT. The plan here stops at the fourth iteration level 0 whereas the one in Figure 2.
The function f can be recovered from its CWT, F a, b. Antoniadis and Gijbels refer to this as the continuous discrete wavelet transform CDWT and mention a fast computational algorithm by Abry which 80 2 Wavelets is equivalent to the non-decimated wavelet transform from Section 2.
It is a book I will heartily recommend to statisticians looking for an entry point into the field of wavelets. Morris, Biometrics, June , The author asks two pertinent questions: Why use wavelets?
And Why use wavelets in statistics? For which he proceeds to provide answers; together with illustrative examples of the main uses of wavelets. The text is interspersed with snippets of R code to illustrate the techniques presented and prove s the basis of an excellent text for private study.
Series approximation methods in statistics. Differential-geometrical methods in statistics. Series Approximation Methods in Statistics. Methods and Models in Statistics. Morphometrics with R Use R.
Multiscale wavelet methods for partial differential equations. Applied functional analysis, numerical and wavelet methods. Recommend Documents. Series Editors: Bayesian Computation with R B