Sei sulla pagina 1di 13

GEOPHYSICS, VOL. 80, NO. 6 (NOVEMBER-DECEMBER 2015); P. WD45–WD57, 12 FIGS., 2 TABLES.

10.1190/GEO2015-0047.1
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Seismic data denoising through multiscale and


sparsity-promoting dictionary learning

Lingchen Zhu1, Entao Liu1, and James H. McClellan1

sources of noise in marine and land acquisition. In a seismic survey,


ABSTRACT which is a primary tool of exploration geophysics, noise caused by
ground roll, drill rig, and seismic vessels, to name a few sources,
Seismic data comprise many traces that provide a spatio- can degenerate the subsurface imaging quality of migration. To
temporal sampling of the reflected wavefield. However, such maximize the contribution of seismic imaging in hydrocarbon res-
information may suffer from ambient and random noise during ervoir locating, characterizing, and recovering, one of the key com-
acquisition, which could possibly limit the use of seismic data ponents of seismic data processing is noise suppression or removal.
in reservoir locating. Traditionally, fixed transforms are used A more challenging scenario is microseismic monitoring, where
to separate the noise from the data by exploiting their different very low amplitude data are typically acquired with strong coherent
characteristics in a transform domain. However, their perfor- and random noise. The low signal-to-noise ratio (S/N) here invalid-
mance may not be satisfactory due to their lack of adaptability ates some traditional detection and location method, such as algo-
to changing data structures. We have developed a novel seis- rithms based on arrival time picking.
mic data denoising method based on a parametric dictionary Traditionally, seismic data noise is segregated from the recorded data
learning scheme. Unlike previous dictionary learning methods in various domains, in which the signal and noise have different char-
that had to learn unconstrained atoms, our method exploits the acteristics (Chen and Ma, 2014). Typical time-domain methods include
underlying sparse structure of the learned atoms over a base stacking (Liu et al., 2009), polynomial fitting (Liu et al., 2011), and
dictionary and significantly reduces the dictionary elements prediction (Abma and Claerbout, 1995). Frequency-domain methods,
that need to be learned. By combining the advantages of multi- such as prediction (Canales, 1984) and filtering (Gulunay, 1986), ex-
scale representations with the power of dictionary learning, ploit the predictability of the seismic signal in the spatiotemporal
more degrees of freedom could be provided to the sparse rep- domain.
resentation, and therefore the characteristics of seismic data Although the aforementioned methods are capable of attenuating
could be efficiently captured in sparse coefficients for denois- the noise to a certain degree, it is also desirable for the denoising out-
ing. The dictionary learning and denoising were processed puts to have minimal signal distortion, especially in low S/N situations
from all overlapping patches of the given noisy
(Harris and White, 1997). In the past decade, the study of sparse signal
seismic data, which maintained low complexity. Numerical
representation has evolved rapidly to provide methods that capture the
experiments on synthetic seismic data indicated that our
useful characteristics of the signal in various domains, as well as pro-
scheme achieved the best denoising performance in terms of
viding a new perspective for denoising. The stationary wavelet trans-
peak signal-to-noise ratio and minimizes visual distortion.
form was used in Chanerley and Alexander (2002) as an alternative to
band-pass filtering, and it removed corrupting signals for a better es-
timate of true ground motion. A new wavelet frame based on the char-
INTRODUCTION acteristics of seismic data was proposed by Zhang and Ulrych (2003)
to suppress noise. The 2D isotropic wavelets can represent point-like
Seismic data quality is vital to geophysical applications. The features with sparse coefficients, although the lack of directional se-
ideal noiseless data acquisition environment may only exist in a lectivity limits their ability to sparsely represent edges and curved fea-
synthetic model. However, real seismic data suffer from different tures in 2D signals, which are evidently contained in 2D seismic data.

Manuscript received by the Editor 31 January 2015; revised manuscript received 9 May 2015; published online 28 August 2015; corrected version published
online 19 October 2015.
1
Georgia Institute of Technology, Center for Energy and Geo Processing and KFUPM, Atlanta, Georgia, USA. E-mail: lczhu@gatech.edu; entao.liu@
ece.gatech.edu; jim.mcclellan@gatech.edu.
© 2015 Society of Exploration Geophysicists. All rights reserved.

WD45
WD46 Zhu et al.

Toward this end, the hyperbolic Radon transform has been adopted based on curvelets and contourlets. Experimental results indicate that
to represent seismic reflections with sparse coefficients (Ji, 2006), our scheme achieves the best performance in terms of peak signal-to-
and it has been used for deblending seismic data in blended source noise ratio (PSNR) and produces the least visual distortion.
acquisition (Ibrahim and Sacchi, 2014a, 2014b). Furthermore, a fam- The rest of this paper is organized as follows: In the second sec-
ily of directional multiscale transforms, such as the curvelet (Candès tion, we introduce the motivation of the dictionary model with dou-
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

and Donoho, 2004; Candès and Demanet, 2005; Candès et al., 2006) ble sparsity and the details of the sparse K-SVD algorithm. The
and contourlet (Do and Vetterli, 2003, 2005) transforms are intro- following section describes patch-based seismic data denoising us-
duced to exploit directional details and geometric smoothness along ing the learned dictionary. Learning separate multiscale dictionaries
2D curves and 3D surfaces. Their basis functions are localized in dif- and performing denoising in different subbands of a multiscale
ferent scales, positions, and orientations. Because seismic data are transform are also presented in this section. Numerical experiments
known to have smooth, anisotropic contours particularly in reflected are given in the next section, followed by a discussion and the con-
wavefronts, these multiscale geometric analysis methods use substan- clusion in the final two sections.
tially fewer coefficients to represent seismic data than wavelets for a
given accuracy, and they have attracted a great deal of attention for
seismic data denoising (Hennenfent and Herrmann, 2006; Neelamani SIGNAL REPRESENTATION WITH DICTIONARY
et al., 2008; Hennenfent et al., 2010). LEARNING
The analytic transform mentioned above is a model-driven proc-
ess based on a formulated mathematical model of the data, leading A dictionary model with double sparsity
to an implicit dictionary described by a structured algorithm. Alter-
Given a training set Y ¼ ½y1 ; y2 ; : : : ; yR  ∈ RN×R , in which each
natively, a data-driven process learns its dictionary as an unstruc-
element is a column vector of length N, the goal of the dictionary
tured and explicit matrix from a training set, in such a way that each
learning process is to find a matrix D ∈ RN×L that is able to represent
data signal can be represented as a linear combination of only a few
the training set Y with a set of sparse coefficients X ¼ ½x1 ;
columns (atoms) of the matrix. Typical dictionary learning algo-
x2 ; : : : ; xR  ∈ RL×R by solving the following optimization problem:
rithms range from principal component analysis (PCA) (Jolliffe,
2002), generalized PCA (Vidal et al., 2005), to the method of op-
fD; ^ ¼ arg minkY − DXk2F ;
^ Xg
timal directions (Engan et al., 1999) and the K-singular value de- D;X
composition (K-SVD) (Aharon et al., 2006). Dictionary learning (
methods avoid choosing a fixed dictionary in which some atoms ∀i∶ kxi k0 ≤ t;
subject to (1)
might be of limited use, and therefore offer refined dictionaries that ∀j∶ kdj k2 ¼ 1;
adapt the structure of the data. This approach yields better perfor-
mance in many image-based applications, such as image denoising where each column vector xi of length L is the sparse representation
(Elad and Aharon, 2006; Mairal et al., 2008), superresolution (Yang of yi over D, k · k0 is the l0 -norm that counts the nonzero entries of a
et al., 2008), etc. The dictionary learning method has recently been vector. The atom normalization constraint kdj k2 ¼ 1 is added to in-
successfully applied to seismic data denoising (Tang et al., 2012; crease the robustness of the dictionary; however, it does not essen-
Beckouche and Ma, 2014), as well as on seismic data deblending tially change the problem. Though the l0 -norm optimization
(Zhou et al., 2014). However, these applications come with the price problem 1 is generally NP-hard and cannot be tackled directly, it
of a high overhead including explicit storage and multiplication of can be relaxed into the following tractable problem by replacing
unstructured dictionary matrices. the l0 -norm with the l1 -norm
In this paper, we propose a novel denoising scheme for seismic
data based on its sparse representation over a learned dictionary. fD; ^ ¼ argminkY − DXk2F ;
^ Xg
The dictionary is trained by a variant of the K-SVD algorithm, named D;X
sparse K-SVD (Rubinstein et al., 2010), over a set of seismic data
(
∀i∶ kxi k1 ≤ t;
patches. The motivation of sparse K-SVD is that the learned diction- subject to (2)
ary atoms from K-SVD may still share some underlying sparse pat- ∀j∶ kdj k2 ¼ 1.
tern over a generic dictionary. Therefore, Rubinstein
et al. (2010) suggest constructing the effective learned dictionary Such a convex relaxation yields an exact solution to the l0 -norm
D ¼ ΦA as a multiplication of a base dictionary Φ corresponding optimization problem 1 under certain conditions specified in Donoho
to a fixed analytic transform (e.g., discrete cosine transform [DCT] in (2006), and thereafter, we use the l1 -norm to measure the sparsity
Rubinstein et al., 2010) by a sparse matrix A actually to be learned. level.
By relieving the need to learn all elements in D, such a parametric For those applications using natural images, the dimensions N of
model of the learned dictionary strikes a good balance among com- the training signal fyi g and L of the sparse coefficients fxi g are
plexity, adaptivity, and performance. The algorithm we propose is large. Any attempt to learn a full-size dictionary requires a huge
summarized as follows: We start from a base dictionary Φ and learn number R of training signals as well, yielding an intractable com-
a sparse matrix A from the noisy seismic data patch set. For each putational complexity of solving the l1 -norm optimization prob-
overlapping patch of seismic data, a sparse coding problem with re- lem 2. This difficulty also applies to seismic applications, where
spect to the effective learned dictionary D ¼ ΦA is solved to perform the recorded data from a single shot may include hundreds of traces
denoising. Then, all patched results are reconstructed, tiled, and aver- and thousands of samples per trace. To learn the dictionary D with
aged to assemble the denoised seismic data. We also compare our moderate computational effort, patch-based (with dimensions in the
scheme with those using sparse approximations for seismic denoising order of 102 ) processing is commonly used. This means that the
Data denoising by dictionary learning WD47

training set must be composed of small patches from the input data, fA; ^ ¼ argminkY − ΦAXk2F ;
^ Xg
and the patches should be overlapped to avoid blocking artifacts. A;X
8
< ∀i∶ kxi k1 ≤ t;
So, what could be an appropriate dictionary to represent seismic
>
data? Before attempting to answer this question, we must first
subject to (4)
understand seismic data. A seismic data set is a collection of data >
:
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

traces, each one of which is a continuous wave recorded by a ∀j∶ kaj k1 ≤ p; kΦaj k2 ¼ 1.
seismic-recorder-like geophone from a seismic source, either a
Because the actual learned dictionary A and the representation coeffi-
man-made seismic event or a naturally occurring earthquake. Many
cients X are sparse matrices, such a dictionary learning model has a
traces together provide a spatiotemporal sampling of the reflected
property called double sparsity.
wavefield, which contains a straight line and hyperbolas that cor-
respond to direct ray and reflections with normal moveouts, respec-
tively. Figure 1b demonstrates an example of a learned dictionary of Sparse K-SVD algorithm
100 atoms by the K-SVD algorithm (Aharon et al., 2006) on a set of Starting from an initial matrix A0 , sparse K-SVD (Rubinstein et al.,
16 × 16 patches from a synthetic seismic data set shown in Fig- 2010) is a dictionary learning algorithm specifically designed to learn
ure 1a. Although there are no constraints posed by the algorithm, the sparse dictionary A by solving the optimization problem 4 for a
we can notice the strong resemblance among atoms in the resulting fixed number of iterations. Here, we briefly review the algorithm and
dictionary, which suggests that the sparse representation can be ex- show how it iteratively improves the dictionary by sparse coding,
tended to each atom itself over some predefined base dictionary. atom removal, and atom updating operations.
Therefore, we can express the dictionary as The sparse K-SVD algorithm is presented in Algorithm 1 with
details. In each iteration, the first step for sparse K-SVD is sparse
coding the training examples in Y, given the current dictionary
D ¼ ΦA; (3)
D ¼ ΦA. Because the sparse coding is coupled with the dictionary
model, an effective dictionary learning process requires each atom
where Φ ∈ RN×L is the base dictionary generally chosen to have a dj ¼ Φaj to be updated individually. Denoting a set I as the index
quick implicit implementation and A ¼ ½a1 ; a2 ; : : : ; aL  ∈ RL×L is group of the signals in Y, whose representations use dj by locating
a sparse matrix to be learned, in which each column satisfies kai k1 ≤ nonzero column elements in the jth row of X, the objective function
p for some sparsity level p. There is no doubt that the selection of Φ to update aj is equivalent to
will affect the success of the dictionary model; therefore, transforms  X  2
 
that already use some prior knowledge about the data are preferred. kYI − ΦAXI k2F 
¼  YI − Φai Xi;I − Φaj Xj;I 
Meanwhile, A can be regarded as an extension to the existing analytic  ;
i≠j F
transform, adding a new layer of adaptivity on the base dictionary Φ.
Compared with the regular dictionary D, which is fully explicit, the ¼ kEj − Φaj Xj;I k2F ; (5)
dictionary model 3 is significantly more efficient because only a P
few elements in A need to be learned, stored, and transmitted. More where Ej ¼ YI − i≠j Φai Xi;I is the residual matrix without the
importantly, due to its fewer degrees of freedom, such a dictionary contribution of Φaj . Therefore, the resulting problem to update
model reduces the chance of overfitting the noise in the training set aj and Xj;I is reduced to the following rank-1 approximation:
and produces robust results even with limited training examples. These
properties are particularly advantageous for denoising. As a result, the
faj ; XTj;I g ¼ argminkEj − ΦaxT k2F ;
a;x
dictionary learning process with the sparsity-promoting dictionary
model shown in equation 3 becomes subject to kak1 ≤ p: (6)

a) 0.6
b) Figure 1. (a) Synthetic seismic data set and
(b) learned dictionary atoms trained from 16 ×
50 16 patches of (a) using K-SVD algorithm.
0.4
100
Time sample index

0.2
150

200 0

250
0.2

300
0.4

350

0.6
50 100 150 200
Trace index
WD48 Zhu et al.

Due to its nonconvexity, solving this problem directly might be ETj Φaj
XTj;I ¼ : (8)
computationally intensive. A roundabout can greatly reduce the kΦaj k2
complexity by applying a single iteration of alternating optimiza-
tion (Bezdek and Hathaway, 2002). Assuming x ¼ XTj;I ∕kXTj;I k2 , aj
Such a process guarantees a reduction of the objective function.
is optimized by solving the following sparse coding problem:
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

aj ¼ argminkEj x − Φak2F ; subject to kak1 ≤ p; (7) Atom replacing techniques


a
Usually, a dictionary learning algorithm is prone to local minima.
according to lemma 1 in Rubinstein et al. (2010). Then, Xj;I is In the sparse K-SVD algorithm, all the dictionary atoms are pre-
updated by sumed to be of equal importance, some atoms can be replaced ac-
Algorithm 1: Sparse K-SVD algorithm.
Data denoising by dictionary learning WD49

cording to certain criteria, which we will describe below. Such pro- DENOISING WITH LEARNED DICTIONARY
cedures can effectively avoid local minima or overfitting and there-
fore improve the adaptability of the learned dictionary. Problem formulation
The representation ability of the learned dictionary will be re- Let s ∈ RUV denote a vectorized noise-free seismic data set that
duced if some atoms happen to be very similar. A useful measure- collects V traces, and each trace has U time samples. The n ∈ RUV
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

ment of the similarity among the atoms in a dictionary matrix D is is an uncoupled random noise matrix, whose elements are indepen-
called mutual coherence, which is defined by dent and identically distributed with N ð0; σ 2 Þ. Then, the received
noisy seismic data set is denoted as
dTi dj
μðDÞ ¼ max : (9) y ¼ s þ n; (12)
i≠j kdi k2 kdj k2

We will examine μðDÞ ¼ μðΦAÞ after all L columns of matrix A have due to additive noise contamination.
As we have pointed out above, dictionary learning algorithms tend
been updated in an iteration. If indeed a pair of ðai ; aj Þ, which makes
to work with small data patches rather than full-size data to avoid
μðDÞ exceed some threshold (say 0.99), is found, one element should
prohibitive computational complexity caused by high dimensionality
be replaced with the representation of yk over Φ that satisfies
and large training sets. Therefore, we break up the seismic data y into
P overlapping patches fyi gPi¼1 (in vectorized form) with the same size
a ¼ argminkyk − Φak22 ; subject to kak1 ≤ p; (10) and perform denoising on each patch independently. For each noisy
a
patch yi ∈ RN , the denoising problem can be stated as estimating a
where k refers to the index of the signal in Y that exhibits the largest clean patch si ∈ RN via the l1 -norm regularization form of the basis
approximation error after taking account of the current sparse matrix A; pursuit denoising (BPDN) problem (Chen et al., 1998):
i.e., (
f^xi ¼ argminkΦAx − yi k22 þ μkxk1 ;
x (13)
k ¼ arg maxkyk − ΦAxk k22 : (11) s^ i ¼ ΦA^xi ;
k

Because the number of training patches y is always much larger than given the learned dictionary ΦA. Generalizing optimization prob-
the number of atoms, such a replacement prevents similar atoms from lem 13 to consider all patch indices i ¼ 1; 2; : : : ; P, the global
appearing again. denoising problem for the entire noisy seismic data set y can be for-
Besides the replacement of similar atoms, we also identify and mulated as
replace those atoms that are infrequently used. As we know, the num- X X
ber of nonzero elements in the ith row of X indicates that how many f^xi ;^sg ¼ argmin kΦAxi −Pi sk22 þ μi kxi k1 þλks−yk22 ;
xi ;s i i
training signals in Y use di in their representations. If an atom is used
by less than a threshold number (say four) of training signals, then it (14)
is said “less representative” and can be replaced with another one that
represents more training signals. This atom replacement within A can where Pi ∈ f0; 1gN×UV is a projection matrix that selects si from s as
be done by solving optimization problem 10 as well. si ¼ Pi s. Its numerical solution is provided in the next subsection.

a) 0.6
b) Figure 2. (a) Synthetic seismic data set and
(b) wavelet decomposed subbands of (a).
50
0.4

100
Time sample index

0.2
150

200 0

250
0.2

300
0.4

350

0.6
50 100 150 200
Trace index
WD50 Zhu et al.

Numerical solution Multiscale dictionary learning


When a suitable analytic transform Φ is chosen, and the sparse If the base dictionary Φ corresponds to some multiscale syn-
matrix A is learned after running the sparse K-SVD algorithm de- thesis operator, such as the inverse wavelet or curvelet transform,
scribed in Algorithm 1, there are still two unknown parameters in optimization problem 4 can be modified into the following equiv-
the global denoising problem 14, the global denoised seismic data alent form:
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

set s^ , and the sparse representation coefficients xi of its ith patch.


Similar to the decoupling approach in the sparse K-SVD algorithm, fA; ^ ¼argminkΦ† Y−AXk2F ;
^ Xg
A;X
we initialize one parameter s ¼ y and estimate xi via a BPDN prob-  ∀i∶ kx k ≤ t;
lem reduced from the global denoising problem 14: i 1
subject to
∀j∶ kaj k1 ≤ p;kΦaj k2 ¼ 1.
x^ i ¼ argminkΦAx − Pi sk22 þ μkxk1 ; (15) (18)
x
Assuming that Φ corresponds to an orthogonal transform, then
Φ† denotes the analysis operator of this transform. Optimization
whose first term can be regarded as a constraint kΦAx − Pi sk22 ≤ problem 18 suggests that the sparse matrix A can be learned not
N 2 σ 2 that estimates the representation error. Such a sparse coding only in the raw data domain, but also in a transformed domain,
process works in a sliding window manner, until all patches in which the seismic data are decomposed into multiscale subbands.
fPi sgPi¼1 have been coded. Because multiscale transforms can capture the directional details of
After all x^ i have been obtained, we turn to update s by solving seismic wavefronts in different subbands, coefficients tend to be
another reduced problem of global denoising problem 14: highly correlated across directions and scales, which is demon-
strated in Figure 2. It is essential to learn this structure similarity
X through some adaptive dictionaries. Therefore, in multiscale dic-
s^ ¼ argmin kΦA^xi − Pi sk22 þ λks − yk22 ; (16) tionary learning process, each subband can be treated separately.
s i
Separate sparse subdictionaries are trained for each subband first
and are then applied to denoise the subband coefficients using the
whose closed-form solution is patch-based approach. As we can see from Figure 2b, deeper de-
composition scales have a smaller subband size. This enables the
 X −1  X  patch-based approach to have a global perspective because even a
T T
s^ ¼ λIþ P
i i
Pi λyþ P i ΦA^
x i : (17) small patch in the deeper scale represents a large area in the data
i
domain. Algorithm 2 presents the complete process of multiscale
P dictionary learning.
Because the matrix λI þ i PTi Pi is diagonal, this solution can be Again, let y denote the vectorized seismic data set contaminated
interpreted as a weighted sum of the tiling result assembled by all by noise; then its multiscale transform result is a collection of co-
reconstructed patches and the original noisy data, followed by a efficient subbands zðbÞ ¼ ðΦ† yÞðbÞ , where b is the subband index.
pixel-by-pixel weighted averaging process. For orthogonal wavelet transform with S decomposition levels,

Algorithm 2. Multiscale sparse K-SVD algorithm.


Data denoising by dictionary learning WD51

 X ðbÞ T ðbÞ −1


b ¼ 1; : : : ; B ¼ 3S þ 1. After breaking up each subband into
patches and grouping them together in the columns of training sets u^ ðbÞ ¼ λI þ P Pi
i i
ZðbÞ , the multiscale dictionary learning problem can be expressed as  X ðbÞ T 
ðbÞ
× λzðbÞ þ Pi AðbÞ x^ i : (21)
i
^ ðbÞ ^ ðbÞ
∀b∶fA ; X g ¼ argminkZðbÞ − AðbÞ XðbÞ k2F ;
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

AðbÞ ;XðbÞ Finally, the denoised seismic data s^ can be obtained by applying the
( ðbÞ
∀i∶ kxi k1 ≤ t; inverse multiscale transform after all subbands across different
subject to ðbÞ ðbÞ
(19) scales have been denoised:
∀j∶ kaj k1 ≤ p; kaj k2 ¼ 1. [ 
s^ ¼ Φ u^ ðbÞ : (22)
b
Similar to the global denoising problem 14, the sparse coding of zðbÞ
over the subdictionary AðbÞ, as well as the denoising process of zðbÞ NUMERICAL EXPERIMENTS ON SYNTHETIC
can be formulated as follows:
DATA

ðbÞ
X ðbÞ ðbÞ 2 In this section, we use the dictionary learning method to attenuate
∀b∶f^xi ; u^ ðbÞ g ¼ argmin kAðbÞ xi − Pi uðbÞ k2 the noise in seismic data, and we compare the performance of our
ðbÞ
xi ;uðbÞ i proposed method with other denoising methods using fixed con-
X ðbÞ tourlet and curvelet transforms. The seismic data sets used in the
þ μi kxi k1 þ λkuðbÞ − zðbÞ k22 ; (20)
experiments are synthesized 2D prestack shot records that are avail-
i
able for download at public seismic data dictionaries, such as those
by SEG and the Madagascar Development Team (2014). Assuming
where u^ ðbÞ is the denoised version of zðbÞ and its closed-form sol- that the seismic noise is caused by a diversity of different, spatially
ution is distributed, mostly uncorrelated, but low-frequency sources, it can

Figure 3. (a) Contourlet atoms on different scales


and directions in spatial domain and (b) curvelet
atoms on different scales and in different direc-
tions in spatial domain.

a) 0.6
b) Figure 4. (a) The original seismic data set for
reference and (b) the noisy seismic data set for
0.6
50 50 denoising (PSNR = 22.62 dB).
0.4
0.4
100 100

0.2
Time sample index

Time sample index

0.2
150 150

0 200 0
200

250 250 0.2


0.2

300 300 0.4


0.4

350 350 0.6

0.6
50 100 150 200 50 100 150 200
Trace index Trace index
WD52 Zhu et al.

be modeled as zero-mean white additive Gaussian noise with stan- the experiments to learn the sparse matrix A and thereafter the over-
dard deviation σ low-pass filtered at a stopband frequency (30 Hz in all dictionary D ¼ ΦA. One represents the single-scale DCT and
our experiments). For the sake of numerical stability and better per- another the multiscale discrete wavelet transform (DWT).
formance, a standard normalization step is applied to rescale the As one of the most commonly used quality metrics for the com-
data into the range of ð−1; 1Þ after subtracting the mean of each parison of denoising performance, PSNR is used in our experi-
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

trace in the data set. Two different base dictionaries Φ are used in ments, which is defined as

Figure 5. Denoised results based on BPDN with- a) c)


out dictionary learning method: (a) denoised result 0.5 0.5
by contourlet-based BPDN method (PSNR = 50
0.4
50
0.4
29.02dB), (b) the difference of (a) to the original
data, (c) denoised result by curvelet-based BPDN 100 0.3 100 0.3
method (PSNR = 29.58dB), and (d) the difference

Time Sample Index


Time sample index
0.2 0.2
of (c) to the original data. 150
150
0.1 0.1

200 0 200 0

0.1 0.1
250 250
0.2 0.2
300 300
0.3 0.3

0.4 350 0.4


350
0.5 0.5
50 100 150 200 50 100 150 200
Trace index Trace Index

b) 0.2 d) 0.2

50
50 0.15 50 0.15

0.1 100 0.1


100
Time sample index

Time sample index 0.05


150 0.05 150

0 200 0
200

0.05 0.05
250 250

0.1 0.1
300 300

0.15 0.15
350 350

0.2 0.2
50 100 150 200 50 100 150 200
Trace index Trace index

Figure 6. Base dictionary Φ: DCT matrix, (b) the learned matrix A by Sparse K-SVD algorithm, and (c) the overall learned dictionary
D ¼ ΦA.
Data denoising by dictionary learning WD53

pffiffiffiffiffiffiffiffi 
UV sMAX where Φ refers to the dictionary of the contourlet/curvelet synthesis
PSNR ≜ 20log10 ; (23) operator.
ks − s^k2
The public seismic data set we use in the following experiments is
provided by BP (Etgen and Regone, 1998) as part of Madagascar soft-
ware (Madagascar Development Team, 2014). It has 240 traces, and
where sMAX ¼ 1 is the maximum possible value of the seismic data
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

each trace contains 384 time samples. Figure 4a is the original noise-
after normalization.
free data, and Figure 4b is its (low-pass-filtered) noisy version with
σ ¼ 0.1 and PSNR ¼ 22.62 dB. The BPDN results based on the
Denoising with fixed transforms
contourlet and curvelet transforms are provided in Figure 5a and
This subsection investigates the prominent multiscale directional 5c with PSNR ¼ 29.02 and 29.58 dB, respectively, whereas the
transforms including the contourlet and curvelet transforms for seis- error panel figures that show the difference between the recon-
mic data denoising. The contourlet transform (Do and Vetterli, 2003, structed data and the original data are given in Figure 5b and 5d,
2005) can capture smooth contours in a seismic data set based on a respectively. It is obvious that most of the random noise is attenuated
Laplacian pyramid decomposition followed by directional filter banks after many small coefficients are suppressed by the BPDN algorithm.
applied on each band-pass subband. Its atom elements are depicted in However, many pseudo-Gibbs artifacts are produced along the
Figure 3a. Based on the frequency partition technique, the curvelet wavefronts, especially in the contourlet case, whose atom elements
transform is able to represent curved singularities more precisely with have a less clear directional feature than curvelets. These artifacts
needle-shaped atom elements, which are shown in Figure 3b. We use that do not exist in the original data set may be harmful for further
the fast discrete curvelet transform (Candès et al., 2006), which is the processing, such as migration and full-waveform tomography.
latest implementation, in our experiments. Because both transforms
are able to represent a seismic data set with sparse coefficients, Denoising with single-scale dictionary learning
the following BPDN formulation can be used for denoising:
In the following experiments, we compare the denoising perfor-
mance of the dictionary learning method based on the sparse K-
( SVD algorithm versus traditional denoising methods based on fixed
f^x ¼ argminkΦx − yk22 þ μkxk1 ;
x (24) transforms. Our dictionary learning method starts by dividing the
s^ ¼ Φ^x; noisy data set into overlapping patches size 16 × 16 each, and ran-

a) b) 0.2 Figure 7. Denoised result of dictionary learning


0.6 method using DCT matrix as the base dictionary
50 50 0.15 (PSNR = 32.00 dB) and (b) the difference of (a) to
0.4
the original data.
100 100 0.1
Time sample index

Time sample index

0.2 0.05
150 150

0 0
200 200

0.05
250 250
0.2
0.1
300 300
0.4
0.15
350 350
0.6 0.2
50 100 150 200 50 100 150 200
Trace index Trace index

a) b) Figure 8. Zoomed-in denoised results for pseudo-


50 50 Gibbs artifacts comparison: (a) Pseudo-Gibbs ar-
tifacts introduced by the curvelet transform and
100 100 (b) no obvious artifacts after using dictionary
learning method.
Time sample index

Time sample index

150 150

200 200

250 250

300 300

350 350

400 400

450 450
50 100 150 200 250 300 350 400 450 50 100 150 200 250 300 350 400 450
Trace index Trace index
WD54 Zhu et al.

a) 40
a)
32
Sp. K−SVD(p = 25)
Sp. K−SVD(p =100)
35 Curvelet BPDN 31.9
Noisy Data
31.8
30
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

31.7
PSNR (dB)

PSNR (dB)
25
31.6

20 31.5

31.4
15
31.3
Sp. K−SVD(p = 25)
10 Sp. K−SVD(p = 50)
31.2
Sp. K−SVD(p = 75)
Sp. K−SVD(p = 100)
5 31.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 30
σ Iteration

b) 36 b) 32
Sp. K−SVD(p = 25)
31.9
34

31.8
32
31.7
PSNR (dB)
PSNR (dB)

30 31.6

28 31.5

31.4
26
31.3
Sp. K−SVD(p = 25)
24 Sp. K−SVD(p = 50)
31.2
Sp. K−SVD(p = 75)
Sp. K−SVD(p = 100)
22 31.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 30
σ Iteration

c) 36
c) 32.1
Sp. K−SVD(p = 100)
32
34
31.9

32 31.8
PSNR (dB)

31.7
PSNR (dB)

30
31.6
28
31.5

26 31.4

31.3
Sp. K−SVD(p = 25)
24 Sp. K−SVD(p = 50)
31.2 Sp. K−SVD(p = 75)
Sp. K−SVD(p = 100)
22 31.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 30
σ Iteration

Figure 9. PSNR versus noise level σ with 20,000 training patches Figure 10. PSNR versus training iterations with different number
for dictionary learning: (a) PSNR performance comparison of of training patches, sparsity level p and noise level σ: (a) Dictionary
sparse K-SVD dictionary learning method and curvelet-based is trained by 5000 training patches, (b) dictionary is trained by
BPDN method, (b) error bar for sparsity level p ¼ 25, and (c) error 10,000 training patches, and (c) dictionary is trained by 20,000
bar for sparsity level p ¼ 100. training patches.
Data denoising by dictionary learning WD55

domly choosing 10,000 among them as a training set for the sparse formance of the dictionary that incorporates the sparsest matrix A
K-SVD algorithm. We choose a single-scale 256 × 256 DCT matrix (p ¼ 25) is worse than those with p ¼ 50; 75, and 100, motivating
as the base dictionary for sparse K-SVD and initialize the sparse a way to choose parameters heuristically.
matrix A to identity. Therefore, the size of the overall learned dic-
tionary D ¼ ΦA is 256 × 256. The number of nonzero coefficients Denoising with multiscale dictionary learning
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

in each column of A, i.e., the atom sparsity level p, is set to 10,


implying that the overall dictionary atoms are linear combinations In this experiment, the multiscale sparse K-SVD algorithm (Al-
of a small number of arbitrary DCT atoms. gorithm 2) is used to learn separate and sparse subdictionaries for
The dictionary for DCT is demonstrated in Figure 6a. Figure 6b the sparse coding and denoising of transform coefficients in differ-
depicts the learned sparse matrix A obtained by running the sparse ent subbands. The final denoised seismic data are obtained by ap-
K-SVD algorithm for 20 iterations. Figure 6c shows the overall learned plying the inverse transform on the denoised transform coefficients.
dictionary D ¼ ΦA. We can see that some primary directional features Daubechies 9/7 wavelets (with dictionary Φ) are used to decom-
in the seismic wavefronts are characterized by the dictionary atoms. pose the seismic data set with S ¼ 2 levels, thus, giving a number of
These atoms are more adaptive to the data set when compared with subbands equal to B ¼ 3S þ 1 ¼ 7. To make a fair comparison with-
the fixed directional transforms, such as the contourlet or curvelet, out losing generality, the patch size is fixed to 16 × 16 for all sub-
which exhibits atoms in all directions. Therefore, the denoising result bands b ¼ 1; : : : ; 7. For subbands b ¼ 1; 2; 3, and 4 at the coarse
shown in Figure 7a is improved to PSNR ¼ 32.00 dB, and the cor- decomposition level, 16 representation coefficients are sufficient
responding error panel is shown in Figure 7b. Particularly because all in the dictionary learning domain for each training patch; therefore,
atoms in the learned dictionary are useful and well representative for subdictionaries AðbÞ , as well as the overall learned dictionaries
sparse coding and patch denoising, the problem of pseudo-Gibbs ar- DðbÞ ¼ ΦAðbÞ are of size 256 × 16. We extract 312 training patches
tifacts is solved. Zoom-in denoising results are demonstrated in Fig- from each subband at this decomposition level to train AðbÞ , and the
ure 8. We can see that the result of BPDN with the curvelet transform atom sparsity pðbÞ is set to 25. Similarly, for subbands b ¼ 5; 6, and 7
in Figure 8a introduces pseudo-Gibbs artifacts that cannot be ignored, at the fine decomposition level, AðbÞ and the overall learned diction-
whereas the dictionary learning method based on the sparse K-SVD aries DðbÞ ¼ ΦAðbÞ are of size 256 × 64. The 1248 training patches
algorithm solves this problem, as shown in Figure 8b. are extracted from each subband at this decomposition level to train
Figure 9a compares the performance of our method to the cur- AðbÞ with the atom sparsity pðbÞ to 50. Accordingly, a total of 5000
velet BPDN method versus different noise levels σ, where 20,000 training patches have been used in the dictionary learning process.
training patches are used to learn sparse matrix A with atom spar- Each subdictionary is learned after 20 iterations. Figure 11a shows
sity levels of p ¼ 25 and 100, indicating a significant improve- the Daubechies 9/7 wavelets for different frequency subbands from
ment by an adaptive dictionary based on a fundamental both two scales. Figure 11b stacks all subdictionaries AðbÞ together,
transform. Our method performs quite stably for different settings, yielding an overall sparse dictionary A. Figure 11c depicts the
as can be seen from the mean and error bars in Figure 9b and 9c. stacked overall dictionary D ¼ ΦA. The denoising result in Fig-
Figure 10 compares the PSNR results of denoised seismic data ure 12a and the corresponding error panel in Figure 12b show that
during training iterations of the sparse K-SVD algorithm under this scheme outperforms the single-scale method by approximately
different parameter settings. We use 5000, 10,000, and 20,000 1 dB under a similar combination of parameters. Compared with the
training patches randomly selected from the noisy data set to learn results shown in Figure 10, this result achieves better performance by
dictionaries, with each dictionary parameterized with different using much fewer training patches. Therefore, our result benefits
atom sparsity levels p ranging from 25 to 100. As the number from choosing a multiscale base dictionary, which is able to provide
of training iterations or training patches of the algorithm increases, more degrees of freedom to the sparse representation without chang-
we find that the denoising performance stably improves. The per- ing the overall complexity.

Figure 11. (a) Base Dictionary Φ: DWT matrix by Daubechies 9/7 wavelet atoms, (b) the learned matrix A by sparse K-SVD algorithm, and
(c) the overall learned dictionary D ¼ ΦA.
WD56 Zhu et al.

Figure 12. (a) Denoised result of dictionary learn- a) b) 0.2


ing method using DWT matrix as the base diction-
0.5
ary (PSNR = 33.61 dB) and (b) the difference of 50 50 0.15
(a) to the original data. 0.4

0.3 100 0.1


100

Time sample index


Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Time sample index


0.2
0.05
150 150
0.1

0 0
200 200

0.1
0.05
250 250
0.2
0.1
300 0.3 300

0.4 0.15
350 350
0.5
0.2
50 100 150 200 50 100 150 200
Trace index Trace index

DISCUSSION efficiency and stability of dictionary learning and provides a new


layer of adaptivity to the existing efficient transforms.
Seismic data denoising results obtained so far show that the dic- Our experiments indicate that the denoising results can be sig-
tionary learning method with double sparsity constraints can nificantly improved by the sparsity-promoting dictionary learning
achieve the best denoising quality. Compared with the other meth- method, better than any traditional sparse approximation techniques
ods using fixed transforms, our method introduces adaptability via based on fixed transforms. Because our method uses the dictionary
the sparse matrix A, which can be efficiently learned from training that is adaptive to the seismic data set, no pseudo-Gibbs artifacts
patches of seismic data set. Meanwhile, compared with the tradi- will be shown in the denoising result. Furthermore, the dictionary
tional scenarios that learn a fully explicit dictionary, our method is can be also learned in the multiscale transform domain, allowing
significantly more efficient, mainly depending on the choices of the more degrees of freedom on sparse representation and better visual
base dictionary and atom sparsity level. Our method also eliminates quality on denoising results.
the pseudo-Gibbs artifacts commonly seen after denoising with
fixed transforms by using overlapped training patches and demand-
ing the proximity between the noisy and denoised data set. The REFERENCES
denoising performance rests with multiple parameters such as
the number of training iterations and the total amount of training Abma, R., and J. Claerbout, 1995, Lateral prediction for noise attenuation by
patches, which can be optimized empirically. In speaking of this, t-x and f-x techniques: Geophysics, 60, 1887–1896, doi: 10.1190/1
.1443920.
we cannot neglect the computational cost of obtaining the adaptive Aharon, M., M. Elad, and A. Bruckstein, 2006, K-SVD: An algorithm for
dictionary for denoising. Our method still requires iterative learning designing overcomplete dictionaries for sparse representation: IEEE
Transactions on Signal Processing, 54, 4311–4322, doi: 10.1109/TSP
of a dictionary and a denoising process, although only a sparse .2006.881199.
amount of elements in matrix A need to be learned. One important Beckouche, S., and J. Ma, 2014, Simultaneous dictionary learning and
step is to compute sparse representation of data patches, in which denoising for seismic data: Geophysics, 79, no. 3, A27–A31, doi: 10
.1190/geo2013-0382.1.
the computation time can vary greatly depending on the choice of Bezdek, J. C., and R. J. Hathaway, 2002, Some notes on alternating opti-
sparse optimization algorithm. In our experiments, SPGL1 (Van mization: Advances in soft computing — AFSS 2002 Springer, Lecture
den Berg and Friedlander, 2007, 2008) is chosen to solve the Notes in Computer Science 2275, 288–300.
Canales, L. L., 1984, Random noise reduction: 54th Annual International
l1 -norm optimization problems due to its fast speed. On the con- Meeting, SEG, Expanded Abstracts, 525–527.
trary, fixed transforms, such as the contourlet and curvelet trans- Candès, E. J., and L. Demanet, 2005, The curvelet representation of wave
propagators is optimally sparse: Communications on Pure and Applied
forms, offer fast and algorithmic sparse representation, yet they Mathematics, 58, 1472–1528, doi: 10.1002/cpa.20078.
sacrifice flexibility to adapt to specific signal. Candès, E., L. Demanet, D. Donoho, and L. Ying, 2006, Fast discrete cur-
velet transforms: Multiscale Modeling and Simulation, 5, 861–899, doi:
10.1137/05064182X.
CONCLUSION Candès, E. J., and D. L. Donoho, 2004, New tight frames of curvelets and
optimal representations of objects with piecewise C2 singularities: Com-
munications on Pure and Applied Mathematics, 57, 219–266, doi: 10
In this paper, we have presented a novel denoising technique for .1002/cpa.10116.
seismic data sets based on a sparsity-promoting dictionary learning Chanerley, A. A., and N. A. Alexander, 2002, An approach to seismic cor-
method. Unlike previous methods that train fully explicit dictionary rection which includes wavelet de-noising, in Proceedings of the Sixth
Conference on Computational Structures Technology: Civil-Comp Press,
matrices but sacrifice efficiency, this method only requires learning 107–108.
a sparse matrix after choosing a base dictionary, which corresponds Chen, S. S., D. L. Donoho, and M. A. Saunders, 1998, Atomic decompo-
to an efficient transform and incorporates some prior knowledge sition by basis pursuit: SIAM Journal on Scientific Computing, 20, 33–
61, doi: 10.1137/S1064827596304010.
about the data. Moreover, motivated by the underlying structural Chen, Y., and J. Ma, 2014, Random noise attenuation by f-x empirical-mode
similarity among dictionary atoms, our method implies a constraint decomposition predictive filtering: Geophysics, 79, no. 3, V81–V91, doi:
10.1190/geo2013-0080.1.
that each atom in our learned dictionary is itself a linear combina- Do, M., and M. Vetterli, 2003, Contourlets: Beyond wavelets: Academic
tion of atoms in the base dictionary. Such a method improves the Press, 83–105.
Data denoising by dictionary learning WD57

Do, M., and M. Vetterli, 2005, The contourlet transform: An efficient direc- Jolliffe, I., 2002, Principal component analysis 2nd ed.: Springer.
tional multiresolution image representation: IEEE Transactions on Image Liu, G.-C., X.-H. Chen, J.-Y. Li, J. Du, and J.-W. Song, 2011, Seismic noise
Processing, 14, 2091–2106, doi: 10.1109/TIP.2005.859376. attenuation using nonstationary polynomial fitting: Applied Geophysics,
Donoho, D. L., 2006, For most large underdetermined systems of linear 8, 18–26, doi: 10.1007/s11770-010-0244-2.
equations the minimal 1-norm solution is also the sparsest solution: Com- Liu, G., S. Fomel, L. Jin, and X. Chen, 2009, Stacking seismic data using
munications on Pure and Applied Mathematics, 59, 797–829, doi: 10 local correlation: Geophysics, 74, no. 3, V43–V48, doi: 10.1190/1.3085643.
.1002/cpa.20132. Madagascar Development Team, 2014, Madagascar software, version 1.6.5,
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Elad, M., and M. Aharon, 2006, Image denoising via sparse and redundant http://www.ahay.org/, accessed 10 November 2014.
representations over learned dictionaries: IEEE Transactions on Image Mairal, J., M. Elad, and G. Sapiro, 2008, Sparse representation for color
Processing, 15, 3736–3745, doi: 10.1109/TIP.2006.881969. image restoration: IEEE Transactions on Image Processing, 17, 53–
Engan, K., S. Aase, and J. Hakon Husoy, 1999, Method of optimal direc- 69., doi: 10.1109/TIP.2007.911828.
tions for frame design, in Proceedings on IEEE International Conference Neelamani, R., A. I. Baumstein, D. G. Gillard, M. T. Hadidi, and W. L.
on Acoustics, Speech, and Signal Processing 5, IEEE, 2443–2446. Soroka, 2008, Coherent and random noise attenuation using the curvelet
Etgen, J., and C. Regone, 1998, Strike shooting, dip shooting, wide-patch transform: The Leading Edge, 27, 240–248, doi: 10.1190/1.2840373.
shooting — Does prestack migration care? A model study: 68th Annual Rubinstein, R., M. Zibulevsky, and M. Elad, 2010, Double sparsity: Learning
International Meeting, SEG, Expanded Abstracts, 66–69. sparse dictionaries for sparse signal approximation: IEEE Transactions on
Gulunay, N., 1986, Fxdecon and complex Wiener prediction filter: Signal Processing, 58, 1553–1564, doi: 10.1109/TSP.2009.2036477.
56th Annual International Meeting, SEG, Expanded Abstracts, Tang, G., J.-W. Ma, and H.-Z. Yang, 2012, Seismic data denoising based on
279–281. learning-type overcomplete dictionaries: Applied Geophysics, 9, 27–32,
Harris, P., and R. White, 1997, Improving the performance of f-x prediction doi: 10.1007/s11770-012-0310-z.
filtering at low signal-to-noise ratios: Geophysical Prospecting, 45, 269– Van den Berg, E., and M. P. Friedlander, 2007, SPGL1: A solver for large-
302, doi: 10.1046/j.1365-2478.1997.00347.x. scale sparse reconstruction, http://www.cs.ubc.ca/labs/scl/spgl1, accessed
Hennenfent, G., L. Fenelon, and F. Herrmann, 2010, Nonequispaced curve- 1 March 2013.
let transform for seismic data reconstruction: A sparsity-promoting ap- Van den Berg, E., and M. P. Friedlander, 2008, Probing the Pareto frontier
proach: Geophysics, 75, no. 6, WB203–WB210, doi: 10.1190/1 for basis pursuit solutions: SIAM Journal on Scientific Computing, 31,
.3494032. 890–912, doi: 10.1137/080714488.
Hennenfent, G., and F. Herrmann, 2006, Seismic denoising with nonuni- Vidal, R., Y. Ma, and S. Sastry, 2005, Generalized principal component
formly sampled curvelets: Computing in Science Engineering, 8, 16–25, analysis (GPCA): IEEE Transactions on Pattern Analysis and Machine
doi: 10.1109/MCSE.2006.49. Intelligence, 27, 1945–1959, doi: 10.1109/TPAMI.2005.244.
Ibrahim, A., and M. Sacchi, 2014a, Eliminating blending noise using fast Yang, J., J. Wright, T. Huang, and Y. Ma, 2008, Image super-resolution as
apex shifted hyperbolic radon transform: 76th Annual International Con- sparse representation of raw image patches: Presented at IEEE
ference and Exhibition, EAGE, Extended Abstracts, EL12 01. Conference on Computer Vision and Pattern Recognition, 1–8.
Ibrahim, A., and M. D. Sacchi, 2014b, Simultaneous source separation using Zhang, R., and T. Ulrych, 2003, Physical wavelet frame denoising: Geo-
a robust radon transform: Geophysics, 79, no. 1, V1–V11, doi: 10.1190/ physics, 68, 225–231, doi: 10.1190/1.1543209.
geo2013-0168.1. Zhou, Y., W. Chen, and J. Gao, 2014, Separation of seismic blended data by
Ji, J., 2006, CGG method for robust inversion and its application to velocity- sparse inversion over dictionary learning: Journal of Applied Geophysics,
stack inversion: Geophysics, 71, no. 4, R59–R67, doi: 10.1190/1.2209547. 106, 146–153, doi: 10.1016/j.jappgeo.2014.04.010.

Potrebbero piacerti anche