Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
10.1190/GEO2015-0047.1
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Manuscript received by the Editor 31 January 2015; revised manuscript received 9 May 2015; published online 28 August 2015; corrected version published
online 19 October 2015.
1
Georgia Institute of Technology, Center for Energy and Geo Processing and KFUPM, Atlanta, Georgia, USA. E-mail: lczhu@gatech.edu; entao.liu@
ece.gatech.edu; jim.mcclellan@gatech.edu.
© 2015 Society of Exploration Geophysicists. All rights reserved.
WD45
WD46 Zhu et al.
Toward this end, the hyperbolic Radon transform has been adopted based on curvelets and contourlets. Experimental results indicate that
to represent seismic reflections with sparse coefficients (Ji, 2006), our scheme achieves the best performance in terms of peak signal-to-
and it has been used for deblending seismic data in blended source noise ratio (PSNR) and produces the least visual distortion.
acquisition (Ibrahim and Sacchi, 2014a, 2014b). Furthermore, a fam- The rest of this paper is organized as follows: In the second sec-
ily of directional multiscale transforms, such as the curvelet (Candès tion, we introduce the motivation of the dictionary model with dou-
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
and Donoho, 2004; Candès and Demanet, 2005; Candès et al., 2006) ble sparsity and the details of the sparse K-SVD algorithm. The
and contourlet (Do and Vetterli, 2003, 2005) transforms are intro- following section describes patch-based seismic data denoising us-
duced to exploit directional details and geometric smoothness along ing the learned dictionary. Learning separate multiscale dictionaries
2D curves and 3D surfaces. Their basis functions are localized in dif- and performing denoising in different subbands of a multiscale
ferent scales, positions, and orientations. Because seismic data are transform are also presented in this section. Numerical experiments
known to have smooth, anisotropic contours particularly in reflected are given in the next section, followed by a discussion and the con-
wavefronts, these multiscale geometric analysis methods use substan- clusion in the final two sections.
tially fewer coefficients to represent seismic data than wavelets for a
given accuracy, and they have attracted a great deal of attention for
seismic data denoising (Hennenfent and Herrmann, 2006; Neelamani SIGNAL REPRESENTATION WITH DICTIONARY
et al., 2008; Hennenfent et al., 2010). LEARNING
The analytic transform mentioned above is a model-driven proc-
ess based on a formulated mathematical model of the data, leading A dictionary model with double sparsity
to an implicit dictionary described by a structured algorithm. Alter-
Given a training set Y ¼ ½y1 ; y2 ; : : : ; yR ∈ RN×R , in which each
natively, a data-driven process learns its dictionary as an unstruc-
element is a column vector of length N, the goal of the dictionary
tured and explicit matrix from a training set, in such a way that each
learning process is to find a matrix D ∈ RN×L that is able to represent
data signal can be represented as a linear combination of only a few
the training set Y with a set of sparse coefficients X ¼ ½x1 ;
columns (atoms) of the matrix. Typical dictionary learning algo-
x2 ; : : : ; xR ∈ RL×R by solving the following optimization problem:
rithms range from principal component analysis (PCA) (Jolliffe,
2002), generalized PCA (Vidal et al., 2005), to the method of op-
fD; ^ ¼ arg minkY − DXk2F ;
^ Xg
timal directions (Engan et al., 1999) and the K-singular value de- D;X
composition (K-SVD) (Aharon et al., 2006). Dictionary learning (
methods avoid choosing a fixed dictionary in which some atoms ∀i∶ kxi k0 ≤ t;
subject to (1)
might be of limited use, and therefore offer refined dictionaries that ∀j∶ kdj k2 ¼ 1;
adapt the structure of the data. This approach yields better perfor-
mance in many image-based applications, such as image denoising where each column vector xi of length L is the sparse representation
(Elad and Aharon, 2006; Mairal et al., 2008), superresolution (Yang of yi over D, k · k0 is the l0 -norm that counts the nonzero entries of a
et al., 2008), etc. The dictionary learning method has recently been vector. The atom normalization constraint kdj k2 ¼ 1 is added to in-
successfully applied to seismic data denoising (Tang et al., 2012; crease the robustness of the dictionary; however, it does not essen-
Beckouche and Ma, 2014), as well as on seismic data deblending tially change the problem. Though the l0 -norm optimization
(Zhou et al., 2014). However, these applications come with the price problem 1 is generally NP-hard and cannot be tackled directly, it
of a high overhead including explicit storage and multiplication of can be relaxed into the following tractable problem by replacing
unstructured dictionary matrices. the l0 -norm with the l1 -norm
In this paper, we propose a novel denoising scheme for seismic
data based on its sparse representation over a learned dictionary. fD; ^ ¼ argminkY − DXk2F ;
^ Xg
The dictionary is trained by a variant of the K-SVD algorithm, named D;X
sparse K-SVD (Rubinstein et al., 2010), over a set of seismic data
(
∀i∶ kxi k1 ≤ t;
patches. The motivation of sparse K-SVD is that the learned diction- subject to (2)
ary atoms from K-SVD may still share some underlying sparse pat- ∀j∶ kdj k2 ¼ 1.
tern over a generic dictionary. Therefore, Rubinstein
et al. (2010) suggest constructing the effective learned dictionary Such a convex relaxation yields an exact solution to the l0 -norm
D ¼ ΦA as a multiplication of a base dictionary Φ corresponding optimization problem 1 under certain conditions specified in Donoho
to a fixed analytic transform (e.g., discrete cosine transform [DCT] in (2006), and thereafter, we use the l1 -norm to measure the sparsity
Rubinstein et al., 2010) by a sparse matrix A actually to be learned. level.
By relieving the need to learn all elements in D, such a parametric For those applications using natural images, the dimensions N of
model of the learned dictionary strikes a good balance among com- the training signal fyi g and L of the sparse coefficients fxi g are
plexity, adaptivity, and performance. The algorithm we propose is large. Any attempt to learn a full-size dictionary requires a huge
summarized as follows: We start from a base dictionary Φ and learn number R of training signals as well, yielding an intractable com-
a sparse matrix A from the noisy seismic data patch set. For each putational complexity of solving the l1 -norm optimization prob-
overlapping patch of seismic data, a sparse coding problem with re- lem 2. This difficulty also applies to seismic applications, where
spect to the effective learned dictionary D ¼ ΦA is solved to perform the recorded data from a single shot may include hundreds of traces
denoising. Then, all patched results are reconstructed, tiled, and aver- and thousands of samples per trace. To learn the dictionary D with
aged to assemble the denoised seismic data. We also compare our moderate computational effort, patch-based (with dimensions in the
scheme with those using sparse approximations for seismic denoising order of 102 ) processing is commonly used. This means that the
Data denoising by dictionary learning WD47
training set must be composed of small patches from the input data, fA; ^ ¼ argminkY − ΦAXk2F ;
^ Xg
and the patches should be overlapped to avoid blocking artifacts. A;X
8
< ∀i∶ kxi k1 ≤ t;
So, what could be an appropriate dictionary to represent seismic
>
data? Before attempting to answer this question, we must first
subject to (4)
understand seismic data. A seismic data set is a collection of data >
:
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
traces, each one of which is a continuous wave recorded by a ∀j∶ kaj k1 ≤ p; kΦaj k2 ¼ 1.
seismic-recorder-like geophone from a seismic source, either a
Because the actual learned dictionary A and the representation coeffi-
man-made seismic event or a naturally occurring earthquake. Many
cients X are sparse matrices, such a dictionary learning model has a
traces together provide a spatiotemporal sampling of the reflected
property called double sparsity.
wavefield, which contains a straight line and hyperbolas that cor-
respond to direct ray and reflections with normal moveouts, respec-
tively. Figure 1b demonstrates an example of a learned dictionary of Sparse K-SVD algorithm
100 atoms by the K-SVD algorithm (Aharon et al., 2006) on a set of Starting from an initial matrix A0 , sparse K-SVD (Rubinstein et al.,
16 × 16 patches from a synthetic seismic data set shown in Fig- 2010) is a dictionary learning algorithm specifically designed to learn
ure 1a. Although there are no constraints posed by the algorithm, the sparse dictionary A by solving the optimization problem 4 for a
we can notice the strong resemblance among atoms in the resulting fixed number of iterations. Here, we briefly review the algorithm and
dictionary, which suggests that the sparse representation can be ex- show how it iteratively improves the dictionary by sparse coding,
tended to each atom itself over some predefined base dictionary. atom removal, and atom updating operations.
Therefore, we can express the dictionary as The sparse K-SVD algorithm is presented in Algorithm 1 with
details. In each iteration, the first step for sparse K-SVD is sparse
coding the training examples in Y, given the current dictionary
D ¼ ΦA; (3)
D ¼ ΦA. Because the sparse coding is coupled with the dictionary
model, an effective dictionary learning process requires each atom
where Φ ∈ RN×L is the base dictionary generally chosen to have a dj ¼ Φaj to be updated individually. Denoting a set I as the index
quick implicit implementation and A ¼ ½a1 ; a2 ; : : : ; aL ∈ RL×L is group of the signals in Y, whose representations use dj by locating
a sparse matrix to be learned, in which each column satisfies kai k1 ≤ nonzero column elements in the jth row of X, the objective function
p for some sparsity level p. There is no doubt that the selection of Φ to update aj is equivalent to
will affect the success of the dictionary model; therefore, transforms X 2
that already use some prior knowledge about the data are preferred. kYI − ΦAXI k2F
¼ YI − Φai Xi;I − Φaj Xj;I
Meanwhile, A can be regarded as an extension to the existing analytic ;
i≠j F
transform, adding a new layer of adaptivity on the base dictionary Φ.
Compared with the regular dictionary D, which is fully explicit, the ¼ kEj − Φaj Xj;I k2F ; (5)
dictionary model 3 is significantly more efficient because only a P
few elements in A need to be learned, stored, and transmitted. More where Ej ¼ YI − i≠j Φai Xi;I is the residual matrix without the
importantly, due to its fewer degrees of freedom, such a dictionary contribution of Φaj . Therefore, the resulting problem to update
model reduces the chance of overfitting the noise in the training set aj and Xj;I is reduced to the following rank-1 approximation:
and produces robust results even with limited training examples. These
properties are particularly advantageous for denoising. As a result, the
faj ; XTj;I g ¼ argminkEj − ΦaxT k2F ;
a;x
dictionary learning process with the sparsity-promoting dictionary
model shown in equation 3 becomes subject to kak1 ≤ p: (6)
a) 0.6
b) Figure 1. (a) Synthetic seismic data set and
(b) learned dictionary atoms trained from 16 ×
50 16 patches of (a) using K-SVD algorithm.
0.4
100
Time sample index
0.2
150
200 0
250
0.2
300
0.4
350
0.6
50 100 150 200
Trace index
WD48 Zhu et al.
Due to its nonconvexity, solving this problem directly might be ETj Φaj
XTj;I ¼ : (8)
computationally intensive. A roundabout can greatly reduce the kΦaj k2
complexity by applying a single iteration of alternating optimiza-
tion (Bezdek and Hathaway, 2002). Assuming x ¼ XTj;I ∕kXTj;I k2 , aj
Such a process guarantees a reduction of the objective function.
is optimized by solving the following sparse coding problem:
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
cording to certain criteria, which we will describe below. Such pro- DENOISING WITH LEARNED DICTIONARY
cedures can effectively avoid local minima or overfitting and there-
fore improve the adaptability of the learned dictionary. Problem formulation
The representation ability of the learned dictionary will be re- Let s ∈ RUV denote a vectorized noise-free seismic data set that
duced if some atoms happen to be very similar. A useful measure- collects V traces, and each trace has U time samples. The n ∈ RUV
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
ment of the similarity among the atoms in a dictionary matrix D is is an uncoupled random noise matrix, whose elements are indepen-
called mutual coherence, which is defined by dent and identically distributed with N ð0; σ 2 Þ. Then, the received
noisy seismic data set is denoted as
dTi dj
μðDÞ ¼ max : (9) y ¼ s þ n; (12)
i≠j kdi k2 kdj k2
We will examine μðDÞ ¼ μðΦAÞ after all L columns of matrix A have due to additive noise contamination.
As we have pointed out above, dictionary learning algorithms tend
been updated in an iteration. If indeed a pair of ðai ; aj Þ, which makes
to work with small data patches rather than full-size data to avoid
μðDÞ exceed some threshold (say 0.99), is found, one element should
prohibitive computational complexity caused by high dimensionality
be replaced with the representation of yk over Φ that satisfies
and large training sets. Therefore, we break up the seismic data y into
P overlapping patches fyi gPi¼1 (in vectorized form) with the same size
a ¼ argminkyk − Φak22 ; subject to kak1 ≤ p; (10) and perform denoising on each patch independently. For each noisy
a
patch yi ∈ RN , the denoising problem can be stated as estimating a
where k refers to the index of the signal in Y that exhibits the largest clean patch si ∈ RN via the l1 -norm regularization form of the basis
approximation error after taking account of the current sparse matrix A; pursuit denoising (BPDN) problem (Chen et al., 1998):
i.e., (
f^xi ¼ argminkΦAx − yi k22 þ μkxk1 ;
x (13)
k ¼ arg maxkyk − ΦAxk k22 : (11) s^ i ¼ ΦA^xi ;
k
Because the number of training patches y is always much larger than given the learned dictionary ΦA. Generalizing optimization prob-
the number of atoms, such a replacement prevents similar atoms from lem 13 to consider all patch indices i ¼ 1; 2; : : : ; P, the global
appearing again. denoising problem for the entire noisy seismic data set y can be for-
Besides the replacement of similar atoms, we also identify and mulated as
replace those atoms that are infrequently used. As we know, the num- X X
ber of nonzero elements in the ith row of X indicates that how many f^xi ;^sg ¼ argmin kΦAxi −Pi sk22 þ μi kxi k1 þλks−yk22 ;
xi ;s i i
training signals in Y use di in their representations. If an atom is used
by less than a threshold number (say four) of training signals, then it (14)
is said “less representative” and can be replaced with another one that
represents more training signals. This atom replacement within A can where Pi ∈ f0; 1gN×UV is a projection matrix that selects si from s as
be done by solving optimization problem 10 as well. si ¼ Pi s. Its numerical solution is provided in the next subsection.
a) 0.6
b) Figure 2. (a) Synthetic seismic data set and
(b) wavelet decomposed subbands of (a).
50
0.4
100
Time sample index
0.2
150
200 0
250
0.2
300
0.4
350
0.6
50 100 150 200
Trace index
WD50 Zhu et al.
AðbÞ ;XðbÞ Finally, the denoised seismic data s^ can be obtained by applying the
( ðbÞ
∀i∶ kxi k1 ≤ t; inverse multiscale transform after all subbands across different
subject to ðbÞ ðbÞ
(19) scales have been denoised:
∀j∶ kaj k1 ≤ p; kaj k2 ¼ 1. [
s^ ¼ Φ u^ ðbÞ : (22)
b
Similar to the global denoising problem 14, the sparse coding of zðbÞ
over the subdictionary AðbÞ, as well as the denoising process of zðbÞ NUMERICAL EXPERIMENTS ON SYNTHETIC
can be formulated as follows:
DATA
ðbÞ
X ðbÞ ðbÞ 2 In this section, we use the dictionary learning method to attenuate
∀b∶f^xi ; u^ ðbÞ g ¼ argmin kAðbÞ xi − Pi uðbÞ k2 the noise in seismic data, and we compare the performance of our
ðbÞ
xi ;uðbÞ i proposed method with other denoising methods using fixed con-
X ðbÞ tourlet and curvelet transforms. The seismic data sets used in the
þ μi kxi k1 þ λkuðbÞ − zðbÞ k22 ; (20)
experiments are synthesized 2D prestack shot records that are avail-
i
able for download at public seismic data dictionaries, such as those
by SEG and the Madagascar Development Team (2014). Assuming
where u^ ðbÞ is the denoised version of zðbÞ and its closed-form sol- that the seismic noise is caused by a diversity of different, spatially
ution is distributed, mostly uncorrelated, but low-frequency sources, it can
a) 0.6
b) Figure 4. (a) The original seismic data set for
reference and (b) the noisy seismic data set for
0.6
50 50 denoising (PSNR = 22.62 dB).
0.4
0.4
100 100
0.2
Time sample index
0.2
150 150
0 200 0
200
0.6
50 100 150 200 50 100 150 200
Trace index Trace index
WD52 Zhu et al.
be modeled as zero-mean white additive Gaussian noise with stan- the experiments to learn the sparse matrix A and thereafter the over-
dard deviation σ low-pass filtered at a stopband frequency (30 Hz in all dictionary D ¼ ΦA. One represents the single-scale DCT and
our experiments). For the sake of numerical stability and better per- another the multiscale discrete wavelet transform (DWT).
formance, a standard normalization step is applied to rescale the As one of the most commonly used quality metrics for the com-
data into the range of ð−1; 1Þ after subtracting the mean of each parison of denoising performance, PSNR is used in our experi-
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
trace in the data set. Two different base dictionaries Φ are used in ments, which is defined as
200 0 200 0
0.1 0.1
250 250
0.2 0.2
300 300
0.3 0.3
b) 0.2 d) 0.2
50
50 0.15 50 0.15
0 200 0
200
0.05 0.05
250 250
0.1 0.1
300 300
0.15 0.15
350 350
0.2 0.2
50 100 150 200 50 100 150 200
Trace index Trace index
Figure 6. Base dictionary Φ: DCT matrix, (b) the learned matrix A by Sparse K-SVD algorithm, and (c) the overall learned dictionary
D ¼ ΦA.
Data denoising by dictionary learning WD53
pffiffiffiffiffiffiffiffi
UV sMAX where Φ refers to the dictionary of the contourlet/curvelet synthesis
PSNR ≜ 20log10 ; (23) operator.
ks − s^k2
The public seismic data set we use in the following experiments is
provided by BP (Etgen and Regone, 1998) as part of Madagascar soft-
ware (Madagascar Development Team, 2014). It has 240 traces, and
where sMAX ¼ 1 is the maximum possible value of the seismic data
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
each trace contains 384 time samples. Figure 4a is the original noise-
after normalization.
free data, and Figure 4b is its (low-pass-filtered) noisy version with
σ ¼ 0.1 and PSNR ¼ 22.62 dB. The BPDN results based on the
Denoising with fixed transforms
contourlet and curvelet transforms are provided in Figure 5a and
This subsection investigates the prominent multiscale directional 5c with PSNR ¼ 29.02 and 29.58 dB, respectively, whereas the
transforms including the contourlet and curvelet transforms for seis- error panel figures that show the difference between the recon-
mic data denoising. The contourlet transform (Do and Vetterli, 2003, structed data and the original data are given in Figure 5b and 5d,
2005) can capture smooth contours in a seismic data set based on a respectively. It is obvious that most of the random noise is attenuated
Laplacian pyramid decomposition followed by directional filter banks after many small coefficients are suppressed by the BPDN algorithm.
applied on each band-pass subband. Its atom elements are depicted in However, many pseudo-Gibbs artifacts are produced along the
Figure 3a. Based on the frequency partition technique, the curvelet wavefronts, especially in the contourlet case, whose atom elements
transform is able to represent curved singularities more precisely with have a less clear directional feature than curvelets. These artifacts
needle-shaped atom elements, which are shown in Figure 3b. We use that do not exist in the original data set may be harmful for further
the fast discrete curvelet transform (Candès et al., 2006), which is the processing, such as migration and full-waveform tomography.
latest implementation, in our experiments. Because both transforms
are able to represent a seismic data set with sparse coefficients, Denoising with single-scale dictionary learning
the following BPDN formulation can be used for denoising:
In the following experiments, we compare the denoising perfor-
mance of the dictionary learning method based on the sparse K-
( SVD algorithm versus traditional denoising methods based on fixed
f^x ¼ argminkΦx − yk22 þ μkxk1 ;
x (24) transforms. Our dictionary learning method starts by dividing the
s^ ¼ Φ^x; noisy data set into overlapping patches size 16 × 16 each, and ran-
0.2 0.05
150 150
0 0
200 200
0.05
250 250
0.2
0.1
300 300
0.4
0.15
350 350
0.6 0.2
50 100 150 200 50 100 150 200
Trace index Trace index
150 150
200 200
250 250
300 300
350 350
400 400
450 450
50 100 150 200 250 300 350 400 450 50 100 150 200 250 300 350 400 450
Trace index Trace index
WD54 Zhu et al.
a) 40
a)
32
Sp. K−SVD(p = 25)
Sp. K−SVD(p =100)
35 Curvelet BPDN 31.9
Noisy Data
31.8
30
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
31.7
PSNR (dB)
PSNR (dB)
25
31.6
20 31.5
31.4
15
31.3
Sp. K−SVD(p = 25)
10 Sp. K−SVD(p = 50)
31.2
Sp. K−SVD(p = 75)
Sp. K−SVD(p = 100)
5 31.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 30
σ Iteration
b) 36 b) 32
Sp. K−SVD(p = 25)
31.9
34
31.8
32
31.7
PSNR (dB)
PSNR (dB)
30 31.6
28 31.5
31.4
26
31.3
Sp. K−SVD(p = 25)
24 Sp. K−SVD(p = 50)
31.2
Sp. K−SVD(p = 75)
Sp. K−SVD(p = 100)
22 31.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 30
σ Iteration
c) 36
c) 32.1
Sp. K−SVD(p = 100)
32
34
31.9
32 31.8
PSNR (dB)
31.7
PSNR (dB)
30
31.6
28
31.5
26 31.4
31.3
Sp. K−SVD(p = 25)
24 Sp. K−SVD(p = 50)
31.2 Sp. K−SVD(p = 75)
Sp. K−SVD(p = 100)
22 31.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 30
σ Iteration
Figure 9. PSNR versus noise level σ with 20,000 training patches Figure 10. PSNR versus training iterations with different number
for dictionary learning: (a) PSNR performance comparison of of training patches, sparsity level p and noise level σ: (a) Dictionary
sparse K-SVD dictionary learning method and curvelet-based is trained by 5000 training patches, (b) dictionary is trained by
BPDN method, (b) error bar for sparsity level p ¼ 25, and (c) error 10,000 training patches, and (c) dictionary is trained by 20,000
bar for sparsity level p ¼ 100. training patches.
Data denoising by dictionary learning WD55
domly choosing 10,000 among them as a training set for the sparse formance of the dictionary that incorporates the sparsest matrix A
K-SVD algorithm. We choose a single-scale 256 × 256 DCT matrix (p ¼ 25) is worse than those with p ¼ 50; 75, and 100, motivating
as the base dictionary for sparse K-SVD and initialize the sparse a way to choose parameters heuristically.
matrix A to identity. Therefore, the size of the overall learned dic-
tionary D ¼ ΦA is 256 × 256. The number of nonzero coefficients Denoising with multiscale dictionary learning
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Figure 11. (a) Base Dictionary Φ: DWT matrix by Daubechies 9/7 wavelet atoms, (b) the learned matrix A by sparse K-SVD algorithm, and
(c) the overall learned dictionary D ¼ ΦA.
WD56 Zhu et al.
0 0
200 200
0.1
0.05
250 250
0.2
0.1
300 0.3 300
0.4 0.15
350 350
0.5
0.2
50 100 150 200 50 100 150 200
Trace index Trace index
Do, M., and M. Vetterli, 2005, The contourlet transform: An efficient direc- Jolliffe, I., 2002, Principal component analysis 2nd ed.: Springer.
tional multiresolution image representation: IEEE Transactions on Image Liu, G.-C., X.-H. Chen, J.-Y. Li, J. Du, and J.-W. Song, 2011, Seismic noise
Processing, 14, 2091–2106, doi: 10.1109/TIP.2005.859376. attenuation using nonstationary polynomial fitting: Applied Geophysics,
Donoho, D. L., 2006, For most large underdetermined systems of linear 8, 18–26, doi: 10.1007/s11770-010-0244-2.
equations the minimal 1-norm solution is also the sparsest solution: Com- Liu, G., S. Fomel, L. Jin, and X. Chen, 2009, Stacking seismic data using
munications on Pure and Applied Mathematics, 59, 797–829, doi: 10 local correlation: Geophysics, 74, no. 3, V43–V48, doi: 10.1190/1.3085643.
.1002/cpa.20132. Madagascar Development Team, 2014, Madagascar software, version 1.6.5,
Downloaded 10/20/15 to 143.215.112.164. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Elad, M., and M. Aharon, 2006, Image denoising via sparse and redundant http://www.ahay.org/, accessed 10 November 2014.
representations over learned dictionaries: IEEE Transactions on Image Mairal, J., M. Elad, and G. Sapiro, 2008, Sparse representation for color
Processing, 15, 3736–3745, doi: 10.1109/TIP.2006.881969. image restoration: IEEE Transactions on Image Processing, 17, 53–
Engan, K., S. Aase, and J. Hakon Husoy, 1999, Method of optimal direc- 69., doi: 10.1109/TIP.2007.911828.
tions for frame design, in Proceedings on IEEE International Conference Neelamani, R., A. I. Baumstein, D. G. Gillard, M. T. Hadidi, and W. L.
on Acoustics, Speech, and Signal Processing 5, IEEE, 2443–2446. Soroka, 2008, Coherent and random noise attenuation using the curvelet
Etgen, J., and C. Regone, 1998, Strike shooting, dip shooting, wide-patch transform: The Leading Edge, 27, 240–248, doi: 10.1190/1.2840373.
shooting — Does prestack migration care? A model study: 68th Annual Rubinstein, R., M. Zibulevsky, and M. Elad, 2010, Double sparsity: Learning
International Meeting, SEG, Expanded Abstracts, 66–69. sparse dictionaries for sparse signal approximation: IEEE Transactions on
Gulunay, N., 1986, Fxdecon and complex Wiener prediction filter: Signal Processing, 58, 1553–1564, doi: 10.1109/TSP.2009.2036477.
56th Annual International Meeting, SEG, Expanded Abstracts, Tang, G., J.-W. Ma, and H.-Z. Yang, 2012, Seismic data denoising based on
279–281. learning-type overcomplete dictionaries: Applied Geophysics, 9, 27–32,
Harris, P., and R. White, 1997, Improving the performance of f-x prediction doi: 10.1007/s11770-012-0310-z.
filtering at low signal-to-noise ratios: Geophysical Prospecting, 45, 269– Van den Berg, E., and M. P. Friedlander, 2007, SPGL1: A solver for large-
302, doi: 10.1046/j.1365-2478.1997.00347.x. scale sparse reconstruction, http://www.cs.ubc.ca/labs/scl/spgl1, accessed
Hennenfent, G., L. Fenelon, and F. Herrmann, 2010, Nonequispaced curve- 1 March 2013.
let transform for seismic data reconstruction: A sparsity-promoting ap- Van den Berg, E., and M. P. Friedlander, 2008, Probing the Pareto frontier
proach: Geophysics, 75, no. 6, WB203–WB210, doi: 10.1190/1 for basis pursuit solutions: SIAM Journal on Scientific Computing, 31,
.3494032. 890–912, doi: 10.1137/080714488.
Hennenfent, G., and F. Herrmann, 2006, Seismic denoising with nonuni- Vidal, R., Y. Ma, and S. Sastry, 2005, Generalized principal component
formly sampled curvelets: Computing in Science Engineering, 8, 16–25, analysis (GPCA): IEEE Transactions on Pattern Analysis and Machine
doi: 10.1109/MCSE.2006.49. Intelligence, 27, 1945–1959, doi: 10.1109/TPAMI.2005.244.
Ibrahim, A., and M. Sacchi, 2014a, Eliminating blending noise using fast Yang, J., J. Wright, T. Huang, and Y. Ma, 2008, Image super-resolution as
apex shifted hyperbolic radon transform: 76th Annual International Con- sparse representation of raw image patches: Presented at IEEE
ference and Exhibition, EAGE, Extended Abstracts, EL12 01. Conference on Computer Vision and Pattern Recognition, 1–8.
Ibrahim, A., and M. D. Sacchi, 2014b, Simultaneous source separation using Zhang, R., and T. Ulrych, 2003, Physical wavelet frame denoising: Geo-
a robust radon transform: Geophysics, 79, no. 1, V1–V11, doi: 10.1190/ physics, 68, 225–231, doi: 10.1190/1.1543209.
geo2013-0168.1. Zhou, Y., W. Chen, and J. Gao, 2014, Separation of seismic blended data by
Ji, J., 2006, CGG method for robust inversion and its application to velocity- sparse inversion over dictionary learning: Journal of Applied Geophysics,
stack inversion: Geophysics, 71, no. 4, R59–R67, doi: 10.1190/1.2209547. 106, 146–153, doi: 10.1016/j.jappgeo.2014.04.010.