Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
art ic l e i nf o a b s t r a c t
Article history: In this paper, a novel multi-focus image fusion approach is presented. Firstly, a joint dictionary is con-
Received 5 December 2015 structed by combining several sub-dictionaries which are adaptively learned from source images using
Received in revised form K-singular value decomposition (K-SVD) algorithm. The proposed dictionary constructing method does
5 July 2016
not need any prior knowledge, and no external pre-collected training image data is required either.
Accepted 14 July 2016
Secondly, sparse coefficients are estimated by the batch orthogonal matching pursuit (batch-OMP) al-
Communicated by Huaping Liu
Available online 27 July 2016 gorithm. It can effectively accelerate the sparse coding process. Finally, a maximum weighted multi-norm
fusion rule is adopted to accurately reconstruct fused image from sparse coefficients and the joint dic-
Keywords: tionary. It can enable the fused image to contain most important information of the source images. To
Multi-focus image fusion
comprehensively evaluate the performance of the proposed method, comparison experiments are con-
Sparse representation
ducted on several multi-focus images and manually blurred images. Experimental results demonstrate
Dictionary learning
Batch-OMP that the proposed method outperforms many state-of-the-art techniques, in terms of visual and quan-
titative evaluations.
& 2016 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2016.07.039
0925-2312/& 2016 Elsevier B.V. All rights reserved.
H. Yin et al. / Neurocomputing 216 (2016) 216–229 217
content of an image is often complex and changeable. orthogonal matching pursuit algorithm. It can effectively accel-
The transform-domain-based fusion approach takes image erate the sparse coding process. Fused image is accurately re-
details and direction coefficients into account, it is successfully and constructed from sparse coefficients and the combined dictionary
widely used in image fusion field. However, how to select an op- using a maximum weighted multi-norm fusion rule, which can
timal transform basis remains a challenging problem. Obviously, preserve and combine the most important information of the
effectively and completely extracting the underlying information source images into the extended depth-of-focus fused image.
of original images would make fused image more accurate. To The main contribution of this work is twofold. (1) An in-
effectively extract the underlying information of the source ima- novative dictionary constructing strategy is designed to construct
ges, sparse-representation-based techniques are popularly studied a joint dictionary. Unlike previous dictionary constructing method,
in image fusion field [19–22]. The sparse representation algorithm the proposed dictionary constructing method does not need any
adopts an over-complete dictionary that contains prototype signal prior knowledge, and no external pre-collected training image
atoms to describe signals by sparse linear combinations of these data is required either. Simultaneously, the sub-dictionaries that
atoms [23]. Sparse-representation-based techniques are now in- constitute the joint dictionary are directly learned from source
creasingly attracting attention in computer vision area due to its images, so that it can improve the adaptability of constructed
state-of-the-art performance in many applications, such as image dictionary to input image data. Furthermore, the combined dic-
classification [24], face recognition [25], action recognition [26], tionary enforces that each source image can be constructed with
and object recognition [27]. the same subset of dictionary atoms. (2) A weighted multi-norm-
In sparse model, the over-complete dictionary plays an essen- based activity measure method is unitized to calculate the activity-
tial role. There are two main approaches to obtain a dictionary. The level of source image patch comprehensively. This improved ac-
first one is pre-constructing dictionary based on analytical meth- tivity measure rule can preserve more detail information such as
ods, such as DCT, wavelets and curvelets. The second one is edges and lines effectively than other methods, since it seems not
learning dictionary from a large number of example image pat- feasible to comprehensively calculate the activity-level only using
ches, using a certain training algorithm such as the method of a single measurement such as ℓ1, ℓ0.
optimal directions (MOD) or K-SVD. Yang [28] is the first to apply The rest of the paper is organized as follows. Section 2 presents
the sparse representation theory to image fusion field, in his the framework of the proposed image fusion approach. The com-
method, image is decomposed by the redundant DCT dictionary. In parative experimental results are presented to verify the perfor-
[29], sparse representation is conducted with two kinds of typical mance of the proposed method in Section 3. Finally, Section 4
over-complete dictionaries: over-complete DCT bases and hybrid concludes this work and discusses future research.
dictionary consisting of DCT bases, wavelet bases, Gabor bases, and
ridgelet bases. Based on the use of sparse representations, a novel
framework for simultaneous image fusion and super-resolution is 2. The framework of sparse-representation-based multi-focus
adopted in [30]. Six thousand patches taken from six images are image fusion approach
used to learn the dictionaries. Liu [31] proposes a multi-focus
image fusion method based on sparse representation, a database The framework of the proposed sparse-representation-based
of forty high quality natural images is utilized to learn the dic- multi-focus image fusion approach is shown in Fig. 1. The proposed
tionary. Aharon presents an image fusion method based on K-SVD algorithm mainly consists of three parts: dictionary constructing,
algorithm in which the redundant dictionary is trained on a image image representation, integrating and reconstruction. To make a
sets. Yin [32] proposes a novel multimodal image fusion scheme dictionary adaptive to input image data, a joint dictionary is con-
based on the joint sparsity model. Similarly, the dictionary is structed by combining several sub-dictionaries that are learned
trained on USC-SIPI image database (http://sipi.usc.edu/database/) from source image patches using K-SVD algorithm adaptively, as
using K-SVD algorithm. shown in Fig. 1(a). After constructing joint dictionary that can
The pre-constructed analytic dictionary shares the advantages preserve each source image signal be constructed with the same
of fast implementation. However, this category of dictionary is subset of dictionary atoms, coefficients vectors for each source
restricted to signals of a certain type, prior knowledge is needed images are estimated by applying the batch-OMP algorithm. For the
when choosing analytical bases. Moreover, it cannot be used for an coefficient fusion rule, a maximum weighted multi-norm-based
arbitrary family of signals of interest. Compared with the pre- fusion rule is utilized to obtain the fused coefficients. After all sparse
constructed ones, the learned dictionary contains much richer coefficients are fused using the proposed fusion rule, the result
feature information, leading to a better representative ability in image is previously reconstructed using the fused coefficients and
image restoration and reconstruction. However, training a dic- the combined dictionary. Fig. 1(b) gives an overview of the pro-
tionary usually requires external pre-collected training image data. posed method for the case of two source images. Following sub-
In practice, collecting a proper image set is not always feasible. sections describe the above mentioned steps in detail.
Furthermore, image contents vary significantly across different
images, it is not surprisingly that, the performance of typical 2.1. Dictionary constructing
learning-based methods varies significantly on the dictionary
learned. Thus, how to construct a over-complete dictionary In this section, the dictionary constructing algorithm is illustrated
adaptive to input image data is a crucial problem in sparse-re- in detail. The over-complete dictionary determines the signal re-
presentation-based image fusion scheme. presentation ability of sparse coding. Generally, there are two main
Exploiting the property of content diversity of images and the categories of offline approaches to obtain a dictionary. The first one
advantages of sparse representation theory, in this paper, a novel is directly using the analytical models such as over-complete wa-
sparse-representation-based multi-focus image fusion approach is velets, curvelets, and contourlets. The second category is applying
proposed to focus on aforementioned problems. Firstly, a joint the machine learning technique to obtain a dictionary from a large
dictionary is constructed by several sub-dictionaries which are number of training image patches. Relatively, the former is simple,
directly learned from source images, adaptively. The dictionary but not adaptive for the complex and changeable structure of the
constructing method does not need any prior knowledge, and no image. The latter has better adaptability.
external pre-collected training image data is required either. Sec- The dictionary learning is a training process based on a series of
ondly, sparse coefficients are estimated by applying the batch sample data. Typical dictionary learning algorithm includes PCA [33],
218 H. Yin et al. / Neurocomputing 216 (2016) 216–229
Fig. 1. The framework of sparse-representation-based multi-focus approach. (a) Procedure of proposed dictionary learning method. (b) Overview of the proposed multi-
focus image fusion approach.
MOD [34], and K-SVD [35]. K-SVD is a standard unsupervised dic- Assume DA ∈ R J × S , αA ∈ R S × L denote dictionary, and the vector of
tionary learning algorithm which is widely investigated in [36–38]. It sparse representation coefficients of training samples respectively,
is the combination of the K-means clustering and sparsity con- the objective function is:
straints. The K-SVD training method of sparse dictionary includes
min {∥ VA − DA αA ∥2F } s.t. ∀ i, ∥ αAi ∥0 ⪡T
two steps: (1) sparse reconstruction: using given dictionary to solve DA, αA (1)
sparse coefficients of the image under the current dictionary.
where the notion ∥·∥F denotes the Frobenius norm, defined as
(2) dictionary updating: updating the atoms of the dictionary se-
quentially. In this paper the developed K-SVD algorithm is used to ∥ M ∥F = ∑ij Mij2 . T is a sparsity constraint of sparse representation
learn sub-dictionaries from source images because of its simplicity to be contained no more than T nonzero coefficients. The above
and efficiency for this task. formula can be solved by alternating the sparse coding stage and the
For the case of two source images, assume that IA , IB denote two dictionary updating stage. In the sparse coding stage, the dictionary
registered source images with size of M × N . Generally, nature image DA is kept fixed and the sparse coefficients matrix αA is efficiently
contains complicated and non-stationary information as a whole, computed by
while local small patch appears simple and has a consistent struc- min {∥ VA − DA αA ∥2F } s.t. ∀ i, ∥ αAi ∥0 ⪡T
ture. For this reason, a sliding window technique is adopted to αA (2)
achieve better performance in capturing local salient features. As
In the dictionary updating stage, keeping the sparse coefficients
shown in Fig. 1(a), firstly, the sliding window technique is utilized to
matrix αA fixed, the dictionary is updated sequentially by
divide each source image, from left-top to right-bottom with a step
length of one pixel, into patches of size n × n. Then, all the patches min {∥ VA − DA αA ∥2F }
DA (3)
are transformed into vectors via lexicographic ordering, all the vec-
tors constitute one matrix VA (take source image IA for example), in After updating the dictionary, all the samples are encoded again
which each column corresponds to one patch in the source image IA . with a new dictionary. If achieving the maximum number of
The size of VA is J × L ( J = n × n, L = (M − n + 1) × (N − n + 1)). iterations or meeting the requirements of the sparsity, the iteration
H. Yin et al. / Neurocomputing 216 (2016) 216–229 219
ends. Otherwise, the algorithm returns to continue sparse coding. 2.3. Integrating and reconstruction
Once all sub-dictionaries for all input source images are obtained,
they are united as a single dictionary D ∈ R J × 2S as follows: Establishing fusion rules needs to solve two key issues. One is
how to measure the activity-level, which recognizes the salience
D = [DA, DB ] (4)
of the sparse representation coefficients of the source images. The
other is how to integrate the coefficients into the counterparts of
the fused image. As to the first issue, it is considered that the
2.2. Sparse representation ℓ1-norm of the sparse coefficient vectors reflects how much detail
information they bring. The larger the ℓ1-norm, the better the
Since the sparse representation globally handles an image, it significance of corresponding image patch. Meanwhile, the
cannot directly be used with image fusion, which depends on the ℓ0-norm of the coefficient vectors can give expression to their
local information of source images. A sliding window technique concentration ratio of the detail information. The larger the
which can divide the source images into small patches is adopted ℓ0-norm, the more detailed information contained in the image
to solve this problem. Let IA , IB ∈ RM × N represent the source image patch. However, it seems not feasible to comprehensively calculate
to be fused. By sliding window technique, each image is divided the activity-level only using a single measurement. Motivated by
into n × n patches from upper left to lower right with a step length the recent work [39] of Mertens et al., in this paper, a weighted
of one pixel. There are L (L = (M − n + 1) × (N − n + 1)) patches multi-norm-based activity measure method is adopted to calculate
denoted as {pA , pB } in IA and IB , respectively. To facilitate the the activity-level of source image patch comprehensively. For each
source image patch, the information from ℓ1-norm and ℓ0-norm
analysis, the ith patches {piA , piB } are lexicographically ordered as
measures are combined into a scalar weight map using multi-
vectors {v iA, v iB }. Then {v iA, v iB } can be expressed as follows: plication. Similar to weighted terms of a linear combination, the
v iA = DαAi (5) influence of each measure can be controlled using a power func-
tion:
ω1 ω0
v iB = DαBi (6)
(
M iA = ∥ αAi ∥1 ) (
× ∥ αAi ∥0 ) (7)
Fig. 2. Running time of batch-OMP versus standard OMP for an explicit dictionary.
V iF = DαFi (10)
220 H. Yin et al. / Neurocomputing 216 (2016) 216–229
Fig. 3. Five source image sets. (a) and (b) Pepsi; (c) and (d) Clock; (e) and (f) Leaf; (g) and (h) Lena; (i) and (j) Barche.
quantitative assessments are presented in Table 1. largest quality indexes, the quality metrics of the proposed ap-
The values of AG, EI, FD, Q AB / F , MI, CE, RW and SSIM of Fig. 5(c)– proach is generally close to all the best values. On the other hand,
(l) are listed in Table 1. The best results are indicated in bold. From the fused result provided by the Max-based method have the best
Table 1, one can see that the proposed fusion scheme has four best values of MI, however, severe ringing artifacts can be observed
values, two second-best values, two third-best values. As shown in around the can and label. Obviously, it can be concluded that the
Table 1, despite the proposed fusion approach does not take all the fused image obtained by the proposed fusion approach contains
222 H. Yin et al. / Neurocomputing 216 (2016) 216–229
Fig. 4. Average value of fusion quality metrics with respect to the sparsity level and dictionary size.
much more abundant information, such as shapes and edges from 268 204. As shown in Fig. 7(a) and (b), each “Leaf” source image
the source images. At this point, it can be concluded that the randomly focuses on different regions. In the source image Leaf A,
proposed scheme works well and exhibits excellent fusion ability the front leaves are in focus and clear, while the back leaves are
visually and quantitatively. out-of-focus and blurred. On the contrary, the source image Leaf B
randomly focuses on the back leaves and the front leaves are fuzzy.
3.3.2. Fusion on multi-focus images “Clock” Fig. 7(c)–(l) present the fusion results obtained by different fusion
The second experiment is realized on multi-focus images
methods.
“Clock”. As shown in Fig. 6(a) and (b), the large clock in Clock A is
From the perspective of human visual perception mechanism,
out-of-focus and blurred, while the small clock is in focus and
the Max method produces the fused image with lots of halo,
clear. Contrary to Clock A, in Clock B, the large clock is defocused
especially in the region of the leaf vein, as shown in Fig. 7(b). The
and the small clock is in focus. In addition, Fig. 6(c)–(l) depict the
Average method, PCA method and GP method generate better
fusion results obtained by different methods to offer a direct view.
Compared with the source image Fig. 6(a) and (b), the large fused images than the Max method, but also severely lose in-
clock and small clock are equally clear in the fused images. As formation in luminance compared with the images acquired by LP
shown in Fig. 6, it can be observed that the Max-based fusion method, CVT method, SR-DCT method, SR-ℓ1 method, SR-PRE
method blurs not only the top and bottom edge of the large clock, method and the proposed method. Fig. 7(f) is the fused image of
but also the top edge of the small clock. The CVT-based method the LP method which has some artifacts in edge region. Fig. 7
produces sharp image but shows serious artifacts around edges, (e) appears some blurring around the edges. Visually, as shown in
e.g. clock borders. The fused results (i)–(l), which are obtained by Fig. 7(g)–(j), the fused images obtained by SR-DCT method, SR-ℓ1
SR-DCT, SR-ℓ1, SR-PRE and the proposed fusion approach, are more method, SR-PRE method and the proposed method have good
clear than other fused results. To further compare the performance performance in luminance and detail information, it is difficult for
of different fusion methods, quantitative assessments are pre- the human eye to find the difference subjectively. Thus, the further
sented in Table 2. From Table 2, the further objective comparison quantitative assessments are required for objective comparison.
shows that the results of average method are much smaller than The results of the quantitative criteria are shown in Table 3.
those of other methods. Numerically, the proposed method has the The values of AG, EI, FD, QAB/F, MI, CE, RW and SSIM are listed
best values for AG, EI, FD, Q AB / F and SSIM, and the second-best
in Table 3. The best values are indicated in bold. As shown in Ta-
values for MI and RW. On the whole, the proposed method has
ble 3, the proposed fusion approach achieves better fusion results
more comprehensive fusion performance, compared with the
with six best values and two second-best values, illustrating that
other methods.
the proposed fusion approach can capture much more information
3.3.3. Fusion on multi-focus images “Leaf” from source images. In brief, the results of subjective and objective
In this section, the corresponding experiments are im- evaluation demonstrate the superiority of the propose fusion ap-
plemented on a pair of multi-focus “Leaf” images with the size of proach when compared with many state-of-the-art methods.
H. Yin et al. / Neurocomputing 216 (2016) 216–229 223
Fig. 5. The “Pepsi” source images and fusion results by different fusion methods: (a) Source image Pepsi A; (b) Source image Pepsi B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method; (k)
The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.
3.3.4. Fusion on multi-focus images “Lena” Compared with the source images Fig. 8(a) and (b), the fused
Two “Lena” source images with different blur regions are used images successfully preserve the focused parts of each source
to evaluate the fusion performance in the fourth experiment. Fig. 8 image and combine them together to generate a clearer picture of
(a) and (b) are the source images Lena A and Lena B. In Fig. 8(a), the whole scene. As shown in Fig. 8, it can be observed that there
the left object is in-focus and clearly depicted, while the right are some losses of luminance distortion in Fig. 8(c)–(f), which are
object is out-of-focus and blurred. The circumstance of Fig. 8(b) is obtained by Average method, Max method, PCA method, and GP
contrary to that of Fig. 8(a). To show the fusion results more ex- method. Moreover, the fused image obtained by Max method
plicitly, Figs. 8(c)–(l) present the fused images acquired by the suffers from ringing effect to some degree. It also losses edge
Average, Max, PCA, GP, LP, CVT, SR-DCT, SR-ℓ1, SR-PRE and the contrast particularly in edge regions of source images such as Le-
proposed method, respectively. na's hair. Fig. 8(e) appears some blurring around the person
Table 1
Quantitative assessments of the compared methods and the proposed method.
Methods AG EI FD Q AB / F MI CE RW SSIM
Fig. 6. The “Clock” source images and fusion results by different fusion methods: (a) source image Clock A; (b) Source image Clock B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method; (k)
The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.
provided by the PCA method. Visually, the fused images of LP The best results for each metric are labeled in bold. As shown in
method, SR-DCT method SR-ℓ1, SR-PRE and the proposed method Table 4, the proposed method has the best values for all adopted
behave better in both light intensity and detail information in- quality metrics except MI, demonstrating that the proposed fusion
clude shapes and edges, it is difficult for the human eye to find the approach can capture much more significant information from
difference subjectively. Thus, the further objective comparison are multi-focus images to integrate an “ideal” result. Based on these
required. The results of the quantitative assessments are shown in
results, it can be concluded that the proposed method consistently
Table 4.
outperforms the other methods visually and quantitatively.
Table 4 presents a quantitative comparison in terms of the
metrics AG, EI, FD, Q AB / F , MI, CE, RW and SSIM of various methods.
Table 2
Quantitative assessments of the conventional methods and the proposed method.
Methods AG EI FD Q AB / F MI CE RW SSIM
Fig. 7. The “Leaf” source images and fusion results by different fusion methods: (a) Source image Leaf A; (b) Source image Leaf B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method; (k)
The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.
3.3.5. Fusion on multi-focus images “Barche” images obtained by the LP, SR-DCT, SR-ℓ1, SR-PRE method and the
In order to further evaluate the fusion performance, the fifth proposed approach are much clearer than the results of other
experiment is performed on the blurred versions of “Barche”. The methods. For better comparison, the quantitative assessments of
reference image Barche is blurred diagonally to build the source different methods for “Barche” are shown in Table 5.
images Barche A and Barche B as illustrated in Fig. 9(a) and (b). The The quantitative assessments of different methods for Barche
upper right corner of Barche A is in focused while the lower left are shown in Table 5. In that table, the best values are shown in
corner of Barche A is out-of-focus. Different from Barche A, Barche bold. It can be seen clearly from the table that the proposed ap-
B focuses on the lower left corner and defocuses on the upper
proach has the best values of all quality metrics except Q AB / F and
right corner of Barche. Moreover, the fused images obtained by
MI, indicating that the fused image obtained by the proposed
different method are depicted in Fig. 9(c)–(l).
approach is more similar to the reference image. Evidently, the
Similar to previous example, the focused parts of each source
proposed approach can preserve more comprehensive information
image are persisted in the fused results by different fusion meth-
ods. However, it can be confirmed that there are some losses of from source images to produce a satisfactory fused result. Overall,
luminance distortion or local information in Fig. 9(c)–(f). Evidently, it can be concluded based on the experiment, that both by visual
more or less artifacts appear in the images obtained by the Max comparison and by objective assessments, the proposed method
method and PCA method particularly in the edge regions such as shows competitive fusion performance compared with other tes-
the boats' mast. Obviously, as shown in Fig. 9(g)–(l), the fused ted methods.
Table 3
Quantitative assessments of the conventional methods and the proposed method.
Methods AG EI FD Q AB / F MI CE RW SSIM
Fig. 8. The “Lena” source images and fusion results by different fusion methods: (a) Source image Lena A; (b) Source image Lena B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method;
(k) The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.
Table 4
Quantitative assessments of the compared methods and the proposed method.
Methods AG EI FD Q AB / F MI CE RW SSIM
Fig. 9. The “Barche” source images and fusion results by different fusion methods: (a) Source image Barche A; (b) Source image Barche B; (c) The fused image obtained by
Average method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused
image obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1
method; (k)The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.
learning and sparse representation stage, which makes sparse- time, when compared with spatial-domain-based fusion methods.
representation-based fusion techniques time-consuming. However, the fusion ability of these methods is limited as shown
In Table 6, one can see it obviously that the spatial-domain- in Tables 1–5, since there is no single transform which can com-
based fusion methods (Av, Max and PCA) hold less time to form a pletely represent all features. To effectively extract the underlying
fused image, since pixels or regions are directly selected and information of the source images, sparse-representation-based
combined in a linear or non-liner way. However, these methods fusion methods are preferred to achieve high-quality fused images
are incapable of guaranteeing details of source images, which is at the cost of running time. Compared with other sparse-re-
specifically illustrated in Section 3.3. Taking image details into presentation-based fusion methods (SR-DCT, SR-ℓ1 and SR-PRE),
account, the transform-domain-based fusion methods (GP, LP and the proposed method can achieve better fused images with better
CVT) can achieve better fusion performance using relatively more quantitative assessments as shown in Tables 1–5. Furthermore, in
Table 5
Quantitative assessments of the conventional methods and the proposed method.
Methods AG EI FD Q AB / F MI CE RW SSIM
Table 6 Acknowledgments
The running time (Time/s) required to fuse Pepsi, Clock, Leaf, Lena and Barche using
different fusion methods.
We would like to thank the support by National Natural Science
Methods Images Foundation of China (61374135 and 61203321), China Postdoctoral
Science Foundation (2012M521676), China Central Universities
Pepsi Clock Leaf Lena Barche Foundation (106112015CDJXY170003 and 106112016CDJZR175511),
Chongqing Natural Science Foundation of China (cstc2015jcyjB0569)
Av 0.0019 0.0020 0.0005 0.0003 0.0006
Max 0.0107 0.0063 0.0015 0.0015 0.0017 and Chongqing Graduate Student Research Innovation Project
PCA 0.0087 0.0079 0.0011 0.0022 0.0021 (CYB14023).
GP 0.1434 0.1447 0.0306 0.0308 0.0311
LP 0.0132 0.02067 0.0058 0.0066 0.0060
CVT 4.6369 3.4458 1.1937 1.2140 1.1256
SR-DCT 439.537 332.3798 106.4618 112.5463 122.3114 References
SR-ℓ1 743.3922 678.2953 460.4978 506.4105 522.0846
SR-PRE 1214.2324 951.9228 726.7140 717.5696 723.4113 [1] J. Duan, G. Meng, S. Xiang, Multifocus image fusion via focus segmentation and
proposed 747.0962 680.6142 460.8295 506.8173 523.1878 region reconstruction, Neurocomputing 140 (2014) 193–209.
[2] B. Zhang, X. Lu, H. Pei, Multi-focus image fusion algorithm based on focused
region extraction, Neurocomputing 130 (2014) 44–51.
[3] Z.D. Liu, H.P. Yin, B. Fang, A novel fusion scheme for visible and infrared images
based on compressive sensing, Opt. Commun. 335 (2015) 168–177.
the proposed method, the sub-dictionaries are simultaneously [4] A.P. James, B.V. Dasarathy, Medical image fusion: a survey of the state of the
learned from source images, which can effectively enhance the art, Inf. Fusion 19 (2014) 4–19.
speed of dictionary constructing procedure in comparison to SR- [5] C.L. Chien, W.H. Tsai, Image fusion with no gamut problem by improved
nonlinear IHS transforms for remote sensing, IEEE Trans. Geosci. Remote Sens.
PRE model. 52 (1) (2014) 651–663.
In brief, the proposed fusion method can get better fused [6] V. Aslantas, A.N. Toprak, A pixel based multi-focus image fusion method, Opt.
Commun. 332 (2014) 350–358.
images at the cost of running time, which is extremely important
[7] B. Yu, B. Jia, L. Ding, Hybrid dual-tree complex wavelet transform and support
to deal with images with comprehensive information. vector machine for digital multi-focus image fusion, Neurocomputing 182
(2016) 1–9.
[8] N. Wang, Y. Ma, K. Zhan, Spiking cortical model for multifocus image fusion,
Neurocomputing 174 (2016) 733–748.
[9] Y. Jiang, M.H. Wang, Image fusion with morphological component analysis, Inf.
4. Conclusions and discussions Fusion 18 (2014) 107–118.
[10] Y. Liu, S.P. Liu, Z.F. Wang, A general framework for image fusion based on
Multi-focus image fusion plays a crucial role in military sur- multi-scale transform and sparse representation, Inf. Fusion 24 (2015)
147–164.
veillance, medical imaging, remote sensing, and machine vision. [11] G. Bhatnagar, Q.J. Wu, Z. Liu, A new contrast based multimodal medical image
Sparse-representation-based techniques are increasingly attract- fusion framework, Neurocomputing 157 (2015) 143–152.
[12] S.T. Li, B. Yang, J.W. Hu, Performance comparison of different multi-resolution
ing attention in multi-focus image fusion field. However, how to
transforms for image fusion, Inf. Fusion 12 (2) (2011) 74–84.
construct a over-complete dictionary adaptive to input image data [13] P. Chavez, S.C. Sides, J.A. Anderson, Comparison of three different methods to
is a crucial problem in sparse-representation-based image fusion merge multiresolution and multispectral data-Landsat TM and SPOT pan-
chromatic, Photogramm. Eng. Remote Sens. 57 (3) (1991) 295–303.
scheme. In this paper, a novel sparse-representation-based multi- [14] N. Cvejic, D. Bull, N. Canagarajah, Region-based multimodal image fusion using
focus image approach is presented to overcome the problem. A ICA bases, IEEE Sens. J. 7 (5) (2007) 743–751.
joint dictionary is constructed by combining several sub-diction- [15] P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code, IEEE
Trans. Commun. 31 (4) (1983) 532–540.
aries which are adaptively learned form source images using [16] G. Pajares, J.M. De La Cruz, A wavelet-based image fusion tutorial, Pattern
K-SVD algorithm. Batch-OMP algorithm is utilized to estimate the Recognit. 37 (9) (2004) 1855–1872.
[17] S.Q. Ren, J. Cheng, M. Li, Multiresolution fusion of PAN and MS images based
sparse coefficients. Furthermore, a maximum weighted multi- on the curvelet transform, in: Proceedings of the IEEE International conference
norm fusion rule is exploited to reconstruct the fused all-in-focus on Geoscience and Remote Sensing Symposium (IGARSS), 2010, pp. 472–475.
image. According to the fusion results and objective measures, it [18] A.L. Da Cunha, J. Zhou, M.N. Do, The nonsubsampled contourlet transform:
theory, design, and applications, IEEE Trans. Image Process. 15 (10) (2006)
can be observed that the proposed fusion approach can achieve 3089–3101.
competitive results compared to a number of state-of-the-art fu- [19] S.T. Li, H.T. Yin, L.Y. Fang, Remote sensing image fusion via sparse re-
presentations over learned dictionaries, IEEE Trans. Geosci. Remote Sens. 51
sion method.
(9) (2013) 4779–4789.
However, there are still many work worth to do in the follow- [20] X.X. Zhu, R. Bamler, A sparse image fusion algorithm with application to pan-
up study. Firstly, the current learned dictionary is not computa- sharpening, IEEE Trans. Geosci. Remote Sens. 51 (5) (2013) 2827–2836.
[21] H. Liu, Y. Yu, F. Sun, Visual-tactile fusion for object recognition, IEEE Trans.
tionally efficient because it still has a large number of redundant Autom. Sci. Eng. 99 (2016) 1–13.
atoms, a compact but informative dictionary constructing method [22] Z. Zhu, Y. Chai, H. Yin, A novel dictionary learning approach for multi-
may be considered to improve the performance of the developed modalitymedical image fusion, Neurocomputing (2016).
[23] H. Cheng, Z.C. Liu, L. Yang, X.W. Chen, Sparse representation and learning in
dictionary constructing scheme further. Secondly, since color acts visual recognition: theory and applications, Signal Process. 93 (6) (2013)
as a basic factor in the human visual system, how to extend the 1408–1425.
[24] Y. Chen, N.M. Nasrabadi, T.D. Tran, Hyperspectral image classification via
current fusion framework to color images fusion is another issue kernel sparse representation, IEEE Trans. Geosci. Remote Sens. 51 (1) (2013)
that requires further investigation. In addition, there appears to be 217–231.
a serious need for further research on the evaluation of different [25] J. Wright, A.Y. Yang, A. Ganesh, Robust face recognition via sparse re-
presentation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 210–227.
fusion method. To the best of our knowledge, the ground truth is [26] T. Guha, R.K. Ward, Learning sparse representations for human action re-
not usually known in practice, yet many of the currently used cognition, IEEE Trans. Pattern Anal. Mach. Intell. 34 (8) (2012) 1576–1588.
[27] H. Liu, D. Guo, F. Sun, Object recognition using tactile measurements: Kernel
performance measures require knowledge of the ground truth.
sparse coding methods, IEEE Trans. Instrum. Meas. 65 (3) (2016) 656–665.
One potential solution is to develop the so-called objective per- [28] B. Yang, S.T. Li, Multifocus image fusion and restoration with sparse re-
formance measures, i.e. independent from the ground truth or presentation, IEEE Trans. Instrum. Meas. 59 (4) (2010) 884–892.
[29] B. Yang, J. Luo, S.T. Li, Color image fusion with extend joint sparse model, in:
human subjective evaluation. These problems will be further in- Proceedings of the IEEE International Conference onPattern Recognition
vestigated in the future work. (ICPR), 2012, pp. 376–379.
H. Yin et al. / Neurocomputing 216 (2016) 216–229 229
[30] H.T. Yin, S.T. Li, L. Fang, Simultaneous image fusion and super-resolution using Yi Chai received the B.E. degree from National Uni-
sparse representation, Inf. Fusion 14 (3) (2013) 229–240. versity of Defense Technology in 1982. He received the
[31] Y. Liu, Z.F. Wang, Multi-focus image fusion based on sparse representation M.Sc. and Ph.D. degrees from Chongqing University in
with adaptive sparse domain selection, in: Proceedings of the IEEE Interna- 1994 and 2001, respectively. He is the associate dean of
tional Conference on Image and Graphics (ICIG), 2013, pp. 591–596. the College of Automation, Chongqing University. His
[32] H.T. Yin, S.T. Li, Multimodal image fusion with joint sparsity model, Opt. Eng. research interests include information processing, in-
50 (6) (2011), 067007-067007-10. tegration and control, and computer network and sys-
[33] Q. Liu, C.M. Zhang, Q. Guo, Adaptive sparse coding on PCA dictionary for image tem control.
denoising, The Visual Computer, 2015, pp. 1–15.
[34] K. Engan, S.O. Aase, J. Hakon Husoy, Method of optimal directions for frame
design, in: Proceedings of the IEEE International Conference on Acoustics,
Speech, and Signal Processing, 1999, pp. 2443–2446.
[35] M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over-
complete dictionaries for sparse representation, IEEE Trans. Signal Process. 54
(11) (2006) 4311–4322.
[36] Q. Zhang, B.X. Li, Discriminative K-SVD for dictionary learning in face re-
Zhaodong Liu received the B.E. degree in College of
cognition, in: Proceedings of the IEEE Conference on Computer Vision and
Automation from Chongqing University, China. He is
Pattern Recognition (CVPR), 2010, pp. 2691–2698.
currently working towards the Ph.D. degree in College
[37] R. Rubinstein, T. Peleg, M. Elad, Analysis K-SVD: a dictionary-learning algo-
of Automation, Chongqing University. His research in-
rithm for the analysis sparse model, IEEE Trans. Signal Process. 61 (3) (2013)
terests include intelligence image processing and ma-
661–677.
chine vision.
[38] J. Jose, J.N. Patel, S. Patnaik, Application of regression analysis in K-SVD dic-
tionary learning, Opt.-Int. J. Light Electron Opt. 126 (20) (2015) 2295–2299.
[39] T. Mertens, J. Kautz, F. Van Reeth, Exposure fusion, in: Proceedings of theIEEE
Pacific Conference on Computer Graphics and Applications, 2007, pp. 382–
390.
[40] K.L. Hua, H.C. Wang, A.H. Rusdi, A novel multi-focus image fusion algorithm
based on random walks, J. Vis. Commun. Image Represent. 25 (5) (2014)
951–962.
[41] M. Elad, M. Aharon, Image denoising via sparse and redundant representations
over learned dictionaries, IEEE Trans. Image Process. 15 (2) (2006) 3736–3745.
[42] M. Elad, M. Aharon, Image denoising via learned dictionaries and sparse re- Zhiqin Zhu received the B.E degree in electronic en-
presentation, in: Proceedings of the IEEE Computer Society Conference on gineering from Chongqing University in 2010. Cur-
Computer Vision and Pattern Recognition, 2006, pp. 895–900. rently, He is a Ph.D. candidate in the College of Auto-
[43] H. Liu, Y. Liu, F. Sun, Robust exemplar extraction using structured sparse mation, Chongqing University. His research interests
coding, IEEE Trans. Neural Netw. Learn. Syst. 26 (8) (2015) 1816–1821. include image processing and machine learning.
H. Yin received the B.E. degree, M.E. degree and the Ph.
D. degree in College of Automation from Chongqing
University, China. Prof. Yin joined the College of Auto-
mation at the Chongqing University in 2009. His re-
search interests include intelligence image processing
and machine vision. Prof. Yin serves as an Associate
Editor of International Journal of Complex Systems, and
the invited reviewer of IEEE Transaction of the ASABE,
Neurocomputing, IEEE Transactions on Cybernetics.