Author's Accepted Manuscript: Signal Processing

Author’s Accepted Manuscript
Multi-focus image fusion based on depth extraction

with inhomogeneous diffusion equation
Jinsheng Xiao, Tingting Liu, Yongqin Zhang,

Baiyu Zou, Junfeng Lei, Qingquan Li
www.elsevier.com/locate/sigpro
PII: S0165-1684(16)00031-1
DOI: http://dx.doi.org/10.1016/j.sigpro.2016.01.014
Reference: SIGPRO6040
To appear in: Signal Processing
Received date: 28 July 2015
Revised date: 12 January 2016
Accepted date: 14 January 2016
Cite this article as: Jinsheng Xiao, Tingting Liu, Yongqin Zhang, Baiyu Zou,
Junfeng Lei and Qingquan Li, Multi-focus image fusion based on depth
extraction with inhomogeneous diffusion equation, Signal Processing,
http://dx.doi.org/10.1016/j.sigpro.2016.01.014
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Multi-focus image fusion based on depth extraction
with inhomogeneous diffusion equation
Jinsheng Xiaoa,b , Tingting Liua , Yongqin Zhangc , Baiyu Zoua , Junfeng
Leia , Qingquan Lid,e
a
School of Electronic Information, Wuhan University, Wuhan, Hubei 430072, China
b
Department of Computer Science, University of California, Santa Barbara, CA 93106,
USA
c
Institute of Computer Science and Technology, Peking University, Beijing 100871,
China
d
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote
Sensing, Wuhan University, Wuhan,Hubei 430072,China
e
President Office, Shenzhen University, Shenzhen,Guangdong 518060,China
Abstract
The defocus of imaging can be modeled as a heat diffusion process and repre-
sented mathematically by a diffusion equation, where the image blur is corre-
sponded to the diffusion of heat. To improve the quality of observed images,
we propose an algorithm of multi-focus image fusion based on the depth ex-
traction. The optical imaging of two multi-focus images is simulated by the
heat equations of positive regions, where the scene depth is estimated by the
inhomogeneous diffusion equation. An adaptive initialization of image depth
estimation is proposed to improve the simulation accuracy of inhomogeneous
diffusion process. Image depth is approximated by an iterative solution of the
partial differential equation. According to the depth information, the target
images are adaptively divided into three types of regions: clear regions, fuzzy
regions and transition regions. Finally, the fusion of multi-focus images is
achieved by not only extracting the pixels of clear regions but also merging
the pixels of transition regions. Theoretical analysis and experimental re-
sults show that the proposed algorithm can avoid the blocking artifacts, and
outperform the state-of-the-art methods both subjectively and objectively in
most cases.
Keywords: Image fusion, multi-focus, depth extraction, partial differential
equation
Email addresses: xiaojs@whu.edu.cn (Jinsheng Xiao), zhangyongqin@pku.edu.cn

(Yongqin Zhang)
Preprint submitted to Elsevier January 28, 2016

1. Introduction
The objects at different depths are captured with different amounts of
blur, when a scene is imaged through a lens. Imaging cameras, especially
with long focal lengths, usually have only a finite depth of field. For an
image captured by a camera, only those objects within the field depth of
the camera are focused, whereas other objects are blurred[1]. The goal of
multi-focus image fusion (MFIF) is to merge multiple images with different
focuses in such a way that all objects appear to be sharp in the output
image. Currently, multi-focus image fusion has been widely used in target
identification, microscopic imaging, military operations, machine vision and
so on. After a brief review, the methods of multi-focus image fusion are
broadly divided into two categories[2]: the spatial-domain methods and the
frequency-domain methods.
The spatial-domain image fusion methods mainly depend on characteris-
tics of spatial information of image pixels. Recently, the patch-based methods
instead of the pixel-based methods have been widely used in many applica-
tions because a single pixel cannot effectively express the spatial structures of
the images[3]. Although this type of methods has good effects on the texture
regions, it often leads to misjudgment on the smooth regions. In general,
the patch-based methods cause serious block effects at image edges. Another
drawback of this type of methods is that it is too difficult to determine the
patch size for the best performances. These patch-based methods may cause
the artifacts at the boundary of blocks that greatly reduce the quality of
the fused image. In view of these shortcomings, Haghighat et al. [4] put
forward a multiple-focus image fusion algorithm based on the combination
of block division and discrete cosine transform(DCT). Ishita and Bhabatosh
[5] proposed a multi-focus image fusion method using a morphology-based
focus measure in a quad-tree structure. Gao et al. [6] presented a multiple
focus image fusion algorithm based on an iterative multistage block division
method. Hua et al. [7] assessed the focus score of each pixel locally for each
shallow depth-of-field input image and then optimized the focus score and
color coherence globally by the random walk framework. Focus detection is
a vital important step in multi-focus image fusion applications. They tend
to cause incorrect results when the measures are used to detect the smooth
regions. Zhang et al. [8] presented a new focus detection rule that treats
image pixels differently based on their classification. Mitianoudis et al. [9]
proposed an image fusion method based on self-trained Independent Com-
ponent Analysis (ICA) bases. In that fusion method,the input images are
segmented into three groups: edges, texture and constant background. The
max-abs rule is used for edges, whereas the mean rule is used for background.
2
Local variance, entropy and Fourier energy are introduced to measure tex-
ture properties of the texture patches in the spatial domain. Although these
methods mentioned above have better performance than the traditional block
segmentation based fusion methods, they still produce blocking effects to
some extent.
The frequency-domain image fusion methods combine multiple images in
the transform domain to formulate a fused image. The multi-focus image
fusion methods based on pyramid transform have been received great at-
tention in recent years [10, 11]. The DCT transform or the singular value
decomposition(SVD) is used in these methods [10, 12]. Patil and Muden-
gudi [13] proposed a united framework of pyramid transform and principal
component analysis (PCA) for image fusion without reference image. But
this method has a high computational complexity and tends to lose details.
Due to their good local and multi-resolution features, the multi-scale trans-
forms have been widely applied to image fusion [14], which includes Ridgelets,
Curvelets, Contourlets [15] and Shearlets [16]. Since wavelet transform has
good properties of multi-scale temporal and frequency domains, Tian and
Chen[17] proposed a new statistical sharpness measure by exploiting the
distribution of the wavelet coefficients to perform adaptive image fusion in
wavelet domain. Lewis[18]increased shift invariance and directional sensitiv-
ity than DWT, but added computational cost due to two fully decimated
trees. The wavelet transform has restricted ability to extract image features
due to its limitation of data representation. The frequency-domain fusion
methods can solve the problem of the blocking effect caused by the spatial-
domain fusion methods, but they cannot directly extract clear pixels from
the multiple focus source images. Since the fused image needs to be re-
constructed by the inverse transform, some information of the clear source
images has been lost. The false Gibbs phenomenon arises from the inconsis-
tent source of the fused coefficients over the multi-scale transform. Therefore,
these existing image fusion methods often cause the artifacts and distortion
of the fused image, such as false contours, textures and edges.
To solve these problems, different from above methods, we address the
new focus-detection-based spatial-domain algorithm for the fusion of multi-
focus images. The existing focus detection methods generally provide the
consistent results with human visual perception. That is, the basic assump-
tion on the multi-focus image fusion mentioned above is correct in most
cases. However, these multi-focus fusion methods tend to cause the incorrect
results when the measures are used to detect smooth regions that are close
to edges. The reason is that the regions out of focus, which are influenced by
edge information, are evaluated with higher scores by focus detection mea-
sures than those of the regions apart from edges. This phenomenon will force
3
focus-detection-based image fusion algorithms to make incorrect decisions on
which part to be selected[8]. It is inspired by the fact that the objects away
from focus appear blurred and the amount of blur increases with the relative
distance when bringing an object into focus. Defocus can be mathematically
modeled as a diffusion process by a diffusion equation, where image blur
corresponds to the diffusion of heat[19].
According to the optical imaging principle[20], this paper aims to acquire
the 3D information of the scenes and uses the relative distance to detect the
focus region. The proposed algorithm can extract the correct focus regions
of multi-focus images and avoid the misjudgments of both edges and smooth
regions. In this paper, the source images are assumed to have already been
registered. Firstly,the depth information of image is extracted by solving
the inhomogeneous diffusion equation of scene reconstruction. Secondly,the
spatial point spread model of imaging is established to estimate the image
depth. Thirdly,the focus regions and the defocus regions are separately iden-
tified by the depth information. In addition,the clear pixels in focus regions
are extracted in each image. Finally,the pixels in the edge and transitional
regions are fused to generate a final clear result.
The remainder of this paper is organized as follows. Section 2 gives a
brief introduction about depth extraction of multi-focus images and the im-
provement of solving the diffusion equation. Section 3 describes the proposed
algorithm in detail. Section 4 shows experimental results and analysis. Fi-
nally, the conclusion is drawn in Section 5.
2. Related Work
2.1. Optical imaging model
The light passing through the lens focuses on the imaging plane of the
optical imaging system[20]. If an object is not in the focus region, it will
diffuse to a blur spot on the image plane. The schematic diagram of optical
imaging system is shown in Figure 1, where F is a focal point, f is a focal
length of the lens, A is an object on focus, B is an object not on focus, C is
an imaging plane, v is the image distance, and s is object distance which is
shown as the depth map.
The well-known Thin Lens Equation is defined as:
1 1 1
= + (1)
f v s
Since the optical imaging system is a lossless system and the total energy
of the light spreading to the blur spot is equal to its energy. In the focused
region, the energy of each point on the object is concentrated on a point
4
Figure 1: The schematic diagram of the optical imaging system
through the optical lens (such as A ); Outside the focus region, the energy of
each point on the object spreads to a blur spot through the optical lens (such
as B ). Because we have taken the aperture to be circular, the blurred image
is also a circle with uniform brightness inside the circle and zero outside.
This is called a blur circle. Let b be the radius of the blur circle. The blur
spots generated by different points overlap each other which make objects
outside the focus region blur.
The optical imaging model of images can be approximated via the follow-
ing equation[21]:

I(y) = h(y, x, b)r(x)dx, ∀y ∈ Ω ⊂ R2 (2)
R2
where h(y, x, b) is a point spread function (PSF), b is a blur radius, r(x) is

a brightness radiation function of the spatial coordinates x, and I(y) is the
pixel value at position y. In our proposed algorithm, we set
1 − x−y 2
h(y, x, b) = e 2σ 2 (3)
2πσ 2
where standard deviation σ = γb for a certain constant γ > 0, and the
blurring radius b satisfies

Dv 1 1 1
b= − − (4)
2 f v s
where D is the radius of the lens.
2.2. Depth extraction based on the anisotropic diffusion

The depth information of the image scene determines the image focus
region and also causes the objects with different depths to be clear or blur in
the image. The depth information in scene can be recovered from the focus
5
region. The degree of image blur in different points can be calculated by the
focus region[22]. Thus we can fuse the input blurred images to produce the
clear image.
The multi-focus images have been exploited recently for depth informa-
tion extraction in the fields of image processing and computer vision. Ziou
et al.[23] proposed a dense computation of the blur difference between two
images to estimate the depth information. Favaro et al.[1] employed the
Poisson distribution as a point spread function and information divergence
as the standard of optimization to obtain depth information. Rajagopalan
et al.[24] estimated the depth of images captured with a real aperture cam-
era by fusing defocus and stereo cues. Under the assumption of Gaussian
distribution, Favaro and Soatto[19] computed the depth by minimizing the
Euclidean norm of the difference between the estimated images and the ob-
served images.
Subsequently, Favaro et al. [21] proposed thermal diffusion theory for
the image depth information extraction. After substituting the point spread
function defined by Eq. (2) into Eq. (3), the observed image I is given by
x−y2
1 −
I(y, b) = e 2(γb)2 r(x)dx (5)
2π(γb)2
Then for the input multi-focus images I1 (y) and I2 (y), their point spread
functions can be separately written in another form:
I1 (y) = I(y, b1 ), I2 (y) = I(y, b2 ) (6)
By substituting the expression of r(x) in terms of the image I1 (y), we can
rewrite I2 (y) as follows[21]:

1 −
x−y2
I(y, b2 ) = e 2Δσ 2 I(x, b )dx
1 (7)
2πΔσ 2
where Δσ 2 = σ22 − σ12 = γ 2 (b22 − b21 ) and Δσ 2 represents the relative blur
between I1 (y) and I2 (y).
The coordinate domain of the whole image is denoted by Ω. Let Ω+
denote the regions where I1 (y) is clearer than I2 (y) with Δσ 2 > 0. Then
Ω− denotes the other regions with Δσ 2 < 0. For the initialization t = 0,
I1 (y) inside Ω+ will diffuse into I2 (y), whereas I2 (y) inside Ω− will diffuse
into I1 (y). Then the relative inhomogeneous diffusion equation of thermal
diffusion model is established on the images I1 (y) and I2 (y) as follows[21]:
⎧
⎪
⎪ ∂u(y, t) ∇ · (c(y)∇u(y, t)), y ∈ Ω+
⎨ = t ∈ (0, ∞)
∂t ∇ · (−c(y)∇u(y, t)), y ∈ Ω− (8)
⎪
⎪ I1 (y), ∀ y ∈ Ω+
⎩ u(y, 0) =
I2 (y), ∀ y ∈ Ω−
6
and
I2 (y), ∀ y ∈ Ω+
u(y, Δt) = (9)
I1 (y), ∀ y ∈ Ω−

T
∂u ∂u
where ∇ is the gradient operator ∂y ,
1 ∂y2
. ∇· is the divergence operator
2
∂
∂yi
. The solution u : R2 × [0, ∞) → [0, ∞) plays the role of an image
i=1
I(y) = u(y, t), ∀y ∈ Ω captured with a certain focus setting related to t.
u(y, Δt) are the resultant image of diffusion where t = Δt. c(y) is the
diffusion coefficient between I2 (y) and I1 (y), which can be written as[21]
Δσ 2 γ 2 (b22 − b21 )
c(y) = = (10)
2Δt 2Δt
Since both I1 (y) and I2 (y) are the images of imaging the same scene,
they have the same depth s(y). When solving the inhomogeneous diffusion
equation, it is necessary to initialize a diffusion coefficient c(y). The calcula-
tion of c(y) is related to the depth map s(y). Each point in the image should
be initialized by a depth value s0 . By the iterative solution with the revision
of depth map, the solution of Eq. (8) can be got as Eq. (9).
In this algorithm, the initial thermal diffusion coefficient satisfies c(y) =
0. After substituting Eq. (4) into Eq. (10), it is easy to get the solution of
s(y). Then the initial depth of images can be given by
(v1 + v2 )f
s0 = (11)
v1 + v2 − 2f
So the diffusion areas Ω+ and Ω− are separately obtained as:
⎧
⎪ (v1 + v2 )f
⎪
⎨ Ω+ = y 0 < s(y) < f or s(y) >
v1 + v2 − 2f (12)
⎪ (v + v )f
⎪
⎩ Ω− = y f < s(y) <
1 2
v1 + v2 − 2f
Here, the solution of Eq. (8) can be converted into the energy minimum
problem of the cost function as follows[21]:

2
ŝ = arg min (u(y, Δt) − I2 (y)) dy + (u(y, Δt) − I1 (y))2 dy (13)
s Ω+ Ω−
It is founded that problems like Eq. (13) are ill-posed. A Tikhonov penalty
is incorporated to regularize the optimized problem in the following form:

ŝ = arg min (u(y, Δt) − I2 (y))2 dy
s Ω+ (14)
+ (u(y, Δt) − I1 (y))2 dy + α ∇s2 + ακ s2
Ω−
7
where the third term on the right-hand side of Eq. (14) imposes a smoothness
constraint on the depth map s and the last term is added to ensure that s
is bounded. Both α > 0 and κ > 0 control the strength of the regularization
items. Since κ is very small, the last term of Eq. (14) has no practical
influence on the remaining energy terms[21]. For the minimization of the
above cost function, a flow of depth maps s is constructed and indexed by
a pseudo time variable so that s moves along the direction opposite to the
gradient of the cost function. This flow will be discretized in time. The
suitable forward time integration such as the forward Euler scheme is used
to solve the optimization problem in Eq. (14) [21].
3. Proposed multi-focus fusion algorithm

The basic idea of the proposed multi-focus image fusion is to design a
novel fusion strategy based on the inhomogeneous thermal diffusion equation
and image depth partition. First of all, the image depth information is
extracted through the inhomogeneous diffusion equation for simulating the
optical imaging system. Then the focus region of two images and transition
region can be got according to the depth information. Finally, the clear pixels
in focus region of the two images are extracted, and the pixels in transition
region are merged to generate the final fused image. The flow chart of our
proposed algorithm is shown in Figure 2.
3.1. Brief steps of proposed algorithm

The proposed algorithm briefly consists of the following five steps:
(1) The two input multi-focus images are I1 (y) and I2 (y). For the cali-
brated camera, the proposed model is established with the optical parameters
of the camera. According to the optical imaging system, the image distance
is obtained.
(2) To get the accurate image depth s(i, j). The depth is initialized by
the self-adapting scheme. The depth information is estimated by the solution
of the inhomogeneous diffusion equation from the input multi-focus images
I1 (y) and I2 (y).
(3) The focus regions of two pictures are determined by the depth in-
formation. Due to the image continuity of scene changes and multilayered
construction of image depth, the images in the center of focus area will grad-
ually spread out as the defocused area.
(4) The depth map s(i, j) is layered. The benchmark of depth value will be
determined to divide s(i, j) into three fusion template regions. These regions
include the foreground regions, the transition regions and the background
regions.
8
Figure 2: The flow chart of our proposed algorithm
(5) To get the multi-focus image fusion based on depth extraction. Ac-
cording to the preliminary fusion template in previous step, the transition
region is handled with a smooth post-processing procedure to get a more
continuous multilevel fusion template image. The fusion processing is imple-
mented on RGB channels.
The specific procedures of our proposed algorithm will be described in
detail in the next several subsections.
3.2. The improvement of inhomogeneous diffusion depth estimation

For the initial estimation of image depth, the depth information s0 es-
timated by a fixed depth map in the literature [21] cannot represent the
real scene. The inaccurate initialization of inhomogeneous diffusion will cost
more time to get the iterative solution. The errors of the initial value can
result in non convergence or inaccurate results. So we propose an adaptive
initialization of image depth estimation to improve the simulation accuracy
of inhomogeneous diffusion process. Given the constraints, the diffusion step
size of each pixel is adaptively adjusted according to the constraint condi-
tions. In this way, each pixel can be effectively reduced by the number of
9
iterations and gets the final result of the algorithm according to the most
suitable step size. The simulation of heat diffusion is operated on the input
multi-focus images with the diffusion coefficient c(y) = 1. The cost functions
Cost1 and Cost2 of each time step can be computed as follows:

Cost1 = (u2(y, Δt) − I1 (y))2
(15)
Cost2 = (u1(y, Δt) − I2 (y))2
where u1 (y, Δt), u2 (y, Δt) are the resultant images of diffusion, and I1 (y),
I2 (y) are the input multi-focus images. The integration of cost functions J1
and J2 are separately given by

J1 = Ω+ (u2 (y, Δt) − I1 (y))2 dy
(16)
J2 = Ω− (u1 (y, Δt) − I2 (y))2 dy
Then, the time step Δt can be got by finding the point of the minimum
integral image in time sequences. The relative blur Δσ(y)2 can be computed
as:
Δσ(y)2 = 2Δt · c(y) (17)
The values of each point in two images are compared to obtain the minimum
value. The smaller value of each point is chosen to correct the value of the
normalized relative blur acquired in the previous step. The parameter β is
used to modify the value of relative blur. In practice, we chose 0.5 < β < 1
to ensure that image depth converges fast. If the smaller value is in image
I1 (y), it is positive. Otherwise, it is negative. The formula of the relative
blur Δσ 2 is given as follows:
Δσ 2 = sgn(ŝ2 − ŝ1 ) · βΔσ 2 (18)
According to Eq. (4) and Eq. (10), and set c(y) = 1, we can get Δσ 2 = 2Δt
and 2 2
4Δσ 2 v2 v2 v1 v1
= −1− − −1− (19)
γ 2 D2 f s(y) f s(y)
So
v22 − v12 2(v2 − v1 ) 2(v22 − v12 ) v22 − v12 2(v2 − v1 ) 4Δσ 2
+ − + − − 2 2 = 0 (20)
s2 (y) s(y) f · s(y) f2 f γ D
Then we can get the solution of s(y)−1 as follows.

√
−1 1 1 Δ∗
s(y) = − ± (21)
f v2 + v1 2(v22 − v12 )
10
where
2 2
∗ 2(v22 − v12 ) 2 2 v2 − v12 2(v2 − v1 ) 4Δσ 2
Δ = 2(v2 − v1 ) − − 4(v2 − v1 ) − − 2 2
f f2 f γ D
2
16Δσ
= 4(v2 − v1 )2 + 2 2 (v22 − v12 )
γ D
(22)
So it is easy to obtain:

1 1 1 16Δσ 2 2
s(y)−1 = − ± 4(v2 − v1 ) 2 + (v − v12 )
f v2 + v1 2(v22 − v12 ) γ 2 D2 2

1 1 1 4Δσ 2 v2 + v1
= − ± 1+ 2 2
f v2 + v1 v2 + v1 γ D v2 − v1
(23)
The initialized depth information s(y) can be written as
−1
1 1 1 4Δσ 2 v2 + v1
s(y) = − ± 1+ 2 2 (24)
f v2 + v1 v2 + v1 γ D v2 − v1
The initialized diffusion coefficient c(y) can be computed with the initial-
ized depth information s(y). Thus the final depth information is obtained
by the iterative solution of the optimized diffusion equation.
3.3. The comparison of depth extraction methods

In order to verify the performance improvement, the proposed algorithm
is compared with the depth extraction method based on least squares[19],
clarity measure[25], information divergence[1] and original inhomogeneous
thermal diffusion equation[21]. The depth information is showed on 256
levels of gray image, stipulating that the smaller the grey value is, the darker
the images are, with the objects closer to the camera. The depth extraction
results of these different methods for the test multi-focus images are shown
in Figure 3.
As can be seen from Figure 3(e,j), there are lots of small area mistakes
in the wall and inside the book. For the book and pepper in Figure 3(f,k),
the method based on clarity measures can only obtain two layers of depth.
It has a massive miscalculation in the former and background. The red
books located at the foreground have no obvious integrity. The result is
poor. As to the depth extraction method based on information divergence,
the contrast shown in Figure 3(g,l) is not strong between foreground and
background. Many details lost too. Figure 3(h,m) also shows a lot of noise at
11
Figure 3: Compared results of depth extraction methods: (a)∼(d) the multi-focus im-
ages of ’book’ and ’pepper’, (e,j) least square, (f,k) clarity measures, (g,l) information
divergence, (h,m)original diffusion, (i,n) proposed method.
12
a small scale in the depth extraction method based on the original diffusion.
Depth estimation of the proposed method in Figure 3(i,n) has relatively
administrative levels feeling. The depth estimation of the foreground and
background are more accurate. The outline area of the two books has better
visual effects.
Since there is no available depth information of real scenes, considering
the accuracy problem of depth extraction obtained by pixel matching, it is
difficult to get the error estimation between the actual object distance and
the depth information extracted by our proposed algorithm. Referring to
manually extract the foreground pixels, we implemented the binarization of
depth map and calculated the correct judgment rate of the foreground pixels
and the background pixels. For the estimation accuracy, the selected thresh-
S
old maximizes the correct judgment rate. It is calculated as P = M × 100%,
where M is the pixel number in condition that the foreground and back-
ground are extracted manually. S is the correct pixel number for partitioning
the foreground and the background by the depth extraction method.
Figure 4: The binarization of depth map. From left to right, top to bottom: (a)The
foreground of ’book’ extracted manually. (b)the binarization result of (a). (c)The fore-
ground of ’pepper’ extracted manually. (d)the binarization result of (c). (e) to (i) The
binarization of depth map for ’book’ of Figure 3(e∼ i) with thresholding: 10,62,68,78,108.
(j) to (n) The binarization of depth map for ’pepper’ of Figure 3(j∼ n) with thresholding:
10,94,88,128,176.
13
Table 1 shows the correct rate of pixels calculated for the group of test
images in Figure 4. As can be seen from the results, our proposed depth ex-
traction algorithm has less misjudgment rate and higher accuracy than other
competing methods. Here, CM denotes clarity measures, LS denotes least
square, ID denotes information divergence, OD denotes original diffusion,
and PM denotes proposed method.
Table 1: The compared pixel matching results by these different depth extraction methods
Book Pepper
Rate(%)
FG BG Total FG BG Total
CM 98.01 90.82 93.97 91.36 97.41 95.58
LS 89.54 94.99 92.60 80.25 93.31 89.35
ID 96.64 92.89 94.54 44.90 84.58 72.54
OD 95.91 94.05 94.87 93.53 99.24 97.51
PM 98.08 95.45 96.69 93.13 99.53 97.59
3.4. Multi-focus Image Fusion Based on depth extraction

The final depth information has been got in last subsection. Then the
focus regions of two pictures according to the depth information can be de-
termined. Due to the continuity of image scene changes and multilayered
construction of image depth, the images in the center of focus area will grad-
ually spread out as the defocused area. According to the preliminary fusion
template in previous step, the transition region is handled with a smooth
post-processing procedure to get a more continuous multilevel fusion tem-
plate image. The fusion processing is implemented on RGB channels. The
details are shown in the followings.
To determine the focus regions of two pictures according to the depth
information, a preliminary image fusion approach is given as follows:
I(i, j) = M(i, j) · I1 (i, j) + (1 − M(i, j)) · I2 (i, j) (25)
where M(i, j) is the weight of image fusion at pixel (i, j). Moreover, we divide
the template into three types of regions: clear regions when M(i, j) = 1, fussy
regions when M(i, j) = 0 and transition regions when 0 < M(i, j) < 1. That
is, the definition of these regions is given by
⎧
⎪
⎪ 1 , s(i, j) < Tlow , (i, j) ∈ clear region
⎨
0 , s(i, j) > Thigh , (i, j) ∈ f ussy region
M(i, j) =
⎪
⎪ s(i, j) − Tlow
⎩ , else , (i, j) ∈ transition region
Thigh − Tlow
(26)
14
where Tlow , Thigh are two discrimination thresholds, respectively. They are
defined as follows: ⎧ smax
⎪
⎨ Tlow = savg −
savg
smax (27)
⎪
⎩ Thigh = savg +
savg
where savg is the average of all the pixel values of depth map, and smax is the
max of pixel values of depth map.
Due to the continuity of image scene changes and multilayered construc-
tion of image depth, the images in the center of focus area will gradually
spread out as the defocused area. According to the preliminary fusion tem-
plate in previous step, the transition region is handled with a smooth post-
processing procedure to get a more continuous multilevel fusion template
image.
Ms (i, j) = M(i, j) ∗ Grs (28)
where Grs is a bilateral filtering kernel function with a small window. Grs
is used to smooth the fusion weights of image edges,and simultaneously to
protect the structures of image edges and the details. A rapid bilateral
filtering method [26] is used to improve the computational efficiency. Ms (i, j)
is the fusion weight after smoothing.
The fusion processing is implemented on RGB channels, respectively.
Ifkusion (i, j) = Ms (i, j) · I1k (i, j) + (1 − Ms (i, j)) · I2k (i, j) (29)
where k ∈ {R, G, B} and R, G, B mean three channels of color image. Ifkusion (i, j)
is the final fusion result of each channel. According to Eq. (29) using the
smooth multi-level fusion template, the multi-focus image fusion is carried
out.
3.5. Method Summary

To give further clarification of our proposed algorithm, it can be summa-
rized here. Let N denote the total number of the iterations. The specific
implementation of the proposed algorithm is shown in Algorithm 1.
4. Experimental results
In the proposed algorithm, the relative depth of the image is obtained
from the relative blur of the input images. The iterative method is used
to estimate the image depth.Therefore, the proposed algorithm has some
robustness to the parameters of specific cameras. In practice, the default
parameter values of our proposed algorithm are used for image fusion, un-
less otherwise specified beforehand. They are the aperture F (= 2.0), the
15
Algorithm 1 Pseudo codes of the DEIDE-Based Multi-focus image fusion
Input: image I1 (y) focus on foreground, I2 (y) focus on background.
Output: a fused image If usion (y).
I. Set the initial parameters F, f, D, s1 and s2 .
II. Obtain the diffusion areas Ω+ , Ω− using Eq. (12).
III.Compute initial depth s(0) by Eq. (24).
IV. Main loop for the iterative solution: i = 1 to N do
1) Get the thermal diffusion result of Eq. (8) with the depth map.
2) Calculate the cost function of Eq. (13) and solve the optimization
problem in Eq. (14) using the forward Euler scheme.
3) Obtain the depth map s(i) by iteration solution.
4) If the cost function is small enough,break.
V. Compute initial image fusion template M by Eq. (26).

VI. Smooth multilevel fusion template M with bilateral filtering Grs by
Eq. (28).
VII. Aggregate multi-focus image to get final image fusion If usion (y) by
Eq. (29).
focal length f (= 12mm), the radius of the lens D = Ff (= 6mm), the object
distance s1 (= 0.52m) and s2 (= 0.85m).The experiments show that the pro-
posed algorithm can generally achieve satisfactory results for a wide range of
images .
4.1. Subjective evaluation

To verify the performance of our proposed algorithm, we select the well-
known competing methods for performance evaluation. These baseline meth-
ods include DCT(block segmentation combine with DCT transform)[4], PCA
(hierarchical PCA)[13], LP(Laplacian Pyramid) [10], DTCWT (Dual Tree
Complex Wavelet Transform algorithm)[18] and ICA(Independent Compo-
nent Analysis)[9]. Six groups of multi-focus images, such as book, disk, pep-
per, toy, clock and lab are used for testing multi-focus image fusion methods
in the experiments, which separately focused on the background and fore-
ground images. The results of the proposed algorithm compared with other
image fusion methods are shown in Figure 5 to Figure 10. Figure 11 to Fig-
ure 12 give the compared fusion results of these different methods with the
difference pictures.
16
Figure 5: The comparison of fusion results of the proposed algorithm and the compet-
ing methods for the ‘book’ test images and fragments. From left to right: (a) focus on
foreground, (b) focus on background, (c) depth map, (d) template map, (e) the proposed
algorithm, (f)DCT[4], (g)PCA[13], (h)LP[10], (i)DTCWT[18], and (j)ICA[9].
Figure 5 show the results of the group of ’book’ images. As can be seen
from the results, Figure 5(f) has the blocking effects at the angle boundaries
of the ’book’ image; Figure 5(g) have certain blur in the background, alias-
ing artifact at the edges and great loss of details; Figure 5(h) and (i) still
have aliasing artifact at the edges, the foreground and the background, even
though the foreground and the background have maintained a good defini-
tion; Figure 5(j) have aliasing artifact at the edges. The proposed algorithm
(Figure 5(e)) not only can effectively retain the contour details and suppress
blocking artifacts, but also has very good fusion effect for color images. Be-
sides that most of the pixels are extracted from clear regions of input images,
the proposed algorithm also specially deals with the edges.
Figure 6 give the results of the group of ’disk’ images. It is found that
the DCT based fusion method causes obvious blocking effects at the inter-
nal region of clock in Figure 6 (f); Figure 6(g) is too vague; Figure 6(h)
17
Figure 6: The comparison of fusion results of the proposed algorithm and the competing
methods for the ‘disk’ image and fragments. From left to right:(a) focus on foreground,
(b) focus on background, (c) depth map, (d) template map, (e) the proposed algorithm,
(f)DCT[4], (g)PCA[13], (h)LP[10], (i)DTCWT[18], and (j)ICA[9].
and (i) reduce the contrast between the foreground and the background to
some extent; Figure 6(j) still exist block artifacts. The proposed algorithm
(Figure 6(e)) achieves better contrast, better detail recovery, clearer edge
contours and has better results for grayscale images.
As can be seen from the results of Figure 7 to Figure 10, the DCT method
results in the side effect of reduction in contrast or blurring the fused im-
age, and also causes blocking artifacts. As for the PCA algorithm, the LP
algorithm and the DTCWT algorithm, these wavelet based methods suffer
from ringing artifacts. The results of ICA method have blocking artifacts.
The proposed algorithm avoids the blocking artifacts, which often appears
in traditional spatial domain methods and does not produce empty texture
and artifacts. Furthermore, the proposed algorithm can also provide clear
contours and sharp edges of the results.
In addition, the difference of pixel between the image and the clear area
18
in the original image is minimal. The differences can be seen in the black
box marked parts of Figure 11 to Figure 12. The proposed algorithm can
recover the pixel information of source image as much as possible. Each
competing method in the contour edges and the original focus (clear) has
obvious differences, but the proposed algorithm has the smallest differences.
4.2. Objective evaluation

For the selection of objective indicators, the standard reference image
is insufficient. Four indicators, such as information entropy (IE), mutual
information (MI), edge strength (ES), and the spatial structure similarity
(QAB/F )[27], are used to evaluate the experimental results of these meth-
ods.These evaluation indexes are explained below.
(1) Information entropy (IE) is an important indicator to measure image
containing rich information, and its definition is given by:

L−1
IE = − P (i) log2 P (i) (30)
i=0
where L is the number of gray levels belongs to the image, P (i) represents a
probability density distribution of the image gray level. The greater fusion
image entropy means that it has the more information and the better fusion
quality.
(2) Mutual information (MI) is defined as:

L
L
PRF (i, j)
MI = PRF (i, j) log2 (31)
i=1 j=1
PR (i)PF (j)
where PRF (i, j) represents the joint probability density distribution of the
image gray in R and F . PR (i) and PF (i) represents probability density
distribution of the image gray in R and F , respectively. MI is a metric
defined as the sum of mutual information between each input image and the
fused image. The greater mutual information (MI) denotes that the fusion
image has the more information from the original image, which implies that
the fusion effect is more ideal.
(3) Edge strength (ES) is defined as:

1 M N
ES = ( 2
I(x, y)W (x, y)) + ( I(x, y)W (x, y))2
M · N x=0 y=0 Ω Ω
(32)
where M, N represent the hight and the width of the image ,I(x, y) represents
the value of pixels, W (x, y) is the sobel matrix,W (x, y) for its transpose, Ω as
19
3×3 matrix. The greater edge strength value shows the fusion image remains
more edge information of source image, and achieves the better fusion effect.
(4) The spatial structure similarity QAB/F is defined as:

M
N
(QAF (m, n)w A (m, n) + QBF (m, n)w B (m, n))
m=1 n=1
QAB/F (m, n) = (33)

M
N
(w A (m, n) + w B (m, n))
m=1 n=1
where QAF and QBF separately represent the retention value of edge infor-
mation from the source image A and B to the fusion image, and the weight
of w A (m, n) and w B (m, n) is usually a function of edge strength. Petrovic
metric (QAB/F ) measures the relative amount of edge information that is
transferred from the source images (A and B) into the fused image (F). The
greater QAB/F means that the fusion image keeps the more edges and struc-
tures of the source image. The objective measures of these different methods
are showed in Figure 13 and specific data in Table 2.
Table 2: The objective comparison of these different methods.
DCT PCA LP DTCWT ICA Proposed

IE 7.091 6.992 7.050 7.139 7.095 7.070
MI 7.997 6.000 7.854 5.709 5.960 8.035
Disk ES 45.170 28.571 42.638 45.301 42.815 44.052
QAB/F 0.712 0.531 0.532 0.663 0.679 0.729
IE 7.204 7.286 7.294 7.298 7.276 7.215
MI 8.617 7.293 8.350 6.923 7.259 9.254
Book ES 53.15 34.731 50.266 54.397 52.917 54.19
QAB/F 0.715 0.500 0.540 0.682 0.688 0.731
IE 7.574 7.536 7.562 7.577 7.718 7.586
MI 8.160 6.603 8.023 7.166 6.684 8.902
Pepper ES 46.224 33.631 45.412 45.605 50.684 48.700
QAB/F 0.686 0.612 0.565 0.690 0.692 0.720
IE 6.158 6.164 6.181 6.146 6.268 6.140
MI 7.815 6.984 7.816 6.379 6.678 7.851
Toy ES 35.301 29.506 33.794 36.201 37.613 34.878
QAB/F 0.693 0.659 0.606 0.671 0.692 0.711
The objective measure results of these different methods suggest that

the proposed algorithm keeps the Edge of the source image more structured
20
information such as variance due to direct extraction of the source image
pixels in the clear areas. So the mutual information (MI) and spatial struc-
ture similarity information QAB/F basic by our proposed algorithm have the
higher results than the competing methods. Figure 13 and Table 2 show that
the image entropy (IE) and the Edge strength (ES) of the frequency-domain
method are slightly higher than those of other methods. The main reason is
that the frequency-domain method generates different degrees of false con-
tours in the fusion image. The proposed algorithm not only reduces virtual
textures and halo artifacts, but also has the clear contour and texture in-
formation in the fused images.Experimental results show that the proposed
algorithm outperforms the current competing method both subjectively and
objectively in most cases.
5. Conclusions
In this paper, we proposed the optimized anisotropic thermal diffusion
equation to estimate image depth for the multiple-focus image fusion. The
spatial point spread model of optical imaging and the anisotropic thermal
diffusion equation are established to simulate the variation of focus. The
depth information of the image scene is estimated by solving the optimized
problem of the anisotropic diffusion equation. According to the scene depth
estimation,the final merged image is constructed by the multi-level image
fusion approach. Experimental results demonstrated that the proposed al-
gorithm can eliminate the blocking effect caused by the traditional spatial
domain methods and has the ability to overcome the shortcomings of the
frequency-domain methods, which cannot extract clear pixels directly from
the source images. Our proposed algorithm, which can produce the fused
images with more clear edges, outlines and details of the source image, has
better performance than the existing state-of-the-art methods both qualita-
tively and quantitatively.
Acknowledgements
This work was supported by National Natural Science Foundation of
China (Grant Nos. 61471272 and 61201442), and the State Scholarship Fund
of China. The Titan X used for this research was donated by the NVIDIA
Corporation.
References
[1] P. Favaro, A. Mennucci, S. Soatto, Observing shape from defocused
images. International Journal of Computer Vision, 2003, 52(1): 25-43.
21
[2] R. Garg, P. Gupta, H. Kaur. Survey on multi-focus image fusion algo-
rithms. Engineering and Computational Sciences (RAECS), 2014 Recent
Advances in. IEEE, 2014: 1-5.
[3] V. Aslantas, A.N. Toprak. A pixel based multi-focus image fusion

method. Optics Communications, 2014, 332: 350-358.
[4] M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Multi-focus image

fusion for visual sensor networks in DCT domain. Computers & Electri-
cal Engineering, 2011, 37(5): 789-797.
[5] I. De, B. Chanda, Multi-focus image fusion using a morphology-based

focus measure in a quad-tree structure. Information Fusion, 2013,
14(2):136-146.
[6] X.-N. Gao, Z.-M. Yu, J. Zhang, T.-S. Li, Multi-focus Image Fusion
Based on Multi-Level and Iterative Method. Acta Electronica Sinica,
2011, 39(3):690-694.
[7] K.L. Hua, H.C. Wang, A.H. Rusdi, S.Y. Yang, A novel multi-focus image
fusion algorithm based on random walks. Journal of Visual Communi-
cation and Image Representation, 2014, 25(5): 951-962.
[8] X. Zhang, X. Li, Z. Liu, Y.C. Feng, Multi-focus image fusion using
image-partition-based focus detection. Signal Processing, 2014, 102: 64-
76.
[9] N. Mitianoudis, S.A. Antonopoulos, T. Stathaki. Region-based ICA im-

age fusion using textural information. Digital Signal Processing (DSP),
2013 18th International Conference on. IEEE, 2013: 1-6.
[10] V.P.S. Naidu, B. Elias, A Novel Image Fusion Technique using DCT
based Laplacian Pyramid. International Journal of Inventive Engineer-
ing and Sciences (IJIES) ISSN, 2013: 2319-9598.
[11] A. Toet, Image fusion by a ratio of low-pass pyramid. Pattern Recogni-

tion Letters, 1989, 9(4): 245-253.
[12] V.P.S. Naidu. Image Fusion Technique using Multi-resolution Singular

Value Decomposition. Defence Science Journal, 2011, 61(5): 479-484.
[13] U. Patil, U. Mudengudi, Image fusion using hierarchical PCA. Image In-
formation Processing (ICIIP), 2011 International Conference on. IEEE,
2011: 1-6.
22
[14] L.-C. Jiao, S. Tan, Development and Prospect of Image Multiscale Ge-
ometric Analysis , Acta Electronica Sinica, 2003, 31(12A):1975-1981.
[15] Q. Zhang, B.L. Guo, Multi-focus image fusion using the nonsubsampled
contourlet transform, Signal Processing. 2009, 89(7): 1334C1346.
[16] Q. Miao, C. Shi, P. Xu, et al. A novel algorithm of image fusion using
shearlets. Optics Communications, 2011, 284(6): 1540-1547.
[17] Tian J, Chen L. Adaptive multi-focus image fusion using a wavelet-based

statistical sharpness measure. Signal Processing, 2012, 92(9): 2137-2146.
[18] J.J. Lewis, R.J. O’ Callaghan, S.G. Nikolov, et al. Pixel-and region-
based image fusion with complex wavelets. Information fusion, 2007,
8(2): 119-130.
[19] P. Favaro, S. Soatto, A geometric approach to shape from defocus. Pat-

tern Analysis and Machine Intelligence, IEEE Transactions on, 2005,
27(3): 406-417.
[20] M. Subbarao, T.S. Choi and A. Nikzad, Focusing Techniques. Optical

Engineering, 1993, 32(11):2824- 2836.
[21] P. Favaro, S. Soatto, M. Burger, S.J. Osher, Shape from defocus via
diffusion. Pattern Analysis and Machine Intelligence, IEEE Transactions
on, 2008,30(3): 518-531.
[22] Y. Zhang, Y. Ding, J. Xiao, et. al. Visibility enhancement using an image
filtering approach. EURASIP Journal on Advances in Signal Processing,
2012, 2012(220): 1-6.
[23] D. Ziou, F. Deschênes, Depth from defocus estimation in spatial domain.

Computer Vision and Image Understanding, 2001, 81(2): 143-165.
[24] A.N. Rajagopalan, S. Chaudhuri, U. Mudenagudi. Depth estimation and

image restoration using defocused stereo pairs. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 2004, 26(11): 1521-1525.
[25] Z. Huang, M. Zhang, X. Zhang, W. Wang. Multi-focus Image Fusion Al-

gorithm based on Relative Activity Level. Microcomputer Information,
2009, 25(18):289-291
[26] J. Xiao, W. Li, G. Liu, et. al. Hierarchical Tone Mapping Based on Image
Color Appearance Model, IET Computer Vision, 2014, 8(4): 358–364.
23
[27] J. Liu, H. Wang, and Q. Wen, A new fusion image quality assessment
based on edge and structure similarity. IEEE International Conference
on Cyber Technology in Automation, Control, and Intelligent Systems.
IEEE, 2011: 112-115
24
methods for the ‘pepper’ test images. From left to right: (a) focus on foreground, (b) focus
on background, (c) depth map, (d) template map,(e) the proposed algorithm, (f)DCT[4],
(g)PCA[13], (h)LP[10], (i)DTCWT[18], and (j)ICA[9].
25
methods for the ‘toy’ test images. From left to right: (a) focus on foreground, (b) focus
on background, (c) depth map, (d) template map, (e) the proposed algorithm, (f)DCT[4],
26
methods for the ’clock’ test images. From left to right: (a) focus on foreground, (b) focus
27
methods for the ‘lab’ test images. From left to right: (a) focus on foreground, (b) focus
28
Figure 11: The difference between fusion results and the original focus foreground picture
of the ‘book’ image. From left to right: (a)DCT[4], (b) ICA[9], (c) PCA[13], (d) LP[10],
(e) DTCWT[18], (f) Proposed.
Figure 12: The difference between fusion results and the original focus foreground picture
of the ‘disk’ image. From left to right: (a)DCT[4], (b) ICA[9], (c) PCA[13], (d) LP[10],
(e) DTCWT[18], (f) Proposed.
29
(a) (b)
(c) (d)
Figure 13: The objective measures of fusion results of these different methods. From left
to right: (a) IE, (b) MI, (c) ES, (d) QAB/F
30

Author's Accepted Manuscript: Signal Processing

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Author's Accepted Manuscript: Signal Processing

Caricato da

Copyright:

Formati disponibili

Author’s Accepted Manuscript

Multi-focus image fusion based on depth extraction

Jinsheng Xiao, Tingting Liu, Yongqin Zhang,

Email addresses: xiaojs@whu.edu.cn (Jinsheng Xiao), zhangyongqin@pku.edu.cn

Preprint submitted to Elsevier January 28, 2016

where h(y, x, b) is a point spread function (PSF), b is a blur radius, r(x) is

2.2. Depth extraction based on the anisotropic diﬀusion

3. Proposed multi-focus fusion algorithm

3.1. Brief steps of proposed algorithm

3.2. The improvement of inhomogeneous diﬀusion depth estimation

Δσ 2 = sgn(ŝ2 − ŝ1 ) · βΔσ 2 (18)

Then we can get the solution of s(y)−1 as follows.

3.3. The comparison of depth extraction methods

3.4. Multi-focus Image Fusion Based on depth extraction

3.5. Method Summary

3) Obtain the depth map s(i) by iteration solution.

4) If the cost function is small enough,break.

V. Compute initial image fusion template M by Eq. (26).

4.1. Subjective evaluation

4.2. Objective evaluation

Table 2: The objective comparison of these diﬀerent methods.

DCT PCA LP DTCWT ICA Proposed

The objective measure results of these diﬀerent methods suggest that

[3] V. Aslantas, A.N. Toprak. A pixel based multi-focus image fusion

[4] M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Multi-focus image

[5] I. De, B. Chanda, Multi-focus image fusion using a morphology-based

[9] N. Mitianoudis, S.A. Antonopoulos, T. Stathaki. Region-based ICA im-

[11] A. Toet, Image fusion by a ratio of low-pass pyramid. Pattern Recogni-

[12] V.P.S. Naidu. Image Fusion Technique using Multi-resolution Singular

[17] Tian J, Chen L. Adaptive multi-focus image fusion using a wavelet-based

[19] P. Favaro, S. Soatto, A geometric approach to shape from defocus. Pat-

[20] M. Subbarao, T.S. Choi and A. Nikzad, Focusing Techniques. Optical

[23] D. Ziou, F. Deschênes, Depth from defocus estimation in spatial domain.

[24] A.N. Rajagopalan, S. Chaudhuri, U. Mudenagudi. Depth estimation and

[25] Z. Huang, M. Zhang, X. Zhang, W. Wang. Multi-focus Image Fusion Al-

Potrebbero piacerti anche