Sei sulla pagina 1di 16

Information Sciences 361–362 (2016) 84–99

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

A novel image hashing scheme with perceptual robustness


using block truncation coding
Chuan Qin a,∗, Xueqin Chen a, Dengpan Ye b, Jinwei Wang c, Xingming Sun c
a
Shanghai Key Lab of Modern Optical System, and Engineering Research Center of Optical Instrument and System, Ministry of Education,
University of Shanghai for Science and Technology, Shanghai 200093, China
b
School of Computer, Wuhan University, Wuhan 430072, Hubei, China
c
School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, Jiangsu, China

a r t i c l e i n f o a b s t r a c t

Article history: In this paper, we propose a novel perceptual image hashing scheme based on block trun-
Received 9 December 2015 cation coding (BTC). In the proposed scheme, the pre-processing is first applied on orig-
Revised 30 March 2016
inal image through bilinear interpolation, Gaussian low-pass filtering, and singular value
Accepted 26 April 2016
decomposition (SVD) to construct a secondary image for regularization. Then, BTC is con-
Available online 11 May 2016
ducted on the secondary image to extract perceptual image features, and the low and high
Keywords: reconstruction levels and the feature matrix of corresponding binary map after the com-
Image hashing putation of center-symmetrical local binary pattern (CSLBP) are compressed with quantiza-
Block truncation coding tion and data dimensionality reduction of PCA to produce the final compact binary hash.
Local binary pattern Experimental results demonstrate that the proposed scheme has the satisfactory perfor-
Principal component analysis mances of robustness, anti-collision, and security.
© 2016 Elsevier Inc. All rights reserved.

1. Introduction

With widespread applications of image processing tools, the contents of digital images can easily be modified. Therefore,
the verification on image authenticity becomes an important and crucial issue in many actual scenarios [14,30]. In recent
years, perceptual image hashing as a new technique in multimedia security has emerged and attracted many researchers’
attentions [10,15,21]. The output of perceptual image hashing scheme can be considered as a compact summary of principle
contents for image, which can be applied in many fields, such as image authentication, copy detection, and image retrieval
[11,13,17,28,29]. Traditional cryptographic hash functions, such as MD5 and SHA-1, also can compress input message into a
short string, but they are highly sensitive to even one-bit slight change, therefore, traditional cryptographic hash functions
are not suitable for the application of digital images that can allow the changes caused by the content-preserving operations,
such as JPEG compression, filtering, and scaling.
In general, a typical image hashing scheme contains three main stages, i.e., pre-processing, feature extraction, and hash
generation, as illustrated in Fig. 1. Denote a given original image as Io , an image that is visually similar to Io as Is , and an
image that is visually distinct with Io as It . The calculation procedure of a perceptual image hashing scheme is represented
as HK (), where K is the secret key used in the scheme. In general, a perceptual image hashing scheme should satisfy the
following three requirements.


Corresponding author. Tel.: +86 21 55272562, +86 13918588359; fax: 86 21 55272982.
E-mail addresses: qin@usst.edu.cn (C. Qin), cxqmm0818@163.com (X. Chen), yedp@whu.edu.cn (D. Ye), wjwei_2004@163.com (J. Wang),
sunnudt@163.com (X. Sun).

http://dx.doi.org/10.1016/j.ins.2016.04.036
0020-0255/© 2016 Elsevier Inc. All rights reserved.
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 85

Fig. 1. Flowchart of image hashing scheme.

(1) Perceptual robustness: Pr{HK (Io )=HK (Is )} ≥ 1−ε 1 . Visually similar images should have very similar hashes. It means
that the calculation of image hashing should be robust to content-preserving operations on images.
(2) Anti-collision capability: Pr{HK (Io )=HK (It )} ≤ ε 2 . Visually distinct images should have significantly different hashes.
In other words, the similarity probability of two hashes generated from two visually distinct images should be very
small.
(3) Key-dependent security: Pr{HK 1 (Io )=HK 2 (Io )} ≤ ε 3 . The output result of image hashing scheme cannot be predicted
without the knowledge of secret key, which means different secret keys generate significantly different hashes.

Here, Pr{} denotes the calculation for probability, ε 1 , ε 2 and ε 3 are very small positive numbers closing to zero.
Recently, a lot of image hashing schemes have been proposed in the field of multimedia security [2,5,8,12,16,19,20,24–
27]. Observing that dominant DCT coefficients can represent principal image features, Tang et al. exploited the dominant
DCT coefficients to construct robust hash against content-preserving digital manipulations [25]. The scheme was robust to
popular content-preserving operations, but its discrimination capability was limited. Venkatesen et al. proposed to extract
statistics features of wavelet coefficients to generate hash [27]. This method was resilient to JPEG compression, however, it
was sensitive to contrast adjustment. In the scheme [26], an image hashing algorithm was presented by using entropies of
image block and applying 2D-DWT to perform feature compression. But, the robustness toward rotation of this scheme was
not very satisfactory. Qin et al. introduced another image hashing scheme [20], which first generated a secondary image by
DFT and then utilized a non-uniform sampling strategy to extract robust salient image features from the magnitude ma-
trix of Fourier coefficients. This scheme had good performances of robustness and anti-collision, but was only resistant to
rotation with small angles. Kozat et al. viewed images as a sequence of linear operators and presented an image hashing
scheme, which applied singular value decomposition (SVD) twice [12]. This hashing method was robust to rotation at the
cost of increasing misclassification. Choi and Park presented a global to local (GLOCAL) image hashing method utilizing a
hierarchical histogram, which depended on the populations of histogram bins [2]. Its robustness resisting the rotation oper-
ation was somewhat good, but this method was dependent on the size of images and was sensitive to the scaling operation.
In [5], Davarzani et al. extracted the features of center-symmetrical local binary pattern from non-overlapping image blocks
and then obtained the final image hash. This scheme produced satisfactory performance of perceptual robustness, but its
discrimination was not good enough. As an optimal algorithm that can retain the essence of original image matrix, non-
negative matrix factorization (NMF) was widely used in the areas of image processing. Monga and Mihcak utilized NMF
twice through extracting the relevant coefficients to generate hashes [16]. The scheme was robust against several common
digital operations but was sensitive to watermark embedding. Fridrich and Goljan proposed a typical image hashing method
for the authentication of both video data and visual images [8]. Although the image features of this method were selected
from the low-frequency DCT coefficients, the robustness and anti-collision capability can be further improved. Local linear
embedding (LLE) has been widely adopted in digital signal processing, such as face recognition, data clustering and image
identification. Tang et al. proposed a LLE-based image hashing scheme by investigating on the embedding vector variance
of LLE [24], which was approximately, linearly changed with the content-preserving modifications. This scheme can achieve
good performances of the perceptual robustness and the discrimination, however, the length of final hash was relatively
longer.
Even though the above reported methods realized the fundamental functionalities of image hashing, there are still
some problems and shortcomings required for solving. For example, the performances of anti-collision in the schemes
[8,12,16,27] were not satisfactory. In this work, we propose an image hashing scheme using block truncation coding (BTC),
which can effectively resist the content-preserving operations, such as JPEG compression, filtering, scaling and rotation of
small angles. Furthermore, it can achieve a better trade-off between anti-collision capability and perceptual robustness. Dur-
ing the stage of hash generation, the secret key is utilized, which guarantees the security of the proposed scheme.
The rest of this paper is organized as follows. Section 2 describes the proposed perceptual image hashing scheme de-
tailedly, including the pre-processing, perceptual feature extraction and hash generation. Experimental results and analysis
are given in Section 3. Section 4 concludes the paper.

2. Proposed image hashing scheme

The proposed image hashing scheme consists of the three main stages, i.e., pre-processing, feature extraction, and hash
generation. The schematic diagram of the proposed scheme is given in Fig. 2.
86 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

Fig. 2. Schematic diagram of the proposed image hashing scheme.

2.1. Pre-processing

Pre-processing is indispensable to the image hashing scheme, which can generate a consistent secondary version from
different input images and also can be beneficial to the performance of perceptual robustness. The details of pre-processing
on the original input image Io are described as follows.
In order to obtain the fixed length of hash, image normalization of bilinear interpolation is first conducted, which re-

sizes the input original image Io with arbitrary size into Io with the size of M × M. Then, Gaussian low-pass filtering that

is achieved by a convolution mask is applied on the normalized Io in order to alleviate influences of content-preserving
operations, such as noise contamination, on the image. Denote G(i, j) as the element of the convolution mask for Gaussian
low-pass filtering at the coordinate of (i, j), which can be realized through Eqs. (1) and (2):
g(i, j )
G(i, j ) = , (1)

M 
M
g(i, j )
i=1 j=1

− ( i2 + j 2 )
g(i, j ) = e 2δ 2 , (2)

where δ is the standard deviation of all elements in the convolution mask. After conducting Gaussian low-pass filtering on
 
Io , the filtered image sized M × M, i.e., Io , is obtained.

Finally, we divide the image Io sized M × M into a series of non-overlapping blocks sized k1 × k1 , and for each divided
block Bi , j (i, j = 1, 2, …, M/k1 ), the SVD operation is conducted, see Eqs. (3) and (4):
⎡ ⎤
B1, 1 B1, 2 ··· B1, M/k1
⎢ B2, 1 B2, 2 ··· B2, M/k1 ⎥
Io  = ⎢ . .. .. .. ⎥, (3)
⎣ .. . . .

BM/k1 , 1 BM/k1 , 2 ··· BM/k1 , M/k1

Bi, j = Si, j · Vi, j · Di, j , (4)

where Vi , j is the diagonal matrix sized k1 × k1 with k1 singular values, and Si , j and Di , j are two orthogonal matrices sized

k1 × k1 corresponding to Bi , j . We denote the first ten column vectors of the diagonal matrix Vi , j as the new matrix V i, j

sized k1 × 10 and denote the first ten row vectors of orthogonal matrix Di , j as the new matrix D i, j sized 10 × k1 . Thus, we
  
can construct a new image block B i, j sized k1 × k1 corresponding to Bi , j through the three matrices Si , j , V i, j, and D i, j,
see Eq. (5):
Bi, j = Si, j · Vi, j · Di, j , (5)

After all (M/k1 )2 image blocks Bi , j are conducted by the operations in Eqs. (4) and (5), the (M/k1 )2 new corresponding

blocks B i, j are collected, and the secondary image I sized M × M can be acquired according to Eq. (6) as the output of
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 87

Fig. 3. Flowchart of perceptual feature extraction.

pre-processing.
⎡ B B1, 2 ··· B1, M/k

1, 1 1

⎢ B2, 1 B2, 2 ··· B2, M/k ⎥


I=⎢ . ⎥,
1
(6)
⎣ . .. .. .. ⎦
. . . .
BM/k1 , 1 BM/k , 2 ··· BM/k , M/k
1 1 1

2.2. Perceptual feature extraction

After pre-processing, the perceptual image features are extracted from the secondary image I with the assist of block
truncation coding (BTC) and center-symmetrical local binary pattern (CSLBP). The flowchart of perceptual feature extraction
is presented in Fig. 3.

2.2.1. Reconstruction levels and binary map with BTC


The secondary image I sized M × M is divided into a series of non-overlapping blocks sized k2 × k2 , and each divided
block is denoted as Fi , j (i, j = 1, 2, …, M/k2 ), see Eq. (7).
⎡ ⎤
F1, 1 F1, 2 ··· F1, M/k2
⎢ F2, 1 F2, 2 ··· F2, M/k2 ⎥
I=⎢ .. .. .. .. ⎥, (7)
⎣ . . . .

FM/k2 , 1 FM/k2 , 2 ··· FM/k2 , M/k2

All divided blocks are processed with the BTC algorithm in the raster-scanning order. For each image block Fi , j sized

k2 × k2 , a binary block F i , j with the same size can be obtained, in which the pixel values are zeros or ones corresponding
to those pixels of Fi , j smaller and not smaller than the mean value of Fi , j . Two reconstruction levels, i.e., α i , j and β i , j , are
then calculated (α i , j ≤ β i , j ), see Eqs. (8) and (9).

ni, j
αi, j = μi, j − σi, j 2
, (8)
k2 − ni, j

2
k2 − ni, j
βi, j = μi, j + σi, j , (9)
ni, j

where μi , j and σ i , j are the mean value and the standard variance of the block Fi , j , respectively, and ni , j is the number of
 
ones in each binary block F i , j . During the decoding of BTC algorithm, the zeros and the ones in each binary block F i , j are
respectively replaced with α i , j and β i , j to produce a decoded image block that is visually similar with the block Fi , j [4,18].
Note that, with the guarantee of Eqs. (8) and (9), the first moment μi , j and the second moment ν i , j of Fi , j can be kept as
88 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

Fig. 4. An exmaple of BTC algorithm.


the same with those of the decoded block, see Eqs. (10) and (11). In other words, the binary block, i.e., F i , j , and the two
corresponding reconstruction levels, i.e., α i , j and β i , j , can reflect the perceptual features of image block Fi , j .

k22 · μi, j = (k22 − ni, j ) · αi, j + ni, j · βi, j , (10)

k22 · vi, j = (k22 − ni, j ) · αi,2 j + ni, j · βi,2 j . (11)

Fig. 4 illustrates an example of BTC algorithm. The left block is an image block sized 4 × 4, and the mean value μi , j and
the standard variance σ i , j of the pixels in the block is 115 and 78, respectively. After binarization by the mean value 115,
the middle binary block can be obtained, and the number ni , j of ones in the binary block is 8. According to Eqs. (8) and
(9), the low and high reconstruction levels α i , j and β i , j can be calculated as 37 and 193, respectively. The right block is
the decoded version through substituting the zeros and ones in the binary block with low and high reconstruction levels,
individually.
After all (M/k2 )2 blocks Fi , j are conducted by the encoding of BTC, a matrix L for low reconstruction levels α i , j and a
matrix H for high reconstruction levels β i , j both sized M/k2 × M/k2 can be acquired:
⎡ ⎤
α1, 1 α1, 2 ··· α1, M/k2
⎢ α2, 1 α2, 2 ··· α2, M/k2 ⎥
L=⎢ . .. .. .. ⎥, (12)
⎣ .. . . .

αM/k2 , 1 αM/k2 , 2 ··· αM/k2 , M/k2
⎡ ⎤
β1, 1 β1, 2 ··· β1, M/k2
⎢ β2, 1 β2, 2 ··· β2, M/k2 ⎥
H=⎢ . .. .. .. ⎥. (13)
⎣ .. . . .

βM/k2 , 1 βM/k2 , 2 ··· βM/k2 , M/k2

In addition, the binary map R sized M × M consisting of all (M/k2 )2 blocks F i, j for the whole image I can also be
obtained:
⎡ F F1, 2 ··· F1, M/k

1, 1 2

⎢ F2, 1 F2, 2 ··· F2, M/k ⎥


R=⎢ . ⎥.
2
(14)
⎣ . .. .. .. ⎦
. . . .
FM/k FM/k ··· FM/k
2, 1 2, 2 2, M/k2

2.2.2. CSLBP for binary map


After applying BTC on the secondary image I to acquire the three matrices, i.e., L, H, and R, the CSLBP algorithm [9] is
conducted on the binary map R to further extract the perceptual features, which can effectively reflect the information
of neighboring relationships in R. In the CSLBP operation, each pixel of the binary map R is compared with its center-
symmetrical pixels within q-neighborhood, see Fig. 5.
For each given pixel pc in R, its corresponding value of CSLBP can be calculated through Eqs. (15) and (16).
q/2−1

CSLBP( pc ) = 2i × ϕ ( pi − pi+q/2 ), (15)
i=0

1, if x > ξ ,
ϕ (x ) = (16)
0, otherwise,
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 89

Fig. 5. Illustration of the CSLBP operation (q = 8).

Fig. 6. Flowchart of hash generation.

where q is the number of pixels in circular neighborhood around the center pixel pc for CSLBP operation, and ξ is regarded
as a threshold value for binarization to increase the robustness of CSLBP features. Obviously, the pixel value after CSLBP
operation belongs to [0, 2q /2 − 1].
After all pixels in the binary map R sized M × M are conducted with the CSLBP operation described above, the matrix R

of binary map is converted into a new feature matrix R with the same size.

2.3. Hash generation

In the following, the final image hash will be generated based on the three extracted feature matrices, i.e., two

reconstruction-level matrices L and H, and the feature matrix of binary map R . The flowchart of hash generation is given
in Fig. 6.

2.3.1. Compression for L and H


Firstly, L and H are both divided into non-overlapping blocks sized k3 × k3 . Then, the mean value of each k3 × k3 divided
block in H and L is calculated, thus, we can obtain (M/k2 /k3 )2 mean values of blocks for both L and H, respectively. We
traverse the (M/k2 /k3 )2 mean values for L and H in the raster-scanning order, respectively, and rearrange them in two one-
dimensional vectors, i.e., l and h, see Eqs. (17) and (18).
l = [l1 , l2 , . . . , l(M/k /k )2 ], (17)
2 3

h = [h1 , h2 , . . . , h(M/k /k )2 ], (18)


2 3

where li and hi denote the mean values of the ith k3 × k3 block in the raster-scanning order for L and H, respectively (i = 1,
2, …, (M/k2 /k3 )2 ). Finally, with a parameter ψ of quantization step, l and h are quantized through Eqs. (19) and (20):
l = (l /ψ  ), (19)

h = (h /ψ  ), (20)
90 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

where  returns a vector, in which the components are the nearest integers no greater than the corresponding components
of the input vector, and () is a function that can convert each decimal component of the input vector into binary bits with
fixed length and concatenate them into a one-dimensional binary sequence. Because the mean values li and hi in l and h can
both be represented with 8 binary bits, thus, each decimal component in l/ψ and h/ψ can be converted into 8 − log2 ψ
 
binary bits, and there are totally (8 − log2 ψ)  (M/k2 /k3 )2 bits in both l and h . After that, the two reconstruction-level
 
matrices L and H both consisting of M/k2 × M/k2 integers are compressed into two binary sequences l and h , respectively.


2.3.2. Compression for R
 
The feature matrix of binary map R sized M × M is first reshaped to a new matrix R sized M/8 × 8 M. Denote each
 T
column vector consisting of M/8 components in R as ri = [r1, i , r2, i , …, rM /8, i ] , i = 1, 2, …, 8 M. Then, the principal component
 
analysis (PCA) is applied on R . The covariance matrix CR sized 8 M × 8 M of R is first calculated:
⎡ ⎤
cov(r1 , r1 ) cov(r1 , r2 ) ··· cov(r1 , r8M )
⎢ cov(r2 , r1 ) cov(r2 , r2 ) ··· cov(r2 , r8M ) ⎥
CR = ⎢ .. .. .. .. ⎥, (21)
⎣ . . . .

cov(r8M , r1 ) cov(r8M , r2 ) ··· cov(r8M , r8M )
where the function cov() can calculate the covariance of ri itself and the covariance between ri and rj (i = j). Then, the
covariance matrix CR is decomposed with Eq. (22):

CR =  ×  × −1 , (22)

where  is the diagonal matrix including all 8 M eigenvalues, i.e., λ1 , λ2 , …, λ8 M , of CR , and  sized 8 M × 8 M is the eigen-
vector matrix of CR . Denote  = [1 , 2 , …, 8 M ], and i = [ρ 1, i , ρ 2, i , …, ρ 8 M , i ]T , which is the ith eigenvector corresponding
to the ith eigenvalue λi in  (i = 1, 2, …, 8 M). The first N (≤ 8 M) eigenvectors in the eigenvector matrix , i.e., [1 , 2 , …,
 
N ], are taken out to form a new matrix  sized 8 M × N, and a matrix  sized (M/8) × N can be obtained:
 = R ×  , (23)

where  contains the principal components reflecting the correlations among different elements within the main features
 
of R for the sake of a series of above linear transformations. Then,  is divided into a series of non-overlapping blocks
 
sized k4 × k4 . The mean value mj of each block in  is compared with the mean value m of matrix  to generate a binary
 
sequence r , see Eq. (24). As a result, the feature matrix of binary map R consisting of M × M integers is compressed into a

binary sequences r with the length of (M × N)/8k4 2 bits.

1, if m j ≤ m ,
r ( j ) = j = 1, 2, . . . , (M × N )/8k24 . (24)
0, otherwise,

2.3.3. Encryption
  
After the above operations, we can obtain three binary sequences of perceptual features, i.e., l , h , and r , and then
  
we concatenate these three sequences to produce a binary sequence Z = [l , h , r ]. The total length of Z is the sum of the
  
lengths for l , h , and r , i.e., 2  (8 − log2 ψ)  (M/k2 /k3 )2 + (M × N)/8k4 2 bits. Finally, in order to ensure the security of the
hashing scheme, a secret key K is utilized to scramble the bits in the sequence Z for encryption, and the scrambled binary

sequence Z is the final image hash for the original image Io .
As a summary, the proposed image hashing scheme consists of three main procedures, i.e., pre-processing, feature ex-
traction and hash generation. The operations in pre-processing for original image, including image normalization, Gaussian
low-pass filtering and SVD, can effectively increase the capability of our scheme to resist the content-preserving manipula-
tions, such as noise contamination and various filtering, which is in a favor of the perceptual robustness for the scheme. In
the procedure of perceptual feature extraction, BTC operation is conducted on the secondary image to obtain the reconstruc-
tion levels and the binary map, which can maintain the first and the second moments of original image feature invariant,
and with the assist of the CSLBP operator on the binary map, perceptual features of original image can be further extracted.
These series of processing in feature extraction guarantee the proposed scheme can reach a good compromise between the
performances of perceptual robustness and anti-collision. At last, the operations of quantization, PCA and encryption in the
stage of hash generation make the generated secure hash have a reasonable length. Based on the above effective process-
ing, the proposed image hashing scheme can achieve satisfactory performances of perceptual robustness, anti-collision and
security.

3. Experimental results and comparisons

Experiments were conducted to demonstrate the superior performances of the proposed scheme with respect to ro-
bustness, anti-collision and key-dependent security. In our experiments, the normalized Hamming distance was adopted to
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 91

Table 1
Parameter settings of the proposed hashing scheme.

Stages Parameters Values

Pre-processing Image size M × M 256 × 256


Block size k1 × k1 16 × 16

Feature extraction Block size k2 × k2 4×4


CSLBP pixel number q 8
Threshold value ξ 0

Hash generation Block size k3 × k3 8×8


Quantization step ψ 10
Eigenvector number N 16
Block size k4 × k4 4×4

measure the similarity between two hashes:

1
S
Dis(Z1  , Z2  ) = |Z1  (i ) − Z2  (i )|, (25)
S
i=1
    
where Z1 and Z2 are two hash sequences of two images, S is the hash length, and Z1 (i) and Z2 (i) denote the ith bits in Z1

and Z2 , respectively. For color images, the luminance components were used for testing. All experiments were implemented
on a computer with a 2.50 GHz Intel i5 processor, 4.00GB memory, and Windows 7 operating system, and programming
environment was Matlab 2010b.

3.1. Parameter settings

There are several parameters in the three main stages of our image hashing scheme, and we describe the parameter
settings used in our experimental results in the following.

3.1.1. Pre-processing
In the pre-processing stage, all input images are resized as M × M for image normalization through bilinear interpolation,
and then, the Gaussian low-pass filtered, resized image is divided into a series of non-overlapping blocks sized k1 × k1 for
SVD operation. In the experiments, we set the parameter M = 256 as a standard resolution, and k1 was equal to 16 for block
division.

3.1.2. Feature extraction


In the stage of feature extraction, the secondary image is divided into a number of non-overlapping blocks sized k2 × k2
for BTC operation, and each pixel of the binary map generated from BTC is compared with its q-neighborhood pixels for
CSLBP operation, in which a threshold ξ is used to binarize the CSLBP-based features and to increase the robustness. In the
experiments, we set the block size k2 × k2 for BTC as 4 × 4, the number q of pixels in circular neighborhood for CSLBP as 8,
and the threshold ξ as 0.

3.1.3. Hash generation


In the stage of hash generation, two reconstruction-level matrices are both divided into k3 × k3 blocks, and the mean
value of each block is quantized by ψ for compact representation and better robustness. During compressing the feature
matrix of binary map, the first N eigenvectors in the eigenvector matrix corresponding to the covariance matrix of reshaped
feature matrix are utilized. After PCA operation, the dimension-reduced feature matrix of binary map is divided into k4 × k4
blocks to produce a binary sequence through comparing mean values of each block and the whole matrix. In the experi-
ments, we set the block size k3 × k3 as 8 × 8, the quantization step ψ as 10, and the number of the utilized eigenvectors as
16, and the block size k4 × k4 as 4 × 4.
  
A summary of the parameter settings of our scheme is also listed in Table 1. Consequently, the lengths of l , h , and r
are 320 bits, 320 bits, and 32 bits, therefore, the length S of the final image hash in our scheme is 672 bits.

3.2. Performance of perceptual robustness

An ideal image hashing scheme should have perceptual robustness toward content-preserving manipulations. That is to
say, the normalized Hamming distance between the two hashes of an original image and its processed version of content-
preserving manipulation should be small enough. In order to test the perceptual robustness of the proposed scheme, twenty
standard test images in Fig. 7 and several common content-preserving manipulations, including JPEG compression, Gaussian
filtering, average filtering, medium filtering, scaling and rotation, with different parameters were utilized, and details can
be found in Table 2. Also, in order to demonstrate the superiority of the proposed scheme, three typical reported schemes,
92 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

Fig. 7. Twenty standard test images.

Table 2
Content-preserving manipulations for robustness testing.

Names Descriptions Parameters

JPEG compression Quality factor Qf 50, 55, …, 100


Gaussian filtering Standard deviation Sd 0.4, 0.6, …, 1.8
Average filtering Window size Wsa 3, 5, …, 15
Medium filtering Window size Wsm 3, 5, …, 15
Scaling Scaling ratio Sr 0.2, 0.4, …, 2.0
Rotation Rotation angle Ra −1.0, −0.75, …, 1.0

i.e., Tang et al.’s scheme [26], Choi and Park’s scheme [2], and Davarzani et al.’s scheme [5], were used for performance
comparisons.
In Fig. 8(a–f), the ordinate is the mean value of twenty normalized Hamming distances between the hash pairs of the
original images in Fig. 7 and their corresponding attacked versions. It can be observed from Fig. 8 that, the normalized
Hamming distances of the proposed scheme are generally smaller than those of the schemes [2,5,26] against JPEG compres-
sion, Gaussian filtering, average filtering, medium filtering and scaling, and are only slightly larger than that of the scheme
[2] for rotation. Therefore, overall speaking, the average Hamming distance of the proposed scheme against these six kinds
of common content-preserving manipulations is smaller than those of the schemes [2,5,26]. That is to say, the proposed
scheme has better performance of perceptual robustness than the schemes [2,5,26]. Table 3 presents the mean, maximum
and minimum of the normalized Hamming distances for the proposed scheme.
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 93

a b

c d

e f

Fig. 8. Results of performance comparison for perceptual robustness. (a) JPEG compression, (b) Gaussian filtering, (c) Average filtering, (d) Medium filtering,
(e) Scaling, (f) Rotation.

Table 3
Normalized Hamming distances of the proposed scheme.

Manipulations Normalized Hamming distances

Mean. Max. Min.

JPEG compression Qf = 50 0.0257 0.0521 0.0089


JPEG compression Qf = 80 0.0225 0.0402 0.0060
Gaussian filtering Sd = 0.6 0.0257 0.0551 0.0104
Gaussian filtering Sd = 1.6 0.0280 0.0639 0.0119
Average filtering Wsa = 3 0.0369 0.0814 0.0179
Average filtering Wsa = 7 0.0307 0.0833 0.0744
Medium filtering Wsm = 3 0.0342 0.0640 0.0119
Medium filtering Wsm = 7 0.0560 0.0997 0.0104
Scaling Sr = 0.6 0.0144 0.0268 0.0045
Scaling Sr = 2.0 0.0123 0.0253 0
Rotation Ra = − 1° 0.1792 0.2485 0.1235
Rotation Ra = 0.5° 0.1112 0.1592 0.0625
94 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

Fig. 9. Distribution of the 894,453 normalized Hamming distances between hash pairs of the 1338 different images.

Table 4
Collision probabilities with different thresholds T of the proposed scheme and [2,5,26].

Threshold T Collision probability

Scheme in [26] Scheme in [2] Scheme in [5] Proposed scheme

0.26 1.4580 × 10−4 0.0110 2.1125 × 10−6 1.5412 × 10−5


0.24 3.1671 × 10−5 0.0058 1.1043 × 10−6 2.2805 × 10−6
0.22 6.0063 × 10−6 0.0029 5.6748 × 10−7 2.2853 × 10−7
0.20 9.9362 × 10−7 0.0014 2.8665 × 10−7 3.0222 × 10−8
0.18 1.4328 × 10−7 6.3888 × 10−4 1.4233 × 10−7 2.7022 × 10−9
0.16 1.7999 × 10−8 2.7669 × 10−4 6.9462 × 10−8 2.0399 × 10−10
0.14 1.9688 × 10−9 1.1388 × 10−4 3.3320 × 10−8 1.2995 × 10−11
0.12 1.8743 × 10−10 4.4532 × 10−5 1.5710 × 10−8 6.9822 × 10−13
0.10 1.5524×10−11 1.6540 × 10−5 7.2801 × 10−8 3.1634 × 10−14

3.3. Performance of anti-collision

Anti-collision performance of image hashing, also called as uniqueness or discrimination, means that, two perceptu-
ally distinct images should have a very small probability of generating two similar hashes. In other words, the normalized
Hamming distance between two hashes generated from two perceptually distinct images should be greater than a pre-
determined threshold T, if not, the collision happens.
In the experiments, we utilized the well-known uncompressed color image database (UCID) [22] to evaluate the anti-
collision performance. The UCID database contains 1338 various images with the sizes of 512 × 384 and 384 × 512. We gen-
erated the 1338 hashes of all the images in UCID with the proposed image hashing scheme, and calculated the 894,453
normalized Hamming distances between the hash pairs of different images. The histogram of the normalized Hamming
distances is illustrated in Fig. 9. Through parameter estimation method, we find that the distribution of the normalized
Hamming distances for the proposed scheme approximates a normal distribution with the mean value μ = 0.456 and stan-
dard variation σ = 0.048. Thus, the collision probability Pc for two perceptual distinct images is the probability that the
normalized Hamming distance is smaller than the pre-determined threshold T, see Eq. (26).
 
1 T
(x − μ )2
Pc (T ) = √ exp − dx
2πσ −∞ 2σ 2
 
1 T −μ
= erfc − √ , (26)
2 2σ
where erfc() is the complementary error function. In Table 4, the collision probabilities with different threshold values of T
for the three schemes [2,5,26] and the proposed scheme are given. We can find from the comparison results of Table 4 that,
under the same threshold value, the collision probability of the proposed scheme is generally smaller than those of the
schemes in [2,5,26], which demonstrates our scheme can achieve better performance of anti-collision than the schemes in
[2,5,26].
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 95

Fig. 10. Normalized Hamming distances between hash pairs with the correct secret key and 10 0 0 wrong secret keys.

Table 5
Comparisons of execution time and hash length for the proposed scheme and [2,5,26].

Performance Scheme in [26] Scheme in [2] Scheme in [5] Proposed scheme

Time (seconds) 1.2356 0.1592 1.3358 0.8164


Hash length (bits) 512 109 640 672

Obviously, the smaller the threshold T is set, the smaller the collision probability is. On the other hand, smaller thresholds
may influence the performance of perceptual robustness, because the normalized Hamming distance between the hash pair
of two visually similar images should be smaller than the threshold T. It can be observed from Fig. 8 that the normalized
Hamming distances of our scheme against the six kinds of common content-preserving manipulations are all below 0.2.
Also, when T is equal to 0.2, the collision probability of our scheme is 3.0222 × 10−8 , which is small enough. Therefore, we
can set the threshold T = 0.2 for our scheme to achieve satisfactory performances of perceptual robustness and anti-collision
simultaneously.

3.4. Performance of key-dependent security

As a secure image hashing scheme, different secret keys should produce significantly different hashes. In our scheme,
there is one secret key K, which is utilized in the stage of hash generation to scramble the binary sequence of percep-
  
tual features, i.e., Z = [l , h , r ]. Fig. 10 demonstrates the key-dependent security of the proposed scheme. The abscissa of
Fig. 10 is the index of 10 0 0 wrong secret keys, which were generated randomly, and the ordinate is the mean value of
twenty normalized Hamming distances between the hash pairs of the images in Fig. 7 obtained by the correct and the
wrong secret keys. It can be observed that, almost all normalized Hamming distances in Fig. 10 locate in the vicinity of
0.5. As a result, it is extremely difficult for the adversary to generate or estimate the same hash without the correct key.
Furthermore, the adversary can not forge an image and its corresponding hash to cheat other users, since he or she does
not own the users’ secret keys. Therefore, the security of the proposed scheme completely depends on the secret key and
satisfies the security requirement in the sense of cryptography.
We also compared the execution time of our image hashing scheme and the schemes in [2,5,26]. As given in Table 5, the
average execution time of Tang et al.’s scheme [26], Choi and Park’s scheme [2] and Davarzani et al.’s scheme [5] are 1.2356 s,
0.1592 s and 1.3358 s, respectively, and our hashing scheme is 0.8164 s, which is faster than the schemes in [5,26] and slower
than the scheme in [2]. In addition, the hash lengths of the schemes in [2,5,26] and our scheme are 512 bits, 109 bits, 640
bits and 672 bits, respectively. Therefore, according to the results in Sections 3.2–3.4, it can be concluded that, the proposed
scheme not only has high computation efficiency and reasonable hash length with satisfactory security, but also can achieve
better performances of perceptual robustness and anti-collision than the schemes in [2,5,26].

3.5. Application

We also applied the proposed scheme to image classification and retrieval, which can effectively demonstrate overall
performance of perceptual robustness and anti-collision. Besides the UCID database that consists of 1338 visually distinct
images, the USC-SIPI image database [31] was utilized to obtain 200 pairs of visually similar images based on the randomly-
chosen content-preserving manipulations in Table 2. Thus, two image datasets 1 and 2 for visually similar and distinct
96 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

a b

c d

Fig. 11. ROC curve comparisons of image classification. (a) Tang et al.’s scheme [26], (b) Choi and Park’s scheme [2], (c) Davarzani et al.’s scheme [5], (d)
Proposed scheme.

images can be constructed. Here, we define two metrics, i.e., true positive rate PT and false positive rate PF , to respectively
represent the percentage of actually similar images correctly classified as similar images and the percentage of actually
distinct images falsely classified as similar images, see Eqs. (27) and (28).
τ1
PT = , (27)
ζ1
τ2
PF = , (28)
ζ2
where ζ 1 and ζ 2 are the pair numbers of images in 1 and 2 , respectively, τ 1 is the practical pair number of images in
1 that were correctly judged as visually similar images by image hashing schemes, and τ 2 is the practical pair number
of images in 2 that were falsely judged as visually similar images by image hashing schemes. In Fig. 11, the receiver
operating characteristic (ROC) curves are illustrated to show comparison results of image classification for the proposed
scheme and the schemes [2,5,26]. The abscissa and ordinate of ROC curves in Fig. 11 represent PF and PT , respectively, and
the twelve points on each ROC curve correspond to the threshold values of T from 0.10 to 0.32 with the interval of 0.02.
Obviously, closer to the upper left region the ROC curve is, better overall performance of robustness and anti-collision the
corresponding image hashing scheme achieves. We can observe that, the values PT of our hashing scheme are generally
approximate to 1 and greater than those of the three schemes [2,5,26]. In the meantime, the values PF of our scheme are
considerably less than those of the schemes in [2,5,26]. Therefore, it can be concluded that, our scheme can achieve better
performance of image classification based on perceptual robustness and anti-collision than the schemes [2,5,26].
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 97

Fig. 12. Results of image retrieval based on USC-SIPI image database [31].

Besides image classification, we also utilized our image hashing scheme in the application of image retrieval. The total
69 consecutive frame images of four different video sequences from USC-SIPI image database [31] were constructed as the
dataset for the testing of image retrieval. We first chose an image in the dataset as the query image, which is the same as
the top-left image of Fig. 12, and the hashes of all images in the dataset were calculated. Then, we conducted image re-
trieval for the query image according to the normalized Hamming distances between the hash pairs of the query image and
other images in the dataset. Fig. 12 presents the results of image retrieval, where we list the ten images that are the most
similar with the query image in the first two rows and the five images that are visually distinct with the query image in
the last row. The corresponding normalized Hamming distances between the query image and these fifteen listed images
are also given below each image. It can be clearly observed from Fig. 12 that, through image hashes, the ten images in
the dataset that were similar with the query image are all successfully retrieved and the normalized Hamming distances
of these ten retrieved similar images are all smaller than 0.2, which are significantly lower than those of the dissimilar
images. Consequently, our hashing scheme can be well applied to the application of image retrieval.

4. Conclusions

In this work, a secure image hashing scheme with the performances of perceptual robustness and anti-collision is pro-
posed. To make final hash more robust to content-preserving manipulations, the pre-processing, including image normal-
ization of bilinear interpolation, Gaussian low-pass filtering and SVD, is applied on the original image to construct the
secondary image. Then, BTC and CSLBP are conducted to obtain the perceptual image features consisting of low and high
reconstruction levels and the corresponding matrix of binary map, which can effectively reflect principle contents of the
image. During the stage of hash generation, compression processing of quantization and PCA operation is applied on the
extracted feature matrices to produce compact binary sequence of the final image hash. Experimental results show that
perceptual robustness of the proposed scheme is superior to those of other reported schemes. In addition, the satisfactory
performances of anti-collision and security of our scheme can also be achieved.
Excellent technique of feature extraction and description is important to the performance of image hashing scheme. In
our current scheme, CSLBP-based image feature descriptor is adopted due to its invariance to monotonic gray-scale changes
and low computational complexity. In recent years, some latest powerful descriptors have been proposed [1,3,6,7,23], such as
the dual-cross pattern (DCP) based on the encoded second-order discriminative feature information [3,7] and the transition
local binary pattern (tLBP) based on neighboring pixel comparisons in clockwise direction [23]. In our future work, the
advantages of these new descriptors will be studied and incorporated into the process of perceptual feature extraction to
further enhance the performance of image hashing scheme. In addition, how to realize the generalization of multimedia
hashing for the data of audio, image and video also deserves in-depth investigations.
98 C. Qin et al. / Information Sciences 361–362 (2016) 84–99

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61303203), the Natural Science Founda-
tion of Shanghai, China (13ZR1428400), the Innovation Program of Shanghai Municipal Education Commission (14YZ087),
the Open Project Program of Shenzhen Key Laboratory of Media Security, the Open Project Program of the National Labora-
tory of Pattern Recognition (20160 0 0 03), Shanghai Engineering Center Project of Massive Internet of Things Technology for
Smart Home (GCZX14014), Hujiang Foundation of China (C14001, C14002), the PAPD Fund, and the CICAEET Fund.

References

[1] Z. Akhtar, A. Rattani, A. Hadid, M. Tistarelli, Face recognition under ageing effect: a comparative analysis, Lect. Notes Comput. Sci. 8157 (2013) 309–318.
[2] Y.S. Choi, J.H. Park, Image hash generation method using hierarchical histogram, Multimed. Tools Appl. 61 (1) (2012) 181–194.
[3] C. Ding, J. Choi, D. Tao, L.S. Davis, Multi-directional multi-level dual-cross patterns for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell.
38 (3) (2016) 518–531.
[4] E.J. Delp, O.R. Mitchell, Image compression using block truncation coding, IEEE Trans. Commun. 27 (9) (1979) 1335–1342.
[5] R. Davarzani, S. Mozaffari, K. Yaghmaie, Perceptual image hashing using center-symmetric local binary patterns, Multimed. Tools Appl. 75 (8) (2016)
4639–4667.
[6] C. Ding, D. Tao, A comprehensive survey on pose-invariant face recognition, ACM Transactions on Intelligent Systems and Technology 7 (3) (2016)
37:1–37:42.
[7] C. Ding, C. Xu, D. Tao, Multi-task pose-invariant face recognition, IEEE Trans. Image Process. 24 (3) (2015) 980–993.
[8] J. Fridrich, M. Goljan, Robust hash function for digital watermarking, in: Proceedings of the 20 0 0 International Conference on Information Technology:
Coding and Computing, 20 0 0, pp. 178–183.
[9] M. Heikkila, M. Pietikainen, C. Schmid, Description of interest regions with local binary patterns, Pattern Recognit. 42 (3) (2009) 425–436.
[10] A. Hadmi, W. Puech, B.A.E. Said, A.A. Quahman, Perceptual image hashing, in: M.D. Gupta (Ed.), Computer and Information Science: Watermarking,
vol. 2, Rijeka, Croatia: InTech, 2012, doi:10.5772/37435.
[11] Y.H. Kuo, K.T. Chen, C.H. Chiang, W.H. Hsu, Query expansion for hash-based image object retrieval, in: Proceedings of the Seventeenth ACM Interna-
tional Conference on Multimedia, 2009, pp. 65–74.
[12] S.S. Kozat, R. Venkatesan, M.K. Mihcak, Robust perceptual image hashing via matrix invariants, in: Proceedings of the 2004 IEEE International Confer-
ence on Image Processing, 2004, pp. 3443–3446.
[13] C.S. Lu, C.Y. Hsu, Geometric distortion-resilient image hashing scheme and its application on copy detection and authentication, Multimed. Syst. 11 (2)
(2005) 159–173.
[14] J. Li, X.L. Li, B. Yang, X.M. Sun, Segmentation-based image copy-move forgery detection scheme, IEEE Trans. Inf. Forensics Secur. 10 (3) (2015) 507–518.
[15] C. Liu, H.F. Ling, F.H. Zou, M. Sarem, L.Y. Yan, Nonnegative sparse locality preserving hashing, Inf. Sci. 281 (2014) 714–725.
[16] V. Monga, M.K. Mihcak, Robust and secure image hashing via non-negative matrix factorizations, IEEE Trans. Inf. Forensics Secur. 2 (3) (2007) 376–390.
[17] J.L. Ouyang, G. Coatrieux, H.Z. Shu, Robust hashing for image authentication using quaternion discrete Fourier transform and log-polar transform, Digit.
Signal Process. 41 (2015) 98–109.
[18] G.P. Qiu, Color image indexing using BTC, IEEE Trans. Image Process. 12 (1) (2003) 93–101.
[19] C. Qin, C.C. Chang, P.L. Tsou, Perceptual image hashing based on the error diffusion halftone mechanism, Int. J. Innov. Comput. Inf. Control 8 (9) (2012)
6161–6172.
[20] C. Qin, C.C. Chang, P.L. Tsou, Robust image hashing using non-uniform sampling in discrete Fourier domain, Digit. Signal Process. 23 (2) (2013) 578–585.
[21] A. Swaminathan, Y. Mao, M. Wu, Robust and secure image hashing, IEEE Trans. Inf. Forensics Secur. 1 (2) (2006) 215–230.
[22] G. Schaefer, M. Stich, UCID – an uncompressed color image database, in: Proceedings of the 2004 SPIE in Storage and Retrieval Methods and Applica-
tions for Multimedia, vol. 5307, 2004, pp. 472–480.
[23] J. Trefný, J. Matas, Extended set of local binary patterns for rapid object detection, in: Proceedings of the Fifteenth Computer Vision Winter Workshop,
2010, pp. 1–7.
[24] Z.J. Tang, L.L. Ruan, C. Qin, X.Q. Zhang, C.Q. Yu, Robust image hashing with embedding vector variance of LLE, Digit. Signal Process. 43 (2015) 17–27.
[25] Z.J. Tang, F. Yang, L.Y. Huang, X.Q. Zhang, Robust image hashing with dominant DCT coefficients, Opt. Int. J. Light Electron Opt. 125 (18) (2014)
5102–5107.
[26] Z.J. Tang, X.Q. Zhang, Y.M. Dai, W.W. Lan, Perceptual image hashing using local entropies and DWT, Imaging Sci. J. 61 (2) (2013) 241–251.
[27] R. Venkatesan, S.M. Koon, M.H. Jakubowski, P. Moulin, Robust image hashing, in: Proceedings of the 2001 IEEE International Conference on Image
Processing, 2001, pp. 664–666.
[28] K. Wang, J. Tang, N. Wang, L. Shao, Semantic boosting cross-modal hashing for efficient multimedia retrieval, Inf. Sci. 330 (2016) 199–210.
[29] L. Xie, L. Zhu, P. Pan, Y.S. Lu, Cross-modal self-taught hashing for large-scale image retrieval, Signal Process. 124 (2016) 81–92.
[30] C.P. Yan, C.M. Pun, X.C. Yuan, Multi-scale image hashing using adaptive local feature extraction for robust tampering detection, Signal Process. 121
(2016) 1–16.
[31] USC-SIPI Image Database: http://sipi.usc.edu/database/, 2007.

Chuan Qin received the B.S. degree in electronic engineering and the M.S. degree in signal and information processing from Hefei
University of Technology, Anhui, China, in 2002 and 2005, respectively, and the Ph.D. degree in signal and information processing
from Shanghai University, Shanghai, China, in 20 08. Since 20 08, he has been with the faculty of the School of Optical-Electrical
and Computer Engineering, University of Shanghai for Science and Technology, where he is currently an Associate Professor. He
was with Feng Chia University at Taiwan as a Postdoctoral Researcher and Adjunct Assistant Professor from July 2010 to July
2012. He is also a visiting researcher with Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518,060,
China. His research interests include image processing and multimedia security. He has published more than 70 papers in these
research areas.
C. Qin et al. / Information Sciences 361–362 (2016) 84–99 99

Xueqin Chen Received B.S. degree in electronic and computer engineering from Anhui Science and Technology University,
Chuzhou, Anhui, China, in 2014. She is currently pursuing the M.S. degree in signal and information processing from Univer-
sity of Shanghai for Science and Technology, China. Her research interests include image processing and data hiding.

Dengpan Ye was born in Hubei, China. He received the B.A.Sc in automatic control from SCUT in 1996 and Ph.D. degree at NJUST
in 2005 respectively. He worked as a Post-Doctoral Fellow in Information System School of Singapore Management University.
And since 2012 he has been a professor in the school of computer science at Wuhan University. His research interests include
Machine Learning and multimedia security. He is the author or co-author of more than 30 refereed journal and conference papers.

Jinwei Wang was born in Inner Mongolia, China, in 1978. He received the Ph.D. degree in information security at Nanjing Uni-
versity of Science and Technology in 2007 and was a visiting scholar in Service Anticipation Multimedia Innovation (SAMI) Lab
of France Telecom R&D Center (Beijing) in 2006. He worked as a senior engineer at the 28th research institute, CETC from 2007
to 2010. He worked as a visiting scholar at New Jersey Institute of Technology, NJ, USA from 2014 to 2015. Now he works as
an associate professor at Nanjing University of Information Science and Technology. His research interests include multimedia
copyright protection, image forensics, image encryption and data authentication. He has published more than 30 papers, hosted
and participated in more than 10 research projects.

Xingming Sun is currently a Professor with the College of Computer and Software, Nanjing University of Information Science and
Technology, Nanjing, China. He was a Professor with the College of Computer and Communication, Hunan University, Changsha,
China. He was a Visiting Professor with University College London, London, U.K., and the University of Warwick, Coventry, U.K. He
received the B.S. degree in mathematics from Hunan Normal University, Changsha, in 1984, the M.S. degree in computing science
from the Dalian University of Science and Technology, Dalian, China, in 1988, and the Ph.D. degree in computer science from
Fudan University, Shanghai, China, in 2001. His research interests include network and information security, digital watermarking,
cloud computing security, and wireless network security.

Potrebbero piacerti anche