Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Email: {satyanarayana.reddy@snu.edu.in}
Abstract—This paper proposes a method for gesture classi- classification algorithms relies on the choice of a suitable
fication based on Graph Fourier transform (GFT) coefficients. feature descriptor.
GFT coefficients are the projection of image pixel block onto
the eigenvectors of a Laplacian matrix. This Laplacian matrix For the images, graphs have the ability to model local
is generated from undirected graph, representing a spatial relations between image pixels and edges. In case of Graph
connectedness between each pixel within an image block. This Signal Processing, converting an image into a graph considers
work proposes a method for generating an undirected graph the spatial as well as structural information. In Graph Signal
by using edge information of the image. Edge information of processing, an image is mapped onto structural graph where
the image is obtained by average sum of absolute difference
between the current pixel and its neighboring pixels by using an nodes represent the pixels and edges depict the relationship
appropriate threshold. The resulting GFT based feature vector between nodes.
is formed by concatenating GFT coefficients of each block. The In this work, we focus on the discriminative strength of
resultant feature vector is applied to linear Support Vector
the GFT based method. GFT is the projection of image block
Machine (SVM) classifier to predict the gesture class. For NTU
and Massey hand gesture datasets, threshold value 30 gives onto the eigenvectors of a Laplacian matrix obtained from
maximum prediction accuracy. We compare the results of the proposed undirected graph representation of image block. DCT
proposed GFT based descriptor approach with Karhunen-Loeve and Discrete Fourier transforms (DFT) are fixed transforms
transform (K-LT) and Discrete Cosine transform (DCT) based with no signal adaptations, i.e., if the size of the block is fixed,
descriptors on three different gesture datasets: NTU, Cambridge
then DFT and DCT transformation matrix is fixed. Whereas
and Massey. Simulation results show that the proposed GFT
based descriptor gives a comparable results with Karhunen- in GFT, different gesture sequences contain different number
Loeve transform (K-LT) and Discrete Cosine transform (DCT) of nodes and different edges for the proposed graph structure.
based descriptors for gesture classification. This leads to a different set of graph Laplacian eigenvectors,
i.e., giving distinct graph Fourier transform bases. Thus, graph
I. I NTRODUCTION
frequency representations for different gesture sequences will
In the recent past, researchers renewed their interest in be different.
applying graph theory concepts for the discrete signals pro-
In classical graph construction approach, when the neigh-
cessing, termed as Graph Signal Processing. Graph signal
boring pixels are close or when pixels i and j belong to
processing uses techniques of algebraic graph theory and
the same region, weight in adjacency matrix will be large.
computational harmonic analysis to process signals/data gen-
If pixels i and j belong to the different regions, weight will
erated from areas such as transportation, social, and sensor
be small [7]. But in this work, we construct an undirected
networks [1]- [2]. The growing interest of researchers in graph
graph based on the edge information of the image. In general,
based discrete signal processing has encouraged to study the
sum of absolute difference (SAD) is used to measure similarity
transform approaches on the signals residing on the graph,
between original image and reference image [8].
termed as Graph Fourier transform.
In the literature, GFT has found applications like image We use Average of SAD (ASAD) to obtain edge information
compression [3], image interpolation, image denoising [4] and of the image. In order to find edge information of image I, we
attitude analysis [5]. construct a binary matrix, called “image edge information ma-
trix A (I)” with same size as image I. Now we will construct
A. Motivation and Contributions the adjacency matrix A using the image edge information
Despite active research over last two decades, image clas- matrix A (I) based on k-nearest neighborhood method.
sification still has challenges such as illumination condition, This work presents a new image classification method based
different shapes, size and speed variation, background details on GFT feature descriptor. The main contributions of this work
and issues due to occlusion [6]. Prediction accuracy of image are:
1) Proposed a method for gesture classification based on v are adjacent. Let v be a vertex of a graph G, then the degree
Graph Fourier Transform (GFT). of v, denoted d(v), means the number of vertices incident at
2) Proposed a method for generating an undirected graph v. For any finite set S, let |S| denote the number of elements
using an appropriate threshold to preserve local neigh- in S. Then a graph/digraph is said to be finite if |V | and |E|
borhood information. are finite. A graph is called simple if it has no loops. In this
paper, graph means finite, simple and undirected graph.
B. Related Work
An adjacency matrix of a graph G = (V, E) with |V | =
In the application areas of sign language recognition, N , is an N × N matrix, denoted A(G) = [aij ] (or A), with
human-computer interaction, virtual reality and computer aij = 1, if the i-th vertex is adjacent to the j-th vertex and 0,
graphics, hand gesture recognition has become of great impor- otherwise.
tance [9]. Despite active research over last two decades, vision Another matrix associated with the graph G is the Laplacian
based hand gesture recognition methods [10]- [12] still have matrix denoted as L(G) or simply L and is defined as
challenges such as illumination condition, different shapes,
size & speed variation, background details and issues due L = D − A, (1)
to occlusion [6]. Extensive literature related to hand gesture
recognition system in particular or gesture recognition in where A is the adjacency matrix and D is the degree matrix,
general is given in [10]- [12]. In the literature, the feature which is a diagonal matrix whose diagonal entries are given
descriptors used for classification are mainly based on the by
N
shape, i.e., contour or region of the object information for
the vision based methods. S. Belongie et. al. [13] presented Dii = Aij . (2)
j=1
a method that measured similarity between the shapes and
used this descriptor for object recognition. H. Ling et. al. [14] It is easy to check that L is a symmetric and positive semi
proposed a method based on part structure. They made shape definite matrix, hence its eigenvalues are nonnegative, real and
descriptors which capture the part structure and are robust can be ordered as 0 = λ0 ≤ λ1 ≤ λ2 ≤ λ3 · · · ≤ λN −1 .
to articulation using the inner-distance. X. Bai et. al. [15] U VON Luxburg has also explained these concepts in his
presented a graph matching algorithm and applied it to shape tutorial for spectral clustering [20]. In 2003, Niyogi et. al. [21]
recognition based on object silhouettes. found that the Laplacian of a graph incorporates neighborhood
In other feature extraction methods, features (HOG, DFT, information of the dataset. Hence, in this work, we construct
DCT, K-LT, 3D model based) of training images are used to GFT coefficients based on Laplacian matrix of the graph which
model the visual appearance and compare these features with preserves the neighborhood information. A graph signal S is a
the features of test images [16]. In 2005, Dalal et. al. [17] used real-valued function defined on the vertices of the graph, i.e.,
Histogram of Oriented Gradient (HOG) for human detection. S : V → R and v → S(v). A signal S can also be represented
Liu et. al. [18] chose to use Gabor filter as a feature descriptor as a vector, i.e. S ∈ RN where ith component of the vector
followed by principal component analysis for dimensionality S denotes signal value of ith vertex in V .
reduction and used Support Vector Machine as a classifier for The classical Fourier transform is the expansion of a signal
different illumination conditions. R. Zen et. al. [19] proposed S in terms of the eigenvectors of the Laplacian operator, i.e.,
a robust part based hand gesture recognition system using the fˆ(ζ) = f, e2πiζt . Analogously, the GFT of a signal S ∈ RN
Kinect sensor on NTU hand gesture dataset. on the vertices of G is the expansion of S in terms of the
eigenvectors of the graph Laplacian. GFT coefficients are the
C. Organization of the Paper
inner product of S with the eigenvectors of the graph Laplacian
GFT feature descriptor is described in Section II including matrix, i.e.,
the method of construction. Section III explains proposed XGF T = T H .S (3)
method for classification. Section IV discusses the experimen-
tal setup followed by results and discussion. Section V presents where T is the matrix formed with eigenvectors of the
the results comparison between GFT, DCT and K-L transform Laplacian matrix of G as columns, T H denotes the conjugate
for gesture images. Concluding remarks and future directions transpose of T and S is the pixel values of the image block.
are made in Section VI. These GFT coefficients are used as a feature descriptors for
classification.
II. G RAPH F OURIER T RANSFORM FOR I MAGES
A directed graph (in short, digraph), denoted G = (V, E), A. GFT Feature Descriptor Computation
consists of the vertex set V and the edge set E ⊂ V × V .
If e = (u, v) ∈ E, then the edge e is said to be incident Given an N × N image I whose ij th pixel value is denoted
with vertices u and v. An edge e = (u, u) is called a loop. A by I(i, j). To find out the edge information from the image,
digraph is called a graph if for any two elements u, v ∈ V , for each pixel (I(i, j)) we will calculate the ASAD i.e. (θi,j )
(u, v) ∈ E whenever (v, u) ∈ E and in this case, we write from its 8 neighboring pixels as follows:
e = {u, v}. We also state it by saying that the vertices u and
389
2017 Fourth International Conference on Image Information Processing (ICIIP)
• For the corner pixels, i.e., first and last pixels of the Results of adjacency matrix A, degree matrix D, Laplacian
first row and first and last pixel of the last row, θi,j is matrix L, matrix of eigenvectors T made from the binary
calculated as given in equation 4. matrix A as given in example II.1, are shown in the Figure 1.
1 1
1
θi,j = |I(i + k, j + l) − I(i, j)| (4)
3
k=0 l=0
• For the pixels on the boundary but not on the corner, θi,j
is calculated as given in equation 5.
1 1
1
θi,j = |I(i + k, j + l) − I(i, j)| (5)
5
k=0 l=−1
390
2017 Fourth International Conference on Image Information Processing (ICIIP)
391
2017 Fourth International Conference on Image Information Processing (ICIIP)
60 [7] F. Zhang and E. R. Hancock, “Graph spectral image smoothing using the
50 heat kernel,” Pattern Recogn., vol. 41, pp. 3328-3342, Nov. 2008.
40 [8] B. Zitova, J. Flusser and F. Sroubek, “Image Registration: A survey and
30 recent advances,” in Proc. of IEEE International Conference on Image
20 Processing(ICIP’2005), pp. 1-55, Los Alamitos.
10 [9] J. P. Wachs, M. Kolsh, H. Stem, and Y. Edan, “Vision-based hand gesture
0
NTU Cambridge Massey applications,” Commun. ACM, vol. 54, pp. 60-71, 2011.
Datasets
[10] A. Erol, G. Bebis, M. Nicolescu, R.D. Boyle, and X. Twombly, “Vision-
based hand pose estimation: A review,” Comput. Vision Image Under-
Fig. 4: Comparison of prediction accuracy when using all GFT, stand, vol. 108, pp. 52-73, 2007.
DCT and K-LT coefficients for each block. [11] S. Mitra and T. Acharya, “Gesture recognition:A survey,” IEEE Trans.
Syst., Man, Cybern. C, Appl. Rev., vol. 37, pp. 311-324, 2007.
[12] G. R. S. Murthy and R. S. Jadon, “A review of vision based hand gesture
recognition,” Int. Journal. Inf. Technol. Knowl. Manage., vol. 2, pp.405-
410, 2009.
VI. C ONCLUSION AND F UTURE D IRECTION [13] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object
recognition using shape context,” IEEE Trans. Pattern Anal. Mach. Intell.,
This paper presented a novel method for hand gesture vol 24, pp. 509-522, 2002.
[14] H. Ling and D. W. Jacobs, “Shape classification using the inner
classification based on Graph Fourier Transform (GFT). For distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, pp.286-299,
efficient calculation of transform coefficients first the image is 2007.
divided into blocks of subimages. We proposed a method for [15] X. Bai and L. J. Latecki, “Path similarity skeleton graph matching,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 1-11, 2008.
generating an undirected graph using an appropriate threshold [16] S. Ding, H. Zhu, W. Jia, and C. Su, “A survey on feature extraction
to preserve local neighborhood information. Graph Laplacian for pattern recognition,” Artificial Intelligence Review, vol. 37, no. 3, pp.
is used to define the connectivity of the underlying graph. 169-180.
[17] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
Inner product of the eigenvector matrix of the unnormalized detection,” IEEE Computer Society Conference on Computer Vision and
graph Laplacian and the image is called as Graph Fourier Pattern Recognition, vol. 1, pp. 886-893, 2005.
Transform. To determine the appropriate threshold for con- [18] C. Liu, “Gabor-based kernel PCA with fractional power polynomial
models for face recognition,” IEEE Transactions on Pattern Analysis and
struction of undirected graph different thresholds values are Machine Intelligence, vol. 26, no. 5, pp. 572-581, 2004.
used. GFT coefficients based on these threshold values are [19] Z. Ren, J. Yuan, J. Meng, and Z. Zhang, “Robust part-based hand gesture
used to determine the prediction accuracy. The threshold of recognition using Kinect sensor,” IEEE Transactions on Multimedia, vol.
15, no. 5, pp. 1110-1120, Aug. 2013.
30 is used to get maximum prediction accuracy using 10 fold [20] U Von Luxburg, “A tutorial on spectral clustering,” Springer Statistics
cross validation. Though the features generated by the K-LT and Computing, vol. 17, 2007
are optimal for the image reconstruction, but they perform well [21] X. He and P. Niyogi, “Locality preserving projections,” Proc. Conf.
also for the gesture classification. Similarly, DCT coefficients Advances in Neural Information Processing Systems, 2003.
[22] J. Shukla and A. Dwivedi, “A method for hand gesture recognition,”
are also used for classification. For the Cambridge and Massey Conference of Communication Systems and Network Technologies , pp.
datasets performance of GFT and DCT are same. For the 919-923, April 2014.
Massey and NTU datasets which have more number of classes, [23] T-K. Kim, R. Cipolla, “Canonical correlation analysis of video volume
tensor for action categorization and detection,” IEEE Transactions on
GFT performance is better than the K-L transform. The main Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1415-1428,
drawback of the proposed method is its computational com- 2009.
plexity and size of feature vector. Practical software requires [24] A. L. C. Barczak, N. H. Reyes, M. Abastillas, A. Piccio, T. Susnjak, “A
new 2D static hand gesture color image dataset for asl gestures,” Research
O(n3 ) time to compute all the eigenvalues and eigenvectors Lett. Information and Mathematical Sciences, vol. 15, pp. 12-20, 2011.
of an n × n symmetric matrix. In the proposed method size [25] C.-C. Chang and C.-J. Lin, “LIBSVM : a library for support vector
of the feature vector is number of blocks in an image ×64. machines,” ACM Transactions on Intelligent Systems and Technology,
vol. 2 no. 27, pp. 1-27, 2011.
Hence, future direction of the proposed method is to reduce [26] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine transform,”
the computational complexity and size of the feature vector. IEEE Transactions on computers, vol. C-23, no. 1, pp. 90-93, Jan. 1974.
392
2017 Fourth International Conference on Image Information Processing (ICIIP)
393