Sei sulla pagina 1di 4

ICCAD’17, Hammamet - Tunisia, January 19-21, 2017

Parallel implementation of Sobel filter using


CUDA
Hana Ben Fredj Mouna Ltaif Anis Ammar Chokri Souani
Faculty of ScienceMonastir, National School of Engineers, National School of Engineers, Laboratory of Microelectronics
University of MonastirTunisia University of Sousse, Tunisia University of Sousse, Tunisia and Instrumentation, Faculty of
ben.fredj.hanaa@gmail.com ltf.mouna@gmail.com anis.ammarr@gmail.com Sciences of Monastir, Tunisia
Higher Institute of Applied
Sciences & Technology,
University of Sousse, Tunisia
chokri.souani@gmail.com

Abstract—Efficient solutions must be considered, in order to on the GPU with CUDA. Also, OpenVIDIA [7] included a
solve the problem of intensive computing of the image processing collection of implementations of computer vision algorithms
applications and to achieve high real-time performance. The on a graphic processor using CUDA environment.
graphics processing unit (GPU) is an effective and the most In [8], the authors described a GPU based on segmenting
recent method used for accelerating extensive calculation
algorithms to reduce the execution time by exploiting the power
images with gradient-based edge detection. Luo et al. [9]
of parallel programming techniques and to obtain the highest proposed a GPU implementation based on the wavelet method.
performance. In this paper, we present a parallel GPU GPU has also been applied to the medical world in the
implementation of an edge detection algorithm with a Sobel applications of segmentation and reconstruction MRI images
operator using CUDA (Compute Unifies Architecture) [10][11]. In addition, there are some work developed real-time
environment. Furthermore, we analyze and prove the high face detection and recognition applications on GPU and these
performance of GPU implementation, by testing the algorithm are much faster than others implemented on CPU, such as in
on a standard central processing unit (CPU) to compare the [12][13].
computational efficiency of these systems. Our experimental In this category, we choose to implement the Sobel detector
results show that the effectiveness of the GPU implementation by
its higher performances compared to sequential calculation.
application on parallel graphic processor.
The selected hardware architecture for our parallel
Keywords— Cuda, edge detection, GPU, parallel implementation. implementation is the CUDA. This is a novel general purpose
parallel computing architecture that provides the parallel
NVIDIA Graphics Processing Units (GPUs), to solve many
I. Introduction involved computational problems in a more efficient way
The Sobel operator is extensively used in image and compared to CPU.
video processing, mainly in edge detection algorithms for In fact, in this paper we study an implementation of the
various applications. In medical science, edge detection is a Sobel filter detector using the CUDA environment. We have
beneficial and useful method to develop secure medical proposed a simple and detailed model of CUDA
imaging information [1]. Also, it is a well-known first step in implementation. It allows reducing time execution by
several computer vision and pattern recognition applications comparing to the performance program code executed on a
[2] because it includes useful information and gives accurate sequential processor (CPU).
results. The remainder of this paper is organized as follows: we
Some work of edge detection algorithms have been first provide a brief review of the Sobel filter detector in
described in several previous papers using the Sobel detector section II. Then, we describe the programming model and the
such as in [3][4][5]. In fact, this detector is a simple and CUDA architecture in section III. In section IV, we display
efficient operator that uses convolution matrix. Many image our experimental results and we conclude with section V.
processing algorithms need an intensive of calculation
operations. Moreover, the traditional methods used for large II. Algorithm Background
resolution image processing cannot achieve the real-time
requirement. However, the GPU has become one of the ideal A. The Sobel Filter
solutions to accelerate data parallel computing. In fact, it is
used for general-purpose, since it is characterized by a high The Sobel operator measures the gradient of an input
computational power. Thus, in the literature there are several grayscale 2D image. In fact, both of the 3 × 3 convolution
works have been implemented on the parallel processor GPU masks are used to compute the horizontal gradient and the
with an interesting result. As an illustration, Yang et al [6] vertical gradient .
implemented several classical algorithms of image processing

978-1-5090-5987-4/17/$31.00 ©2017 IEEE 209


By considering "I" an image source, the approximations of
gradients are calculated as follows:

-1 0 1

= -2 0 1 ×I (1)

-1 0 1

-1 -2 -1

= 0 0 0 ×I (2) Fig. 1. CUDA Programming Model [14]

1 2 1 B. Proposed Model
In this section, we present a model image processing on a
At each point, the approximate horizontal and vertical GPU, which allows performing edge detection application
gradients can be combined to get an approximation of the using the Sobel filter. Thus, this model is characterized by a
gradient G, as follows. The Gradient magnitude is given by: hybrid CPU-GPU approach based on CUDA parallel
processing and the library “OpenCV” to read input image and
display the results. Hence, this model allows loading,
G=
2
+ 2 (3) processing and displaying images on graphics processors as
shown in Fig.2.

III. CUDA Programming Model This model compries four essential steps. First, the host
(CPU) loads the input image and stores it in a host memory.
A. CUDA programming model Then, the input data will be transferred from CPU memory to
GPU memory. Next, follows the allocation of threads to
The Graphic processors have especially a massively process the data and to execute the main kernel of the edge
parallel cores structure. In order to exploit the graphic detection function on the GPU. Finally, the results will be
processor efficiently by providing faster access to data, we returned to the host memory to expose the processing results
have used the NVIDIA CUDA environment. performed on the Graphic Processor.
CUDA is a parallel processing architecture developed by
NVIDIA. It is widely used in general purpose computation
and utilizes a high-level language compatible with C/C++.
Each CUDA program includes two main parts: a CPU code
called the GPU, and a CUDA code is known as a kernel
executed in parallel on GPU. CUDA functions allow the
programmer to define the number of threads executed by a
kernel on GPU.
Moreover, the threads executed on the GPU are grouped
into blocks; the set of blocks represents the computing grid
GPU as shown in Fig.1. The input images that will be
processed in this study are 2D images. So we chose 2D blocks
for our implementation.

Fig. 2. Flowchart of CUDA implementation

210
IV. Experiment Results The following section, presents the execution time of the
different steps described below:
The experiments were carried out on an Intel Core i7-
5500U CPU with 8GB of main memory. The NVidia GeForce Step1: Copy input data from CPU memory to GPU memory
920M is characterized by 384 CUDA Cores and 4GB of the Step2: Load the GPU program and execute Kernel.
graphic memory. The operating system was Linux 64-bit with Step3: Copy results from the GPU memory to the CPU
OpenCV 3.0.0 for image and video manipulation. The version memory.
of the CUDA Driver Runtime was 7.5. Both implementations
were performed using Qt Creator tools. The CPU Fig.5 shows that access to the memory for reading/writing
implementation was carryed out using C++ language. dates from/to the GPU memory occupies 81 percent of the
Fig.3 shows the original image and the resulting parallel total filter time, which retards execution time.
Sobel edge detection image, which is calculated by the
parallel GPU program.

44% 37%
step1
step2
19% step3

Fig. 5. Percentage of execution time for every step for the Sobel
Original test image Sobel edge detection
Operator function in the GPU
result
(Size 256*256 pixels)
Moreover, the block size is an important parameter of
computational time. Hence, we propose here to study the
Fig. 3. Sobel edge detection result
influence of the block size on the measurement of execution
In this paper, to express the performance of this work, we time. In fact, CUDA functions allow the programmer to define
give a comparative analysis of execution time for edge the number of kernel threads which are running on GPU.
detection function implementation between the GPU and CPU, However, the number of threads can largely exceed the
by varying the size of the image. Fig.4 shows the results which number of processing units.
indicate the performance of the GPU, is more important than
those of the CPU. We also notice as the image size increases, On the other hand, we define square blocks of size n. The
CPU execution time increases rapidly for the 512*512 and following Fig.6 presents the consumed execution time,
1024*1024 size images, meanwhile for the GPU, it does not according to the block size. The lowest execution time is
increase overly. obtained for blocks 32 * 32 when running the Sobel edge
detection function on the GPU. It is necessary to choose the
100 blocks size well to achieve high performance.
Execution time(ms)

90
80
70 2
Execution time (ms)

60
50 1.5
40
30 GPU
20 1
10 CPU
0
0.5

0
1*1 2*2 4*4 8*8 16*16 32*32
image size (pixels)
Block size
Fig. 4. Execution time for edge detection function on GPU and CPU
Fig. 6. Influence of block size on the GPU execution time.

211
[12] S.J. Bhutekar, and A.K. Manjaramkar, “Parallel face Detection and
Recognition on GPU,” International Journal of Computer Science and
Information Technologies, vol. 5, pp. 2013-2018, 2014.
V. Conclusion [13] T. Nguyen, D. Hefenbrock, J. Oberg, R. Kastner, and S. Baden, “A
software-based dynamic-warp scheduling approach for load-balancing
the Viola–Jones face detection algorithm on GPUs,” Journal of Parallel
We have presented in this paper, a GPU implementation of and Distributed Computing. vol. 73, pp. 677–685, 2013.
[14] NVIDIA CUDA Programming Guidehttp://www.developer.nvidia.com
the Sobel algorithm that achieves higher performance / nvidia-gpu-computing-documentation.
compared to the CPU sequential implementation. In fact, we
have used the CUDA environment to exploit the power of
GPU component and “OpenCV” library to read input image
and to display the results. Our experimental results indicate
that our model can perform parallel computation with much
less time and provide a significant acceleration than the CPU.
Obviously, we conclude that GPU provides a novel and
efficient acceleration technique for image processing. As a
further research, we will also implement more complex image
processing algorithms into the GPU by using the suitable
memory type (global, shared and texture) to reduce memory
access. Furthermore, we will focus to optimize CUDA
programming by a better exploitation of the resources
provided by the GPU to achieve important performance.

References
[1] H. Al-Dmour, and A. Al-Ani, “Quality optimized medical image
information hiding algorithm that employs edge detection and data
coding,” Computer Methods And Programs in Biomedicine, vol. 127,
pp. 24-43, 2016.
[2] L. Junyan, T.Qingju, W.Yang, L.Yumei, and Z. Zhiping, “Defects’
geometric feature recognition based on infrared image edge detection,”
Infrared Physics & Technology, vol. 67, pp. 387-390, 2014.
[3] Z. Jin-Yu, C. Yan, H. Xiang, ”Edge Detection of Images Based on
Improved Sobel Operator and Genetic Algorithms,” IEEE International
Conference on Image Analysis and Signal Processing (IASP ), pp. 31-
35, April 2009 .
[4] C. Topal , and C. Akinlar, “Edge Drawing;A combined real-time edge
and segment detector,” Journal of Visual Communication and Image
Representation , vol. 23, pp. 862–872, 2012.
[5] J. Canny, “A computational approach to edge detection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no.
6, pp. 679-698., 1986.
[6] Z. Yang, Y. Zhu, and Y. pu, “Parallel Image Processing Based on
CUDA,” International Conference on Computer Science and Software
Engineering, pp. 198–201, 2008.
[7] J. Fung and S. Mann, “OpenVIDIA:Parallel gpu computer vision,”
In: Proceedings of the 13th annual ACM international conference on
Multimedia. ACM, pp. 849–852, 2005.
[8] D. Diaz-Pernil, A. Berciano, F. Pena-Cantillana, and M.A. Gutierrez
Naranjo, “Segmenting images with gradient-based edge detection
using Membrane Computing,“ Pattern Recognition Letter, vol. 34 , pp.
846–855, 2013.
[9] W.J. Van der Laan, A.C. Jalba and J.B, Roerdink, ”Accelerating
Wavelet Lifting on Graphics Hardware Using CUDA,” IEEE
Transactions. Parallel and Distributed Systems, vol 22, pp. 132-146,
2011.
[10] A. De, Y. Zhang, and Ch. Guo, “A parallel adaptive segmentation
,

method based on SOM and GPU with application to MRI image


processing,” Neurocomputing, Vol. 198, pp. 180–189, 2016.
[11] S.S Stone, J.P Haldar, S.C Tsao, B.P Sutton, and Z.P Liang,
“Accelerating advanced MRI reconstructions on GPUs,” Journal of
Parallel and Distributed Computing, vol 68, pp. 1307-1318, 2008

212

Potrebbero piacerti anche