Sei sulla pagina 1di 6

Design and Implementation of ConvNet for

Handwritten Digits Classification on Graphical


Processing Unit
Humera Shaziya Prof. K. Shyamala Raniah Zaheer
Research Scholar Head Lecturer
Department of CSE, UCE Department of CSE, UCE Department of CS
Osmania University Osmania University Najran University

Abstract—Convolutional Neural Network (CNN) or ConvNet provide insights and suggestions for convolution optimization
is the leading-edge deep learning model that has achieved on GPU, memory usage and shape limitation during GPU
phenomenal successes in the tasks of image classification, object kernel execution. Performance profiling is conducted to study
recognition, speech recognition and natural language processing.
ConvNets are inherently complex architectures and training the intrinsic characteristics of various ConvNets implementa-
ConvNets requires significant amount of computation. There is tions on GPU. There is no single implementation best for all
a need to determine whether GPU or CPU provide the effective scenarios. Another study in [7] focuses on investigating the
method for implementation of ConvNets. Very few studies have power behavior and energy efficiency of numerous well-known
come up with the comparison of the implementation of ConvNets frameworks of ConvNets and provided a detailed workload
on both CPU and GPU. The present work examines the impact
of GPU on the implementation of ConvNets. ConvNet is trained characterization to facilitate energy efficient deep learning
on MNIST dataset to perform classification of handwritten digits. solutions. The authors have studied the power behavior and
The experiments have been performed on both CPU and GPU energy efficiency of ConvNets on both CPU and GPU. The
and observed that there is a performance improvement of 5 times experiments have been conducted on Intel Xeon CPU, Nvidia
in terms of training time speedup on GPU. The proposed work Kepler and Maxwell GPUs. The observation through their
also investigates the effect of regularization and the results show
that regularization indeed reduces the problem of overfitting. experiments is that network topology and batch size affect
Index Terms—Convolutional Neural Networks (ConvNet), power and energy consumption. The authors claim that their
Deep Learning, Graphical Processing Unit (GPU), Handwritten results can be utilized to design energy efficient deep Con-
Digits Classification vNets frameworks and architectures.

I. I NTRODUCTION
Overfitting is a major problem in ConvNets that occurs
Deep learning is the revolutionary technology that has
when the model is trained on some set of features and recog-
brought about several transformations in the way a prob-
nizes the training data but does not generalize well to unseen
lem is attacked and the solution is framed. ConvNets are
data. To reduce this problem of overfitting, a regularization
one of the phenomenal models of deep learning. ConvNets
technique called dropout is used. To investigate the effect of
have been applied to a wide range of applications and have
dropout, ConvNets are trained without the use of dropout and
achieved remarkable results. Some of the use cases of Con-
then with the use of dropout. Finally the model is trained
vNets include image classification, object detection, speech
on both CPU and GPU and the training time is compared.
recognition, and natural language processing. The ConvNets
The contributions of the present work can be summarized
breakthrough came in ILSVRC challenge [1] in 2012 whose
as follows: A ConvNet is designed and implemented using
results showed a significant reduction in the error rate [2].
tensorflow and keras library on both CPU and GPU. The effect
VGGNet [3], GoogleNet [4] and Overfeat [5] are some of the
of dropout on the overfitting problem of ConvNet is examined.
important ConvNets architectures that have been successful
A comparison of the training time of ConvNets on CPU and
in improving the accuracy and reducing the error rate. The
GPU is performed. Finally the performance improvement is
number of parameters and the depth of the model contribute
analyzed by executing ConvNets on GPU.
to the complexity of the ConvNets and thus implementing
them on CPUs makes the model inefficient in terms of training
time and overall speed of execution. ConvNets can leverage The paper is organized as follows. In section II, an overview
GPUs to harness the computing power of GPUs to accelerate of CNNs and CNN on GPU are presented. In section III,
their performance by reducing the training time. Similar work experimental methodology is described. Section IV investi-
is carried out in [6] where a comparison is made on several gates evaluation methodology. Section V discusses the results.
parallel implementations of ConvNets on GPUs, the authors Lastly section VI concludes the paper.
II. C ONVOLUTIONAL N EURAL N ETWORKS (C ONV N ETS )
ConvNet [8] is a sequence of layers that accepts raw
pixels of images as input and computes class scores that
predict the class to which the image belongs. ConvNets
are similar to regular neural networks except that they are
not fully connected in all layers. ConvNets are suitable to
images because they are not fully connected and thus require
fewer parameters. ConvNets are inspired by the biological
visual cortex. Similar to visual cortex, ConvNets learn the
Fig. 1: Lenet 5 - Convolutional Neural Network.
features of the image hierarchically. First layer learns about
the edges, lines, and points followed by the next layer that
understands the combinations of these individual elements like
in training set and 10,000 in test set. ImageNet [11], a larger
circles, rectangles and so on. This process continues until an
dataset contains almost 15 million high resolution images. This
entire image is formed from these smaller units. A special
challenge of training massive dataset on complicated networks
characteristic of ConvNet is its ability to learn the weights
can be addressed by leveraging GPU-accelerated computing.
through back propagation algorithm. Therefore ConvNets does
A CPU consists of few cores optimized for the processing of
not require hand-crafted features leading to a major milestone
serial tasks whereas GPU embodies multiple cores optimized
in machine learning field. ConvNets are organized into the
for handling massively parallel applications. Numerous deep
following layers: Input layer accepts raw pixel values of image.
learning frameworks are emerging with the support for GPUs
Conv layer performs dot product of filter and image values.
by using CUDA [12] and cuDNN [13] programming interface.
A filter is created for each feature. Typical size of filter is
Compute Unified Device Architecture (CUDA) is parallel
3x3 or 5x5. The filter convolves or slides over the image
computing platform developed by Nvidia in 2006. CUDA
and a dot product is computed between the weights of the
programs are executed on GPU. Therefore CUDA program
filter and the pixel values of the image. Non-Linearity layer
execution shows significant performance improvement. The
applies an activation function. Most commonly used activation
Nvidia CUDA Deep Neural Network (cuDNN) library is a
functions in ConvNets are Relu, Leaky relu and a recent
GPU-accelerated library for deep neural networks. In order to
function Elu has also been used. Subsampling or pooling layer
implement ConvNets on GPU using tensorflow-gpu library,
reduces the size of the input volume. Reduction of the size
it is essential to install CUDA and cuDNN libraries. The
makes the input more manageable and also decreases the
deep learning frameworks [14] include tensorflow [15], Caffe
number of parameters. Max pooling is widely used as the
[16], Torch [17], Theano [18], Keras [19]. Tensorflow is an
subsampling technique because it performs well in practice.
open source library for numerical computations. Tensorflow
Fully connected or dense layer classifies the image into one
represents the operations in data flow graph where the nodes
of the specified classes. All the previous layers are used to
are the operations and edges are tensors. A tensor is an array of
learn features of the image and this layer uses the learned
any dimension. Keras is a high level API for neural networks.
features and softmax activation function to perform the task of
Keras is written in python and runs on top of TensorFlow.
classification. Softmax assigns a probability value to each class
for the input. The sum of all output probabilities is equal to III. E XPERIMENTAL M ETHODOLOGY
one. Dropout [9] regularization is included in the design after
A. Experimental Environment
pooling layer. Essentially dropout converts the state of few
neurons to inactive or drops them so that they do not contribute ConvNet model is implemented on a hybrid system con-
to learning process. Figure 1 shows Lenet-5, a convolutional sisting of both CPU and GPU. CPU is Intel i5 7th gen
neural network architecture developed by Yann Lecun in 1998. with the clock speed of 2.7 GHz. Main memory is 8GB
It was used to recognize handwritten characters. The input and hard disk capacity is 1TB. A single Nvidia GeForce
to the network was a character image of size 32x32 and the 940MX GPU is used in this experiment. The GPU has 512
output was a class score. There were 7 layers in the network CUDA cores with graphics clock of 795 MHz. The GPU also
containing trainable weights or parameters. [10]. comes with 2GB DDR5 dedicated memory. Window 10 64-
bit is the operating system. CPU-based ConvNets have been
A. Convnets On GPU implemented primarily in python 3.6 using Tensorflow 1.2
The training time of ConvNets is high which can be (CPU version) and GPU-based ConvNets are implemented
attributed to their complex architecture and also to massive on TensorFlow-gpu 1.2 (GPU version). The dependencies for
datasets. AlexNet the winner of ILSVRC 2012 has 8 layers TensorFlow-gpu CUDA 8.0 and cuDNN 6.0 have also been
(5 conv layers and 3 dense layers) and more than 60 million installed. Keras2.0 library and Anaconda 4.0 64-bit version
parameters. VGGNet has 19 layers (16 conv layers and 3 dense are also used.
layers) and over 144 million parameters. GoogleNet winner of The ConvNet is trained and tested on MNIST dataset [20].
ILSVRC 2015 comprises of 22 layers with about 6.8 million MNIST is a popular dataset and is widely used to prove
parameters. CIFAR-10 dataset consists of 50,000 color images effectiveness of deep learning. The dataset consists of images
of handwritten digits 0 through 9. There are 60,000 training TABLE I: ConvNet configuration without dropout
and 10,000 testing images. Essentially there are 10 classes [0- Block Layer Filter Output Param
9] and each image is associated with a label indicating the 1 Conv 32 28 x 28 x 32 320
1 Leaky Relu 32 28 x 28 x 32 0
class to which the image belongs. The size of each image is 1 Max Pool 32 14 x 14 x 32 0
28x28 pixels and is of gray scale. For processing the image 2 Conv 64 14 x 14 x 64 18496
dimensions are converted into a vector of 784 values. 2 Leaky Relu 64 14 x 14 x 64 0
2 Max Pool 64 7 x 7 x 64 0
3 Conv 128 7 x 7 x 128 73856
B. ConvNet Architecture 3 Leaky Relu 128 7 x 7 x 128 0
3 Max Pool 128 4 x 4 x 128 0
The design of ConvNet consists of 5 blocks. Each block 4 Flatten - 2048 0
4 Dense - 128 262272
is composed of three layers; convolutional (conv) layer, non- 4 Leaky Relu - 128 0
linearity layer and subsampling layer. When dropout is in- 5 Dense - 10 1290
cluded in the model it turns out to be the fourth layer in the
block. Out of 5 blocks, 3 blocks are conv layers and 2 blocks
are dense layers. The model is organized sequentially in a
linear stack of layers of neurons. The model configuration
is shown in table I. The first column of the table denotes Block 5: Here dense layer consists of softmax activation
the block number, followed by the type of layer. Next comes function to output the probability values. Since there are 10
the total number of filters used in each layer followed by the classes, the output of this layer is 10 probability values. The
dimensions of output of each layer, whose format is height sum of all the probability values is equal to one. The max
x width x depth. Final column indicates the total number of probability value will determine the predicted class of the
parameters generated in the corresponding layers. Conv layer, image.
non-linearity layer and subsampling layer are grouped together
into units called blocks. When the layer is fully connected it ConvNets configuration with dropout is identical to the
is called dense and is used instead of conv layer. So in this configuration given in table I except that dropout regularization
scenario the last two blocks composed of dense layer and non- is included in each block after pooling layer. The dropout
linearity layer. The primary functionality of learning is carried parameter in block one and two has a value of 0.25. Block
out in conv layer and hence the operations of dot product and three dropout value is 0.4 and the last dropout has a value
parameter generation are performed in this layer. The other of 0.3. The architectures of ConvNet on CPU and GPU are
two layers merely apply non-linearity and reduce the size of identical. The idea is to experiment same network on both
the output of the conv layer. The flatten layer is used to map the processors to figure out the difference in their training
the dimensions to a single vector which is fed into the dense time. Before feeding the images to the ConvNets, they are
layer of the network. preprocessed. Since the images are in gray shades, the depth of
Block 1: Conv layer uses 32 filters of size 3x3 each. The the images is specified to be one. The height, width and depth
total parameters generated are 320 (3x3x32+32 bias values). of images after this is 28x28x1. One-hot encoding method is
The result of this layer is a feature map of size 28x28x32. used to assign the class label to the images. One-hot encoding
The non-linearity function used in this model is leaky relu. is a process to convert categorical data into binary form. The
Further subsampling of the feature map is performed by using class to which image belongs gets a value of one and all
max-pooling operation for size 2x2. The output is a reduced the other classes receives a zero. For instance Suppose an
size of 14x14x32 volume. image in MNIST dataset has a class label of five and there
Block 2: Conv layer uses 64 filters of size 3x3 each. The are 10 classes. The one-hot encoding of that particular image
total parameters generated are 18496 (3x3x32x64+64 bias would be [0,0,0,0,0,1,0,0,0,0]. In other words it is defining
values). Feature map of volume 14x14x64 is the output of a vector of binary numbers to represent the class of the
this layer. Leaky relu is applied next. Subsampling (2x2 size) image. Subsequently the training data is split into train set
using max-pooling reduces input volume to 7x7x64 pixels. and validation set. Typically in machine learning models the
Block 3: Conv layer uses 128 filters of size 3x3 each. dataset is divided up into two sets: training set and test set
The total parameters generated are 73856 (3x3x32x128+128 in the percentage of 80% and 20%. Further the training set
bias values). The result is feature map of size 7x7x128. Next is split into train set and validation set. Altogether the data is
leaky relu is applied. Then subsampling (2x2 size) using max- partitioned into three parts of 60% train set,20% validation
pooling operation results in 4x4x128 volume. set and 20% test set. This partitioning of training set into
Block 4: Composed of dense layer and leaky relu layer. The two parts has been done to avoid overfitting by adjusting the
output of the previous layer is flatten to 2048 values obtained parameters on the validation set. After preprocessing the data
from the volume of 4x4x128 to be given as input to this dense using reshaping, one-hot encoding and splitting operations, it
layer. 128 neurons are fully connected to another 128 neurons is given as input to the ConvNet. The ConvNet is then trained
in the next block resulting in 262272 parameters. Leaky relu and finally the model is tested on test set to determine its
is applied in this block as well. accuracy and loss.
IV. E VALUATION M ETHODOLOGY
ConvNets are trained and tested on MNIST dataset on
CPU using tensorflow and on GPU using tensorflow-gpu. To
evaluate the model, six set of experiments are conducted
• Training the ConvNet on CPU without dropout.
• Training the ConvNet on CPU with dropout.
• Testing the ConvNets on CPU.
• Training the ConvNet on GPU without dropout.
• Training the ConvNet on GPU with dropout.
• Testing the ConvNets on GPU.

Training dataset of 60000 images is divided up into training


set of 48000 images and validation set of 12000 images. The (a) Accuracy without dropout (b) Loss without dropout
model is trained on training and validation set to determine the
accuracy and loss. Figures 2,3 show number of training epoch
on x-axis and accuracy or loss on y-axis. Training accuracy and
loss is represented as dotted line and validation accuracy and
loss is denoted by solid line. Figure 2(a) and 2(b) present the
training accuracy and the training loss of ConvNet on CPU
for ten epochs without the use of dropout. Figure 2(c) and
2(d) shows the training accuracy and training loss respectively
on CPU with dropout. Accuracy is determined from correctly
classified instances upon total number of instances. Categorical
crossentropy loss function of keras has been invoked for com-
puting the loss associated with training set and validation set.
Crossentropy is a log likelihood loss function that computes (c) Accuracy with dropout (d) Loss with dropout
the loss of the network. The network reduces the loss by Fig. 2: CPU Accuracy and loss.
learning the parameters. Figure 3(a) and 3(b) present the
training accuracy and training loss respectively of ConvNet
TABLE II: Accuracy and loss of ConvNets
on GPU for ten epochs without the use of dropout. Figure
3(c) and 3(d) show the training accuracy and training loss Mode AccLoss CPU GPU
TWOD Accuracy 0.9978 0.9977
on GPU with dropout. All the eight figures show an increase TWOD Loss 0.0079 0.0073
in the accuracy and a decrease in the loss with each epoch. TWD Accuracy 0.9903 0.9904
However the difference lies in the training time and it is much TWD Loss 0.0305 0.0311
VWOD Accuracy 0.9905 0.9904
larger in CPU when compared to the training time in GPU. To VWOD Loss 0.0449 0.0423
update the network weights during the training process, adam VWD Accuracy 0.9927 0.9928
optimization algorithm [21] is chosen. In adam the learning VWD Loss 0.0270 0.0288
TeWOD Accuracy 0.9910 0.9918
rate is adapted as the iterations progresses. Adam is derived TeWOD Loss 0.0344 0.0319
from the phrase adaptive moment estimation. An alternative TeWD Accuracy 0.9932 0.9937
to adam optimizer is stochastic gradient descent(SGD). Adam TeWD Loss 0.0206 0.0213
is quite different and efficient than SGD because it adapts its
weights as the learning progresses. SGD uses a fixed learning
rate throughout the training process. Having a fixed learning of overfitting. The validation loss on CPU without dropout was
rate may not take into account the variations that arise as the 0.0449 which is reduced to 0.0270 with the use of dropout.
learning unfolds. Changing the learning rate causes the model Likewise the loss on GPU without dropout is 0.0423 and with
to converge quickly. Thus the process of learning improves dropout is 0.0288. Hence dropout reduces overfitting problem.
with altering the learning rates. The performance of training ConvNets on CPU and GPU is
compared in table III in which the time taken to execute each
V. R ESULTS AND D ISCUSSIONS epoch and the speedup achieved is presented. The maximum
Table II shows the performance improvement gained when speedup realized in the model without dropout is 5 times more
the model is implemented on GPU. The column mode takes on and the average speedup is 4.4. The average training time
the values of TWOD,TWD,VWOD,VWD,TeWOD and TeWD on CPU without dropout is 148.1 secs and with dropout is
that denote training without dropout, training with dropout, 168.3 secs. Table IV presents the information of performance
validation without dropout, validation with dropout,testing when model is implemented with dropout. It is observed
without dropout and testing with dropout respectively. The that GPU execution is faster by 5.06 times on average. The
results show that the use of dropout has reduced the problem average training time on GPU without dropout is 34 secs and
(a) Accuracy without dropout (b) Loss without dropout
Fig. 4: Performance comparison for 10 epochs without dropout
on CPU and GPU.

(c) Accuracy with dropout (d) Loss with dropout

Fig. 3: GPU Accuracy and loss.

with dropout is 33 secs. The results show that the dropout


Fig. 5: Performance comparison for 10 epochs with dropout
increases training time in CPU while decreases on GPU.
on CPU and GPU.
The hyperparameters settings are same for all the models.
Batch size is 64. When the batch size increases, the memory
requirement also increases. So the typical batch size is 64 or
difference in terms of accuracy. However there is a significant
128. The experiments have been conducted for 10 epochs. For
training time speedup achieved on GPU which turn out to
10 epochs the accuracy achieved is over 99 percent. Figure 4
be 5 times more. The primary effect of reducing overfitting
and 5 plot the function of training time in secs with respect
from 0.449 to 0.0270 on CPU and 0.0423 to 0.0288 on GPU
to the number of epochs. Figure 4 presents the training time
when dropout is used is shown through the experiments. Hence
taken for the corresponding epochs when the model is trained
the use of dropout has reduced the problem of overfitting.
without the use of dropout. Figure 5 shows training time
The impact of dropout on training time is also experimented
with respect to epochs with the use of dropout. The accuracy
on both CPU and GPU. Results show that, an increase in
achieved when the models are trained on CPU is similar
the training time of CPU and decrease in the training time
to the accuracy acheived when the models are trained on
GPU. Empirical results show that the performance of training
time improves significantly when the models are trained on TABLE III: Comparison of ConvNet implementation without
GPU. Dropout regularization plays a vital role in reducing the dropout
problem of overfitting.
Epoch CPU(secs) GPU(secs) Speedup Achieved(%)
1 145 46 3.1
VI. C ONCLUSION A ND F UTURE W ORK 2 147 31 4.7
Deep learning is solving many problems in almost every 3 147 31 4.7
4 146 32 4.5
domain of real world. ConvNets are the most phenomenal 5 148 37 4.0
models of deep learning. In this study, an empirical work 6 149 37 4.0
to design and implement ConvNets to classify the images of 7 147 36 4.0
8 153 30 5.1
handwritten digits is considered. ConvNets have been imple- 9 149 30 4.9
mented on both CPU and GPU to comprehend the difference 10 150 30 5.0
in performance. It is observed from the results, that there is no Avg 148.1 34 4.4
TABLE IV: Comparison of ConvNet implementation with [15] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
dropout S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for
large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.
Epoch CPU(secs) GPU(secs) Speedup Achieved(%) [16] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
1 170 34 5.0 S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for
2 170 32 5.3 fast feature embedding,” in Proceedings of the 22nd ACM international
3 166 33 5.0 conference on Multimedia, 2014, pp. 675–678.
4 164 33 4.9 [17] R. Collobert, C. Farabet, K. Kavukcuoglu et al., “Torch,” in Workshop
5 173 33 5.2 on Machine Learning Open Source Software, NIPS, vol. 113, 2008.
6 167 33 5.0 [18] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Berg-
7 165 33 5.0 eron, N. Bouchard, D. Warde-Farley, and Y. Bengio, “Theano: new
8 168 33 5.0 features and speed improvements,” preprint arXiv:1211.5590, 2012.
9 172 33 5.2 [19] F. Chollet et al., “Keras: Deep learning library for theano and tensor-
10 168 33 5.0 flow,” URL:https://keras.io/k, vol. 7, p. 8, 2015.
Avg 168.3 33 5.06 [20] C. C. Yann LeCun and C. Burges, “Mnist handwritten digit database,”
http://yann.lecun.com/exdb/mnist/.
[21] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
preprint arXiv:1412.6980, 2014.
of GPU with the use of dropout in the model. Finally the
results of this study demonstrate that ConvNets are more
suitable to be implemented on GPU. To further delve into
the performance characterization of ConvNets on CPU and
GPU, different deep learning frameworks and libraries can be
considered. Larger datasets like Cifar-10 or Cifar-100 can be
experimented on ConvNets. Various parameters such as stride,
filter size, number of filters, or padding can also be changed
and the effect can be measured.

R EFERENCES
[1] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large
scale visual recognition challenge,” International Journal of Computer
Vision, vol. 115, no. 3, pp. 211–252, 2015.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in Neural Infor-
mation Processing Systems, 2012, pp. 1097–1105.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” preprint arXiv:1409.1556, 2014.
[4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, A. Rabinovich et al., “Going deeper with convolutions.”
CVPR, 2015.
[5] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le-
Cun, “Overfeat: Integrated recognition, localization and detection using
convolutional networks,” preprint arXiv:1312.6229, 2013.
[6] X. Li, G. Zhang, H. H. Huang, Z. Wang, and W. Zheng, “Performance
analysis of gpu-based convolutional neural networks,” in 45th Interna-
tional Conference on Parallel Processing. IEEE, 2016, pp. 67–76.
[7] D. Li, X. Chen, M. Becchi, and Z. Zong, “Evaluating the energy
efficiency of deep convolutional neural networks on cpus and gpus,”
in IEEE International Conferences on Big Data and Cloud Computing,
Social Computing and Networking, Sustainable Computing and Com-
munications. IEEE, 2016, pp. 477–484.
[8] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
tional networks,” in European conference on computer vision. Springer,
2014, pp. 818–833.
[9] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
dinov, “Dropout: A simple way to prevent neural networks from over-
fitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp.
1929–1958, 2014.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998.
[11] ImageNet, “Imagenet,” http://www.image-net.org/.
[12] NVIDIA, “Cuda zone,” https://developer.nvidia.com/cuda-zone.
[13] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro,
and E. Shelhamer, “cudnn: Efficient primitives for deep learning,”
preprint arXiv:1410.0759, 2014.
[14] S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah, “Com-
parative study of deep learning software frameworks,” preprint
arXiv:1511.06435, 2015.

Potrebbero piacerti anche