Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT 2 METHODOLOGY
We introduce the use of rectified linear units (ReLU) as the classifi- 2.1 Machine Intelligence Library
cation function in a deep neural network (DNN). Conventionally,
arXiv:1803.08375v1 [cs.NE] 22 Mar 2018
2.4.1 Softmax. Deep learning solutions to classification prob- prediction units (see Eq. 6). The θ parameters are then learned by
lems usually employ the softmax function as their classification backpropagating the gradients from the ReLU classifier. To accom-
function (last layer). The softmax function specifies a discrete prob- plish this, we differentiate the ReLU-based cross-entropy function
ability distribution for K classes, denoted by kK=1 pk . (see Eq. 7) w.r.t. the activation of the penultimate layer,
Í
If we take x as the activation at the penultimate layer of a neural
network, and θ as its weight parameters at the softmax layer, we
Õ
ℓ(θ ) = − y · loд max(0, θx + b) (6)
have o as the input to the softmax layer,
Let the input x be replaced the penultimate activation output h,
n−1
Õ
o= θi xi (2) ∂ℓ(θ ) θ ·y
=− (7)
i ∂h max 0, θh + b · ln 10
Consequently, we have The backpropagation algorithm (see Eq. 8) is the same as the
conventional softmax-based deep neural network.
exp(o )
pk = Ín−1 k (3)
exp(ok ) " #
k =0 ∂ℓ(θ ) Õ ∂ℓ(θ ) Õ ∂pi ∂ok
Hence, the predicted class would be ŷ = (8)
∂θ i
∂pi ∂ok ∂θ
k
Algorithm 1 shows the rudimentary gradient-descent algorithm
ŷ = arg max pi (4) for a DL-ReLU model.
i ∈1, ..., N
θ = θ − α · ∇θ ℓ(θ ; x (i) )
Any standard gradient-based learning algorithm may be used.
We used adaptive momentum estimation (Adam) in our
experiments.
Keras[4]) used in the experiments. The last layer, dense_2, used Table 3: MNIST Classification. Comparison of FFNN-
the softmax classifier and ReLU classifier in the experiments. Softmax and FFNN-ReLU models in terms of % accuracy. The
The Softmax- and ReLU-based models had the same hyper- training cross validation is the average cross validation ac-
parameters, and it may be seen on the Jupyter Notebook found in curacy over 10 splits. Test accuracy is on unseen data. Preci-
the project repository: https://github.com/AFAgarap/relu-classifier. sion, recall, and F1-score are on unseen data.
Table 1: Architecture of VGG-like CNN from Keras[4]. Metrics / Models FFNN-Softmax FFNN-ReLU
Training cross validation ≈ 99.29% ≈ 98.22%
Layer (type) Output Shape Param # Test accuracy 97.98% 97.77%
conv2d_1 (Conv2D) (None, 14, 14, 32) 320 Precision 0.98 0.98
conv2d_2 (Conv2D) (None, 12, 12, 32) 9248 Recall 0.98 0.98
max_pooling2d_1 (MaxPooling2) (None, 6, 6, 32) 0 F1-score 0.98 0.98
dropout_1 (Dropout) (None, 6, 6, 32) 0
conv2d_3 (Conv2D) (None, 4, 4, 64) 18496
conv2d_4 (Conv2D) (None, 2, 2, 64) 36928
max_pooling2d_2 (MaxPooling2) (None, 1, 1, 64) 0
dropout_2 (Dropout) (None, 1, 1, 64) 0
flatten_1 (Flatten) (None, 64) 0
dense_1 (Dense) (None, 256) 16640
dropout_3 (Dropout) (None, 256) 0
dense_2 (Dense) (None, 10) 2570
5 ACKNOWLEDGMENT
An appreciation of the VGG-like Convnet source code in Keras[4],
as it was the CNN model used in this study.
REFERENCES
[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San-
jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven-
berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike
Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul
Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,
Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.
2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.
(2015). http://tensorflow.org/ Software available from tensorflow.org.
[2] Abien Fred Agarap. 2017. A Neural Network Architecture Combining Gated
Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection
in Network Traffic Data. arXiv preprint arXiv:1709.03082 (2017).
[3] Abdulrahman Alalshekmubarak and Leslie S Smith. 2013. A novel approach
combining recurrent neural network and support vector machines for time series
classification. In Innovations in Information Technology (IIT), 2013 9th International
Conference on. IEEE, 42–47.
[4] François Chollet et al. 2015. Keras. https://github.com/keras-team/keras. (2015).
[5] Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and
Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances