Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract—To improve the visual quality of the low-resolution large amount of training samples, and its time cost on
license plate image in the video surveillance system, this paper training is high.
proposes a new method of super-resolution, namely multi-scale In this paper, we propose a new super-resolution method
super-resolution convolutional neural network (MSRCNN), it for license plate image, namely multi-scale super-resolution
was inspired by Inception architecture of GoogLeNet. The convolutional neural network (MSRCNN). The method
proposed method uses different sizes of filters for parallel consists of three convolutional layers, the first layer extracts
convolution to obtain different features of the low-resolution and fuses features of LR image, the second layer maps
license plate image, then the features can be fused by the layer features from LR image to HR image, and the third layer
of concatenation. Finally, the high-resolution images can be
reconstructs HR image. Experiments show that the proposed
reconstructed through non-linear mapping. Experimental
results prove that the proposed method can improve the
model is effective to improve the quality of super-resolution
quality of low-resolution license plate image obviously. image, especially for reconstruction of edge information. The
rest of the paper is organized as following. In Section II, the
Keywords-super-resolution; convolutional neural network; proposed SR method is explained in details. Section III
license plate image provides our experimental results and Section IV concludes
this paper.
I. INTRODUCTION II. PROPOSED METHOD
With the rapid increase in the number of motor vehicles,
the license plate plays an important role in the field of public A. GoogLeNet Convolutional Network
security. However, in the real-world video surveillance, due Now the convolution neural network is getting bigger
to the camera’s location, the light condition and other and bigger, and its performance is getting better and better.
reasons, the license plate images obtained are of poor visual In generally, the performance of neural networks can be
quality with low resolution mostly. It affects the applications improved by adding layers and the number of filters per layer.
of the license plate image for public security. Hence, GoogLeNet [11] is the champion of ImageNet 2014. Its
improving the license plate image’s resolution is important in architecture is considered as network in network. The main
practical application. module is called Inception architecture, which is a tiny
In order to obtain high-resolution license plate images, network. In this network, the image is simultaneously
the most direct way is to improve the hardware device, but convolved with different size filters to get different feature
the cost is high. Therefore, super-resolution (SR) technique maps. Then these feature maps connect together to form a
has become the main way. SR uses signal-processing new fusion feature map. This architecture reduces the
techniques to reconstruct a high-resolution (HR) image from influence of filter size on neural network performance and
a low-resolution (LR) image [1]. Usually, SR could be can extract more features. Therefore GoogLeNet have an
divided into three categories: interpolation-based method [2] extraordinary good performance. Inspired by Inception
[3], reconstruction-based method [4] [5] and learning-based architecture, the proposed method uses different filter size on
method [6] [7]. Interpolation-based method is easy and fast, first layer to extract more features and improve the image
but the result is too smooth and ringing artifact seriously. reconstruction performance.
Reconstruction-based method uses mathematical models to
reconstruct high resolution images, but the computation is B. Multi-scale Super-resolution Convolutional Neural
complicated. Learning-based method makes full use of the Network
inherent priori knowledge of images. It can well preserve Super resolution convolutional neural network (SRCNN)
details of the image and it is suitable for dealing with special [12] [13] is end-to-end mapping between LR images and
images, such as spectral images [8], infrared images [9] and corresponding HR images. SRCNN has experimentally
medical images [10]. But learning-based method requires a verified that adjusting the number of network’s layer, the
size of filter and the number of filter in each layer can effect
Concat
pad13
Low-resolution
n11
High-resolution
Image (input) image (output)
1) Feature extraction and representation The final HR image is obtained from the reconstruction
In first layer, image is simultaneously convolved by three filter convolution n2 -dimensional high-resolution features.
different filters f11 , f12 , and f13 (corresponding pad11 , pad12 , Image reconstruction is expressed as:
pad13 , respectively). These three feature maps connect F (Y ) W3
F2 (Y ) B3 (3)
together to form a new fusion feature map, which is the input where, W3 is the filter weight corresponding to f3 u f3 filter,
of the next layer. The feature extraction and representation B3 is a scalar.
are expressed as:
F1 (Y ) max(0, W11
Y B11 ) max(0, W12
Y B12 ) C. Loss function
(1)
max(0, W13
Y B13 ) SRCNN [12] [13] uses Mean Squared Error (MSE) as a
loss function for testing the accuracy of the training model
where, f11 u f11 , f12 u f12 , f13 u f13 are the filter size, W11 , W12 ,
and updating the parameters of the model. In this paper we
W13 , B11 , B12 , B13 , n11 , n12 , n13 are filter weights, filter biases, still use MSE as the loss function of the neural network , the
and the number of filter, respectively. Here, B11 , B12 , B13 are expression is:
2
n1 -dimensional vector, suppose n11 n12 n13 n1 , '
' deno 1 n
tes convolution operation, ' 'denotes concatenation.
¦ F (Yi ; 4) X i
L(4)
ni1
(4)
724
set is 10 (Set10), five of them are the same as Set5. The Therefore, parameters of MSRCNN2 are adjusted to further
upscale factor is 3 in this paper. improve the quality of reconstruction in the next section.
C. Model Parameters
1) The number of filters
Figure 2. Testing set of Dataset1 (Set5)
In MSRCNN2, the filter number of each layer is
This experiment uses Dataset1 and Dataset2 to train n11 n12 n13 32 , n2 32 , n3 1 . This experiment
SRCNN model, respectively. Set5 is tested by SRCNN changes filter number in first layer and second layer, that is
model [12], SRCNN Dataset1 model, SRCNN Dataset2 MSRCNN4 ( n11 n12 n13 64 , n2 64 , n3 1 ) and
model, as shown in Fig. 3. Notice that SRCNN model [12] MSRCNN5 ( n11 n12 n13 16 , n2 16 , n3 1 ). The
was trained by author using ImageNet, so only the final
testing results on Dataset2 are shown in Fig. 5.
reconstruction results can be obtained, but in order to
compare with our experiment results, we draw a line in Fig.
3. Experiment shows that the more training data, the better
property. Besides, for license plate image reconstruction,
license plate image training sets is better than natural image
sets.
725
MSRCNN2 model is still the best choice. PSNR value and
SSIM value of models are shown in Table I.
Original / PSNR Bi-cubic / 26.4929
726
MSRCNN6 31.4126 28.7354 31.8657 30.9261 33.6527 31.3185 31.2830
(Dataset2)
/ 0.9115 / 0.9107 / 0.9108 / 0.9298 / 0.9467 /0.9219 /0.8916
MSRCNN7 32.7722 29.2593 33.1222 32.4821 34.5336 32.4388 32.4116
(Dataset2)
/ 0.9375 / 0.9217 / 0.9380 / 0.9423 / 0.9560 /0.9391 /0.9152
MSRCNN8 32.8434 29.4160 33.4530 32.7366 34.4259 32.5750 32.4876
(Dataset2)
/ 0.9362 / 0.9240 / 0.9354 / 0.9451 / 0.9542 /0.9389 /0.9170
727