Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
A R T I C L E I N F O A BS T RAC T
Communicated by X. Li Document image binarization refers to the conversion of a document image into a binary image. For broken and
Keywords: severely degraded document images, binarization is a very challenging process. Unlike the traditional methods
Binarization that separate the foreground from the background, this paper presents a new framework for the binarization of
Thresholding broken and degraded document images and restoring the quality of the document images. In our approach, the
Document image non-local means method is extended and used to remove noises from the input document image in the step of
pre-process. Then the proposed method binarizes the document image which takes advantage of the quick
adaptive thresholding proposed by Pierre D. Wellner. To get more pleasing binarization results, the binarized
document image is post-processed finally. There are three measures in the post-process step: de-speckle,
preserve stroke connectivity and improve quality of text regions. Experimental results show significant
improvement in the binarization of the broken and degraded document images collected from various sources
including degraded and broken books, magazines and document files.
1. Introduction Recent efforts in the adaptive method have been done in [8,9].
Specifically designed for broken document images (see Fig. 1),
Historical and ancient document collections available in libraries Stubberud et al. [10] trained an adaptive restoration filter and applied
throughout the world are of great cultural and scientific importance. the filter to the distorted text image that the OCR system could not
The transformation of such documents into digital form is essential for recognize. Adrian et al. [11] proposed the linking broken character
maintaining the quality of the originals while provide scholars with full borders with variable sized masks to improve recognition accuracy.
access to that information [1]. However, the document images may be Banerjee et al. [2] proposed the contextual restoration which learns the
broken and degraded because of the poor quality of paper, the printing text model from the degraded document itself. Touching and broken
process, ink blot and fading, document aging, extraneous marks, noise characters were corrected by the algorithm. Lazzara et al. [12] recently
from scanning, etc. [2]. Restoring and enhancing the quality of the improved Sauvola's method using a multiscale scheme. Milyaev et al.
broken and degraded document image is a common and essential [13] proposed a binarization method for accurate scene text under-
requirement in libraries. standing. Recently, Ntirogiannis et al. [14] proposed a pixel-based
Comparison with the traditional methods, e.g. median filtering [3], binarization evaluation methodology for historical handwritten/ma-
wiener filtering [4] and Bayesian filter [5], document image binariza- chine-printed document images.
tion plays a key role in document processing since its performance But it can not prevent missing punctuation marks. In addition to
affects quite critically the degree of success in a subsequent character broken document images, various methods for degraded document
segmentation and recognition [6]. The commonly used approaches of images have been proposed. Bal [15] proposed a language-independent
binarization can be classified into two types: global methods and local semi-automated system for enhancing degraded document images that
adaptive methods. The global document image binarization methods is capable of exploiting inter- and intra-document coherence.
find a global threshold for the whole document image, which can deal Likforman-Sulem et al. [16] compared two image restoration ap-
with background noise well but usually over-thresholds document proaches for the pre-processing of printed documents. She believed
images results in broken character strokers. Unlike global approaches, high-pass filtering may remove stains and holes. Lelore et al. [17]
the local adaptive document image binarization [7,1] determine the introduced a technique based on a Markov Random Field (MRF) model
threshold for each pixel adaptively, which can behave better than global of the handwritten document, whose observation model was estimated
methods on degraded and broken document images. Adaptive binar- thanks to an expectation maximization (EM) algorithm. The method
ization can also reduce noise while segmenting text from background. showed a greater sensitivity to thin line than other techniques. In the
⁎
Corresponding author at: Department of Computer Science, School of Information Science and Engineering, Xiamen University, Xiamen, China.
http://dx.doi.org/10.1016/j.neucom.2016.12.058
Received 17 February 2015; Received in revised form 2 September 2016; Accepted 26 December 2016
Available online 06 January 2017
0925-2312/ © 2017 Elsevier B.V. All rights reserved.
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
Fig. 3. Traverses image alternatively from the left and from the right.
Fig. 1. A broken document image (a) and the result (c) of our proposed binarization. (b)
g (n ) + g (n − width )
is the direct OCR result of (a), and (d) is the corresponding OCR result of (c). h (n ) =
2 (4)
recent efforts, Biswas et al. [18] proposed a global-to-local approach for width is the width of the image. g(n) (in Eq. (3)) and h(n) (in Eq. (4))
this issue and Singh et al. [19] proposed a method for severely are the further approximations of the sum f(n), respectively. h(n)
degraded and non-uniformly illuminated documents. makes the algorithm get rid of the every-other-line effect and produces
To cut these shortcomings mentioned before, we propose a frame- consistently good results across a wide range of images. The method
work that contains three steps. The document images are pre-pro- works well because comparing a pixel to the average of neighborhood
cessed using the non-local means denoising method in the first step. pixels will preserve hard contrast lines and ignore soft gradient
Secondly, we extend the Wellner's quick adaptive thresholding method changes. The advantage of the method is that it just need one pass
based on histogram and use Rosenfeld's method to determine the through the image. However, there are several problems with Wellner's
threshold for each pixel. In order to get pleasing binarization results, method. First, the method is very sensitive to the scanning order.
three measures are employed to post-process the output thresholding Second, the moving average is not a good representation for the
document images in the final step including de-speckle, preserve neighborhood pixels since the similar pixels are not evenly distributed
strokes and improve quality of the text regions. Experiments and in all directions.
evaluations are performed to test our proposed method.
The structure of the paper is organized as follows. Section 2 3. Our proposed method
describes the quick adaptive thresholding. Section 3 presents the
framework of our proposed method in detail. The experimental results The proposed method for broken and degraded document images
and discussions are described in Section 4. Section 5 contains conclu- binarization is illustrated in Fig. 4. All steps of the method are
sions and our future work. described in following subsections.
273
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
274
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
Table 1
Test document image portion and its corresponding OCR results. Underlined words indicate the error OCR results.
Table 2 Table 3
PSNR results of corresponding document images binarization in Table 1. Quantitative evaluation of the binarization results of different methods over 150
documents extracted from a digital issue of a French magazine.
Method PSNR
Method Precision Recall Fm Time (s)
Niblack [28] 12.7642
Sauvola et al. [29] 12.3087 W. Nibleck's 0.70 0.75 73.2 165
Kim et al. [30] 14.9765
Gatos et al. [6] 16.6665 Sauvola et al. 0.89 90.4 91 180
Chou et al. method [27] 16.9498 Kim et al. 0.88 0.88 87.1 190
Wagdy et al. method [31] 17.2156 Gatos et al. 0.90 0.91 90.5 202
Our proposed 18.2381 Chou et al. 0.92 0.91 92.8 190
Wagdy et al. 0.93 0.92 93.3 175
Ours 0.97 0.95 95.7 156
275
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
Fig. 10. Binarization of a severely broken document image: (a) original document image; (b) Nibleck's method [28]; (c) Sauvola et al. method [29]; (d) Kim et al. method [30]; (e) Gatos
et al. method [6]; (f) Chou et al. method [27]; (g) Wagdy et al. method [31]; (h) our proposed method.
276
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
1
c (q ) =
1 + (q − q0 )2 (10)
where q is the instantaneous coefficient of variation. The definition of q
can be found in [23]. The q0 is defined by
⎛ C ⎞
q0 = ⎜ ⎟ MAD (∥∇ log Ixt , y ∥)
⎝ 2⎠ (11)
where MAD (∥∇ log Ixt , y ∥) is the median absolute deviation, which is
defined as
MAD (∥∇ log Ixt , y ∥) = medianI {|∥∇ log Ixt , y ∥ − medianI [∥∇ log Ixt , y ∥]|}
(12)
where C is a constant C=1.4826. Also, the symbols ∇, ∥ ∥ and
represent the gradient, gradient magnitude and absolute value, respec-
tively.
ρL (x ) = log ⎨1 + ⎜ ⎟ ⎬
⎩
⎪
2 ⎝ ρL ⎠ ⎭ ⎪
(13)
where ρL is called the contrast parameter, which controls the shape of
the edge-stopping function [26]. x is the pixel in the binary image.
Fig. 8 demonstrates the result after our post-process measure.
277
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
Fig. 12. Binarization of an historical handwritten document image: (a) original document image; (b) Nibleck's method [28]; (c) Sauvola et al. method [29]; (d) Kim et al. method [30];
(e) Gatos et al. method [6]; (f) Chou et al. method [27]; (g) Wagdy et al. method [31]; (h) our proposed method.
[27]. The dataset contain 122 document images photographed by an adaptive method [30], Gatos et al. adaptive method [6], Chou et al.
ORITE I-CAM 1300 onechip color camera. The authors set the camera learning-built method [27], Wagdy et al. global thresholding method
lens at about 6 cm from each document and captured a rectangular [31].
region of approximately 4.9 × 3.4 cm which resulting a 320×240 pixels
grayscale image. Thus, the resolution of each captured image is about
166 dots per inch. Another 28 document images were collected from 4.1. Quantitative evaluation of performance
internet for test. The sensitivity of the proposed method to parameter
settings of q, p1, and p2 is shown in the Fig. 9. The results suggest that There are two goals for performance evaluation: effectiveness for
the following parameter values: q = 0.6, p1 = 0.5, p2 = 0.8 in the practice and effectiveness compared with other algorithms. Our
experiments achieve a highest precision. evaluation measures consist of the OCR system and PSNR value. To
The Performance of our proposed method is compared with six verify the efficiency of the proposed binarization method, experiments
state-of-the-art binarization algorithms: Niblack's adaptive threshold- are performed on several well-known OCR systems including
ing method [28], Sauvola et al. adaptive method [29], Kim et al. Tesseract-2.01 OCR from Google, Microsoft Office Document
Scanning and ABBYY FineReader. Table 1 shows the binarization
278
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
Fig. 13. Binarization of an historical handwritten document image: (a) original document image; (b) Nibleck's method [28]; (c) Sauvola et al. method [29]; (d) Kim et al. method [30];
(e) Gatos et al. method [6]; (f) Chou et al. method [27]; (g) Wagdy et al. method [31]; (h) our proposed method.
document image portions and the OCR results. In terms of OCR 2 × Recall × Precision
Fm =
performance, our method can achieve the same promising results as Recall + Precision (14)
Gatos et al. and outperform the others five methods for the example TP TP
where Recall = and Precision =
TP + FN
,
with TP, FP, and FN,
TP + FP
document image. It also produces 100% quality for OCR detection on
respectively, standing for true positive (total number of well-classified
the inadequate illumination condition. Corresponding PSNR values are
foreground pixels), false positive (total number of misclassified fore-
showed in Table 2. Throughout all the experiments, the proposed
ground pixels in binarization results compared to ground truth), and
method can achieve better improvement in binarization of broken and
false negative (total number of misclassified background pixels). The
degraded document images compared to other six approaches.
quantitative evaluation results are shown in Table 3. It is demonstrated
The assessment of the performance of our method against the
that our method outperform other six binarizaion methods in terms of
different cases is quantitatively analyzed with three metrics. According
average precision, recall and cost time over whole 150 document
to common evaluation protocols [32], we used the F-measure (Fm) in
images.
order to compare our method with other approaches. Fm is expressed
Figs. 10, 11, 12 and 13 show example documents that have severe
in percentage:
broken/scratches on the original documents. As a summary, we notice:
(1) Niblack's method suffer from a great amount of background noise,
279
Y. Chen, L. Wang Neurocomputing 237 (2017) 272–280
(2) Sauvola et al. method has no background noise, but many [13] S. Milyaev, O. Barinova, T. Novikova, P. Kohli, V. Lempitsky, Fast and accurate
scene text understanding with image binarization and off-the-shelf ocr, Int. J. Doc.
characters are broken, (3) Kim et al. approach achieves good result, Anal. Recognit. (IJDAR) (2015) 1–14.
but remain much noises and characters broken, (4) Gatos et al. [14] K. Ntirogiannis, B. Gatos, I. Pratikakis, Performance evaluation methodology for
performs well, but has many speckles, (5) Chou et al. method will historical document image binarization, IEEE Trans. Image Process. 22 (2) (2013)
595–609.
miss some strokes due to their method highly depends on learning [15] G. Bal, G. Agam, O. Frieder, G. Frieder, Interactive degraded document enhance-
features; (6) Wagdy et al. method may fail to retrieve characters ment and ground truth generation, in: Electronic Imaging 2008, International
because their method uses a global threshold; (5) compared with other Society for Optics and Photonics, 2008, pp. 68150Z–68150Z.
[16] L. Likforman-Sulem, J. Darbon, E.H.B. Smith, Pre-processing of degraded printed
methods, our proposed algorithm performs superiorly in general
documents by non-local means and total variation, in: Proceedings of the 10th
degraded document images. Particularly, the current method outper- IEEE International Conference on Document Analysis and Recognition, ICDAR'09,
forms all algorithms in the document images which are poorly broken 2009, pp. 758–762.
[17] T. Lelore, F. Bouchara, Document image binarisation using markov field model, in:
and have high noise.
ICDAR, 2009, pp. 551–555.
[18] B. Biswas, U. Bhattacharya, B.B. Chaudhuri, A global-to-local approach to
5. Conclusions and future work binarization of degraded document images, in: Proceedings of the 22nd IEEE
International Conference on Pattern Recognition (ICPR), 2014, pp. 3008–3013.
[19] B.M. Singh, R. Sharma, D. Ghosh, A. Mittal, Adaptive binarization of severely
We presents a new framework for degraded document image degraded and non-uniformly illuminated documents, Int. J. Doc. Anal. Recognit.
binarization and restoring the quality of the document images. The (IJDAR) 17 (4) (2014) 393–412.
document images are pre-processed using the non-local means denois- [20] P. Wellner, Adaptive thresholding for the DigitalDesk, Xerox, EPC1993-110.
[21] A. Buades, B. Coll, J. Morel, A non-local algorithm for image denoising, in: IEEE
ing method in the first step. Then, we extend the Wellner's quick Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2,
adaptive thresholding method based on the histogram and use Citeseer, 2005, p. 60.
Rosenfeld's method to determine the threshold for each pixel. To get [22] A. Rosenfeld, D.La. Torre, Histogram concavity analysis as an aid in threshold
selection (in image processing), IEEE Trans. Syst., Man, Cybern. 13 (1983)
the pleasing binarization results, three measures are employed to post- 231–235.
process the output thresholding document images in the final step, [23] G. Liu, X. Zeng, F. Tian, Z. Li, K. Chaibou, Speckle reduction by adaptive window
which includes de-speckle, preserve strokes and improve quality of the anisotropic diffusion, Signal Processing.
[24] F. Lin, X. Tang, Off-line handwritten Chinese character stroke extraction, in:
text regions. Experimental results and evaluation analysis show
Proceedings of International Conference on Pattern Recognition, Vol. 16, 2002, pp.
significant improvement in the binarization of document images 249–252.
collected from various sources including broken and degraded books, [25] L. Lam, S. Lee, C. Suen, Thinning methodologies-a comprehensive survey, IEEE
Trans. Pattern Anal. Mach. Intell. 14 (9) (1992) 869–885.
magazines and document files. The improvement in recognition is
[26] A. Pizurica, I. Vanhame, H. Sahli, W. Philips, A. Katartzis, A bayesian approach to
especially high for broken document images stuck by several torn nonlinear diffusion based on a laplacian prior for ideal image gradient, in:
pieces. Proceedings of the 13th Workshop on Statistical Signal Processing, IEEE/SP2005,
The proposed method primarily uses a non-local means method to 2005, pp. 477–482.
[27] C.-H. Chou, W.-H. Lin, F. Chang, A binarization method with learning-built rules
remove noise from document image. Its computational cost is still for document images produced by cameras, Pattern Recognit. 43 (4) (2010)
expensive. Integration of the non-local means denoising with adaptive 1518–1530.
threshold could further improve the binarization performance. [28] W. Niblack, An introduction to digital image processing, Strandberg Publishing
Company Birkeroed, Denmark, Denmark, 1985.
[29] J. Sauvola, M. Pietikainen, Adaptive document image binarization, Pattern
Acknowledgement Recognit. 33 (2) (2000) 225–236.
[30] I. Kim, D. Jung, R. Park, Document image binarization based on topographic
analysis using a water flow model, Pattern Recognit. 35 (1) (2002) 265–277.
This work was supported by National Natural Science Foundation [31] M. Wagdy, I. Faye, D. Rohaya, Document image binarization using retinex and
of China (Grant nos. 61601392, 61671399), Research Fund for the global thresholding, ELCVIA Electronic Letters on Computer Vision and Image
Doctoral Program of Higher Education (20130121120045) and by the Analysis 14 (1).
[32] N.R. Howe, A laplacian energy for document binarization, in: Proceedings of IEEE
Fundamental Research Funds for the Central Universities (Grant no. International Conference on Document Analysis and Recognition (ICDAR), 2011,
20720150110). pp. 6–10.
280