Sei sulla pagina 1di 6

IMPROVEMENTS ON HANDWRITTEN DIGIT RECOGNITION BY COOPERATION OF MODULAR NEURAL NETWORKS

Claudio A Perez

Patricio A Galdames

Carlos A Holzmann

Department of Electrical Engineering, University of Chile. AV.Tupper 2007, Santiago, Chile

ABSTRACT

In this paper modular neuml networks are used to improve handwritten digit recognition. To evaluate the performance of modular networks, a comparison is made with a global neural network, on the same database. Two basic kind of modular

networks

networks are used. Five of them are provided for digits 0, 1, 2, 5, 6,7. The other two modular networks are for the pair of digits 3-8 and 4-9 respectively. The second kind of modular neural network considers an expert module for each feature extracted eom the handwritten digit image. The coopemtion is among modules extractingslope and radial projection fiom each digit. Two type of cooperation among modular networks are considered: neural network and weighted combination of the modules outputs. The models were trained with a set of 1.837handwitten digits, tested on a diEerent set of 918 digits where the best weight set was selected for each neural network and finaly results were validated on a mfferent set of 9 19 digits. Results show that by using modular network for features, it is possible to improve classification performance on handwritten digits, fiom 91.0% in the case of global networks to 93.5% of modular networks.

are considered. In the first one, seven expert modular

1. INTRODUCTION

Specialization is found in the nervous system as modules dedicated to process specific fundions such as vision, touch, hearing, etc. [1,2]. The notion of speciali2ation or modularity was implemented in expert systems for decision making in the past decade. In these applications problems were decomposed in subproblems and then solved by specialist modules [3,4]. This notion was also implemented in neural networks to create expert modules in subproblems. It has been stated that global neural networks have disadvantages compared to modular networks in plasticity, in difficultiesto learn heterogenictasks and in wg time when hrge networks are needed [5]. Since the early ~O'S,the combination of multiple classifiers have been proposed as a new direction for the development of character recognition system [5,6]. Several forms of Cooperation between modules, such as vow, Bayes and confusion matrices were considered in [7,8]. Jacobs et.al. [9], considered a model composed of modular and

0-7803-4778-1/98 $10.00 8 1998 IEEE

gatmg networks to combine its outputs. For each input the gating network determines stochasticaly the appropriate expert module

weighted combination of

the outputs of the expert networks. This approach was applied to approximate functions of one variable. A similar cooperation scheme was presented by Tresp [ 11, applied to the handwritten digits. Fuzzy logic has also been applied to the cooperation among

em u21.

to respond [9]. Wem [101 considered a

In the recognition of handwritten digits, Sebire [13] defined a number of experts less or equal to the total number of the total number of classes. The cooperation among experts is performed using perceptrons and by winner takes all technique. In other applications one feature has been extracted as input to each classifier [14,15]. Cao and Ahmadi [16] used principal components in the cooperation ofmultipleexperts.The dimension of the input was reduced si@icantly and results were better than those obtained with a backpropagation network. Error rates were less than 1% for rejection rates of 10-15%. For rejection rates higherthan18%both classifiersyield similarresults.

In the literature, results on handwitten digit recognitionrates vary between 68% [17] and 97.7%[14]. It is not possible to compare different systems only on the basis of the correct classificationrate. Most systems have been tested on different data sets and under Merent conditions [14]. Fin- a method to determine the optimum architecturefor a modular neural network and the best form of cooperation among modules, are among the problems being addressed by the scientific community [8,9,18].

Improvements on the recognition rates of handwritten digits by standard neural networks (My connected, feed-forward, backpropagation) were introduced by augmentmg the training set by shim and magnification [19-211. In [20-221 genetic algorithms were applied to select the appropriate number of hidden units of the network. In th~spaper, modular networks are used to improve handwritten recognition.Two type of cooperation schemes are used:one with a neural network and another with adjustableweights.

4172

2. COOPERATIONAMONG MODULAR NETWORKS

Two type of modular networks are considered. The fist one considersa specialistmodule for each digit.Therefore,the number of modules of the modular network will be equal to the defmed number of classes. The second tVpe of modular network uses modules specialized in one feature extracted ffom the handwritten digit image. One of the features is the slope and it is extracted by a gradient operator. The other feature is radial projection respect to the geometricalcenter.

Experts in Digits An expert module is created for each digit (or subset of digits) and trainedto recognize only that digit (or subset of digits). For pairs of digits 3-8 and 4-9, where most confusions are produced, only one module was defmed for each pair. Module i should have a high output only when digit i appears in the input. Modules for subsets of dlgits have a number of outputs equal to the number of digts in the subset. In Figure 1, a general scheme for expert mcdules per

InFig.1 (a) coopemtion is restricted to the output

digits is shown.

of the modules. In Fig.1 (b), cooperation is extended to the input patterns.

ooperation

image

Mod. N

Mod. I

-- -

Mod. N

Figure 1: (a) Modular network with expert modules per digit and with cooperationamong its outputs. (b) Cooperation is extended to the input pattems.

fiooperatia

output

t

ooperatio

Image

image

b

a Figure 2: (a) Modular network with expert modules per feature and cooperation among its outputs. (b) Cooperation is extended to the input pattems.

41 73

Expert Modules in Features Cooperation schemes for the expert modules in one feature are shown in Figure 2. In Fig.:Z(a) there is cooperation only among the outputs of the modular networks. In Fig.2(b) cooperation is extended to the input pattems.

"

Slope detection Slopes are detected in several directions for each character. The

operation is

digit image with a 3x3 Prewitt gradient operator [23] rotated for the desired directions. F'our directions are considered for the gradient: O", 45@,90@,and 135". The rotated Prewitt operators are shown in Figure 3. The re,mlt of the convolution, C,, between the gradient operator, 4, and the image, A,, is represented m equation (1). For the case of a 3x3 operator, n=m=3.

implemented by convolution of the 23x15 handwritten

nm

k=l

I=1

convolvingthe gradient operator with

the image of a handwitten 5 considering (a) the original image

and (bd)the four rotations.

Figure 4 shows the result of

Radial Projections Relative to de Geometric Center The radlal projection relative to the geometric center of the handwritten digit is obtained by detemlining the radial dlstribution of the digit's pixels respect to its geometric center. First the geometric center of the chiuacter is obtained. A convex region is fomed to include the mass of the charader by segments tangential to its contour as shown in Figure 5. The average vertical and horizontal length of the cortvex region define the geometric center of the character. Second, the radial projection is obtained by inkgratmg the digit's mass in the direction of the segment joining the geometric center with one of the pixels of the perimeter of the 23x15 image. Therefore, the resulting vector with the radial projections has dimensionsof 76 elements.

Figure3: Prewitt operator rotated for (a) O",

(d)145@.

-1 -1 0

CdI

(b) 45", (c)

90"

. PI om a. 0 *a . I "I . .1. U. . .# e
.
PI
om
a.
0
*a
. I
"I
.
.1.
U.
. .#
e
ea
*I
.
."
.I
BS
DS
.I
nu
n

U.

s

I

U##

P

H*

e

".

ft

*

Figure4

the convolution ofthe Prewtt operator for O",45", 90" and 45"

(a) Ongmal mage of a handwtten 5, (b-d)Result of

iI.

m

II

II

Figure 5:

(b) Convexregion cont"g

(a) Original handwritten 3 digitued in an 23x15 image.

the digit.

Types of Cooperation Two types of cooperation are considered. In the first type, the cooperation is performed by a neural network receiving the outputs of-the modular networks. Additionally, the input pattems could also be considered as inputs for this network. In the second type of cooperation, adjustable weights are used to combine the modular networks' outputs. The weights are adjusted iteratively to maximize the overall recognition rate. The cooperation scheme provides 10 outputs (one for each digit) and the maximum is selected as the systemresponse.

3. TRAINING AM) TESTING

Network Dimensions According to previous work [19,21] using global neural networks for the problemofhandwritten digt recogrution,the dimensions of a two hidden layer network are 345xN1xN2xlO. Each network is

by

trained by backpropagation using an augmented training set sluftingthe patterns in the input.

Training, Testing and Validation Sets

A database of 3,674 handwritten digits, obtained form university

students is used for training, testmg and validation. The data base was segmented in three subsets. The training set is composed of 1,837 patterns and it is used for training global and modular

networks. Augmentation of the training set was performed by shfiing the inputs pattems [19,21] which is in part equivalent to

expand artificially the training set to 9,185 pattems. The testing set is formed by 918 pattems, different %omthose of the training set. The testing set is used to adjust the cooperation scheme after the modular networks have been trained individually. The validation set is composed of 919 difkrent patterns and it is used only to

determinethe generalization performance of the network. Figure 6 shows a sample of the handwitten digit database. In (a), 110 dig&used in training and in (b), a set of 110 digts used for

validation.

I

I

(b)

Figure 6 shows a sample of the handwtten dlgit database In

110 digits used m trmg and 111 (b), a set of 110 digits used for validabon

(a),

Weight Adjustment Based on work by Hashem [lo] and Tresp [ll], an algorithm to adjust the weights was developed to maximize the total recowtion rate for a testing database. The performance of the system is measured on the validation set.The algorithm does not

4174

guaranties the optimum weigh set because the method perfoms a random search to find the best set [24]. For K modular networks, the ith output of the classifier exx> is denoted as dx). The cooperation consists of a linear combination of the modules' outputs.

k=l

with

Pi

k and s, (x)

the ith output Of the

10)

vi E A, 1 5 k <. K

And A={1,2 ,

,

Classification system is the set of symbols

iden-

each digit. The weights

should be computed so that

each

component of the output vector is normalized to a maximum

of 1. Therefore the weights are normalized according to equation

(3).

The weights . a,k. are computed using an algorithmsirmlarto

~41.

RecognitionRate

All networkswere trainedat least for 100 epochsand selectingthe set of weights that maximizes the recognition rate of the network in the testing set. Once the weight set was chosen, the network was applied to validation set to detennine the gendization

performance of the network.

To show with some level of confidence that the improvements in recognition rate are not due to local minima, all networks were trained from merent random starting weight sets. Results are presented by the average recognition rate of the ten and the standard deviation. The Student t-test was used to determine if

differences between different models were statistically significant.

A ~0.05was consideredto be statistically sigtuficant.

4. RESULTS

Table 1 shows the results of the classification, for the vahdation set, considering a rejection threshold of 0.5. In the first column,it is shown the average recognition rate in % for the 10 simulations. The second column shows the recognition rate in % for the best trained network. The rows correspond to the following architectures: GNI=global network with an image as input; GNS=globalnetwork with slopes as inputs; GNP=globalnetwork with radial projectioq MNCN=modularnetwork for image, radial projection and slope with cooperation performed by a neural network; MNCW=modularnetwork for image, radial projection and slope with cooperation by weights; MNDN="dular networks for digits and cooperation by a neural network, MNDFmodular networks for digits and cooperation by weights

4175

In the case of MNCN, no s:ignificantimprovement was measured if the input patterns were included in the cooperationin addition to the modular networks. Therefore,the irtcrease in the dimensions of the network for cooperation is not jus.tified. It is observed from Table 1, that WCN with tmperation by weights presents higher recognitionrate thanglobal networks (~~0.001).

Table 2 has the same organization as Table 1. In Table 2, no rejection rate was considered It is obsemed that modular networks for features achieve the best classification performance. The differences in classification rate between global and modular neural networks are highly sigmficant (p<O.OOOl). sigtuficant differences between cooperation with tlie neural network and that obtained by the weights @0.068). ?here were no significant differenceswhen input patkms were included in the cooperation in addition to the outputs from the modular networks. Figure 7 shows the error rate as a fimction of the rejection rate for global networks and for modular networks for features. It is observed that the lowest error rates are obtainedfor the modular network and for the global network with slopes as inputs. The highest mor rates correspond to the case ofthe: global network with radial projection as input.

Table 1: Results of classification perftmmce on the validation set, considering a rejection threshold of '0.5. In the first column, it is shown the average recowition rate in YOfor the 10 simulations. The second column shows the recognibion rate in YOfor the best trained network. The rows correslpond to the following architectures: GNI=global network with an image as input; GNSglobal network with slopes as inputs; GNP=global network with radial projection; MNCN=modularnetwork for image, radial projection and slope with cooperation performed by a neural network; MNCW=modular network for image, mhal projection and slope with cooperation &y weights; bdNDN=modularnetworks

r7Rejection threshold = 0.5

for digits

and cooperation by a neural network; MNDP=modular

networks for digits and cooperation by weights.

- Classificationconecyl

X+ STD

 

90,o k 0,2

GNP

71,9f0,6

MNCN without Image

92,O

k 0

8

WCNwith Image

91,9 k 0

3

93,1402

MNDP

90,4 k 0.5 90,4 *0.9

92,7

92,4

93,5

91,2

91,9

In table 3, the confusion matrix obtained for the modular network for image, radial projection and slope with cooperation by weight adjustmentwhen applied to the validation set. Each row shows the number of confusions of the network for the digit specified on that row. From this table it is possible to identifj the cases with largest number of confusions and therefore to develop strategies to elirmnate the sources of confusion. It is observed that the largest number of confusions are for digits: 3-5,4-9 and 5-6. Rey account for 16 cases of confusion.

L

Rejection Threshold=O

GNI

GNS

GNP MNCN without Image

MNCN with Image MNCW MNDN

MNDP

c.l

5

01

I

m

8%

7%

8%

5%

CT 4%

E

3 3%

2%

1%

0%

ox

Correct

Classification

-

XL-STD

92,3 f0.3 92,7 k 0.4 76,8 k 0.8

0.5

93,7

93,3

f0.5

944*0.2

1

,

91Jf0.4

90,9f 0.7

50%

Rejection Rate 1x1

I

I

Best Result

[”/.I

92,7

93,l

78,l

94,O

94,7

94.9

92,3

92,2

100%

Figure 7. shows the error rate as a function of the rejection rate for global networksand for modular networks for features.

Confusion malxix for the modular network considering

image, radial projection and slope with cooperation by weight

adjustment on the validation set. Column number

number of patterns rejected for each digit. Each row shows the confusionsDer di&.

10 ind~catesthe

Table 3:

5. CONCLUSIONS

In this work two types of cooperation among modular neural networks were presented and results were compared to global neural networks for the problem of handwitten digt recogrution. Results show that by using modular network for features, it is possible to improve classification performance on handwritten digits, kom 91.0% in the case of global networks to 93.5% of modular networks. This improvementwas achievedfor a rejection threshold of 0.5. Besides when no rejection is applied, it is possible to improve recognition rates &om 93.1%, for the case of the global network, to 94.9% for the case of a modular network.

ACKNOWLEDGEMENTS Tlus research has been fimded by FONDECYT, grant no. 1960921 and by the Dept. ofElectnca1Engineering, U. of Chile.

REFERENCES

[l] Fishler MA, Firschein 0. The Brain and the Comuuter”. in Intelligence: The Eye; the Brain and The Comiuter ”; Addison -Wesley, pp.23-58, 1987.

ChurcNand

Computation in Neural Networks”, in The Foundations of Artificial Intelligence: A Sourcebook, Partridge & Wilks eds., CambridgeUniv. Press, pp.337-372, 1990.

Speed

PM,

“Representation

and

High

131 Erman LD, Hayes-Roth F, Lesser VR, Reddy RD, “The Hearsay-II Speech Understanding System: Integrating Knowledge to Resolve Uncertainty“, ACM Computing Surveys 12(2), 1980.

4176

PI

Rich E, Knight K, “Distributed Reasoning Systems”, in Artificial Intelligence,McGraw- Hill, pp.433-446, 1991.

[5] Ronco E, Gawthrop P, “Modular Neural Networks: A

State

University of Glasgow, May 12,22p, 1995.

of

the

Art”, Technical

Report

CSC-95026,

[6] Tumer K, Ghosh J, “Analysis of decision boundaries in linearly combined neural classifiers”, Pattern Recognition,29(2): 341 - 348, February 1996.

[7] Xu L, Krzyzak A,

Suen C, “Methods of Combining

Multiple Classifiers and their Applicationsto handwriting

Recognition” IEEE Transactions on System, Man and

Cybernetics, Vol. 22, No .1,

pp. 418 - 435, May/June

1992.

[8] Ho KT,

Hull J, “Decision Combination in Multiple

Classifiers Systems”, JEEE Trans. on Pattern Anal. and Machine Intelligence, Vol. 16,No.1, pp.66-75, Jan 1994.

[9] Jacobs R, Jordan M y Barto A, “Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks”, Cognition Science, 15:pp. 219 - 250, 1991.

[lo] Hashem S y Schmeiser B, “Improving model accuracy using linear combinations of trained neural networks”. IEEE Transaction on Neural Networks. 6(3), pp 792 -

794,1995.

[111Tresp V, Tanigushi M, “Combining estimators using non- constant weighting functions”, NIPS 7, MlT Press, CambridgeMA, pp. 419426,1995.

[12] Cho S and Kim J, “Combining Multiple Neural Networks by Fuzzy Integral for robust Classification”, IEEE Transaction Systems, Man,and Cybemetics, Vol. 25, pp. 380-384, 1995.

[13] Sebire P, Dorizzi B, “MLP Modular Networks for Multiclass Recognition”, Proceedings ESAN” 93,

Brussels,

Belgium, pp. 111-116, 7-9, April 1993.

[141 Cho, SB, “Neural-Network Classifiers for Recognizing Totally Unconstrained Handwritten Numerals”, IEEE Transactions on Neural Networks, Vo1.8, No. 1, pp.43- 53, 1997.

[15] Knerr S, Personnaz L y Dreyfus G, “Handwritten Digit Recognition by Neural networks with Single - Layer Training ”. IEEE Transaction on Neural Networks. V01.3, No6, pp.962-968, NOV.1992.

[16] Cao J, Ahmadi M, Shridhar M, “A Hierarchical Neural

Network

Recognition”, Pattern Recognition, Vo1.30, No.2, pp.289-294, 1997.

Numeral

for

Architecture

Handwritten

41 77

[17] Lee DS, Srihari SN, and Pawlicki. T, “Experiments with

Neural

Recognition”, in Systems and Signal Processing, R.N.

Madan, N. Viswanadham, R.L. Kashyap (eds.), pp.757- 774, 1991.

Digit

Network

Models

for

Handwritten

[18] Huang Y, Suen C, “A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals”, IEEE Trans. on Pattern Anal.

& Mache Intelligence, Vol. 17, No. 1, pp.90-94, Jan

1995.

[191Perez, CA, Hohann, CA, MorelX, IR,“Optimization of One and Two Hidden Layer Neural Networks Architecture for Handwritten Digit Recognition”, Proceedlings of the 1995 IEEE International Conference on System, Man and Cybernetics, Vancouver, Canada, Oct. 22,-25,pp.2795-2799, 1995.

[20] Perez C,4, Holmann CA, Diaz E, “Genetic Selection of Multilayer Neural Networks Cor Handwritten Digit Recognitionto aid the: Blind, 18thAnnual International Conference EEEEMES, Amsterdam, The Netherlands, Oct. 31-NOV. 3, 3p., 1996.

[21] Perez CA, Holzmann CA, “Improvementson Handwritten Digit ]Recognition by Genetic: Selection of Neural Network Topology and by Augmented Training”, 1997 IEEE International Conference on Systems, Man and Cybernetics, Orlando, USA, Oct.12-15, pp.1487-1491,

1997.

[22] Perez CA, Holi” CA, Diaz E, “Genetic Selection of N”s Hidden Units Improving Handwritten Digit Recognition Aiding the Blind to Read”, Med. & Biol. Eng. & Computing, Abstracts of the World Congress on Medical Physics and Biomedical Engineering, Nice, France, Sept.14-19, V.135, pp.513, 1997.

[23] Jain AK, “Edge Detection”, in l+ndamentals of Digital Image Processing, Prentice Hall, 1989,pp.347-356.

[24] R. Rutenbar, “Simulated Annealing Algorithms: An Overview”, IEEE Circuits and Devices Magazine, pp 19 - 26, January 1989.