Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ISSN 1549-3636
2007 Science Publications
Abstract: Secure buildings are currently protected from unauthorized access by a variety of devices.
Even though there are many kinds of devices to guarantee the system safety such as PIN pads, keys
both conventional and electronic, identity cards, cryptographic and dual control procedures, the people
voice can also be used. The ability to verify the identity of a speaker by analyzing speech, or speaker
verification, is an attractive and relatively unobtrusive means of providing security for admission into
an important or secured place. An individuals voice cannot be stolen, lost, forgotten, guessed, or
impersonated with accuracy. Due to these advantages, this paper describes design and prototyping a
voice-based door access control system for building security. In the proposed system, the access may
be authorized simply by means of an enrolled user speaking into a microphone attached to the system.
The proposed system then will decide whether to accept or reject the users identity claim or possibly
to report insufficient confidence and request additional input before making the decision. Furthermore,
intelligent system approach is used to develop authorized person models based on theirs voice.
Particularly Adaptive-Network-based Fuzzy Inference Systems is used in the proposed system to
identify the authorized and unauthorized people. Experimental result confirms the effectiveness of the
proposed intelligent voice-based door access control system based on the false acceptance rate and
false rejection rate.
proposed system features are extracted from the person and closing. The electromagnetic lock works on 12
voice data and then an Adaptive-Network-based Fuzzy volts DC power supply and it is set in normally close
Inference Systems (ANFIS) is used to develop models (NC) condition. Therefore, without command signal
of the authorized persons based on the feature extracted from the verification system, the lock is always
from the authorized person voices. switched on and the door remains closed. In the case a
First, the prototype of the door access control is person is verified by the proposed voice-based
described. Next, the speaker verification process used verification as an authorized user, the access is granted.
in the proposed system is discussed in detail. Finally, The parallel port sends a signal to the electromagnetic
the performance of the proposed intelligent voice-based lock driver, which is shown in Fig. 3, so that the
door access control is evaluated experimentally for door electromagnetic lock is demagnetized. As a result the
control access in Intelligent Mechatronics System door can be opened by that authorized person for a
Laboratory, Faculty of Engineering, International certain period of time.
Islamic University Malaysia.
Intelligent Voice-based
Door Access Control
Microphone
Magnetic lock
Voice-based Verification
A/D Port Driver
System
the rate of access control to accept the unauthorized is required to enter the claimed identity and his/her
person is called to as false acceptance rate (FAR). voice. Furthermore, the entered voice is processed and
Mathematically, both rates are expressed as percentage compared with the claimed person model to verify
using the following simple calculations[5]: his/her claim. In this phase, there is a decision process
NFR in which the system decides whether the feature
FRR = x100% (1) extracted from the given voice matches with the model
NFA
NFA of the claimed person. In order to give a definite answer
FAR = x100% (2) of access acceptance or rejection, a threshold is set.
NIA
NFR and NFA are the numbers of false rejections and When degree of similarity between a given voice and
false acceptance respectively, while NAA and NIA are the model is greater then threshold, the system will
the number of the authorized person attempts and the accept the access, otherwise the system will reject the
numbers of impostor person attempts. For achieving person to access the building/room.
high security of the door access control system, it is
expected that the proposed system will have both low Voice Claimed Identity
FRR and low FAR.
method the spectrum is warped according to Bark scale. Equal-loudness pre-emphasis: Firstly, an equal-
The PLP used an all-pole model to smooth the modified loudness curve is constructed. An approximation of this
power spectrum. The output cepstral coefficients are curve for the frequency up to 5 kHz is
then computed based on this model[6].
4 ( 2 + 56.8 x10 6 )
E( ) = (6)
( + 6.3 x10 2 )2 ( 2 + 0.38 x10 9 )
2
where B is the number of the sample. The sampling Fig. 6. A rule set of first order Sugeno fuzzy system is
intervals are chosen so that when the critical bands are the following form:
added together it equally represents the frequency scale.
Rule i: If x is Ai and y is Bi then fi = pix+qiy+ri.
277
J. Computer Sci., 3 (5): 274-280, 2007
In the perspective of artificial neural network, it is a training data. A hybrid learning algorithm is a popular
feedfoward network consisting of 5 layers. Every node i learning algorithm used to train the ANFIS for this
in the first layer is an adaptive node with the following purpose. In summary, the steps of building person
node function model based on the voice data using ANFIS are as
follows:
* Voice data collection and feature extraction of the
voice data.
* Determining the premise parameters.
* Training of the ANFIS using the input pattern and
desired output to obtain the consequent parameters.
* Validation of the trained ANFIS using training
data.
Fig. 6: ANFIS architecture[8] RESULTS
A ( x ), i = 1,2 Experimental setup: In order to evaluate the
O1,i = i (10)
Bi 2 ( y ), i = 3,3 effectiveness of the proposed intelligent voice-based
where x (or y) is the input node i and Ai (or Bi-2) is a door access control, the proposed system is installed at
linguistic label associated with this node. In other Intelligent Mechatronics System Laboratory, Faculty of
words, O1,i is the membership degree of a fuzzy set A Engineering, International Islamic University Malaysia.
(or B) to which the input x (or y) is quantified. The Voices of nine (9) speakers from YOHO database are
membership function for A (or B) can be Gaussian used in the experiment. Three (3) speakers are
function, triangle membership function and others. The considered as the authorized person to access the
parameters of the membership function used in this laboratory and the other six (6) speakers are assumed as
layer are termed as premise parameters. outside impostors. Each speaker, who is assumed as
Second layer combines the output of the first layer authorized person, has to say word seven for 70 times
so that it has the following output: where 20 voice data are used as training data and the
O2 ,i = wi = A ( x ) B ( y )
i i
(11) other 50 voice data are used as testing data. This means
Here each output represents the firing strength of a rule. the text-dependent speaker verification system is used
Next layer, which is third layer, normalizes the output in the proposed system.
of the previous layer as follows; The example of raw voice signal of the word
wi
seven for an authorized person is shown in Fig. 7. To
O3 ,i = wi = , i = 1,2 (12) obtain the PLP coefficients, the 17 critical-band filters
w1 + w2
are used, which covers a 17 Bark frequency range.
In the fourth layer, the following output is calculated These filters are simulated by integrating the FFT
based on the third outputs: spectrum of 20-ms Hamming-windowed speech
O4 ,i = wi f i = wi ( pi x + qi y + ri ) (13) segments in which the frame rate is 10-ms. Figure 8
where f is function which is used in the first order shows the 13 PLP coefficients extracted from the voice
Sugeno type fuzzy system. Parameters in this node (pi, signal shown in Fig. 7.
qi and ri) are referred as consequent parameters. Finally,
the final output of the ANFIS is the last layer output 0.50
and it is given as
O5 ,i = z = wi f i (14) 0.25
Magnitude
classification rate. Moreover, the effectiveness of the radius causes a longer training time. This is due to the
proposed system in testing (operational) phase is fact that a smaller cluster radius will usually yield more,
evaluated based upon FRR and FAR. The FAR is smaller clusters in the data and hence more rules. A
0.04 more rules of ANFIS system result in a larger number
0.03 of consequent parameters. As consequent, a longer time
is needed in training process to optimize the
Magnitude
0.02
parameters. Hence, it can be concluded a larger radius
0.01 are preferable to shorten the training time.
0.00 Table 2: Testing performances of ANFIS 1, Radius of 0.25
Authorized FRR (%) FAR (%)
-0.01 Person
1 2 3 4 5 6 7 8 9 10 11 12 13 Close Set Open Set
Coefficient Number Person 1 16 5 6
Person 2 20 9 4
Fig. 8: Extracted 13 PLP coefficients Person 3 14 11 7
Overall 16 8.3 5.7
Table 1: Training performances
Model Authorized Training Time Classification Table 2: Testing performances of ANFIS 2, Radius of 0.50
Person (sec) Rate (%) Authorized FRR (%) FAR (%)
Person 1 51 100% Person Close Set Open Set
ANFIS 1 Person 2 45 100% Person 1 18 5 6
(r = 0.25) Person 3 46 100% Person 2 10 7 6
Average 47 100% Person 3 14 8 7
Person 1 25.8 100% Overall 14 6.7 6.3
ANFIS 2 Person 2 28.9 100%
(r = 0.50) Person 3 27.8 100% Table 3: Testing performances of ANFIS 3, Radius of 1.00
Average 27.8 100%
Authorized FRR (%) FAR (%)
Person 1 0.28 100%
Person Close Set Open Set
ANFIS 3 Person 2 0.27 100%
(r = 1.00) Person 3 0.41 100% Person 1 36 2 4
Average 0.32 100% Person 2 6 14 3
Person 3 16 11 10
Overall 19.3 9 5.6
calculated based on the close set and open set. In the
close test, the voice of an authorized person makes up
the disguised voice to the other authorized person. On Testing of the ANFIS-based speaker models: Tables
the other hand, test on the two students who are 2-4 show the performances of the ANFIS models when
regarded as outside impostors constitute open set test. testing voice data is used. In term of both FAR and
FRR, ANFIS 2 produces a better performance than the
Training of the ANFIS-based speaker models: The other models. Hence it can be concluded that ANFIS 2
ANFIS-based speaker model is developed using Fuzzy is the best candidate as voice-based model in the
Logic Toolbox of MATLAB. In order to allow the proposed system. From the security point of view,
ANFIS learn from the input-output data available so ANFIS 2 is the best model for protecting the laboratory
that the consequent parameters are obtained, firstly the from unauthorized person (impostors) since it gives the
structure of the ANFIS has to be designed. Design of lowest FAR. The overall FAR of the ANFIS 2 is smaller
than 10%, which is good enough for common security
the ANFIS structure is done by determining premise
system. In the case high level of security is needed,
parameters. Here the subtractive clustering method is
further improvement has to be done so that the
used with different radius parameters. Once the premise proposed system produces a small FAR, which is
parameters are obtained, the ANFIS model is trained by smaller than 1 %.
using hybrid learning algorithm for 10 iterations. However although the FRR of the ANFIS 2 is also
Table 1 shows the training time and the the smallest, its FRR is larger than 10 %. Although it
classification rate for all of the ANFIS-based speaker does not influence the level of security, a quite large
models for different subtractive clustering parameters. value of FRR makes the access control system
As shown in the table, all of the speaker models give inconvenient for the authorized person. Further
perfect classification rates. There are no errors in improvement needs to be done to improve the level of
identifying the authorized persons based on the voice usability of the ANFIS-based model for access control
data used in training phase. However, the training time system.
is significantly different for different radius. A smaller
279
J. Computer Sci., 3 (5): 274-280, 2007
CONCLUSION
3. Osadciw, L., P. Varshney and K. Veeramachaneni,
This study has documented development of 2002. Improving Personal Identification Accuracy
intelligent voice-based door access control for building Using Multisensor Fusion for Building Access
security. The proposed system adopted Perceptual Control Application. In Proceedings the Fifth Intl.
Linear Prediction (PLP) coefficients as the feature of Conf. Information Fusion, pp: 1176- 1183.
the person voice and used Adaptive-Network-based 4. Anonymous, 2004. Door-access-control System
Fuzzy Inference Systems (ANFIS) to develop Based on Finger-vein Authentication. Hitachi
authorized person models based on their voices. Review. Available online at
Experimental results showed that the proposed system http://www.hitachi.com/rev/archive/2004.
produced a good security performance, especially it 5. Campbell, J.P., 1997. Speaker Recognition: a
gave a good false rejection rate (FRR) and a good false Tutorial. In Proc. IEEE., pp: 1437-1462.
acceptance rate (FAR) of the close set condition. 6. Hermansky, H., 1990. Perceptual linear predictive
However, further study has to be done to improve its (PLP) analysis for speech. J. Acoustics Society
FRR. American, pp: 1738-1752.
7. Jang, J.S., 1993. Adaptive-network-based fuzzy
REFERENCES inference system. IEEE Trans. on System, Man and
Cybernetics, pp: 665-685.
1. Kung, S.Y., M.W. Mak and S.H. Lin, 2004. 8. Jang, J.S.R., C. T. Sun and E. Mizutani, 1997.
Biometric Authentication: Machine Learning Neuro-Fuzzy and Soft Computing, Prentice Hall,
Approach. Prentice Hall. Upper Saddle River, NJ, USA.
2. Zhang, D.D., 2000. Automated Biometrics:
Technologies and Systems. Kluwer Academic
Publisher.
280