Sei sulla pagina 1di 26

Chapter 1: INTRODUCTION

1.1 Problem Definition

Expression of feelings through facial emotions was an object of interest since the time
of Aristotle. This topic grew only after 1960, when a list of universal emotions was
established and several parametrized systems were proposed. Facilitated by advances
in Machine Learning and Computer Vision, the idea of building automated recognition
systems has received a lot of attention within the Computer Science area.

Emotions are transmitted in proportion of 55% through facial expressions.


That means that if the computer could capture and understand the emotions of its "user",
communication would be more natural and appropriate, especially if we think of
scenarios where a computer would play the role of a tutor. Developing such Facial
Expression Recognition system (also referred to as a FER system) is not trivial task,
due to the high variability of data. Images are represented under various conditions such
as resolution, quality, illumination or size. All these constraints have to be taken into
consideration for selecting appropriate methods, in order to deliver a system that is
robust, person independent and that ideally works in real time scenarios. Facial
recognition system is the next big thing as people prefer signing their names or scanning
faces than getting their iris or retina scans or giving their blood drops for DNA. Face
recognition is one of the few biometric methods that possess the merits of high
accuracy. The information age is quickly revolutionizing the way transactions are
completed like access codes for buildings, bank accounts, and computer systems often
use PIN’s for identification and security clearance. Hence FER, has become an
important issue for in many applications such as security systems, criminal
identifications, credit cards. Although it is clear, humans are good at faces, but FER
have come a long way to prove its worth in today’s technology. Understanding the
human facial expressions and the study of expressions has many aspects, from computer
analysis, emotion recognition, lie detectors, airport security, nonverbal communication
and even the role of expressions in art.

Improving the skills of reading expressions is an important step towards successful


relations. This project has acceptable performance to recognize faces within intended

Page 1 of 26
limits. System is also capable of detecting and recognizing multiple faces in live
acquired images.

1.2 Project Overview


This paper proposes a system capable of performing automatic recognition of six
emotions, considered to be universal across cultures: happiness, anger, sadness, Calm
and Yawning. Such system would analyze the image of a face and produce a
calculated prediction of the expression. The approach integrates a module for
automatic face detection. Given the cropped face, the system extracts discriminant
features with a method called Local Binary Patterns Histograms (LBPH) and
MobileNet Architecture (using for CNN learning). chosen for its robustness to
illumination changes and speed of computation. Lastly, the solution performs
expression classification by incorporating the widely used Machine Learning models,
which are trained on a standard dataset of images.

Acquire Face Face Person Facial


Image Detection Recognition Identity Expression
Anaysis

Figure 1 Steps of Facial Emotion Recognizer System Applications

1.3 Hardware Specifications

 Processor: 1GHz

 RAM: 512 MB

 HDD: 500 MB

 Webcam: Embedded or extended

1.4 Software Requirements

 Python 3.6

 OpenCV Module

 Haar Cascades
Page 2 of 26
 Tensorflow

1.3.1 Processor Speed

The minimum processor speed required to run a python script over a system is Intel
Atom® processor or Intel® Core™ i3 processor . The project will be able to flour in
such an environment.

1.3.2 Hard Disk

A system with a bare minimum of 500 mbs can run the script , but Intel Distribution
for Python recommends atleast 1024 mbs or 1 gb . The project can be used to train new
images for which an extension to the dataset might be needed so the extra space is the
hard disk will be beneficial.

1.3.3 Camera

The project inputs computer vision through a camera. This Camera can be embedded
within the system or can be extended through a mobile device or a usb input web-
camera. The camera provides the basic unit of the project.

Page 3 of 26
Chapter 2: LITERATURE SURVEY

2.1 Existing System

Many mobile deep learning tasks are actually performed in the cloud. When we want to
classify an image, that image is sent to a web service, it is classified on a remote server,
and the result is sent back to our device.

2.2 Proposed System

The computational portable devices on our phones is increasing rapidly, and the network
complexity required for computer vision is shrinking (thanks to architectures like
MobileNet).

This project is intended to be lightweight and be able to recognise a person’s facial


expressions from live video-streams through an embedded or an extended webcam.
Since, this project makes use of Tensorflow, OpenCV, etc. therefore use of Python 3.6
is best suited. The project requires a minimalistic user interface; hence, Tk GUI toolkit
is used.

2.3 Feasibility Study

2.3.1 Technical Feasibility

It study is the complete study of the project in terms of input, processes, output, fields,
programs and procedures.

2.3.1.1 Project Technique

Deep Learning by training MobileNet architecture (a variant of Convolutional Neural


Networks) for generating a graph.

2.3.1.2 Project Requirement

The proposed system would be able to recognize facial expressions. In order to do so it


makes use of the following:

Page 4 of 26
2.3.1.2.1 Python

Python was the first client language supported by TensorFlow and currently supports
the most features. Python is one of those rare languages that can claim to be both
simple and powerful.

The official introduction to Python is:

Python is an easy to learn, powerful programming language. It has efficient high-


level data structures and a simple but effective approach to object-oriented
programming. Python’s elegant syntax and dynamic typing, together with its
interpreted nature, make it an ideal language for scripting and rapid application
development in many areas on most platforms.

2.3.1.2.2 PyCharm

PyCharm is an integrated development environment (IDE) used in computer


programming, specifically for the Python language. It is developed by the Czech
company JetBrains. It provides code analysis, a graphical debugger, an integrated unit
tester, integration with version control systems (VCSes), and supports web
development with Django.

PyCharm is cross-platform, with Windows, macOS and Linux versions. The


Community Edition is released under the Apache License, and there is Professional
Edition with extra features, released under a proprietary license.

2.3.1.2.3 OpenCV

OpenCV (Open source computer vision) is a library of programming functions mainly


aimed at real-time computer vision. Originally developed by Intel, it was later
supported by Willow Garage then Itseez (which was later acquired by Intel). The
library is cross-platform and free for use under the open-source BSD license.

OpenCV supports the deep learning frameworks TensorFlow, Torch etc.

2.3.1.2.4 Haar-like Features

A simple rectangular Haar-like feature can be defined as the difference of the sum of
pixels of areas inside the rectangle, which can be at any position and scale within the
original image. This modified feature set is called 2-rectangle feature. Viola and Jones
Page 5 of 26
also defined 3-rectangle features and 4-rectangle features. The values indicate certain
characteristics of a particular area of the image. Each feature type can indicate the
existence (or absence) of certain characteristics in the image, such as edges or changes
in texture. For example, a 2-rectangle feature can indicate where the border lies between
a dark region and a light region.

OpenCV already contains many pre-trained classifiers for face, eyes, smile etc. Those
XML files are stored in opencv/data/haarcascades/ folder.

2.3.1.2.5 MobileNet Architecture

MobileNets are based on a streamlined architecture that uses depthwise separable


convolutions to build lightweight deep neural networks.

MobileNet is more suitable for mobile and embedded based vision applications where
there is lack of compute power. Google proposed this architecture. The MobileNet
model is based on depthwise separable convolutions which is a form of factorized
convolutions which factorize a standard convolution into a depthwise convolution and
a 1×1 convolution called a pointwise convolution

A few things about MobileNets:

1. They are insanely small

2. They are insanely fast

3. They are remarkably accurate

4. They are easy to tune for resources vs. accuracy

2.3.1.2.6 TensorFlow

TensorFlow is an open source library for fast numerical computing.

Google created and maintains it and released under the Apache 2.0 open source license.
The API is nominally for the Python programming language, although there is access
to the underlying C++ API.

2.3.1.2.7 Tkinter

Tkinter is a Python binding to the Tk GUI toolkit. It is the standard Python interface to
the Tk GUI toolkit, and is Python’s de facto standard GUI. Tkinter is included with
Page 6 of 26
standard Linux, Microsoft Windows and Mac OS X installs of Python. The name
Tkinter comes from Tk interface.

Tkinter is free software released under a Python license.

2.3.1.2.8 Dataset: The Karolinska Directed Emotional Faces (KDEF)

A dataset of images of various people emoting: disgust, sadness, happiness, fear,


anger, surprise and calmness.

2.3.2 Legal Feasibility

Every tool used to make this project is either open source or is for non-commercial
research purposes only like ‘The Karolinska Directed Emotional Faces’ (KDEF)

2.3.3 Operational Feasibility

The proposed system would recognize the expressions with at-least 70% confidence.

Since, we will train a MobileNet on the KDEF dataset, so our software will be
lightweight and faster than the software using ‘traditional’ neural networks.

The challenge would be to find Accuracy. MobileNets are not usually as accurate as
the bigger, more resource-intensive networks. However, finding that resource/accuracy
trade-off is the strong suite of MobileNets.

2.3.4 Time Feasibility

The project would take 4-5 months to achieve a satisfactory working state. Since, we
have yet to gain knowledge about neural networks, deep learning, TensorFlow etc.
the mandatory deadline i.e. the duration of the current semester is reasonable.

2.3.5 Financial Feasibility

We will develop this project for research purposes on our personal computers; hence,
there would not be any incurring financial expenses.

Page 7 of 26
Chapter 3: SYSTEM ANALYSIS AND DESIGN

3.1 Requirement Specification

3.1.1 Functional and non-functional requirements

Functional requirements are the one that the system needs to deliver and non-functional
requirements take into account constraints and how the system should behave.

Functional requirements Non-functional requirements

The system should classify an image into one The system should be implemented in
of 6 emotions. Python.

The system should include an automatic face The system should be lightweight and
detection algorithm. portable.

The system should include techniques for The system should be quick and responsive.
extraction of meaningful facial features.

The system should deliver a trained classifier.

The system should take test images from the


video-stream through the webcam.

The system’s GUI should be simple and clear.

Page 8 of 26
3.2 Flowcharts And DFD

Figure 2 : Flowchart for Emotional Analysis System

Figure 3 : Data Flow Diagram for Distinct Emotion Recogniser


Page 9 of 26
Figure 4 : Enitity Relation Diagram of The Smart Human Scanner

The Dataflow in the above diagram explicitly defines how an image is extracted from
a database or a live feed to compute the expression of the user. The Facial Expression
Recognition System functions on facial detection , Feature Extraction ( Haar Cascade)
and then classifies the emotion of the user

Page 10 of 26
REAL TIME GRAPHICAL INPUT

USER SYSTEM

ANALYSED FACIAL
EXPRESSION

Figure 5: DFD LEVEL 0 DIAGRAM OF THE PROPOSED SYSTEM

Using Haar Cascades


USER FACIAL
DETECTION

MobileNet Trained CNN

FACIAL EXPRESSION
RECOGNITION

Figure 5.1: DFD LEVEL 1 DIAGRAM OF THE PROPOSED SYSTEM

Page 11 of 26
3.3 Design and Test Steps

The project for our facial expression recognition system is built up on a set of basic
steps. First and foremost we have the Facial Detection which is the most fundamental
step in the process. We do this by the help of Computer Vision or OpenCV, it is used
to detect Haar features in the images specified in the dataset directory. The Haar
Classifier is used to extract the facial features which is then used to detect the face

Figure 6 :Method of applying the rectangle features

on the 24x24 pixels image of the face.

Haar-like features

Once the program is able to detect faces in the feed or from images we are now ready
to move on to the most integral part of the program which is the Emotion Classification.
We achieve Emotion Analysis with the help of Deep learning (Neural Networks) and
OpenCV. We train a Convolutional Neural Network for the same . Once the dataset is
filled with different directories of expressions . We are ready to retrain the network with
the next emotion that we want to classify. We make the use of MobileNet architecture
to retrain the network because it is fast and effective and allows the usage of the network
on devices with low computational power.

Page 12 of 26
3.3.1 Neural Network

The standard convolutional layer is parameterized by convolution kernel K of size


DK ×DK ×M×N
where DK is the spatial dimension of the kernel assumed to be square and
M is number of input channels and N is the number of output channels

𝐺(𝑘, 𝑙, 𝑛) = ∑ (𝐾(𝑖, 𝑗, 𝑚, 𝑛). F(k + i − 1, l + j − 1, m))


𝑖,𝑗,𝑚

Standard convolutions have the computational cost of:

DK · DK · M · N · DF · DF

Our Model of MobileNet Architecture gives us an edge over the standard


convolutional netoworks. The mobile Net provides us with the combination of
depthwise convolution and 1 × 1 (pointwise) convolution is called depthwise
separable convolution which was originally introduced.

Depthwise separable convolutions cost:

DK · DK · M · DF · DF + M · N · DF · DF

which is the sum of the depthwise and 1 × 1 pointwise convolutions. By expressing


convolution as a two step process of filtering and combining we get a reduction in
computation of:

DK · DK · M · DF · DF + M · N · DF · DF 1 1
= + 2
DK · DK · M · N · DF · DF 𝑁 𝐷𝐾

MobileNet uses 3 × 3 depthwise separable convolutions which uses between 8 to 9


times less computation than standard convolutions at only a small reduction in
accuracy

Page 13 of 26
Figure 7: The standard convolutional filters in (a) are replaced
by two layers: depthwise convolution in (b) and pointwise convolution in (c) to build a
depthwise separable filter

Page 14 of 26
3.3.2TEST CASES

TEST TEST DESCRIPTION EXPECTED ACTUAL STATUS COMMENTS


ID CASE
RESULTS RESULT
CASE
NAME

TC001 Facial The frontal face is Proper Face Passed


Detection detected recognition of recognised
the frontal face in middle to
high lit
environment

TC002 Dataset Apart from the Cleaning the Features Passed


Extention KDEF dataset downloaded Segmented
extending it by images ( with
adding Feature medium
personalised Segmentation) accuracy
image

TC003 Expression Using the Proper Frames Passed Frame Rate


mobilenet cnn Analysis of the dropped needs to be
Recognition
determining the emotion with improved .
accuracy and through the medium Perhaps using a
conditons OpenCV accuracy different
window architecture

TC004 Expression Using the system Intermediate Not Tested Next


Recognition to recognise Accuracy in Development
using expression to be recognition
Stage
mobile used on a mobile with low to
device device medium lag

TC005 Using a MobileNet is a Analysing Not Tested


different fairly accurate and other CNN for
algorithm fast architecture to the retraining
for Neural analyse expression of the network
Network but we’ll choose
Training alternatives too

Page 15 of 26
4.RESULT AND OUTPUTS

Figure 8: Code for recognition

Figure 9: Output of facial recognition.

Page 16 of 26
Dataset used

The training dataset that we have used for facial recognition is a self created dataset
which is extracted when the OpenCV window opens and captures multiple photos of
the example specimen which is later used in the recognizer script to identify the person.

Figure 11: This dataset is the images of users 1,2 and so on.

Page 17 of 26
4. Face emotion analysis

4.1 Creating a dataset

The dataset for our next activity that is the facial expression recognition consists of
various expressions being fed to the computer to identify a smile on a face as happy.
For this we have downloaded a Yale human faces dataset which features clean and
distinct photos of happy human faces and classifies them as a group (Supervised
Learning). For the activity we create several directories which features hundreds of
images of different expressions.

Images

5%
5%
Angry
31%
Calm
26% Happy
Sad
Yawning
33%

Figure 10: Pie chart explaining % of emotions

4.2 Cleaning the dataset

The images that were downloaded in raw form from google images website need to be
cropped (Frontal Face Segmentation) . For that we have a face crop segmentation that
runs on the specific directories to process the images to train the neural network.

Page 18 of 26
4.3 Retraining the Network

Once we have only cleaned images, we are ready to retrain the network. For this
purpose Mobile net Model is being used which is quite fast and accurate to modify the
weights and inputs to the dataset and train the network. Mobile Nets are a new family
of convolutional neural networks. Google open-sourced the Mobile Net architecture and
released 16 ImageNet checkpoints, each corresponding to a different parameter
configuration. This gives us an excellent starting point for training our own classifiers
that are insanely small and insanely fast.

The main difference between the Mobile Net architecture and a “traditional” CNN’s is
instead of a single 3x3 convolution layer followed by batch norm and ReLU, Mobile
Nets split the convolution into a 3x3 depthwise conv and a 1x1 pointwise conv.

4.4 Importing retrained model and setting everything up

Once the Network has been trained the model needs to be imported to the script which
then runs and OpenCV window and refers to the model to give the output as to what
expression is on the face which is being input through the OpenCV window. This
completes the expression analysis phase and the work is done.

#TRAINING MODEL SCREENSHOT

Figure 11: Screenshot of output showing happy emotion

Page 19 of 26
Figure12 Screenshot of output showing angry emotion

Figure 13: Screenshot of output showing yawning emotion

Page 20 of 26
Figure 14: Screenshot of output showing calm emotion

4.4 Graphical User Interface (GUI)

It is a user interface that includes graphical elements, such


as windows, icons and buttons. The GUI has been implemented using
Tkinter. There are various modules for GUI in Python like PyQt, Tkinter,
Kivy, Pyforms, PyGObject, PyGUI etc. We have created a GUI which
makes it easy for user to access our system and facilitates efficiency and
smoothness to our project.

Page 21 of 26
Figure 15:GUI of the project Smart Human Scanner

Page 22 of 26
Chapter 4: CONCLUSION

4.1 Chapter overview

This final chapter summaries the achievements of the project as well as the challenges
that were faced during its development. It also provides an outline of the possible
improvements and their applicability.

4.2 Project Achievement

The proposed solution delivers a recognizer system for facial expressions. The most
important achievement consists in the integrated functionalities and the obtained
results. The system includes an automatic face detection mechanism and implements
feature extraction techniques, tailored for the current problem. A model is trained on
examples of faces. This successfully expands the system with the capability of
classifying all five emotions, ultimately achieving an accuracy of 86%. The
functionality can be easily accessed by the user, which provides the ability of uploading
an image and requesting a classification.

4.3 Challenges

Prior to implementing the system, one of the first challenges of the project was choosing
the algorithms for each individual module, because the selection had to consider the
following: integration of techniques, restriction to the time allowed for the project
development, speed of computation and ultimately, achieving a good system
performance. Another challenge was to gain and increase the accuracy of the models
during facial emotion recognition.

4.4 Concluding remarks

Finally, this report demonstrates the achievements of the project, but also presents an
assessment of the performance and reliability. Overall, the proposed solution has
delivered a system capable of classifying the six basic universal emotions, with an
average accuracy of 86%. It extensively makes use of Image Processing and Machine
Learning techniques, to evaluate still images and derive suitable features, such that,
presented with a new example, it would be able to recognize the expressed emotion.
Personally, this project had a significant contribution for improving our knowledge of

Page 23 of 26
Computer Vision and Machine Learning methodologies and for understanding the
challenges and limitations of image interpretation.

To conclude with, I believe that the current solution succeeded to meet the project's
requirements and its deliverables. Even though it has a series of limitations, it allows
for further extensions, which would enable a more in-depth analysis and understanding
of human behavior, through facial emotions.

Page 24 of 26
REFERENCES:

WEBSITES:

1. https://opencv.org/

2. https://www.tensorflow.org/

3. https://www.python.org/

4. https://docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html

5. https://www.pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-
learning-image-dataset/

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J.


Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous
systems, 2015. Software available from tensorflow. org, 1, 2015. 4

[2] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural


networks with the hashing trick. CoRR, abs/1504.04788, 2015. 2

[3] F. Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv
preprint arXiv:1610.02357v2, 2016. 1

[4] M. Courbariaux, J.-P. David, and Y. Bengio. Training deep neural networks with low
precision multiplications. arXiv preprint arXiv:1412.7024, 2014. 2

[5] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network
with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2, 2015.
2

[6] J. Hays and A. Efros. IM2GPS: estimating geographic information from a single image.
In Proceedings of the IEEE International Conference on Computer Vision and Pattern
Recognition, 2008. 7

[7] J. Hays and A. Efros. Large-Scale Image Geolocalization. In J. Choi and G. Friedland,
editors, Multimodal Location Estimation of Videos and Images. Springer, 2014. 6, 7

Page 25 of 26
PERSONAL DETAILS:

1. ANOUSHKA SHARMA

Enrolment No. - 161B036

Branch - CSE

Ph. No. - 9650010265

Email - anoushka444@gmail.com

2. AYUSH GUPTA

Enrolment No. -161B059


Branch - CSE

Ph. No. - 8959933198

Email - ayush.gupta398@gmail.com

3. CHAITANYA

Enrolment No. - 161B067

Branch - CSE

Ph. No. - 9755677266

Email - chaitanya.2298@gmail.com

Page 26 of 26

Potrebbero piacerti anche