Chapter 1: INTRODUCTION: 1.1 Problem Definition

Chapter 1: INTRODUCTION
1.1 Problem Definition
Expression of feelings through facial emotions was an object of interest since the time
of Aristotle. This topic grew only after 1960, when a list of universal emotions was
established and several parametrized systems were proposed. Facilitated by advances
in Machine Learning and Computer Vision, the idea of building automated recognition
systems has received a lot of attention within the Computer Science area.
Emotions are transmitted in proportion of 55% through facial expressions.

That means that if the computer could capture and understand the emotions of its "user",
communication would be more natural and appropriate, especially if we think of
scenarios where a computer would play the role of a tutor. Developing such Facial
Expression Recognition system (also referred to as a FER system) is not trivial task,
due to the high variability of data. Images are represented under various conditions such
as resolution, quality, illumination or size. All these constraints have to be taken into
consideration for selecting appropriate methods, in order to deliver a system that is
robust, person independent and that ideally works in real time scenarios. Facial
recognition system is the next big thing as people prefer signing their names or scanning
faces than getting their iris or retina scans or giving their blood drops for DNA. Face
recognition is one of the few biometric methods that possess the merits of high
accuracy. The information age is quickly revolutionizing the way transactions are
completed like access codes for buildings, bank accounts, and computer systems often
use PIN’s for identification and security clearance. Hence FER, has become an
important issue for in many applications such as security systems, criminal
identifications, credit cards. Although it is clear, humans are good at faces, but FER
have come a long way to prove its worth in today’s technology. Understanding the
human facial expressions and the study of expressions has many aspects, from computer
analysis, emotion recognition, lie detectors, airport security, nonverbal communication
and even the role of expressions in art.
Improving the skills of reading expressions is an important step towards successful

relations. This project has acceptable performance to recognize faces within intended
Page 1 of 26
limits. System is also capable of detecting and recognizing multiple faces in live
acquired images.
1.2 Project Overview

This paper proposes a system capable of performing automatic recognition of six
emotions, considered to be universal across cultures: happiness, anger, sadness, Calm
and Yawning. Such system would analyze the image of a face and produce a
calculated prediction of the expression. The approach integrates a module for
automatic face detection. Given the cropped face, the system extracts discriminant
features with a method called Local Binary Patterns Histograms (LBPH) and
MobileNet Architecture (using for CNN learning). chosen for its robustness to
illumination changes and speed of computation. Lastly, the solution performs
expression classification by incorporating the widely used Machine Learning models,
which are trained on a standard dataset of images.
Acquire Face Face Person Facial

Image Detection Recognition Identity Expression
Anaysis
Figure 1 Steps of Facial Emotion Recognizer System Applications
1.3 Hardware Specifications
 Processor: 1GHz
 RAM: 512 MB
 HDD: 500 MB
 Webcam: Embedded or extended
1.4 Software Requirements
 Python 3.6
 OpenCV Module
 Haar Cascades
Page 2 of 26
 Tensorflow
1.3.1 Processor Speed
The minimum processor speed required to run a python script over a system is Intel
Atom® processor or Intel® Core™ i3 processor . The project will be able to flour in
such an environment.
1.3.2 Hard Disk
A system with a bare minimum of 500 mbs can run the script , but Intel Distribution
for Python recommends atleast 1024 mbs or 1 gb . The project can be used to train new
images for which an extension to the dataset might be needed so the extra space is the
hard disk will be beneficial.
1.3.3 Camera
The project inputs computer vision through a camera. This Camera can be embedded
within the system or can be extended through a mobile device or a usb input web-
camera. The camera provides the basic unit of the project.
Page 3 of 26
Chapter 2: LITERATURE SURVEY
2.1 Existing System
Many mobile deep learning tasks are actually performed in the cloud. When we want to
classify an image, that image is sent to a web service, it is classified on a remote server,
and the result is sent back to our device.
2.2 Proposed System
The computational portable devices on our phones is increasing rapidly, and the network
complexity required for computer vision is shrinking (thanks to architectures like
MobileNet).
This project is intended to be lightweight and be able to recognise a person’s facial

expressions from live video-streams through an embedded or an extended webcam.
Since, this project makes use of Tensorflow, OpenCV, etc. therefore use of Python 3.6
is best suited. The project requires a minimalistic user interface; hence, Tk GUI toolkit
is used.
2.3 Feasibility Study
2.3.1 Technical Feasibility
It study is the complete study of the project in terms of input, processes, output, fields,
programs and procedures.
2.3.1.1 Project Technique
Deep Learning by training MobileNet architecture (a variant of Convolutional Neural

Networks) for generating a graph.
2.3.1.2 Project Requirement
The proposed system would be able to recognize facial expressions. In order to do so it

makes use of the following:
Page 4 of 26
2.3.1.2.1 Python
Python was the first client language supported by TensorFlow and currently supports
the most features. Python is one of those rare languages that can claim to be both
simple and powerful.
The official introduction to Python is:
Python is an easy to learn, powerful programming language. It has efficient high-

level data structures and a simple but effective approach to object-oriented
programming. Python’s elegant syntax and dynamic typing, together with its
interpreted nature, make it an ideal language for scripting and rapid application
development in many areas on most platforms.
2.3.1.2.2 PyCharm
PyCharm is an integrated development environment (IDE) used in computer

programming, specifically for the Python language. It is developed by the Czech
company JetBrains. It provides code analysis, a graphical debugger, an integrated unit
tester, integration with version control systems (VCSes), and supports web
development with Django.
PyCharm is cross-platform, with Windows, macOS and Linux versions. The

Community Edition is released under the Apache License, and there is Professional
Edition with extra features, released under a proprietary license.
2.3.1.2.3 OpenCV
OpenCV (Open source computer vision) is a library of programming functions mainly

aimed at real-time computer vision. Originally developed by Intel, it was later
supported by Willow Garage then Itseez (which was later acquired by Intel). The
library is cross-platform and free for use under the open-source BSD license.
OpenCV supports the deep learning frameworks TensorFlow, Torch etc.
2.3.1.2.4 Haar-like Features
A simple rectangular Haar-like feature can be defined as the difference of the sum of
pixels of areas inside the rectangle, which can be at any position and scale within the
original image. This modified feature set is called 2-rectangle feature. Viola and Jones
Page 5 of 26
also defined 3-rectangle features and 4-rectangle features. The values indicate certain
characteristics of a particular area of the image. Each feature type can indicate the
existence (or absence) of certain characteristics in the image, such as edges or changes
in texture. For example, a 2-rectangle feature can indicate where the border lies between
a dark region and a light region.
OpenCV already contains many pre-trained classifiers for face, eyes, smile etc. Those
XML files are stored in opencv/data/haarcascades/ folder.
2.3.1.2.5 MobileNet Architecture
MobileNets are based on a streamlined architecture that uses depthwise separable

convolutions to build lightweight deep neural networks.
MobileNet is more suitable for mobile and embedded based vision applications where
there is lack of compute power. Google proposed this architecture. The MobileNet
model is based on depthwise separable convolutions which is a form of factorized
convolutions which factorize a standard convolution into a depthwise convolution and
a 1×1 convolution called a pointwise convolution
A few things about MobileNets:
1. They are insanely small
2. They are insanely fast
3. They are remarkably accurate
4. They are easy to tune for resources vs. accuracy
2.3.1.2.6 TensorFlow
TensorFlow is an open source library for fast numerical computing.
Google created and maintains it and released under the Apache 2.0 open source license.
The API is nominally for the Python programming language, although there is access
to the underlying C++ API.
2.3.1.2.7 Tkinter
Tkinter is a Python binding to the Tk GUI toolkit. It is the standard Python interface to
the Tk GUI toolkit, and is Python’s de facto standard GUI. Tkinter is included with
Page 6 of 26
standard Linux, Microsoft Windows and Mac OS X installs of Python. The name
Tkinter comes from Tk interface.
Tkinter is free software released under a Python license.
2.3.1.2.8 Dataset: The Karolinska Directed Emotional Faces (KDEF)
A dataset of images of various people emoting: disgust, sadness, happiness, fear,

anger, surprise and calmness.
2.3.2 Legal Feasibility
Every tool used to make this project is either open source or is for non-commercial
research purposes only like ‘The Karolinska Directed Emotional Faces’ (KDEF)
2.3.3 Operational Feasibility
The proposed system would recognize the expressions with at-least 70% confidence.
Since, we will train a MobileNet on the KDEF dataset, so our software will be
lightweight and faster than the software using ‘traditional’ neural networks.
The challenge would be to find Accuracy. MobileNets are not usually as accurate as
the bigger, more resource-intensive networks. However, finding that resource/accuracy
trade-off is the strong suite of MobileNets.
2.3.4 Time Feasibility
The project would take 4-5 months to achieve a satisfactory working state. Since, we
have yet to gain knowledge about neural networks, deep learning, TensorFlow etc.
the mandatory deadline i.e. the duration of the current semester is reasonable.
2.3.5 Financial Feasibility
We will develop this project for research purposes on our personal computers; hence,
there would not be any incurring financial expenses.
Page 7 of 26
Chapter 3: SYSTEM ANALYSIS AND DESIGN
3.1 Requirement Specification
3.1.1 Functional and non-functional requirements
Functional requirements are the one that the system needs to deliver and non-functional
requirements take into account constraints and how the system should behave.
Functional requirements Non-functional requirements
The system should classify an image into one The system should be implemented in
of 6 emotions. Python.
The system should include an automatic face The system should be lightweight and
detection algorithm. portable.
The system should include techniques for The system should be quick and responsive.
extraction of meaningful facial features.
The system should deliver a trained classifier.
The system should take test images from the

video-stream through the webcam.
The system’s GUI should be simple and clear.
Page 8 of 26
3.2 Flowcharts And DFD
Figure 2 : Flowchart for Emotional Analysis System
Figure 3 : Data Flow Diagram for Distinct Emotion Recogniser

Page 9 of 26
Figure 4 : Enitity Relation Diagram of The Smart Human Scanner
The Dataflow in the above diagram explicitly defines how an image is extracted from
a database or a live feed to compute the expression of the user. The Facial Expression
Recognition System functions on facial detection , Feature Extraction ( Haar Cascade)
and then classifies the emotion of the user
Page 10 of 26
REAL TIME GRAPHICAL INPUT
USER SYSTEM
ANALYSED FACIAL
EXPRESSION
Figure 5: DFD LEVEL 0 DIAGRAM OF THE PROPOSED SYSTEM
Using Haar Cascades

USER FACIAL
DETECTION
MobileNet Trained CNN
FACIAL EXPRESSION
RECOGNITION
Figure 5.1: DFD LEVEL 1 DIAGRAM OF THE PROPOSED SYSTEM
Page 11 of 26
3.3 Design and Test Steps
The project for our facial expression recognition system is built up on a set of basic
steps. First and foremost we have the Facial Detection which is the most fundamental
step in the process. We do this by the help of Computer Vision or OpenCV, it is used
to detect Haar features in the images specified in the dataset directory. The Haar
Classifier is used to extract the facial features which is then used to detect the face
Figure 6 :Method of applying the rectangle features
on the 24x24 pixels image of the face.
Haar-like features
Once the program is able to detect faces in the feed or from images we are now ready
to move on to the most integral part of the program which is the Emotion Classification.
We achieve Emotion Analysis with the help of Deep learning (Neural Networks) and
OpenCV. We train a Convolutional Neural Network for the same . Once the dataset is
filled with different directories of expressions . We are ready to retrain the network with
the next emotion that we want to classify. We make the use of MobileNet architecture
to retrain the network because it is fast and effective and allows the usage of the network
on devices with low computational power.
Page 12 of 26
3.3.1 Neural Network
The standard convolutional layer is parameterized by convolution kernel K of size

DK ×DK ×M×N
where DK is the spatial dimension of the kernel assumed to be square and
M is number of input channels and N is the number of output channels
𝐺(𝑘, 𝑙, 𝑛) = ∑ (𝐾(𝑖, 𝑗, 𝑚, 𝑛). F(k + i − 1, l + j − 1, m))

𝑖,𝑗,𝑚
Standard convolutions have the computational cost of:
DK · DK · M · N · DF · DF
Our Model of MobileNet Architecture gives us an edge over the standard

convolutional netoworks. The mobile Net provides us with the combination of
depthwise convolution and 1 × 1 (pointwise) convolution is called depthwise
separable convolution which was originally introduced.
Depthwise separable convolutions cost:
DK · DK · M · DF · DF + M · N · DF · DF
which is the sum of the depthwise and 1 × 1 pointwise convolutions. By expressing

convolution as a two step process of filtering and combining we get a reduction in
computation of:
DK · DK · M · DF · DF + M · N · DF · DF 1 1
= + 2
DK · DK · M · N · DF · DF 𝑁 𝐷𝐾
MobileNet uses 3 × 3 depthwise separable convolutions which uses between 8 to 9

times less computation than standard convolutions at only a small reduction in
accuracy
Page 13 of 26
Figure 7: The standard convolutional filters in (a) are replaced
by two layers: depthwise convolution in (b) and pointwise convolution in (c) to build a
depthwise separable filter
Page 14 of 26
3.3.2TEST CASES
TEST TEST DESCRIPTION EXPECTED ACTUAL STATUS COMMENTS

ID CASE
RESULTS RESULT
CASE
NAME
TC001 Facial The frontal face is Proper Face Passed

Detection detected recognition of recognised
the frontal face in middle to
high lit
environment
TC002 Dataset Apart from the Cleaning the Features Passed

Extention KDEF dataset downloaded Segmented
extending it by images ( with
adding Feature medium
personalised Segmentation) accuracy
image
TC003 Expression Using the Proper Frames Passed Frame Rate

mobilenet cnn Analysis of the dropped needs to be
Recognition
determining the emotion with improved .
accuracy and through the medium Perhaps using a
conditons OpenCV accuracy different
window architecture
TC004 Expression Using the system Intermediate Not Tested Next

Recognition to recognise Accuracy in Development
using expression to be recognition
Stage
mobile used on a mobile with low to
device device medium lag
TC005 Using a MobileNet is a Analysing Not Tested

different fairly accurate and other CNN for
algorithm fast architecture to the retraining
for Neural analyse expression of the network
Network but we’ll choose
Training alternatives too
Page 15 of 26
4.RESULT AND OUTPUTS
Figure 8: Code for recognition
Figure 9: Output of facial recognition.
Page 16 of 26
Dataset used
The training dataset that we have used for facial recognition is a self created dataset
which is extracted when the OpenCV window opens and captures multiple photos of
the example specimen which is later used in the recognizer script to identify the person.
Figure 11: This dataset is the images of users 1,2 and so on.
Page 17 of 26
4. Face emotion analysis
4.1 Creating a dataset
The dataset for our next activity that is the facial expression recognition consists of
various expressions being fed to the computer to identify a smile on a face as happy.
For this we have downloaded a Yale human faces dataset which features clean and
distinct photos of happy human faces and classifies them as a group (Supervised
Learning). For the activity we create several directories which features hundreds of
images of different expressions.
Images
5%
5%
Angry
31%
Calm
26% Happy
Sad
Yawning
33%
Figure 10: Pie chart explaining % of emotions
4.2 Cleaning the dataset
The images that were downloaded in raw form from google images website need to be
cropped (Frontal Face Segmentation) . For that we have a face crop segmentation that
runs on the specific directories to process the images to train the neural network.
Page 18 of 26
4.3 Retraining the Network
Once we have only cleaned images, we are ready to retrain the network. For this
purpose Mobile net Model is being used which is quite fast and accurate to modify the
weights and inputs to the dataset and train the network. Mobile Nets are a new family
of convolutional neural networks. Google open-sourced the Mobile Net architecture and
released 16 ImageNet checkpoints, each corresponding to a different parameter
configuration. This gives us an excellent starting point for training our own classifiers
that are insanely small and insanely fast.
The main difference between the Mobile Net architecture and a “traditional” CNN’s is
instead of a single 3x3 convolution layer followed by batch norm and ReLU, Mobile
Nets split the convolution into a 3x3 depthwise conv and a 1x1 pointwise conv.
4.4 Importing retrained model and setting everything up
Once the Network has been trained the model needs to be imported to the script which
then runs and OpenCV window and refers to the model to give the output as to what
expression is on the face which is being input through the OpenCV window. This
completes the expression analysis phase and the work is done.
#TRAINING MODEL SCREENSHOT
Figure 11: Screenshot of output showing happy emotion
Page 19 of 26
Figure12 Screenshot of output showing angry emotion
Figure 13: Screenshot of output showing yawning emotion
Page 20 of 26
Figure 14: Screenshot of output showing calm emotion
4.4 Graphical User Interface (GUI)
It is a user interface that includes graphical elements, such

as windows, icons and buttons. The GUI has been implemented using
Tkinter. There are various modules for GUI in Python like PyQt, Tkinter,
Kivy, Pyforms, PyGObject, PyGUI etc. We have created a GUI which
makes it easy for user to access our system and facilitates efficiency and
smoothness to our project.
Page 21 of 26
Figure 15:GUI of the project Smart Human Scanner
Page 22 of 26
Chapter 4: CONCLUSION
4.1 Chapter overview
This final chapter summaries the achievements of the project as well as the challenges
that were faced during its development. It also provides an outline of the possible
improvements and their applicability.
4.2 Project Achievement
The proposed solution delivers a recognizer system for facial expressions. The most
important achievement consists in the integrated functionalities and the obtained
results. The system includes an automatic face detection mechanism and implements
feature extraction techniques, tailored for the current problem. A model is trained on
examples of faces. This successfully expands the system with the capability of
classifying all five emotions, ultimately achieving an accuracy of 86%. The
functionality can be easily accessed by the user, which provides the ability of uploading
an image and requesting a classification.
4.3 Challenges
Prior to implementing the system, one of the first challenges of the project was choosing
the algorithms for each individual module, because the selection had to consider the
following: integration of techniques, restriction to the time allowed for the project
development, speed of computation and ultimately, achieving a good system
performance. Another challenge was to gain and increase the accuracy of the models
during facial emotion recognition.
4.4 Concluding remarks
Finally, this report demonstrates the achievements of the project, but also presents an
assessment of the performance and reliability. Overall, the proposed solution has
delivered a system capable of classifying the six basic universal emotions, with an
average accuracy of 86%. It extensively makes use of Image Processing and Machine
Learning techniques, to evaluate still images and derive suitable features, such that,
presented with a new example, it would be able to recognize the expressed emotion.
Personally, this project had a significant contribution for improving our knowledge of
Page 23 of 26
Computer Vision and Machine Learning methodologies and for understanding the
challenges and limitations of image interpretation.
To conclude with, I believe that the current solution succeeded to meet the project's
requirements and its deliverables. Even though it has a series of limitations, it allows
for further extensions, which would enable a more in-depth analysis and understanding
of human behavior, through facial emotions.
Page 24 of 26
REFERENCES:
WEBSITES:
1. https://opencv.org/
2. https://www.tensorflow.org/
3. https://www.python.org/
4. https://docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html
5. https://www.pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-
learning-image-dataset/
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J.

Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous
systems, 2015. Software available from tensorflow. org, 1, 2015. 4
[2] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural

networks with the hashing trick. CoRR, abs/1504.04788, 2015. 2
[3] F. Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv
preprint arXiv:1610.02357v2, 2016. 1
[4] M. Courbariaux, J.-P. David, and Y. Bengio. Training deep neural networks with low
precision multiplications. arXiv preprint arXiv:1412.7024, 2014. 2
[5] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network
with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2, 2015.
2
[6] J. Hays and A. Efros. IM2GPS: estimating geographic information from a single image.
In Proceedings of the IEEE International Conference on Computer Vision and Pattern
Recognition, 2008. 7
[7] J. Hays and A. Efros. Large-Scale Image Geolocalization. In J. Choi and G. Friedland,
editors, Multimodal Location Estimation of Videos and Images. Springer, 2014. 6, 7
Page 25 of 26
PERSONAL DETAILS:
1. ANOUSHKA SHARMA
Enrolment No. - 161B036
Branch - CSE
Ph. No. - 9650010265
Email - anoushka444@gmail.com
2. AYUSH GUPTA
Enrolment No. -161B059

Branch - CSE
Ph. No. - 8959933198
Email - ayush.gupta398@gmail.com
3. CHAITANYA
Enrolment No. - 161B067
Branch - CSE
Ph. No. - 9755677266
Email - chaitanya.2298@gmail.com
Page 26 of 26

Chapter 1: INTRODUCTION: 1.1 Problem Definition

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 1: INTRODUCTION: 1.1 Problem Definition

Caricato da

Copyright:

Formati disponibili

Chapter 1: INTRODUCTION

1.1 Problem Definition

Emotions are transmitted in proportion of 55% through facial expressions.

Improving the skills of reading expressions is an important step towards successful

1.2 Project Overview

Acquire Face Face Person Facial

Figure 1 Steps of Facial Emotion Recognizer System Applications

1.3 Hardware Specifications

 Webcam: Embedded or extended

1.4 Software Requirements

1.3.1 Processor Speed

1.3.2 Hard Disk

2.1 Existing System

2.2 Proposed System

This project is intended to be lightweight and be able to recognise a person’s facial

2.3 Feasibility Study

2.3.1 Technical Feasibility

2.3.1.1 Project Technique

Deep Learning by training MobileNet architecture (a variant of Convolutional Neural

2.3.1.2 Project Requirement

The proposed system would be able to recognize facial expressions. In order to do so it

The official introduction to Python is:

Python is an easy to learn, powerful programming language. It has efficient high-

PyCharm is an integrated development environment (IDE) used in computer

PyCharm is cross-platform, with Windows, macOS and Linux versions. The

OpenCV (Open source computer vision) is a library of programming functions mainly

OpenCV supports the deep learning frameworks TensorFlow, Torch etc.

2.3.1.2.4 Haar-like Features

2.3.1.2.5 MobileNet Architecture

MobileNets are based on a streamlined architecture that uses depthwise separable

A few things about MobileNets:

1. They are insanely small

2. They are insanely fast

3. They are remarkably accurate

4. They are easy to tune for resources vs. accuracy

TensorFlow is an open source library for fast numerical computing.

Tkinter is free software released under a Python license.

2.3.1.2.8 Dataset: The Karolinska Directed Emotional Faces (KDEF)

A dataset of images of various people emoting: disgust, sadness, happiness, fear,

2.3.2 Legal Feasibility

2.3.3 Operational Feasibility

2.3.4 Time Feasibility

2.3.5 Financial Feasibility

3.1 Requirement Specification

3.1.1 Functional and non-functional requirements

Functional requirements Non-functional requirements

The system should deliver a trained classifier.

The system should take test images from the

The system’s GUI should be simple and clear.

Figure 2 : Flowchart for Emotional Analysis System

Figure 3 : Data Flow Diagram for Distinct Emotion Recogniser

Figure 5: DFD LEVEL 0 DIAGRAM OF THE PROPOSED SYSTEM

Using Haar Cascades

MobileNet Trained CNN

Figure 5.1: DFD LEVEL 1 DIAGRAM OF THE PROPOSED SYSTEM

Figure 6 :Method of applying the rectangle features

on the 24x24 pixels image of the face.

The standard convolutional layer is parameterized by convolution kernel K of size

𝐺(𝑘, 𝑙, 𝑛) = ∑ (𝐾(𝑖, 𝑗, 𝑚, 𝑛). F(k + i − 1, l + j − 1, m))

Standard convolutions have the computational cost of:

Our Model of MobileNet Architecture gives us an edge over the standard

Depthwise separable convolutions cost:

which is the sum of the depthwise and 1 × 1 pointwise convolutions. By expressing