Sei sulla pagina 1di 63

Facial Expression Recognition in E-Learning Environment using

Deep Learning
A Project report submitted in partial fulfillment

For the Award of Degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
[​2019​-​2020​]

Submitted by
S. Chandra sekhar (16A51A0536) M. Suresh(16A51A0553)

Ch. khageswara Rao (16A51A0537) B.Mamatha (16A51A0528)

Under the esteemed Guidance of


K.prasada Rao, M. Tech​.
Sr. Assistant Professor, Department of CSE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ADITYA INSTITUTE OF TECHNOLOGY AND MANAGEMENT


[Autonomous]

Approved by AICTE, Permanently Affiliated to JNTU, Kakinada,

Accredited by NBA & NACC K. Kotturu, Tekkali-532201, Srikakulam dist. (A.P)


ADITYA INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(Approved by AICTE, Permanently Affiliated to JNTU, Kakinada)

(Accredited by NBA & NAAC)

K. Kotturu, Tekkali-532201, Srikakulam dist. (A.P)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


CERTIFICATE
This is to certify that the project work entitled as ​“Facial Expression Recognition in
E-Learning Environment using Deep Learning” is carried out by ​S.chandra
sekhar(16A51A0536), M.suresh (16A51A0553), Ch. khageshwara Rao (16A51A0537),
B.Mamatha (16A51A0528) ​submitted in partial fulfillment for the requirements for the
award of Bachelor Of Technology in ​COMPUTER SCIENCE AND ENGINEERING
during the year 2019-20 to the ​Jawaharlal Nehru Technological University, Kakinada ​is a
record of bonafide work carried out by them under my guidance and supervision.

​Signature of Project Guide​ ​Signature of the Head of the Department

K. Prasada Rao, M. Tech, Dr. G.S.N. Murthy, M. Tech, Ph. D,

Sr.Asst. Professor, Head of the Department,

Department of CSE. Department of CSE.


​ACKNOWLEDGEMENT

We take this opportunity to express our sincere gratitude to our Director


Prof.V.V.NAGESWARA RAO and principal our ​Dr.A.S.Srinivasa Rao for his whole
hearted and kind cooperation, without this project would not have been possible.

We are also very much thankful to ​Dr.G.S.N.Murthy​, head of Computer Science &
Engineering, AITAM, Tekkali for his help encouragement.

We have great pleasure to acknowledge our sincere gratitude to our project guide Sri.
K.Prasada Rao​, Sr.Asst.Prof. Department of Computer Science and Engineering, AITAM,
Tekkali for his help and guidance during the project. His valuable suggestions and
encouragement helped us a lot carrying out this project work as well as in bringing this
project to this form.

We are extremely grateful to our department staff members, lab technicians and nonteaching
staff members for extreme help throughout our project.

Finally we express our heartfelt thanks to all my friends who helped me in successful
completion of this project.

PROJECT ASSOCIATES

S. Chandra sekhar(16A51A0536)
M. Suresh Kumar(16A51A0553)
CH. Khageshwara Rao (16A51A0537)
B.Mamatha(16A51A0528)
DECLARATION

We hereby declare that the project titled ​"Facial Expression Recognition in E-Learning
Environment using Deep Learning" ​is a bonafide work done by us at AITAM, Tekkali
Affiliated to JNTU, Kakinada towards the partial fulfillment for the award of Degree of
Bachelor of Technology in Computer Science and Engineering during the period ​2019-20​.

Project Associates

S. Chandra sekhar(16A51A0536)
M. Suresh Kumar(16A51A0553)
CH. Khageshwara Rao (16A51A0537)
B.Mamatha(16A51A0528)
ABSTRACT

Face recognition has become an attractive field in computer based application development
in the last few decades. The E-learning system is becoming more and more popular among
students nowadays. However, the emotion of students is usually neglected in the e-learning
system. This project is mainly concerned about using facial expression to detect emotion in
the e-learning system. OpenCV and Keras provide many algorithms for facial recognition
and emotion capturing. The captured facial expression will be used in the E-Learning
Environment for analyzing the learner mood. At last, we propose the design of an experiment
to evaluate the performance of this method in a real e-learning system.

The main Aim of this project is:


This Project is to determine the emotion and mood of the learner in an E-Learning system
and to use the detected emotion to make the learning process effective. This has to be
predicted based on the user visual features.

1
INDEX

TABLE OF CONTENTS Page No’s


ABSTRACT i
INDEX ii
LIST OF FIGURES iv
LIST OF TABLES v

CHAPTERS
1. INTRODUCTION
1.1 Introduction 2

1.2 Potential Applications 3

1.3 Deep Learning ​5


2. LITERATURE SURVEY
2.1 Existing system 8

2.2 Proposed system 8

3. REQUIREMENT AND TECHNICAL DESCRIPTION


3.1 System Configuration 11

3.2 Technical Description 12

4. METHODOLOGY
4.1 Image Acquisition 15

4.2 Face Detection 15

4.3 Image Preprocessing 17

4.4 Feature Extraction 17

4.5 Classification 19

5. DESIGN
5.1 UML Diagrams 23

5.2 Activity Diagram 23

5.3 Usecase Diagram 27

2
5.4 Sequence Diagram 28

6. DATASETS
6.1 FER Datasets for standard emotion 31

6.2 FER Datasets for E-Learning 32

7. IMPLEMENTATION
7.1 Loading and splitting dataset 34

7.2 Extracting Features and classificatio​n 34


8. TESTING
8.1 Introduction 40

8.2 Unit Testing 41

8.3 Integrated Testing 41

8.4 Test cases 42

9. RESULTS AND DISCUSSION


9.1 Results obtained for emotions related to E-Learning 44
using proposed Architecture
9.2 Results obtained for standard emotions using proposed Architecture 45

9.3 Results obtained for emotions related to E-Learning 47


using Hand crafted features

10. CODING
10.1 Load and split the dataset 49

10.2 Train the model 50

11. CONCLUSION AND FUTURE SCOPE ​52


12. BIBLIOGRAPHY ​54

3
LIST OF FIGURES

Figure No Figure Name Page No

1 Basic structure of facial expression analysis systems 2

2 E-learning hype curve 4

3 Neural network organisation 5

4 Proposed Hybrid Architecture for emotion detection 9

5 Illustrating feature points on human face 18

6 Depicting the hyperplane and support vectors 20


Illustrating the softmax layer
7 21

Activity Diagram - For training handcrafted model


8 24

Activity Diagram - For training CNN model


9 25

Activity Diagram - For testing Handcrafted model


10 26

Activity Diagram - For testing CNN model


11 27

Use case Diagram


12 28

Sequence Diagram
13 29

14 CNN Architecture 35

15 Confusion matrix for DAiSEE using proposed architecture 44

16 Confusion matrix for CK+ using proposed architecture 45

17 Accuracy and loss plot for CK+ using only CNN 45

18 Confusion matrix for JAFFE using proposed architecture 46

19 Accuracy and loss plot for JAFFE using CNN 46

20 Calculating distance between center to all feature points 47

21 Confusion matrix for Daisee using Handcrafted Features 47

4
LIST OF TABLES

Table No Table Name Page No

1 Software Used 11

2 Dependencies Used 11

3 Hardware Used 12

4 Publicly available datasets for FER 31

5 Obtained accuracies for each cognitive state for DAiSEE 44

5
CHAPTER-1
INTRODUCTION

Page 1
1. INTRODUCTION

1.1 Introduction:

Facial expressions are the facial changes in response to a person’s internal emotional states,
intentions, or social communications. Facial emotion recognition is the process of detecting
human emotions from facial expressions. The human brain recognizes emotions
automatically, and software has now been developed that can recognize emotions as well.
This technology is becoming more accurate all the time, and will eventually be able to read
emotions as well as our brains do. AI can detect emotions by learning what each facial
expression means and applying that knowledge to the new information presented to it.
Emotional artificial intelligence, or emotion AI, is a technology that is capable of reading,
imitating, interpreting, and responding to human facial expressions and emotions.

1.1.1 Basic Structure of Facial Expression Analysis Systems


Facial expression analysis includes both measurement of facial motion and recognition of
expression. The general approach to automatic facial expression analysis (AFEA) consists of
three steps (Fig. 1): face acquisition, facial data extraction and representation, and facial
expression recognition. Face acquisition is a processing stage to automatically find the face
region for the input images or sequences.

​Fig. 1​ Basic structure of facial expression analysis systems

It can be a detector to detect faces for each frame or just detect faces in the first frame and
then track the face in the remainder of the video sequence. To handle large head motion, the
head finder, head tracking, and pose estimation can be applied to a facial expression analysis
system. After the face is located, the next step is to extract and represent the facial changes
caused by facial expressions. In facial feature extraction for expression analysis, there are
mainly two types of approaches: geometric feature-based methods and appearance-based
methods. The geometric facial features present the shape and locations of facial components
(including mouth, eyes, brows, nose, etc.). The facial components or facial feature points are

Page 2
extracted to form a feature vector that represents the face geometry. With appearance-based
methods, image filters, such as Gabor wavelets, are applied to either the whole-face or
specific regions in a face image to extract a feature vector. Depending on the different facial
feature extraction methods, the effects of in-plane head rotation and different scales of the
faces can be eliminated by face normalization before the feature extraction or by feature
representation before the step of expression recognition.

1.2 Potential Applications

1.2.1 FER in some other Area​s​:

Making Cars Safer and Personalized

Car manufacturers around the world are increasingly focusing on making cars more personal and
safe for us to drive. In their pursuit to build more smart car features, it makes sense for makers to
use AI to help them understand the human emotions. Using facial emotion detection smart cars
can alert the driver when he is feeling drowsy.

Facial Emotion Detection in Interviews

A candidate-interviewer interaction is susceptible to many categories of judgment and


subjectivity. Such subjectivity makes it hard to determine whether a candidate's personality is a
good fit for the job. Identifying what a candidate is trying to say is out of our hands because of
the multiple layers of language interpretation, cognitive biases, and context that lie in between.
That's where AI comes in, which can measure a candidate's facial expressions to capture their
moods and further assess their personality traits.

Market Research

Traditional market research companies have employed verbal methods mostly in the form of
surveys to find the consumers wants and needs. However, such methods assume that consumers
can formulate their preferences verbally and the stated preferences correspond to future actions
which may not always be right. Detecting emotions with technology is a challenging task, yet one
where machine learning algorithms have shown great promise. Using ParallelDots' Facial
Emotion Detection API, customers can process images, and videos in real-time for monitoring
video feeds or automating video analytics, thus saving costs and making life better for their users.
The API is priced on a Pay-As-You-Go model, allowing you to test out the technology before
scaling up.

Page 3
Facial Emotion detection is only a subset of what visual intelligence could do to analyze videos
and images automatically. Click ​here​ to check facial emotion in your picture.

1.2.2 FER in e-Learning System​:

As soon as the idea of e-learning was explained, expectations of converting formal education to
e-learning were increased. However, in the early 2000s, web technologies could not meet these
high expectations in the sense of the e-learning system and course materials (Fig. 2) . Since the
personal computer usage and internet bandwidth are increasing, e-learning systems are also
widely spreading. Although e-learning has some advantages in terms of information
accessibility, time and place flexibility compared to formal learning, it does not provide
enough face-to-face interactivity between an educator and learners. In this project, we are
proposing a hybrid information system, which is combining computer vision and machine

Fig. 2​ . E-learning hype curve

learning technologies for visual and interactive e-learning systems E-learning has become
more and more common among universities and colleges looking at its advantages over
traditional approaches where students are able to study and learn anytime. E-learning enables
students to access educational materials easily at any time and from anywhere. In addition,
with the personalized learning technologies, the productivity in education towards the student
groups in the heterogeneous structure is also increasing. Today, e-learning is not limited to
computer-aided systems. It is also accessible from mobile devices.. Thus, students can take
virtual courses whether from their personal computer or mobile devices, at home or in a cafe.
These elasticities allow the student to feel comfortable. This situation has led to increasing
efficiency and facilitating the learning process.

Page 4
1.3 Deep Learning
Deep learning​[1] is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology behind driverless
cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. It
is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free
speakers. Deep learning is getting lots of attention lately and for good reason. It’s achieving
results that were not possible before.

How Deep Learning Works


Most deep learning methods use ​neural network architectures, which is why deep learning
models are often referred to as deep neural networks. The term “deep” usually refers to the
number of hidden layers in the neural network. Traditional neural networks only contain 2-3
hidden layers, while deep networks can have as many as 150. Deep learning models are
trained by using large sets of labeled data and neural network architectures that learn features
directly from the data without the need for manual feature extraction. The organisation of
neurons in Deep Learning are illustrated in Fig.3. One of the most popular types of deep
neural networks is known as ​convolutional neural networks (CNN or ConvNet). A CNN​[8]
convolves learned features with input data, and uses 2D convolutional layers, making this
architecture well suited to processing 2D data, such as images.

Fig 3: Neural networks, which are organized in layers consisting of a set of interconnected
nodes. Networks can have tens or hundreds of hidden layers.

CNNs eliminate the need for manual ​feature extraction​, so you do not need to identify
features used to classify images. CNN works by extracting features directly from images. The
relevant features are not pretrained; they are learned while the network trains on a collection
of images. This automated feature extraction makes deep learning models highly accurate for
computer vision tasks such as object classification.

Page 5
Activation Function:

Activation function decides, whether a neuron should be activated or not by calculating


weighted sum and further adding bias with it. The purpose of the activation function is to
introduce non-linearity into the output of a neuron.

Variants of Activation Functions :-

1) Linear Function :- ​No matter how many layers we have, if all are linear in nature, the
final activation function of the last layer is nothing but just a linear function of the input of
the first layer. Issues : If we differentiate linear function to bring non-linearity, result will no
longer depend on input “x” and function will become constant, it won’t introduce any
ground-breaking behavior to our algorithm.

​ quation​ :​ ​ y = ax
E

2) Sigmoid Function :- ​Non-linear. Notice that X values lies between -2 to 2, Y values are
very steep. This means, small changes in x would also bring about large changes in the value
of Y. Usually used in the output layer of a binary classification, where result is either 0 or 1,
as value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to
be 1 if value is greater than 0.5 and 0 otherwise.

Equation : A = 1/(1 + e-x)

3) Tanh Function :​- The activation that works almost always better than sigmoid function is
Tanh function also known as Tangent Hyperbolic function. It’s actually a mathematically
shifted version of the sigmoid function. Both are similar and can be derived from each other.

Equation :- f(x) = tanh(x) = 2/(1 + e-2x) - 1

4) RELU :- Stands for Rectified linear unit. It is the most widely used activation function.
Chiefly implemented in hidden layers of Neural networks. It gives an output x if x is positive
and 0 otherwise. non-linear, which means we can easily backpropagate the errors and have
multiple layers of neurons being activated by the ReLU function. ReLu is less
computationally expensive than tanh and sigmoid because it involves simpler mathematical
operations.

Equation :- A(x) = max(0,x)

5) Softmax Function :- The softmax function is also a type of sigmoid function but is handy
when we are trying to handle classification problems. Usually used when trying to handle
multiple classes. The softmax function would squeeze the outputs for each class between 0
and 1 and would also divide by the sum of the outputs. The softmax function is ideally used
in the output layer of the classifier where we are actually trying to attain the probabilities to
define the class of each input.

Page 6
CHAPTER-2
LITERATURE SURVEY

Page 7
2 . Literature Survey

2.1 Existing System:

There is a wild usage of face emotion recognition in various fields such as ​Facial emotion
recognition medical field (impairment in chronic temporal lobe epilepsy), Effect of yoga
therapy on facial emotion recognition and much more. But they use either CNN model or an
SVM classifier for prediction.

​2.1.1 Facial emotion recognition using CNN


The CNN consists of two components for Feature extraction and classification. It is a
regular approach where CNN will take care about the feature extraction process and
classification. It's a straightforward process which includes designing an architecture and
load dataset to it. Then the model will be trained using CNN. Then it can be used to predict
the class labels.

2.1.2 Facial emotion recognition using Handcrafted Features

Handcrafted Features are those which are manually extracted from a given Image. A
facial image consists of feature points which represents all the facial parts(such as nose,
eyes and lips etc.,). These feature points are then forwarded to a classifier(such as SVM,
Random Forest etc.,) to train the model. Now, The trained model will be used to predict the
class labels.

2.2 Proposed System:

In this project, we are doing face detection and emotion recognition using some of the
packages available in python (such as Keras and OpenCv). The seven universal emotions are
happiness, sadness, anger, surprise, contempt, fear and disgust. But, These emotions will not
fit the E-Learning environment. So, the new cognitive emotions are: ​Boredom, Confusion,
Engaged, Frustration.​ However these emotions will suit best in an E-environment to
summarize the learner mental mood. Here, we are proposing a system where facial emotions
are recognised in e-Learning systems. In this system we used a hybrid architecture​[2]​(Fig.4 )
where the features were extracted using CNN. Later the extracted features are passed into an
SVM classifier. The layers followed by fully connected layers were popped out from CNN
architecture to produce the feature vector. This hybrid model will lead to better performance
than the regular CNN architecture. All the results are included in section 9.

Page 8
Fig.4 : Proposed Hybrid Architecture for emotion detection

Page 9
CHAPTER-3
REQUIREMENTS AND TECHNICAL DESCRIPTION

Page 10
3. REQUIREMENTS AND TECHNICAL DESCRIPTION
3.1 ​System Configuration
A ​system configuration ​(​SC) ​defines the computers, processes, and devices that compose
the system and its boundary. More generally, the system configuration is the specific
definition of the elements that define and/or prescribe what a system is composed of.
Alternatively, the term "system configuration" can be used to relate to a ​model (​ declarative)
for abstract generalized systems. In this sense, the usage of the configuration information is
not tailored to any specific usage, but stands alone as a data set.

Software Used:

Software Name Version

Python 3.6

Table 1: Software Used

Dependencies Used:

Requirements Version

Keras 2.2.1

Tensorflow 2.2.0

Numpy 1.18.1

DLib 19.9.0

Sklearn 0.21.1

Table 2: Dependencies Used

Page 11
Hardware Used:

HardDisk 1 TB

RAM 8 GB

Processor Intel core I3

Table 3: Hardware Used

3.2 Technical Description

3.2.1 Python
Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language. It was created by Guido van Rossum during 1985- 1990. It
provides many packages for image processing(such as OpenCv and Pillow etc.,)and deep
learning(such as Keras, Theano and Tensorflow etc.,).

Keras:

Keras is a deep learning framework. Keras is a central part of the tightly-connected


TensorFlow 2.0 ecosystem, covering every step of the machine learning workflow, from
data management to hyperparameter training to deployment solutions. Keras is a high level
library intended to stream-line the process of building deep learning networks.

Inside Convnets​:

There exists a filter or neuron or kernel which lays over some of the pixels of the input
image depending on the dimensions of the Kernel size. The Kernel actually slides over the
input image, thus it is multiplying the values in the filter with the original pixel values of the
image.

Kernel:

The kernel is nothing but a filter that is used to extract the features from the images. The
kernel is a matrix that moves over the input data, performs the dot product with the
sub-region of input data, and gets the output as the matrix of dot products. Kernel moves on
the input data by the stride value.

Stride in Convnets​:

stride denotes how many steps we are moving in each step in convolution.By default it is
one. Stride controls how the filter convolves around the input volume. Stride is normally set
in a way so that the output volume is an integer and not a fraction​.

Page 12
Pooling:

We can observe that the size of output is smaller than input. To maintain the dimension of
output as in input , we use padding. Padding is a process of adding zeros to the input matrix
symmetrically.

3.2.2 Google Colab


Colaboratory​ is a Google research project created to help disseminate machine learning
education and research. It's a Jupyter notebook environment that requires no setup to use and
runs entirely in the cloud​.

● Zero configuration required


● Free access to GPUs
● Easy sharing

colab is the best tool for running the higher time consuming modules.
Colab​[3]​ is used extensively in the machine learning community with applications including:

● Getting started with TensorFlow


● Developing and training neural networks
● Experimenting with TPUs
● Disseminating AI research
● Creating tutorials

Page 13
CHAPTER-4
METHODOLOGY

Page 14
​4 Methodology

4.1. ​Image Acquisition:


Image acquisition is the creation of a digitally encoded representation of the visual
characteristics of an object, such as a physical scene or the interior structure of an object.
The general aim of Image Acquisition is to transform an optical image (Real World Data)
into an array of numerical data which could be later manipulated on a computer. Images
used for facial expression recognition are static images or image sequences. Images of
faces can be captured using a camera. OpenCv have sophisticated tools to read a raw
image

4.2 Face detection:


​ Generally Face Detection​[4]​ can be carried out in three ways
● Haar Cascade Classifiers using OpenCv
● Histogram of Oriented Gradients using Dlib
● Convolutional Neural Networks using Dlib

​4.2.1 Haar Cascade Classifiers:

There are some common features that we find on most common human faces :

● a dark eye region compared to upper-cheeks


● a bright nose bridge region compared to the eyes
● some specific location of eyes, mouth, nose…

The characteristics are called Haar Features.

​4.2.2 Histogram of Oriented Gradients using Dlib:

The idea behind HOG is to extract features into a vector, and feed it into a classification
algorithm like a Support Vector Machine for example that will assess whether a face (or any
object you train it to recognize actually) is present in a region or not. The features extracted
are the distribution (histograms) of directions of gradients (oriented gradients) of the image.
Gradients are typically large around edges and corners and allow us to detect those regions.

4.2.3 Convolutional Neural Networks using Dlib:

Convolutional Neural Networks (CNN) are feed-forward neural networks that are mostly
used for computer vision. They offer an automated image pre-treatment as well as a dense

Page 15
neural network part. CNNs are special types of neural networks for processing datas with
grid-like topology. The architecture of the CNN is inspired by the visual cortex of animals.

In The below codelet the face was detected using haar cascad​e and cropped with a rectangle
of blue color

#face.xml is a faceCascade
faceCascade=cv2.CascadeClassifier(​"face.xml"​)
#returns the detected faces in given image
faces = faceCascade.detectMultiScale(
img,
scaleFactor=​1.1​,
minNeighbors=​5​,
minSize=(​30​, ​30​),
)

​# Draw a rectangle around the faces


for​ (x, y, w, h) ​in​ faces:
img=gray[y:y+h,x:x+w]
cv2.rectangle(frame, (x, y), (x+w, y+h), (​0​, ​255​, ​0​), ​2​)

Page 16
4.3 Image Pre-processing:
Image pre-processing includes the removal of noise and normalization against the
variation of pixel position or brightness.

​4.3.1 Histogram Equalization

Histogram Equalization is a computer image processing technique used to improve


contrast in images. It accomplishes this by effectively spreading out the most frequent
intensity values, i.e. stretching out the intensity range of the image. This method usually
increases the global contrast of images when its usable data is represented by close contrast
values. This allows for areas of lower local contrast to gain a higher contrast. We perform
Contrast Limited Adaptive Histogram Equalization (CLAHE) by using the OpenCv package.
CLAHE will use adaptive histogram equalization rather than the global histogram
equalization. It indeed divides the given image into blocks then performs the equalization
process

clahe = cv2.createCLAHE(clipLimit=​2.0​, tileGridSize=(​8​,​8​))


clahe_image = clahe.apply(gray)

4.4 Feature Extraction:


Selection of the feature vector is the most important part in a pattern classification problem.
The image of face after preprocessing is then used for extracting the important features. The
inherent problems related to image classification include the scale, posttranslational and
variations in illumination level

4.4.1 Feature Extraction using Dlib shape predictor:

It‘s a landmark’s facial detector with pre-trained models, the dlib is used to estimate the
location of 68 coordinates (x, y) that map the facial points on a person’s face (Fig. 5- marked
with green dots)

Page 17
Fig. 5 Feature points detected using dlib shape predictor

Although detecting feature points is quite straightforward because its just a pretrained model

#load the pretrained model


predictor =
dlib.shape_predictor(​"shape_predictor_68_face_landmarks.dat"​)
shape = predictor(image, face) ​# Image and Region of Interest are
arguments

4.4.2 Feature Extraction using CNN:

A convolutional neural network (CNN) is a type of artificial neural network usually designed
to extract features from given high dimensional data. CNN is designed specifically to
reorganize two dimensional shapes with a high degree of invariance to translation, scaling,
skewing and other forms of distortion. The structure includes feature extraction, feature
mapping and subsampling layers. A CNN model can be thought of as a combination of two
components: feature extraction part and the classification part. The convolution + pooling
layers perform feature extraction. For example, given an image, the convolution layer detects
features such as two eyes, long ears, four legs, a short tail and so on. The fully connected
layers then act as a classifier on top of these features, and assign a probability for the input
image being a dog. The convolution layers are the main powerhouse of a CNN model.
Automatically detecting meaningful features given only an image and a label is not an easy
task. The convolution layers learn such complex features by building on top of each other.
The first layers detect edges, the next layers combine them to detect shapes, and the
following layers merge this information to infer that this is a nose. To be clear, CNN doesn’t
know what a nose is. By seeing a lot of them in images, it learns to detect that as a feature.

Page 18
The fully connected layers learn how to use these features produced by convolutions in order
to correctly classify the images.

Pooling:
Pooling progressively reduces the size of the input representation. It makes it possible to
detect objects in an image no matter where they’re located. Pooling helps to reduce the
number of required parameters and the amount of computation required. It also helps control
overfitting. If pooling is not done periodically then the size of output will be increased
exponentially. There are two types of poolings that can be applied in convnets.They are
Global Average pooling, Max pooling. In global average pooling, The given matrix will be
replaced by its average and in Max pooling It will be replaced by the maximum value.

4.5 Classification:
The dimensionality of data obtained from the feature extraction method is very high so it is
reduced using classification. Features should take different values for object belonging to
different classes so classification will be done using the Support Vector Machine algorithm.

4.5.2 classification using SVM:

SVM offers very high accuracy compared to other classifiers such as logistic regression, and
decision trees. It is known for its kernel trick to handle nonlinear input spaces. It is used in a
variety of applications such as face detection, intrusion detection, classification of emails,
news articles and web pages, classification of genes, and handwriting recognition. It can
easily handle multiple continuous and categorical variables. SVM constructs a hyperplane in
multidimensional space to separate different classes(Fig. 6). SVM generates optimal
hyperplanes in an iterative manner, which is used to minimize an error. The core idea of
SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into
classes. Support vectors are the data points, which are closest to the hyperplane. These points
will define the separating line better by calculating margins. These points are more relevant
to the construction of the classifier.

Page 19
Fig.6: Depicting the hyperplane and support vectors

4.5.2 classification in CNN :

The convolutional neural network (CNN) is a class of deep learning neural networks. CNNs
represent a huge breakthrough in image recognition. They’re most commonly used to analyze
visual imagery and are frequently working behind the scenes in image classification. They
can be found at the core of everything from Facebook’s photo tagging to self-driving cars.
They’re working hard behind the scenes in everything from healthcare to security. Image
classification is the process of taking an input (like a picture) and outputting a class (like
“cat”) or a probability that the input is a particular class (“there’s a 90% probability that this
input is a cat”).

Softmax Layer:
A Softmax function is a type of squashing function. Squashing functions limit the output of
the function into the range 0 to 1. This allows the output to be interpreted directly as a
probability​. Similarly, softmax functions are multi-class sigmoids, meaning they are used in
determining probability of multiple classes at once. Since the outputs of a softmax function
can be interpreted as a probability (i.e.they must sum to 1), a softmax layer is typically the
final layer used in ​neural network functions. It is important to note that a softmax layer must
have the same number of nodes as the output later. A softmax layer(Fig. 7), allows the neural
network to run a multi-class function. In short, the neural network will now be able to
determine the probability that the dog is in the image, as well as the probability that
additional objects are included as well.

Page 20
Fig. 7: Illustrating the softmax layer

Page 21
CHAPTER-5
DESIGN

Page 22
5. DESIGN
5.1 UML Diagrams

The Unified Modeling Language (UML) is a standard language for specifying, visualizing,
constructing, and documenting the artifacts of software systems, as well as for business
modeling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects. Explore potential designs, and validate the architectural design
of the software

5.1.1 The conceptual model of the UML:

A conceptual model can be defined as a model which is made of concept and their
relationships.

A conceptual model is the first step before drawing UML diagrams. It helps to understand
the entities in the real world and how they interact with each other.

To understand how the UML works, we need to know the three elements:

● UML basic building blocks

● Rules to connect the building blocks (Rules for how these building blocks may be put
together).

● Common mechanisms that apply throughout in the UML.

5.2 Activity Diagram:

Activity diagram is another important diagram in UML to describe the dynamic aspects of
the system.

Activity diagram is basically a flowchart to represent the flow from one activity to another
activity. The activity can be described as an operation of the system.

The control flow is drawn from one operation to another. This flow can be sequential,
branched, or concurrent. Activity diagrams deal with all type of flow control by using
different elements such as fork, join, etc
The basic purpose of activity diagrams is to capture the dynamic behavior of the system.
Other four diagrams are used to show the message flow from one object to another but
activity diagram is used to show message flow from one activity to another.

Activity is a particular operation of the system. Activity diagrams are not only used for
visualizing the dynamic nature of a system, but they are also used to construct the executable

Page 23
system by using forward and reverse engineering techniques. The only missing thing in the
activity diagram is the message part.

It does not show any message flow from one activity to another. Activity diagram is
sometimes considered as the flowchart. Although the diagrams look like a flowchart, they
are not. It shows different flows such as parallel, branched, concurrent, and single.

​Activity Diagram - Training the model

Fig. 8 : Activity Diagram - For training handcrafted model

Page 24
Fig. 9 : Activity Diagram - For training CNN model

Page 25
Activity Diagram - Testing the model:

Fig. 10 : Activity Diagram - For testing Handcrafted model

Page 26
Fig. 11 : Activity Diagram - For testing CNN model

5.3 Use case Diagram:

A use case diagram at its simplest is a representation of a user‘s interaction with the system
and depicting the specifications of a use case. A use case diagram can portray the different
types of users of a system and the various ways that they interact with the system. This type
of diagram is typically used in conjunction with the textual use case and will often be
accompanied by other types of diagrams as well. While a use case itself might drill into a lot
of detail about every possibility, a use-case diagram can help provide a higher-level view of
the system. It has been said before that ―Use case diagrams are the blueprints for your
system‖. They provide the simplified and graphical representation of what the system must
actually do.

In its simplest form, a use case can be described as a specific way of using the system from a
user‘s (actor‘s) perspective. A use case is a set of scenarios that describe an interaction
between a user and a system. A use case diagram displays the relationship among actors and
use cases. The two main components of a use case diagram are use cases and actors.

Page 27
Fig 12: Use case Diagram

5.4 Sequence Diagram

A sequence diagram is a graphical view of a scenario that shows object interaction in a time
based sequence: what happens first, what happens next. Sequence diagrams establish the
roles of objects and help provide essential information to determine class responsibilities and
interfaces. Sequence diagrams are normally associated with use cases. Sequence diagrams
are closely related to collaboration diagrams and both are alternate representations of an
interaction There are two main differences between sequence and collaboration diagrams:
sequence diagrams show time-based object interaction while collaboration diagrams show
how objects associate with each other. A sequence diagram has two dimensions: typically,
vertical placement represents time and horizontal placement represents different objects. A
sequence diagram shows, as parallel vertical lines (lifelines), different processes or objects
that live simultaneously, and, as horizontal arrows, the messages exchanged between them,
in the order in which they occur. This allows the specification of simple runtime scenarios in
graphical manner.

Page 28
Fig 13: Sequence Diagram

Page 29
CHAPTER-6
DATASETS

Page 30
6. Datasets:

6.1 FER Datasets for standard emotion:

A facial expression database is a collection of images or video clips with ​facial expressions of
a range of ​emotions​. Well-annotated (​emotion​-tagged) media content of facial behavior is
essential for training, testing, and validation of ​algorithms for the development of ​expression
recognition systems​. The emotion annotation can be done in ​discrete emotion labels or on a
continuous scale. Most of the databases are usually based on the ​basic emotions theory (by
Paul Ekman​) which assumes the existence of six discrete basic emotions (anger, fear, disgust,
surprise, joy, sadness). However, some databases include the emotion tagging in continuous
arousal-valence scale. In posed expression databases, the participants are asked to display
different basic emotional expressions, while in spontaneous expression databases, the
expressions are natural. Spontaneous expressions differ from posed ones remarkably in terms
of intensity, configuration, and duration. Apart from this, synthesis of some AUs are barely
achievable without undergoing the associated emotional state. Therefore, in most cases, the
posed expressions are exaggerated, while the spontaneous ones are subtle and differ in
appearance. Some of the publicly available datasets are tabulated

Database Facial expression Number of Number of Gray/Colo Type


Subjects images /videos r

Ryerson Audio-Visual Speech: Calm, happy, sad, angry, 24 7356 video and Color Posed
Database of Emotional fearful, surprise, disgust, and audio files
Speech and Song neutral.
(RAVDESS)
Song: Calm, happy, sad, angry,
fearful, and neutral. Each
expression at two levels of
emotional intensity.

Extended neutral, sadness, surprise, 123 593 image Mostly Posed;


Cohn-Kanade Dataset happiness, fear, anger, contempt sequences) gray spontaneo
(CK+) and disgust us smiles

Japanese Female neutral, sadness, surprise, 10 213 static images Gray Posed
Facial Expressions happiness, fear, anger, and disgust
(JAFFE)

neutral, sadness, surprise, 43 1280 videos and Color Posed and


happiness, fear, anger, and disgust over 250 images
MMI Database

Table 4: Publicly available datasets for FER

Page 31
6.2 FER Dataset for E-learning:

6.2.1 DAiSEE:

The difference between real and virtual worlds is shrinking at an astounding pace. With more
and more users working on computers to perform a myriad of tasks from online learning to
shopping, interaction with such systems is an integral part of life. In such cases, recognizing a
user’s engagement level with the system (s)he is interacting with can change the way the
system interacts back with the user. This will lead not only to better engagement with the
system but also pave the way for better human-computer interaction. Hence, recognizing user
engagement can play a crucial role in several contemporary vision applications including
advertising, healthcare, autonomous vehicles, and e-learning. However, the lack of any
publicly available dataset to recognize user engagement severely limits the development of
methodologies that can address this problem. To facilitate this, we introduce DAiSEE, the
first multi-label video classification dataset comprising 9068 video snippets captured from
112 users for recognizing the user's affective states of boredom, confusion, engagement, and
frustration “in the wild”. The dataset has four levels of labels namely - very low, low, high,
and very high for each of the affective states, which are crowd annotated and correlated with
a gold standard annotation created using a team of expert psychologists. We have also
established benchmark results on this dataset using state-of-the-art video classification
methods that are available today. We believe that DAiSEE​[5] will provide the research
community with challenges in feature extraction, context-based inference, and development
of suitable machine learning methods for related tasks, thus providing a springboard for
further research.

DAiSEE Features:

● Database Information- 9,068 video sequences 112 - Wild


● # of subjects - 112
● Condition - Wild
● Affect Modelling- (Engagement, Boredom, Confusion, Frustration)

This work uses the above dataset (DAiSEE) to find the learner emotion

Page 32
CHAPTER-7
IMPLEMENTATION

Page 33
7. Implementation

7.1 Loading and Splitting The dataset:

Our dataset consists of 9068 sequences of videos over 112 no. of subjects. The
videos present in the dataset are labelled to one of the four cognitive states which are
included in section 2.2. The implementation initiated with preprocessing the datasets and the
video sequences are cut into a set of frames. Later we resized each frame into 48 X 48 pixels
followed by contrast limited adaptive histogram equalization. Then the dataset is splitted into
two sets as a training set and validation set with a weightage of 80% and 20% respectively.
Then The same is stored in a pickle format to start the training process.

7.2 Extracting features and classification:

Feature Extraction and classification with Convnets:

Convolution Neural Networks are those which use neural networks and ​it is
inspired by the visual cortex of animals. The CNN consists of two components for feature
extraction and classification. Initially we used CNN model to extract features and classify
them to respective class labels.The architecture will take an image of size 48 X 48 with
grayscale channel. It totally consists of 24 layers(Fig.14). The fully connected layer will
provide the feature vector with 3200 features. They are sent to the softmax layer to classify
them according to the labels. But the performance was not upto the mark. The performance of
a CNN will be improved by replacing a softmax layer with a SVM classifier.

Page 34
Fig. 14. CNN Architecture ​

Page 35
Feature Extraction using Convnets and classification with SVM:

The last five layers in the above mentioned architecture were popped out to extract only the
Features. The new model is created upto fully connected layer with the same weights.

#preparing the model with same weights


model_1=Model(inputs=model.input,outputs=model.layers[​-6​].output)

The features will be obtained by using the predict method. The same obtained features will be
brought forwarded to SVM​[10] classifier. The performance of classification can be increased
by using SVM instead of two fully connected neural networks​[6]​.

from​ sklearn.svm ​import​ SVC ​#import SVM classifier


clf = SVC(kernel=​'linear'​, verbose=​True​,probability=​True​,
tol=​1e-3​)
clf.fit(model_1.predict(X_train),train_labs) ​# Train the
classifier

Summary of the CNN Model ("sequential_1"):

_________________________________________________________________

Layer (type) Output Shape Param #

==========================================================

conv2d_1 (Conv2D) (None, 46, 46, 32) 320

_________________________________________________________________

conv2d_2 (Conv2D) (None, 46, 46, 32) 9248

_________________________________________________________________

batch_normalization_1 (Batch (None, 46, 46, 32) 128

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 23, 23, 32) 0

_________________________________________________________________

dropout_1 (Dropout) (None, 23, 23, 32) 0

_________________________________________________________________

Page 36
conv2d_3 (Conv2D) (None, 23, 23, 64) 18496

_________________________________________________________________

batch_normalization_2 (Batch (None, 23, 23, 64) 256

_________________________________________________________________

conv2d_4 (Conv2D) (None, 23, 23, 64) 36928

_________________________________________________________________

batch_normalization_3 (Batch (None, 23, 23, 64) 256

_________________________________________________________________

max_pooling2d_2 (MaxPooling2 (None, 11, 11, 64) 0

_________________________________________________________________

dropout_2 (Dropout) (None, 11, 11, 64) 0

_________________________________________________________________

conv2d_5 (Conv2D) (None, 11, 11, 128) 73856

_________________________________________________________________

batch_normalization_4 (Batch (None, 11, 11, 128) 512

_________________________________________________________________

conv2d_6 (Conv2D) (None, 11, 11, 128) 147584

_________________________________________________________________

batch_normalization_5 (Batch (None, 11, 11, 128) 512

_________________________________________________________________

max_pooling2d_3 (MaxPooling2 (None, 5, 5, 128) 0

_________________________________________________________________

dropout_3 (Dropout) (None, 5, 5, 128) 0

_________________________________________________________________

flatten_1 (Flatten) (None, 3200) 0

_________________________________________________________________

Page 37
These 3200 sized feature vectors will be feeded to the SVM and will now create hyperplanes
with support vectors to classify the features. Now the model will result in 51% accuracy
overall. The total implementation code will be available in the public repository​[11]​.

Using the model to test the cognitive state:

Now the same obtained model will be used to detect the state of the learner in a real-time
environment. These emotion states will be used to analyze the learner mood and behaviour.
The engaged cognitive mood will result in that the course contents are good and there is no
need to change the course contents. Boredom cognitive mood will describe that the tutor has
to change his course delivering process. The other two moods(confusion and Frustration)
will describe that the course has to be modified.

Page 38
CHAPTER-8
TESTING

Page 39
8.TESTING

8.1 Introduction
Testing is the process of detecting errors. Testing performs a very critical role for
quality assurance and for ensuring the reliability of software. The results of testing are used
later on during maintenance also.
Testing should be made at every level of performing the task. By making testing we
can know what our mistakes are. So testing should be done and it is very primary aspect.
There are different types of testing methods. These testing methods were divided into two
types.
1. White box testing
2. Black box testing

1. White Box Testing


White box testing requires access to the source code. White box testing requires
knowing what makes software secure or insecure, how to think like an attacker, and how to
use different testing tools and techniques. The first step in white box testing is to comprehend
and analyze source code, so knowing what makes software secure is a fundamental
requirement. Second, to create tests that exploit software, a tester must think like an attacker.
Third, to perform testing effectively, testers need to know the different tools and techniques
available for white box testing.
In this testing only the output is checked for correctness. The logical flow of the data
is not checked. In my project I tested the source code, in that all independent paths have been
executed. And all loops at their boundaries and within their operational

2. Black Box Testing


Black box testing treats the software as a “black box”, examining functionality without
any knowledge of internal implementation. The tester is only aware of what the software is
supposed to do, not how it does. Black box testing methods include: equivalence partitioning,
boundary value analysis, all-pairs testing, state transition tables, decision table testing, fuzz
testing, model-based testing, use case testing, exploratory testing and specification-based
testing.

Testing objectives
The main objective of testing is to uncover a host of errors, systematically and with
minimum effort and time. Stating formally, we can say,
● Testing is a process of executing a program with the intent of finding an error.
● A good test case is one that has a big probability of finding error, if it exists.
● The tests are inadequate to detect possibly present errors.
We perform different types of testing at every stage. Among so many testing levels
we choose some testing levels which are involved. They are:

Page 40
1. Unit testing
2. Integration testing

8.2 Unit Testing


Unit testing focuses verification effort on the smallest unit of software i.e., the
module. Using the detailed design and the process specifications testing is done to uncover
errors within the boundary of the module. All modules must be successful in the unit test
before the start of the integration testing begins.
In every stage of development of the project, testing was done. After writing code
for every module/page testing was done. In the first time execution we got so many errors, by
correcting the errors line by line we were able to overcome those and made my project
successful.

8.3 Integration Testing


After the unit testing we have to perform integration testing. The goal here is to see if
modules can be integrated properly, the emphasis being on testing interfaces between
modules. This testing activity can be considered as testing the design and hence the emphasis
on testing module interactions. I have checked whether the integration effects working of any
of the services by giving different combinations of inputs with which the two services run
perfectly before Integration.

Page 41
8.4 Test Cases:

Expected
S.No Test case Title Description Result
Outcome
1 Testing an Image The actual The predicted
with Frustration emotion is cognitive
state Frustration state of user is
Frustration

2 Testing an Image The actual The


with Engaged state emotion is predicted
Engaged cognitive
state of user is
Engaged
3 Testing an Image The actual The
with Boredom state emotion is predicted
Boredom cognitive
state of user is
Boredom
4 Testing an Image The actual The
with Confusion state emotion is predicted
Confusion cognitive
state of user is
Confusion

Page 42
CHAPTER-9
RESULTS AND DISCUSSIONS

Page 43
9.1 Results obtained for emotions related to E-Learning using proposed architecture:

The above mentioned architecture in section 7.2 will result in 51.97% accuracy. The
accuracy for every cognitive state is listed below.

Cognitive state Obtained


accuracy

Boredom 54%

Confusion 54%

Engaged 63%

Frustration 28%
Table 5: Obtained accuracies for each cognitive state for DAiSEE

The above results are satisfiable and markable up to now in the E-Learning Field Since the
dataset is being collected in wild(not posed) condition. The confusion matrix(Fig.15) is
drawn for obtained accuracy. The confusion matrix will depict the correct and incorrect
predictions made by the model.

Fig.15 Confusion matrix for DAiSEE using proposed architecture

Key Observations:

● Most of the Images with frustration cognitive state are predicted as confusion
(Although the both emotion states will results in change of contents of the courses)
● The accuracy is upto the mark for each emotion state except for the frustration state

Page 44
9.2 .Results obtained for standard emotions using proposed architecture:

9.2.1 Results obtained for CK+ dataset:

The above proposed architecture will result in accuracy upto 98% for CK+​[7] dataset. The
obtained results are extremely good. This much accuracy was obtained because the CK+
dataset contains the images which are taken in a posed environment. The confusion
matrix(Fig.16) will depict the false and true predictions.The result obtained for the same
dataset by using only CNN will result in 58% accuracy(Fig. 17).

Fig.16 Confusion matrix for CK+ using proposed architecture

Fig.17 accuracy and loss plot for CK+ using only CNN

Page 45
9.2.2 Results obtained for JAFFE dataset:

The above proposed architecture will result in accuracy upto 76% for FER2013​[9] dataset. The
obtained results are extremely good compared to CNN.The CNN model will be trained for
100 epoches. It will result in 22 % accuracy(Fig. 19).The confusion matrix(Fig.18) is drawn
for our hybrid model to depict the false and true predictions.

Fig. 18 Confusion matrix for JAFFE using proposed hybrid architecture

Fig. 19 accuracy and loss plot for JAFFE using CNN

Page 46
9.3 .Results obtained for emotions related to E-Learning using Handcrafted Features

Here, we used a dlib shape predictor to extract feature points on a given face image(Fig.5).
Later the mean point for all 68 feature points will be calculated.For every feature point we
calculate the distance from center(Fig. 20), Tangent angle to the center and Its coordinates in
2-D space. Then the total size of the vector is 272 (68 X 4) for a single image. There are
3219 images in the training set.

Fig. 20 Calculating distance between center to all feature points

Then train the obtained feature vector of size 875568 (272 X 3219) with SVM classifier. The
SVM classifier classifies the feature vector with an accuracy of 40%. The true and false
predictions made by SVM by these handcrafted features are illustrated in confusion
matrix(Fg. 21)

Fig. 21 Confusion matrix for Daisee using Handcrafted Features

Page 47
CHAPTER - 10
CODING

Page 48
10. Coding:
The above whole work is divided into two modules. Load and split the data followed by some
image preprocessings, Train the model using proposed hybrid architecture.

10.1 Load and split the dataset:

#function to get file list, randomly and shuffle it and split 80/20
def get_files(emotion):
files = glob.glob("PATH/%s/*" %emotion)
random.shuffle(files)
training = files[:int(len(files)*0.8)] #get first 80% of file list
prediction = files[-int(len(files)*0.2):] #get last 20% of file list
return training, prediction

#​To load the images and perform CLAHE operation and return data
def get_img(ems):
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
pixel=[]
usage=[]
em=[]
for xx in range(len(ems)):
tr,te=get_files(ems[xx])
for item in tr:
image = cv2.imread(item) #open image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #convert to grayscale
gray=np.asarray(Image.fromarray(gray).resize((48,48)))
clahe_image = clahe.apply(gray)
clahe_image=clahe_image.reshape(48*48)
s=""
for i in clahe_image:
s=s+str(i)+" "
em.append(xx)
pixel.append(s)
usage.append('train')
for item in te:
image = cv2.imread(item) #open image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #convert to grayscale
gray=np.asarray(Image.fromarray(gray).resize((48,48)))
clahe_image = clahe.apply(gray)
clahe_image=clahe_image.reshape(48*48)
s=""
for i in clahe_image:

Page 49
s=s+str(i)+" "
em.append(xx)
pixel.append(s)
usage.append('test')
print('done',ems[xx])
df={'emotion':pd.Series(em),'pixels':pd.Series(pixel),'usage':pd.Series(usage)}
return df

10.2 Train the model:

#Defining and compiling the model


model = Sequential()
model.add(Conv2D(num_features, kernel_size=(3, 3), activation='relu', input_shape=(width,
height, 1), data_format='channels_last', kernel_regularizer=l2(0.01)))
model.add(Conv2D(num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(2*2*num_features, kernel_size=(3, 3), activation='relu',
padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(2*2*num_features, kernel_size=(3, 3), activation='relu',
padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2*2*num_features, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(2*num_features, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Page 50
#Training the model over set of epoches
nb_epoch =100
batch_size = 64
history = model.fit(X_train,y_train, epochs=nb_epoch,class_weight=class_Weights,
validation_data=(X_val,y_val), shuffle=True, verbose=1)

#Recreating the model upto Fully Connected layer and perform classification using SVM
model_1=Model(inputs=model.input,outputs=model.layers[-6].output)
clf = SVC(kernel='linear', verbose=True,probability=True, tol=1e-3)
clf.fit(model_1.predict(X_train),train_labs)

Page 51
CHAPTER – 11
​CONCLUSION AND FUTURE SCOPE

Page 52
10. ​CONCLUSION AND FUTURE SCOPE:

In this project, we have proposed a Facial Emotion Recognition system for the E-Learning
environment. The Mood of the Learners are usually neglected in the E-Environment. There
is no proper system to monitor the cognitive mood of the learner. This model can classify the
learner mood based on the learner visual features.

Future Work:

In future, This work can be extended to propose an application which will be placed at client
side to monitor the mood of the learner continuously. Based on the mood of the user the
course contents will be developed.

This work can also be extended to analyse the mood of the user based on their interaction
(such as., Assessments completed, Posted Feedback and active participation ) along with the
visual features. This may increase the accuracy of the proposed system.

Page 53
BIBLIOGRAPHY

[1] ​https://machinelearningmastery.com/what-is-deep-learning/

[2] Xiao-Xiao Niu, Ching Y. Suen,A novel hybrid CNN–SVM classifier for recognizing
handwritten digits, Pattern Recognition

[3] Pessoa, Tiago & Medeiros, Raul & Nepomuceno, Thiago & Bian, Gui-Bin & Albuquerque,
V.H.C. & Filho, Pedro Pedrosa. (2018). Performance Analysis of Google Colaboratory as a Tool
for Accelerating Deep Learning Applications. IEEE Access. PP. 1-1.
10.1109/ACCESS.2018.2874767

[4] Maël Fabien , A guide to Face Detection in Python.

[5]Abhay Gupta and Arjun D'Cunha and Kamal Awasthi and Vineeth Balasubramanian,
DAiSEE: Towards User Engagement Recognition in the Wild, https://arxiv.org/abs/1609.01885

[6]Ahmad, Mubashir. (2019). Re: In CNN, can we replace fully connected layers with SVM as a
classifier?.

[7]P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, "The Extended
Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified
expression," 2010 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition - Workshops, San Francisco, CA, 2010, pp. 94-101, doi:
10.1109/CVPRW.2010.5543262.

[8]Pramerdorfer, Christopher & Kampel, Martin. (2016). Facial Expression Recognition using
Convolutional Neural Networks: State of the Art.

[9]Wolfram Research, "FER-2013" from the Wolfram Data Repository (2018)

[10]Yujun Yang, Jianping Li and Yimei Yang, "The research of the fast SVM classifier method,"
2015 12th International Computer Conference on Wavelet Active Media Technology and
Information Processing (ICCWAMTIP), Chengdu, 2015, pp. 121-124, doi:
10.1109/ICCWAMTIP.2015.7493959.

[11]This public GitHub repository includes all modules used in this work,
https://github.com/chandrasekhar36/FER-for-E-Environment

Page 54

Potrebbero piacerti anche