Sei sulla pagina 1di 3

PROBLEM STATEMENT:

CODES:
Please run the Test.py file to check the working of the Naïve
Bayes text classifier.
1. Class2Movie.pkl/Class2Movie_new.pkl – Pickle file of dictionary containing key
as class and movie name as value.
2. Classifier_image.py – pytorch implementation of the image classifier.
3. Classifier_text.py – Training code for Naïve-Bayes text classifier.
4. Combine_cells.py – Python code to create New_plot.csv
5. Config-RealTime.txt – Configuration file to run image_classifier_neuro.py
6. coref_plot.csv – Dataset csv file provided for text plot to movie recognition
7. Corpus_Create.py/Create_Corpus_New.py – python file to create corpus for
text classifier
8. CountVector.pkl – pickle file to vectorize the corpus data
9. Create_csv.py – python file to create image_train.csv
10.image_classifier_neuro.py – Neuroevolution implementation of training the
image classifier.
11.image_train.csv – This csv contains the path to the images in the system and
the respective movie name.
12.IMGClass2Movie.pkl – Class to movie conversion dictionary fr image classifier
13.NaiveBayes_50%.pkl – text classifier pickle file.
14.New_plot.csv – Combined form of all cells of coref_plot.csv file where cells
contain movie name and movies entire plot.
15.Plot_corpus.pkl/Plot_corpus_new.pkl – pickle file containing the corpus file for
data conversion.
16.Test.py – contains the combined code to run the text classifier as per the
problem statement above.
17.y_values.pkl/y_values_new.pkl – pickle file containing the respective required
movie names.
Learning:
 First time implementing a Natural language Processing model.
 Learnt how to create the bag of words model, which helps covert
text to vectors which can be used to implement machine learning
algorithms.
 Learnt about basic preprocessing techniques for NLP.
 Learnt how using Linear Discriminent analysis and principle
component analysis help reduce the vector dimensions, by
holding on to frequently occurring words in the corpus meaning
holding onto words with the most statistical relevance to the
data.
 Implemented different ML algorithms for text classification.
 Learnt a little about using pretrained models like ULMFit and GPT-
2. These models help achieve high accuracy with low training
time.
 Tried to implement the image classifier using pytorch first before
trying with the neuroevolution model.
 The pytorch implementation had a dimension mismatch as
training in pytorch has to be done directly and not one by one.
 Learnt how to use Google Colab to implement ML models on the
cloud.