Sei sulla pagina 1di 14

EdYoda

Data Scientist Program

Program Curriculum
Learning outcomes:
• Learn to implement Machine Learning techniques using Python
• Learn data visualization techniques
• Learn to analyze raw data
• Learn Big Data and Spark

Python

1. Introduction to Python

• Useful Python Resources


• Python Tools and Utilities
• Python Features

2. Python Environment

• Local Environment Setup


• Downloads and Installations
• Setting up Environment Path

3. Executing Python

• Interactive Mode
• Scripting Mode
• Integrated Development Environment

4. Python Basic Syntax

• Python Identifiers
• Reserved Words
• Lines and Indentation

www.edyoda.com hello@edyoda.com
5. Python Variable Types

• Assigning Values to Variables


• Multiple Assignment
• Standard Data Types
• Data Type Conversion

6. Python Basic Operators

• Arithmetic Operators
• Comparison Operators
• Assignment Operators
• Bitwise Operators
• Logical Operators
• Membership Operators
• Identity Operators
• Operators Precedence

7. Python Decision Making

• IF statements
• IF...ELIF...ELSE Statements
• Nested IF statements

8. Python Loops

• While loop
• For loop
• Nested loop
• Break control statement
• Continue statement
• Pass statement

9. Python Numbers

• Number type conversion


• Mathematical function
• Random number function
• Trigonometric function

www.edyoda.com hello@edyoda.com
10. Python Strings

• String special operators


• String formatting operator
• Built-in string methods

11. Python Lists

• Basic list operations


• Indexing and slicing
• Built-in functions and methods

12. Python Tuples

• Basic tuple operations


• Indexing and slicing
• Built-in functions

13. Python Dictionary

• Basic Dictionary operations


• Built-in Functions and Methods
• Use cases

14. Python Functions

• Pass by reference and value


• Function Arguments
• Scope of variables
• Default Argument Values
• Keyword Arguments
• Arbitrary Argument Lists
• Unpacking Argument Lists
• Lambda Expressions
• Documentation Strings

www.edyoda.com hello@edyoda.com
15. Python Modules

• Importing Modules
• Namespaces and scoping
• Packages

16. Python Files I/O

• Writing and Parsing Text Files


• Parsing Text Using Regular Expressions
• Writing and Parsing XML Files
• Writing and Parsing JSON Files
• Writing and Parsing CSV Files

17. Python Exceptions

• The except clause with multiple exceptions


• The try-finally clause
• Argument of an Exception
• Raising an exception
• User-Defined Exceptions

18. Python Classes and Objects

• Creating Classes
• Creating instance objects
• Destroying Objects (Garbage Collection)
• Custom Classes
• Attributes and Methods
• Inheritance and Polymorphism
• Using Properties to Control Attribute Access

19. Functional Programming

• Lambda
• Filter
• Map
• Functools

www.edyoda.com hello@edyoda.com
20. Iterators and Generators

• Itertools
• Generators
• Decorators

21. Collections

• Deque
• Counter
• OrderedDict
• ChainMap

23. Debugging, Testing

• Pdb
• Breakpoints

24. Regular Expressions

• Characters and Character Classes


• Quantifiers
• Grouping and Capturing
• Assertions and Flags
• The Regular Expression Module

25. Deploying Python Applications

• Pip
• Virtualenv
• The init.py files
• The setup.py file
• Installing the package
• Software deployment in Python

www.edyoda.com hello@edyoda.com
Data Wrangling

1. Black Box Introduction to Machine Learning

• What is not Machine Learning


• What is Machine Learning
• Types of ML - Supervised, Unsupervised
• Supervised - Classification, Regression
• Unsupervised - Clustering, Association
• Machine Learning Pipeline

2. Essential NumPy

• Introduction to NumPy
• Creation
• Access
• Stacking and Splitting
• Methods
• Broadcasting

3. Pandas for Machine Learning

• Introduction to Pandas
• Understanding Series & DataFrames
• Loading CSV,JSON
• Connecting databases
• Descriptive Statistics
• Accessing subsets of data - Rows, Columns, Filters
• Handling Missing Data
• Dropping rows & columns
• Handling Duplicates
• Function Application - map, apply, groupby, rolling, str
• Merge, Join & Concatenate
• Stacking, Unstacking & Melting
• Pivot-tables
• Normalizing JSON
• Application - EDA on Employee data, sales data

www.edyoda.com hello@edyoda.com
4. Understanding Visualization:

• Introduction to matplotlib & seaborn


• Basic Plotting
• Title, Labels, Legends, Grid, colormap, xticks, yticks
• Color, linewidth
• Sub Plotting
• Scatter plot
• Histogram
• Bar Graphs
• Plotting distributions
• Plotting 3D data
• Fundamentals of Tableau

Mathematics Fundamentals

1. Essential Maths & Statistics

• Essential Linear Algebra


• Matrix Operations
• Understanding distributions
• Probability Concepts
• Calculus
• Understanding distributions
• Mean, Median, Mode, Quantile
• Other statistics Concepts
• Sampling Techniques

Machine Learning

1. Linear Models for Classification & Regression

• Simple Linear Regression using Ordinary Least Squares


• Gradient Descent Algorithm
• Regularized Regression Methods - Ridge, Lasso, Elastic Net
• Logistic Regression for Classification
• OnLine Learning Methods - Stochastic Gradient Descent & Passive Aggressive
• Robust Regression - Dealing with outliers & Model errors
• Polynomial Regression
• Bias-Variance Tradeoff
• Application - House Price, Cancer Prediction, Insurance Prediction

www.edyoda.com hello@edyoda.com
2. Preprocessing for Machine Learning

• Introduction to Preprocessing
• StandardScaler
• MinMaxScaler
• RobustScaler
• Normalization
• Binarization
• Encoding Categorical (Ordinal & Nominal) Features
• Imputation
• Polynomial Features
• Custom Transformer
• Text Processing
• CountVectorizer
• TfIdf
• HashingVectorizer
• Image using skimage

3. Decision Trees

• Introduction to Decision Trees


• The Decision Tree Algorithms
• Decision Tree for Classification
• Decision Tree for Regression
• Advantages & Limitations of Decision Trees
• Application - Cloth Prediction

4. Naive Bayes

• Introduction Bayes' Theorem


• Naive Bayes Classifier
• Gaussian Naive Bayes
• Multinomial Naive Bayes
• Bernoulli’s Naive Bayes
• Naive Bayes for out-of-core
• Application - Text Classification, Sentiment Analysis and Spam & Non-spam
classification

www.edyoda.com hello@edyoda.com
5. Composite Estimators using Pipelines & FeatureUnions

• Introduction to Composite Estimators


• Pipelines
• Transformed Target Regressor
• FeatureUnions
• ColumnTransformer
• GridSearch on pipeline
• Application - Author classification

6. Model Selection & Evaluation

• Cross Validation
• Hyperparameter Tuning
• Model Evaluation
• Model Persistence
• Validation Curves
• Learning Curves

7. Feature Selection & Dimensionality Reduction

• Introduction to Feature Selection


• Variance Threshold
• Chi-squared stats
• ANOVA using f_classif
• Univariate Linear Regression Tests using f_regression
• F-score vs Mutual Information
• Mutual Information for discrete value
• Mutual Information for continues value
• SelectKBest
• SelectPercentile
• SelectFromModel
• Recursive Feature Elimination
• PCA
• SVD
• Application - Credit Risk Prediction

8. Nearest Neighbors

• Fundamentals of Nearest Neighbor Algorithm


• Unsupervised Nearest Neighbors
• Nearest Neighbors for Classification

www.edyoda.com hello@edyoda.com
• Nearest Neighbors for Regression
• Nearest Centroid Classifier
• Application - Nearest neighbour for face inpainting

9. Clustering Techniques

• Introduction to Unsupervised Learning


• Clustering
• Similarity or Distance Calculation
• Clustering as an Optimization Function
• Types of Clustering Methods
• Partitioning Clustering - KMeans & Meanshift
• Hierarchical Clustering - Agglomerative
• Density Based Clustering - DBSCAN
• Measuring Performance of Clusters
• Comparing all clustering methods
• Application - Grouping similar customers

10. Anomaly Detection

• What are Outliers ?


• Statistical Methods for Univariate Data
• Using Gaussian Mixture Models
• Fitting an elliptic envelope
• Isolation Forest
• Local Outlier Factor
• Using clustering method like DBSCAN
• Application - Anomaly detection for credit risk prediction

11. Support Vector Machines

• Introduction to Support Vector Machines


• Maximal Margin Classifier
• Soft Margin Classifier
• SVM Algorithm for Classification
• SVM for Regression
• Hyper-parameters in SVM
• Application - Face recognition and breast cancer classification

www.edyoda.com hello@edyoda.com
12. Dealing with Imbalanced Classes

• What are imbalanced classes & their impact?


• OverSampling
• UnderSampling
• Connecting Sampler to pipelines
• Making classification algorithm aware of Imbalance
• Anomaly Detection
• Application - Fraud detection

13. Ensemble Methods

• Introduction to Ensemble Methods


• RandomForest
• AdaBoost
• Gradient Boosting Tree
• VotingClassifier
• XGBoost
• Application - Malicious data detection

14. Recommendation Engine

• Understanding distance vector calculation - cosine, euclidean, manhattan


• Types of Recommendation Engines
• Recommendation based on similarity
• Application - Grouping videos based on description, user rating prediction

15. Time Series Modeling

• Simple Average & Moving Average


• Single Exponential Smoothing
• Holt’s linear trend method
• Holt’s winter seasonal method
• ARIMA

16. Packaging & Deployment

• Creating Python Package


• Deploy trained model behind REST interface
• Deploy model behind API call
• Deploy on AWS cloud (optional)

www.edyoda.com hello@edyoda.com
Big Data Ecosystem

1. Introduction to Big Data

• Big Data
• Understanding distributed computing
• Introduction to Hadoop
• HDFS, YARN, MapReduce
• Limitations of Hadoop
• Introduction to Spark
• Introduction to Kafka
• Hive
• Cassandra

2. Internal Details of Spark

• Driver
• Executors
• Partitions
• Jobs
• Stages
• Tasks
• Resilient Distributed Datastructure
• DataFrames as a High Level Datastructure

3. Foundations of Spark using RDD

• Basics of Distributed Computing


• Resilient Distributed Dataset
• Simple Transformers - map,filter,groupby
• Actions - Collect, count, foreach
• Complex api - combinebykey
• Caching, Debugging
• Important Configuration

4. Data Wrangling using DataFrames

• Creating DataFrames from collections


• Creating a DataFrame from csv,json etc.
• DataFrame Row

www.edyoda.com hello@edyoda.com
• DataFrame Column
• Creating tables from dataframe
• SQL query
• DataFrame Grouping
• DataFrame Functions
• User Defined Functions (UDF)

5. Packaging & Deployment of Spark Applications

• The spark-submit command


• Command line parameters
• Deploying the app programmatically
• Configuring your SparkSession
• Modularizing code
• Structure of the module
• Building an egg
• User defined functions in Spark
• Submitting a job
• Monitoring execution

www.edyoda.com hello@edyoda.com

Potrebbero piacerti anche