Sei sulla pagina 1di 7

NYC

Data Science Academy


12-Week Data Science Bootcamp C urriculum

Week 1
Data Science Toolkit Linux, Git, Bash, and SQL
Data Science with R Data Analytics Part I
Linux system
o Introduce Linux environment
o Learn Linux commands
o IO redirection and Pipe
o Introduce server-side Linux usage
Git
o Introduce modern source code management
o Learn common git operations
o Setup github and personal portfolio page
Other server related topics
o Text editors and IDEs
o ssh: how to communicate with a remote server
o Linux environment variables
SQL
o Introduction to relational database
o Introduction to structured query language
o SQL major commands and examples
Programming foundation in R I
o Syntax
o Data object: Vectors, Matrices, Data Frames, and Lists
o Common functions
o Rstudio environment and package management
o Local data input/output
o Introduction to R data visualization
Programming foundation in R II
o Data sorting and merging
o String manipulation
o Dates and times
o Connecting to an external database
Data manipulation with dplyr
o Tables in R
o Join
o Subset
o Advanced manipulations with dplyr

Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Week 2
Data Science with R Data Analytics Part II
Data Visualization with "ggplot2"
o Histogram
o Point graphics
o Columnar graphics
o Line charts
o Pie charts
o Box plots
o Scatter plots
o Visualizing multivariate data
o Matrix-based visualizations
o Maps
Introduction to Shiny
o Shiny introduction
o Design the User-interface
o Control widgets
o Build reactive output
o Use data table in Shiny Apps
o Use R scripts, data and packages
o UI and server for the App
o Make Shiny perform quickly
o Matrix-based visualizations
o Use reactive expressions
o Share and deploy Shiny apps
Lab: Moneyball
Project 1 Due: Exploratory Data Visualization

Week 3
Data Science with R - Machine Learning Part I

Foundations of Statistics
o
Descriptive Statistics

Measures of Centrality

Measures of Variability

Frequency, Proportion & Contingency Tables

Correlation
o
Hypothesis Testing

One Sample t-test

Two Sample t-test

F-test

One-way ANOVA
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

X2 Test of Independence
o
Introduction to Machine Learning

Supervised Learning

Regression

Classification

Unsupervised Learning

Clustering

Dimension Reduction

Missingness & Imputation


o
Types of Missingness

MCAR

MAR

MNAR
o
Basic Methods of Imputation

Mean Value Imputation

Simple Random Imputation

Regression Prediction
o
K-Nearest Neighbors

Voronoi Tessellations

KNN for Classification

KNN for Regression

Distance Measures

Linear Regression I
o
Simple Linear Regression

From a Mathematical Standpoint

Accuracy of the Coefficient Estimates

Performing Hypothesis Tests

Constructing Confidence Intervals


o
Assumptions & Diagnostics
o
Transformations

Power Transformation

Box-Cox Transformation
o
The Coefficient of Determination R2

Linear Regression II
o
Multiple Linear Regression

From a Mathematical Standpoint


o
Assumptions & Diagnostics
o
Potential Problems
o
Research Questions
o
Variable Selection
o
Factors
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Interactions
Higher-Order Terms

o
o

Week 4
Data Science with R - Machine Learning Part II
Lab: Building Bridges
Generalized Linear Models
o
Logistic Regression
The Curse of Dimensionality
o
Ridge Regression
o
Lasso Regression
o
Cross-Validation
o
Bias/Variance Tradeoff
o
Density
o
Principal Component Analysis
The Curse of Dimensionality
o
Density
o
Principal Components Analysis
Guest Lecture
Project 2 Due: R Shiny Interactive Applications

Week 5
Data Science with Python - Data Analytics Part I
Data Science with R Machine Learning (Continued)
Python Programming Language I
o
Overview of syntax
o
Built-in functions
o
Data structures
o
Standard libraries
o
Object oriented programming
Python Programming Language II
o
List comprehension
o
Data copy
o
Introduction to algorithm concepts
String Processing / Regular Expressions
o
Regular expressions
o
Web scraping: Ajax, XPath, Beautiful Soup
o
Accessing APIs
Time Series Analysis
o
Smoothing
o
Seasonal Decomposition
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

ARIMA

Week 6
Data Science with Python Data Analytics Part II
Data Science with R - Machine Learning (Continued)
Numpy / Scipy
Matplotlib / Data Structures and Visualization in Pandas / Seaborn
Data Manipulation in Pandas
Lab: Oil Boilers
Cluster Analysis
o
K-Means Clustering
o
Agglomerative Clustering
o
Hierarchical Clustering
Project 3 Due: Python Web Scraping

Week 7
Data Science with R - Machine Learning (Continued)
Data Science with Python Data Analytics (Continued)
Classification
o
Feature Selection
o
Decision Trees
o
Pruning
o
Purity
o
Entropy
o
GINI
o
Random Forests
o
Bagging
o
Boosting
o
Support Vector Machines
o
Neural Networks
Lab: Simple Linear Regression from Scratch

Week 8
Data Science with R - Machine Learning (Continued)
Introduction to Natural Language Processing
Case Study: Spam Detection
Association Rules
o
Market Basket Analysis
Nave Bayes Analysis
Introduction to Natural Language Processing Part I
Introduction to Natural Language Processing Part II
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Guest Lecture


Week 9
Data Science with Python - Machine Learning
Machine Learning Recap / Linear Regression
Naive Bayes Classifiers / KNN / Logistic Regression / LDA
Cross-validation / Bootstrap / Feature Selection / Regularization / Model Selection
SVM / Decision Trees / Random Forest
Principal Components Analysis/ Kmeans / Hierarchical Clustering
Project 4 Due: Machine Learning Project (It can be a Kaggle competition, a hiring partner
project or a non-profit project from our partners)

Week 10
Big Data
Machine Learning Review
Parallel Processing: Parallel Computing in Python / Parallel Computing in R
Introduction to Hadoop:
o
Hadoop Ecosystem
o
Hadoop Data Flow
o
Introduction to the origin and functions of Hadoop
o
The principle operations of the Hadoop Distributed File System (HDFS)
Python for MapReduce:
o
The principle system and working mechanisms of MapReduce
o
MapReduce Programming
o
MapReduce with Streaming
Advanced Hadoop Applications: Hive
Spark
Machine Learning Theory Interview Questions Review Session

Week 11
Big Data (Continued)
Python Computer Science
Spark: MLlib
Introduction to Algorithms / Data Structures
Big-O notation
Sorting and Searching

Week 12
Capstone Project Presentations
From the beginning of Bootcamp, you will work on hands-on projects. Now your Capstone
Project lets you create your own data product that showcases your interests and talents.
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Students are free to use anything covered in class on this project.


Updated January 12, 2016

Potrebbero piacerti anche