Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Week
1
Data
Science
Toolkit
Linux,
Git,
Bash,
and
SQL
Data
Science
with
R
Data
Analytics
Part
I
Linux
system
o Introduce
Linux
environment
o Learn
Linux
commands
o IO
redirection
and
Pipe
o Introduce
server-side
Linux
usage
Git
o Introduce
modern
source
code
management
o Learn
common
git
operations
o Setup
github
and
personal
portfolio
page
Other
server
related
topics
o Text
editors
and
IDEs
o ssh:
how
to
communicate
with
a
remote
server
o Linux
environment
variables
SQL
o Introduction
to
relational
database
o Introduction
to
structured
query
language
o SQL
major
commands
and
examples
Programming
foundation
in
R
I
o Syntax
o Data
object:
Vectors,
Matrices,
Data
Frames,
and
Lists
o Common
functions
o Rstudio
environment
and
package
management
o Local
data
input/output
o Introduction
to
R
data
visualization
Programming
foundation
in
R
II
o Data
sorting
and
merging
o String
manipulation
o Dates
and
times
o Connecting
to
an
external
database
Data
manipulation
with
dplyr
o Tables
in
R
o Join
o Subset
o Advanced
manipulations
with
dplyr
Week
2
Data
Science
with
R
Data
Analytics
Part
II
Data
Visualization
with
"ggplot2"
o Histogram
o Point
graphics
o Columnar
graphics
o Line
charts
o Pie
charts
o Box
plots
o Scatter
plots
o Visualizing
multivariate
data
o Matrix-based
visualizations
o Maps
Introduction
to
Shiny
o Shiny
introduction
o Design
the
User-interface
o Control
widgets
o Build
reactive
output
o Use
data
table
in
Shiny
Apps
o Use
R
scripts,
data
and
packages
o UI
and
server
for
the
App
o Make
Shiny
perform
quickly
o Matrix-based
visualizations
o Use
reactive
expressions
o Share
and
deploy
Shiny
apps
Lab:
Moneyball
Project
1
Due:
Exploratory
Data
Visualization
Week
3
Data
Science
with
R
-
Machine
Learning
Part
I
Foundations
of
Statistics
o
Descriptive
Statistics
Measures of Centrality
Measures of Variability
Correlation
o
Hypothesis
Testing
F-test
One-way
ANOVA
Updated
January
12,
2016
X2
Test
of
Independence
o
Introduction
to
Machine
Learning
Supervised Learning
Regression
Classification
Unsupervised Learning
Clustering
Dimension Reduction
MCAR
MAR
MNAR
o
Basic
Methods
of
Imputation
Regression
Prediction
o
K-Nearest
Neighbors
Voronoi Tessellations
Distance Measures
Linear
Regression
I
o
Simple
Linear
Regression
Power Transformation
Box-Cox
Transformation
o
The
Coefficient
of
Determination
R2
Linear
Regression
II
o
Multiple
Linear
Regression
Interactions
Higher-Order
Terms
o
o
Week
4
Data
Science
with
R
-
Machine
Learning
Part
II
Lab:
Building
Bridges
Generalized
Linear
Models
o
Logistic
Regression
The
Curse
of
Dimensionality
o
Ridge
Regression
o
Lasso
Regression
o
Cross-Validation
o
Bias/Variance
Tradeoff
o
Density
o
Principal
Component
Analysis
The
Curse
of
Dimensionality
o
Density
o
Principal
Components
Analysis
Guest
Lecture
Project
2
Due:
R
Shiny
Interactive
Applications
Week
5
Data
Science
with
Python
-
Data
Analytics
Part
I
Data
Science
with
R
Machine
Learning
(Continued)
Python
Programming
Language
I
o
Overview
of
syntax
o
Built-in
functions
o
Data
structures
o
Standard
libraries
o
Object
oriented
programming
Python
Programming
Language
II
o
List
comprehension
o
Data
copy
o
Introduction
to
algorithm
concepts
String
Processing
/
Regular
Expressions
o
Regular
expressions
o
Web
scraping:
Ajax,
XPath,
Beautiful
Soup
o
Accessing
APIs
Time
Series
Analysis
o
Smoothing
o
Seasonal
Decomposition
Updated
January
12,
2016
ARIMA
Week
6
Data
Science
with
Python
Data
Analytics
Part
II
Data
Science
with
R
-
Machine
Learning
(Continued)
Numpy
/
Scipy
Matplotlib
/
Data
Structures
and
Visualization
in
Pandas
/
Seaborn
Data
Manipulation
in
Pandas
Lab:
Oil
Boilers
Cluster
Analysis
o
K-Means
Clustering
o
Agglomerative
Clustering
o
Hierarchical
Clustering
Project
3
Due:
Python
Web
Scraping
Week
7
Data
Science
with
R
-
Machine
Learning
(Continued)
Data
Science
with
Python
Data
Analytics
(Continued)
Classification
o
Feature
Selection
o
Decision
Trees
o
Pruning
o
Purity
o
Entropy
o
GINI
o
Random
Forests
o
Bagging
o
Boosting
o
Support
Vector
Machines
o
Neural
Networks
Lab:
Simple
Linear
Regression
from
Scratch
Week
8
Data
Science
with
R
-
Machine
Learning
(Continued)
Introduction
to
Natural
Language
Processing
Case
Study:
Spam
Detection
Association
Rules
o
Market
Basket
Analysis
Nave
Bayes
Analysis
Introduction
to
Natural
Language
Processing
Part
I
Introduction
to
Natural
Language
Processing
Part
II
Updated
January
12,
2016
Guest Lecture
Week
9
Data
Science
with
Python
-
Machine
Learning
Machine
Learning
Recap
/
Linear
Regression
Naive
Bayes
Classifiers
/
KNN
/
Logistic
Regression
/
LDA
Cross-validation
/
Bootstrap
/
Feature
Selection
/
Regularization
/
Model
Selection
SVM
/
Decision
Trees
/
Random
Forest
Principal
Components
Analysis/
Kmeans
/
Hierarchical
Clustering
Project
4
Due:
Machine
Learning
Project
(It
can
be
a
Kaggle
competition,
a
hiring
partner
project
or
a
non-profit
project
from
our
partners)
Week
10
Big
Data
Machine
Learning
Review
Parallel
Processing:
Parallel
Computing
in
Python
/
Parallel
Computing
in
R
Introduction
to
Hadoop:
o
Hadoop
Ecosystem
o
Hadoop
Data
Flow
o
Introduction
to
the
origin
and
functions
of
Hadoop
o
The
principle
operations
of
the
Hadoop
Distributed
File
System
(HDFS)
Python
for
MapReduce:
o
The
principle
system
and
working
mechanisms
of
MapReduce
o
MapReduce
Programming
o
MapReduce
with
Streaming
Advanced
Hadoop
Applications:
Hive
Spark
Machine
Learning
Theory
Interview
Questions
Review
Session
Week
11
Big
Data
(Continued)
Python
Computer
Science
Spark:
MLlib
Introduction
to
Algorithms
/
Data
Structures
Big-O
notation
Sorting
and
Searching
Week
12
Capstone
Project
Presentations
From
the
beginning
of
Bootcamp,
you
will
work
on
hands-on
projects.
Now
your
Capstone
Project
lets
you
create
your
own
data
product
that
showcases
your
interests
and
talents.
Updated
January
12,
2016