Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Syllabus
Week 1: Week 2:
Python Eco-system for Data Analytics Data Exploration & Visualization
- 1. What is data scientist - 1. Python basic visualization
- 2. Numpy, scipy, pandas, ipython book - 2. OOP concept and plotting principle
- 3. Basic analytics using pandas - 3. Advanced visualization
- 4. Case study using pandas - 4. Exploratory data analysis
Machine Learning Algorithm-1 Machine Learning Algorithm-2
– Brief Introduction to Machine Learning Algorithm – SVM Classifiers
Week 3: Week 4:
Scikit Learn Eco-system for Machine Learning Regression & Classification
- 1. Machine learning introduction - 1. Regularization (Lasso, Ridge, Elastic-Net)
- 2. Scikit Learn package - 2. Basic classification model (Logistic, Tree, SVM)
- 3. Basic regression model - 3. Model measurement (classification)
- 4. Cross validation & model measurement - 4. Bias variance trade off
(regression)
Machine Learning Algorithm-3 Machine Learning Algorithm-4
– ANN – Decision Tree
Week 5: Week 6:
Dimension Reduction & Unsupervised Learning Data Analysis using Hadoop Hive
- 1. Model feature selection - 1. Hadoop ecosystem introduction
- 2. Principle component analysis - 2. HDFS
- 3. Clustering analysis (K-mean, KBSCAN, etc) - 3. Data analysis with Hive
- 4. Other ML methodology (ensemble method,
random forest, etc)
Data Analysis using Python-1 Data Analysis using Python-2
– Machine Learning Algorithms Implementation – Web Crawler
Week 7: Week 8:
Data Analysis using Apache Pig Data Processing using Spark SQL and data frame
– 1. Pig Introduction – 1. Spark SQL
– 2. Pig Latin language (continue at the next page) – 2. Spark data frame (continue at the next page)