Sei sulla pagina 1di 7

BMI 704 – Machine Learning

Lab
Topics
• Types of Machine Learning
• Supervised - you have data where you have observed the input variables
(features) and the outcome
• Unsupervised - you have a data but there is no observed outcome
• Introduction to Supervised Learning
• Introduction to Unsupervised Learning
• ML Pipeline
• Algorithms and Packages
Supervised Learning
• Outcome (what you are trying to predict and have observed - labels):
• Continuous or Categorical
• Features - variables you are using to do the prediction
• Measurement of how well your algorithm did?
• Regression
• R2 - amount of variance explained (1 - RSS/TSS)
• RSS always decreases as more variables are added  R-squared always increases as more
variables are added
• Generally you want to adjust for the number of features selected
• Adjusted R2 = 1 - ((RSS/(n-d-1))/(TSS(n-1) ))
• AIC/BIC/Cp
• Classification
• ROC and AUC
Unsupervised Learning
• The data given to the model is not labeled
• General overall process: pattern representation, develop a pattern proximity
measure, cluster, validation of clusters
• Clustering
• Hierarchical clustering– Build a hierarchy of clusters
• Agglomerative: A “bottom up” approach. You start with each element in a separate
cluster, then merge them according to a given property.
• Divisive: A “top down” approach. All elements start in one all-inclusive cluster, then you
split recursively.
Unsupervised Learning (con’t)
• Clustering
• Partitional methods
• K-means: partition {x1,…xn} into K clusters where K is
predefined.
• Build a new partition by associating each point with the nearest
centroid
• Compute the centroid (mean point) for each set. Repeat until
converge.
• “kmeans” function in R.
ML Pipeline
• Split your data set into train and test (80/20, 70/30)
• Build your model using the training data set (and cross validation)
• Test your model using the test data set
• Report results
Algorithms and Packages
• ML Algorithms (many, many, many!)
• Basics: linear and logistic regression
• Shrinkage Methods
• Lasso and Ridge regression
• ElasticNet
• Non-linear methods
• Spline
• Support Vector Machines
• Tree based methods
• Decision trees
• Random Forests
• Packages in R
• Individual packages for each algorithm - glmnet
• Meta packages – caret

Potrebbero piacerti anche