Sei sulla pagina 1di 4

International Management Institute, Delhi

Course outline: Data Preprocessing for Analytics (IM 503)


Credit: 3, Core
Area: Information Management
PGDM
Term II (September- December 2019)
_________________________________________________________________________________
Instructor(s) Prof. Prerna Lal Prof. Himanshu Joshi Prof. Santanu Das
Name
Room no. 207 312
Email prernalal@imi.edu himanshu@imi.edu santanu@imibh.edu.in
Phone (Extn no.) 47194127/127 47194119/119
Consultation Thursday 2:00 pm – Monday, 2:00 pm –
Hours 4:00 pm 4:00 pm
_________________________________________________________________________________
1. Course Description

Data pre-processing is an important step in analytics. Data gathered from various sources in an
organization are often loosely controlled, resulting in out-of-range values, impossible data
combinations, missing values, etc. There may be data which may be irrelevant and redundant
or noisy and unreliable. Analysing such data can produce misleading results. Thus, the
representation and quality of data is first and foremost before applying any machine learning
technique.
Data pre-processing includes cleaning, Instance selection, normalization, transformation,
feature extraction and selection, etc. The product of data pre-processing is the final training set.
Data pre-processing may affect the way in which outcomes of the final data processing can be
interpreted.
There are different methods used to identify trends and patterns in the processed data. One of
the most valuable tools is data visualization. Regardless of industry or size, data visualization
is emerging as an important concept in all types of businesses to help make sense of their data.
Thus, data visualization is seen as an important skill for all managers.
2. Course Objectives

Objectives of Data Pre-processing for Analytics course is to ensure that students should be
able to
1. Utilize data visualization tools to uncover insights and communicate it as a story
2. Identify the importance of data preparation in Analytics
3. Utilize R Programming Language for data pre-processing
4. Identify the meaning and aspects of feature engineering and apply it using R
programming
3. Mapping between Course objectives and Program Objectives

Program Objectives

Objective 1

Objective 2

Objective 3

Objective 4
Course

Course

Course

Course
Objective 1.1
Student should be able to write well organized and
grammatically correct business reports and letters.

Objective 1.2
Student should be able to make effective oral presentations.

Objective 2.1 √ √ √ √
Student should be able to demonstrate critical thinking skills by
understanding the issues, evaluating alternatives on the basis of
multiple perspectives and presenting a solution including
conclusions and implications.

Objective 2.2 √ √ √ √
Student should be to demonstrate problem solving skills by
understanding and defining the problem, analyzing it and solving
it by applying appropriate theories, tools and techniques from
various functional areas of management.

Objective 3.1
Student should be able to illustrate the role of responsible
leadership in management.

Objective 3.2
Student should be able to identify social concerns and ethical
issues in management.

Objective 4.1
Student should be able to identify challenges faced by the
organization at the global level.

Objective 4.2 √
Student should be able to take decisions in the global business
environment.

4. Pedagogy

The course will be taught with a blend of presentations, interactive lectures and discussions. It will be
supplemented by assignments, and practical exercises.

5. Evaluation criteria

The final Grade will be calculated as follows:

Assignment : 15 %
Quiz : 15 %
Project : 30 %
End Term exam : 40 %
6. Resources

Text Book
• R for Everyone: Advanced Analytics and Graphics (Addison-Wesley Data & Analytics
Series) 2nd Edition, by Jared P. Lander

References:
• https://cran.r-project.org/
• https://www.computerworld.com/article/2497143/business-intelligence-beginner-s-guide-to-r-
introduction.html
• R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition, by
Hadley Wickham (Author), Garrett Grolemund https://r4ds.had.co.nz/
• Hands-On Programming with R: Write Your Own Functions and Simulations 1st Edition, by
Garrett Grolemund
• The Art of R Programming: A Tour of Statistical Software Design 1st Edition, by Norman
Matloff
• Machine Learning with R: Expert techniques for predictive modeling to solve all your data
analysis problems, 2nd Edition 2nd Edition, by Brett Lantz
• Beginning R: The Statistical Programming Language 1st Edition, by Mark Gardener
• R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'reilly
Cookbooks) 1st Edition, by Paul Teetor

7. Session Plan

Session Topic Content


1 Story Telling Using Data • What is story telling?
• Understanding the context
• Choosing an appropriate visual
• Tell a story – Communicate it to the
audience
2-3-4 Data Visualization using SAP • Understanding data sources
Discovery and Tableau • Sourcing and Joining Data
• Use of Filters and Formatting
• Understanding types of visualization
(Piecharts, Barcharts, Treemaps,
Scatterplots, Geo-maps, Storylines…)
• Building dashboards using Tableau
5 Publishing your Visualization • Understanding Tableau Public
• Avoiding Common Pitfalls and Sharing
Visualization
6 Framing a Business Problem • Identify Business problem
• Analyze whether the business problem is
amenable to an analytics solution
• Identify data available for the analytics.
(data source, data type)
7-10 Introduction to R • Downloading and installing R
• Running R program
• Running and Manipulating Command
packages
• Understanding data structures in R
o Vectors, Matrices, Arrays, Data
frames, Factors, Lists
• Various operations on different data
structures

11 • Conditional statements
• Loops
12 Data Creation Importing data from various formats:
• CSV, Delimited text files, Excel etc.

13 - 16 Data Management • Creating new variables


• Recoding variables
• Renaming variables
• Data type conversion
• Sorting data
• Merging datasets
o Adding columns
o Adding rows
• Subsetting datasets
o Selecting variables
o Selecting observations
o Random samples
17 – 18 Exploratory data Analysis • Missing value
• Outliers Analysis
• Descriptive statistics
19 - 20 Graphical Analysis Types
• Continuous data • Strip Charts
• Discrete Data • Scatter Plot
• Histograms
• Box Plots
• Density plots
• Bar Plot
• Mosaic Plot
Features
• Multiple datasets on One plot
• Multiple graphs on one Image
• Pairwise relationship

8. Academic Integrity

a) Plagiarism is the use of or presentation of ideas, works that are not one’s own and which are
not common knowledge, without granting credit to the originator. Plagiarism is unacceptable
in IMI and will invite penalty. Type and extent of penalty will be at the discretion of the
concerned faculty.

b) Cheating means using written, verbal or electronic sources of aid during an examination/ quiz/
assignment or providing such assistance to other students (except in cases where it is expressly
permitted by the faculty). It also includes providing false data or references/list of sources which
either do not exist or have not been used, having another individual write your paper or
assignment or purchasing a paper for one’s own submission. Cheating is strictly prohibited at
IMI and will invite penalty as per policies of the Institute.

Potrebbero piacerti anche