Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data pre-processing is an important step in analytics. Data gathered from various sources in an
organization are often loosely controlled, resulting in out-of-range values, impossible data
combinations, missing values, etc. There may be data which may be irrelevant and redundant
or noisy and unreliable. Analysing such data can produce misleading results. Thus, the
representation and quality of data is first and foremost before applying any machine learning
technique.
Data pre-processing includes cleaning, Instance selection, normalization, transformation,
feature extraction and selection, etc. The product of data pre-processing is the final training set.
Data pre-processing may affect the way in which outcomes of the final data processing can be
interpreted.
There are different methods used to identify trends and patterns in the processed data. One of
the most valuable tools is data visualization. Regardless of industry or size, data visualization
is emerging as an important concept in all types of businesses to help make sense of their data.
Thus, data visualization is seen as an important skill for all managers.
2. Course Objectives
Objectives of Data Pre-processing for Analytics course is to ensure that students should be
able to
1. Utilize data visualization tools to uncover insights and communicate it as a story
2. Identify the importance of data preparation in Analytics
3. Utilize R Programming Language for data pre-processing
4. Identify the meaning and aspects of feature engineering and apply it using R
programming
3. Mapping between Course objectives and Program Objectives
Program Objectives
Objective 1
Objective 2
Objective 3
Objective 4
Course
Course
Course
Course
Objective 1.1
Student should be able to write well organized and
grammatically correct business reports and letters.
Objective 1.2
Student should be able to make effective oral presentations.
Objective 2.1 √ √ √ √
Student should be able to demonstrate critical thinking skills by
understanding the issues, evaluating alternatives on the basis of
multiple perspectives and presenting a solution including
conclusions and implications.
Objective 2.2 √ √ √ √
Student should be to demonstrate problem solving skills by
understanding and defining the problem, analyzing it and solving
it by applying appropriate theories, tools and techniques from
various functional areas of management.
Objective 3.1
Student should be able to illustrate the role of responsible
leadership in management.
Objective 3.2
Student should be able to identify social concerns and ethical
issues in management.
Objective 4.1
Student should be able to identify challenges faced by the
organization at the global level.
Objective 4.2 √
Student should be able to take decisions in the global business
environment.
4. Pedagogy
The course will be taught with a blend of presentations, interactive lectures and discussions. It will be
supplemented by assignments, and practical exercises.
5. Evaluation criteria
Assignment : 15 %
Quiz : 15 %
Project : 30 %
End Term exam : 40 %
6. Resources
Text Book
• R for Everyone: Advanced Analytics and Graphics (Addison-Wesley Data & Analytics
Series) 2nd Edition, by Jared P. Lander
References:
• https://cran.r-project.org/
• https://www.computerworld.com/article/2497143/business-intelligence-beginner-s-guide-to-r-
introduction.html
• R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition, by
Hadley Wickham (Author), Garrett Grolemund https://r4ds.had.co.nz/
• Hands-On Programming with R: Write Your Own Functions and Simulations 1st Edition, by
Garrett Grolemund
• The Art of R Programming: A Tour of Statistical Software Design 1st Edition, by Norman
Matloff
• Machine Learning with R: Expert techniques for predictive modeling to solve all your data
analysis problems, 2nd Edition 2nd Edition, by Brett Lantz
• Beginning R: The Statistical Programming Language 1st Edition, by Mark Gardener
• R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'reilly
Cookbooks) 1st Edition, by Paul Teetor
7. Session Plan
11 • Conditional statements
• Loops
12 Data Creation Importing data from various formats:
• CSV, Delimited text files, Excel etc.
8. Academic Integrity
a) Plagiarism is the use of or presentation of ideas, works that are not one’s own and which are
not common knowledge, without granting credit to the originator. Plagiarism is unacceptable
in IMI and will invite penalty. Type and extent of penalty will be at the discretion of the
concerned faculty.
b) Cheating means using written, verbal or electronic sources of aid during an examination/ quiz/
assignment or providing such assistance to other students (except in cases where it is expressly
permitted by the faculty). It also includes providing false data or references/list of sources which
either do not exist or have not been used, having another individual write your paper or
assignment or purchasing a paper for one’s own submission. Cheating is strictly prohibited at
IMI and will invite penalty as per policies of the Institute.