Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
(ID6020 Module)
Rahul R. Marathe
Department of Management Studies
Introduction: Why?
Numbers everywhere! -- Last year, ID6020 had 386 students registered. This year the number is 405. -- Average time required to complete a typical catalysis experiment under laboratory conditions is 34.7. Successful professionals are those who can make sense of these numbers. In todays world, it is more the case of information overload too much data! It is our job to make this data tell us a story! 2 Sort out what is important and what is not!
Introduction: Why?
Whether you will be audited by income tax authorities depends a lot on sampling techniques used by the IT department, and also on you hitting certain numerical signals. The urban traffic planning is done using the data collected from various locations in a city. Market research firms use statistical techniques on point-of-sale data to understand buyer behavior. Suitability of a drug is decided by analyzing the field data collected from trials conducted. Thats why every professional should know these 3
Introduction: Why?
Data analysis done traditionally through Statistical techniques; in recent times, we call this Data Analytics. Today, data analytics encompasses areas like: Statistics (uni- and multi- variate), Probability theory, Stochastic processes, Computational methods, Optimization techniques, Data mining, Artificial Intelligence, Econometrics, Numerical techniques, Simulation.. Data analysis Understanding the story told by the numbers!
4
Introduction: Why?
Very likely, your research will involve data collection and analysis. Data could be experimental (most engineering applications), or secondary data (from surveys humanities and management). Data collection and analyses require deep understanding of theory and techniques of data analytics. Your research area itself could be data analytics. You certainly require good understanding of theory and techniques!
5
Introduction: Data
Data: Any related observations. A collection of data is the data set and single observation is data point. Data can be collected by: 1. Observations of incidences occurring (direct recording) 2. Surveys (and sampling) 3. Conducting experiments etc. Data collection is the most important step. Because, if the collected data is not correct, analyses and conclusions are incorrect and
Data collection
Before relying on any data, test the data by asking: Where did the data come from? Is the source biased? Do the data support or contradict other evidence we have? Is the evidence missing that might cause us to come to a different conclusion? How many observations do we have? Do they represent all the groups we wish to study? Are the conclusions logical? Have we made conclusions that are not supported by data?
7
Ordinal
Qualitativ e
Interval
Quantitati Rank and ve distance from arbitrary zero Quantitati Interval + ratio ve with a meaning
Ratio
10
Quick check
Can variables with nominal scale be quantitative? Yes or No. No Nominal scale has categories. Categories are for qualitative data. Can variables with ordinal scale be qualitative? Yes or No. Could be qualitative; could be quantitative. So yes! Can nominal or ordinal scale be continuous? Yes or No. No! Nominal or ordinal scale is for categorical data. Categorical variables are discrete. Can interval scale be continuous and/or discrete? 11 Yes or No.
12
Example
Pressur e Current 12.1 4 12.5 3.9 12.9 4.11 13.4 4.4 14.9 2.01
13
Example (cont.)
Pressur e Current 12.1 4 12.5 3.9 12.9 4.11 13.4 4.4 14.9 2.01 14 3.7 14.8 2.75 11.8 3.45 14.65 2.68 14.2 2.9
14
15
16
Summary of data
Describe the data in graphical or statistical way: Some of commonly used graphical tools Frequency distribution tables; Line charts; Histogram; Higher dimensional plots; Scatter plot Use of summary statistics Measures of central tendency (measures of location) Examples? Measures of dispersion (extent of scatter) Examples? Measure of symmetry (skewness) Etc.
17
Bangalore Central Ahmedab ad Mumbai Bareily Gangtok Multiple Multiple Beams Central
19
Questions to ask
Want to know: Reasons for failure Also: factors that may contribute to failure Is the data valid? Is the data sufficient? Can the conclusions be extrapolated? Possible methodology: Clustering algorithms. Interpretation depends on whether you look at this problem as a civil engineer, management researcher, or a computer scientist!
20
Example: Regression
22
23
24
25
27
28
Data analyses
Dos: Apply the correct analysis technique Understand the assumptions of the method Enter the data in the selected technique correctly Use the correct equations/software Be very careful about the conclusions you draw. Donts: Try each and every technique to decide which looks good. Get fooled by jazzy graphs and colors. Extrapolate results and conclusions.
29
Final word
Data analyses skills are extremely important and useful. Every researcher is going to require these skills at some point or the other. Equip yourself with these techniques and you are better prepared for the battle of logic. These weapons in your armory have to be used carefully, and after knowing their capabilities (and limitations). Dont make the mistake of beating everything with the same stick different demons require different tools!
30
rrmarathe_at_iitm.ac.in