Sei sulla pagina 1di 38

Business Statistics SIM Semester 1

2019: Welcome -Lecture 1


• Course Coordinator
Associate Professor Ashton De Silva
Email: ashton.desilva@rmit.edu.au
• Offering Coordinator
Professor Michael Kidd
Email: michael.kidd@rmit.edu.au

• Administrative Enquiries
Ms. Kathryn Bendell email
busstats@rmit.edu.au

1
Recommended Textbook*

• Basic Business Statistics: Concepts and


Applications by Berenson et al., Fourth
Edition.
• ISBN: 978-1-4860-1895-6
• Note this edition is available on-line at no
cost.

2
Assessments:
• Assessment 1: 20%
• Eight on line quizzes throughout the semester, with the best 5 counting 20%.
Further details will be provided on the subject webpage.
• Assessment 2: 40%
• An authentic Individual/Group assignment comprising multiple parts form 40% of
this course’s assessment. Details will be made available on the subject website in
due course.
• Assessment 3: 40%
• The final exam may consist of a mix of multiple-choice and short answer questions
and will be 2 hours long, with a 15 minute reading period. The questions will be
drawn from the course material covered in lectures, workshop/tutorials/computer
workshop in the relevant parts of the textbook. The workshop/tutorial/computer
workshop questions provide the best insight into the nature of the exam questions.

**Important to note: Assessment will examine interpretation and intuition as well as


your ability to use formula correctly to calculate various tests etc. Interpretation and
intuition are higher level skills and will be useful long after you leave University.

3
Teaching Schedule
• Please note, this is a guide only. Topics may vary in
order/combination. (In bold-is what I will cover)
• Basic Concepts & Numerical Descriptive Statistics
• Basic Probability & Decision Making
• Discrete and Continuous Probability Distributions
• Sampling Distributions
• Confidence Interval Estimation
• Hypothesis Testing
• Relationships between 2 variables, Simple Linear
Regression
• Multiple Regression
• Introduction to Time Series and Forecasting

4
Chapter 1
Introduction
and data
collection

PowerPoint to accompany:

Cover illustration: © Raw Pixel/Shutterstock.com


Chapter 2
Presenting
data in
tables and
charts

PowerPoint to accompany:

Cover illustration: © Raw Pixel/Shutterstock.com


Chapter 3
Numerical
descriptive
measures

PowerPoint to accompany:

Cover illustration: © Raw Pixel/Shutterstock.com


These 3 chapters are very introductory

• I suggest you skim through all 3 chapters.


• The 5 key ideas in Chapter 1&2 are:
(1) The distinction between a Sample and the
Population. This is possibly THE most important
idea in the entire course.
(2) This course focusses on inferential statistics-
in essence – it is the branch of statistics that draws
conclusions about the population based on a
sample of data. We do this via hypothesis testing.
8
Key ideas continued:
(3) Types of data: Categorical versus Numerical
and Discrete versus Continuous
• <ignore levels of measurement and scale>
(4) Chapter 2- how can we
summarise/present/graph different types of data.
Can do this with/out Excel. <practice in 1st
workshop>
(5) Chapter 3 Numerical descriptive statistics of a
single variable- key measures refer to central
tendency and variation.
9
So lets take a look at the key ideas:

10
Key Definitions
• A population consists of all the members of a
group about which you want to draw a conclusion

• A sample is the portion of the population


selected for analysis

• A parameter is a numerical measure that


describes a characteristic of a population

• A statistic is a numerical measure that describes a


characteristic of a sample

11
1.Population vs. Sample

Population Sample

All people who are A sample of


aged over 18 drinkers on
resident in Orchard
Singapore Road

Measures used to Measures calculated


describe a population from sample data are
are called parameters called statistics
12
Lets say we want to work out the average adult
income of Singapore residents
• So we are interested in the population mean of
adult income of Singapore residents
• How would you go about finding that population
mean (µ) {Greek letter pronounced mu?}
• Send out a census form to all Singaporean
residents, ask them their income.
• Calculate: µ= (sum of each resident’s income)/N
the no. of residents
13
In reality we are virtually never going to have
access to the population

• That’s why we want to take a sample (preferably


a random sample) and find the sample mean

X
 X
n
• The sample mean will inform us about the
population mean. What this really means will be
explained later. Note the symbol ∑ just means
add up the terms. (some quick example)
14
What about if we are interested in the average
adult female wage in Singapore? (that is mu)

Sample – visit 10 top companies in Singapore and


interview each female.
Again not a random sample- sample not likely to be
representative of the Singapore female population.

15
2. Inferential statistics

• THE Key idea: Draw conclusions about a


population based on sample results.
• Estimation
e.g. Estimate the population mean weight (parameter) for
females in Singapore using the sample mean weight (statistic)

• Hypothesis test:
Critical Question:
Will you test something about sample mean or the
population mean weight? What do we know/
don’t know?
16
• Hypothesis test: Test the claim/hypothesis that
the population mean female Singaporean weight
is 120 kilos. What do you think?
• Intuition for the future- if your observed sample
mean is a long way away from the hypothesised
population mean – we will likely reject the
hypothesis! That is a key idea in the course.
• (The obvious question to ask tho’ is what do we
mean by a long way a way?)-Much later!
17
3. Types of Data

Data

Categorical Numerical
(Defined categories) (Quantitative)

Discrete Continuous
(Counted items) (Measured
characteristics)
18
Types of Data (continued)

• Categorical
• Simply classifies data into categories (e.g. marital
status, hair colour)
• Numerical (Discrete)
• Counted items – finite number of items (e.g.
number of children, number of people who have
type-O blood)
• Numerical (Continuous)
• Measured characteristics – infinite number of
items (e.g. weight, height)
19
Learning Objectives

•After studying chapter 2 you should be able to: Much


of this will be covered in workshop/tutorial plus self
study
1 describe the distribution of a single categorical
variable using tables and charts
2 describe the distribution of a single numerical
variable using tables and graphs
3 describe the relationship between two categorical
variables using contingency tables
4 describe the relationship between two numerical
variables using scatter diagrams and time-series plots
5 correctly present data in graphs
20
Tables and Charts for Categorical Data

Categorical
Data

Graphing
Data

Summary Bar Pie


Table Charts Charts

21
4. Tables and Charts for Numerical Data

Numerical
Data

Frequency
Ordered Distributions
Array Cumulative
Distributions

Stem-and-
Histogram Polygon Ogive
leaf
Display

22
A few pointers:
• Very easy to derive graphs and histograms in
Excel. The 1st workshop/tutorial we can do
simple examples.
• For numerical data there are 2 points I’d like to
make.
(1) Ordered array- just means you order the sample
of data from lowest to highest. Stem and leaf plot
uses the ordered array and divides into stem and
leaf e.g. if smallest value was 8 and highest value
was 79. Then stem is how many 10’s are there
(stem) and then the leaf is what’s left over.

23
More pointers:
• When drawing a histogram, the widths of the
intervals should be equal.
• The intervals cannot overlap
• Every data point must be in 1 and only 1
interval
• Approximate width of interval = range/No. of
intervals
• Be very careful!
• Finally also be careful about excel’s treatment
of boundary points in a histogram
24
Finally:
• Categorical data be aware of contingency
tables for multivariate data
• For numerical data – we have scatter plot and
time series plots
• All pretty easy in excel.
• BIGgest issue is to consider how best to
present the data!
You will need to practice in your own time and
during workshop/tutorial.
25
Learning Objectives
• After studying chapter 3 you should be able to:
1 calculate and interpret numerical descriptive
measures of central tendency, variation and
shape for numerical data
2 calculate and interpret descriptive statistics
3 summary measures for a population (extremely
briefly)
4 construct and interpret a box-and-whisker plot
**At this point we are going to focus solely on a
single variable

26
Describing numerical data: a single variable only
and our focus is on the sample

• Central tendency: mode, median , mean


• Variation: variance, standard deviation, range,
interquartile range.
• Shape: skewness

In terms of the course the mean and standard


deviation are the MOST important. They will
crop up many times.
27
More details..

• I am assuming you know how to calculate the


mean, mode and median.
• Briefly: mean= (sum up all the data values)/no. of
data points in your sample
• Mode= the value that occurs most frequently
(can have more than 1 mode)
• Median if we order the data from low to high,
and there are n observations- the position of the
median is the data point in the middle i.e.
it’s the (n+1)/2 ranked value. (if n=odd median is
right in the middle, if n=even average the middle 2
data points).
28
Continued:
• One final point on central tendency. One problem
with the mean as a measure of central tendency
is it is sensitive to outliers (large or small values).
The median is NOT sensitive.
• Variation: Variance, the standard deviation is just
the square root of the variance

29
Measuring variation

Small variance/standard
deviation

Large variance/standard
deviation

30
The Sample Variance – S2
• Measures average scatter around the mean
• Units are also squared

 (X  X)
i
2 where
X = mean
S 
2 i1
n = sample size
n -1 Xi = ith value of the
variable X

31
• The key to calculating the variance is to take a
deep breath and then do it step by step.
• Step 1: Write down each value of x in your data in
a column.(call it column A)
• Step 2: calculate the sample mean
• Step 3: in a new column (call it column B)
subtract the mean from each value of x and thus
create a new column (as a check sum the column-
what value should you find?)
• Step 4: create a new column (call it column C) let
it equal column B squared.
• Step 5 sum all the nos in column C. Divide this
sum by sample size n minus 1. You have the
sample variance! Take the square root it’s the
sample standard deviation.
32
For completeness just a couple of other fairly
common ways of summarising the distribution:
• Range= max-min
• Interquartile range is Q3-Q1 i.e. 3rd and 1st
quartiles. (Note Q2 is the median)-so you
should get the basic idea.
• Position of the 3rd quartile is ¾(n+1)
• Position of the 1st quartile is 1/4(n+1)
• Be careful not to mix up the value of a quartile
and the position

33
Distribution Shape and Box-and-whisker Plot

Left-skewed Symmetric Right-skewed

Q1 Q2Q3 Q1Q2Q3 Q1 Q2 Q3

34
• Final point if the left hand box is wider than
the right hand box THEN distribution is left
skewed. Etc.
• Everything so far is about the sample. Next is
the population- not particularly important-
why not?
• but here are the formula.

35
Numerical Measures for a Population

• For completeness I present population


parameters (virtually never calculate these).

• Population summary measures are called


parameters

• The population mean is the sum of the values in


the population divided by the population size, N
N

X i
X1  X2    XN
 i1

N N 36
Population Variance vs. Standard Deviation

Population Variance: N

• the average of the squared  i


(X  μ) 2

deviations of values from σ2  i1


the mean N

μ = population mean; N = population size; Xi = ith value of the


variable X
Population Standard Deviation:
• shows variation about the mean
N
• is the square root of the
population variance  (X  μ)
i
2

• has the same units as the σ i1

original data N
37
Some extra bits and pieces
These 2 videos are a bit of light relief!
A bit silly……

Stem and leaf video:


https://www.youtube.com/watch?v=yjhhbEApr
p8

Mean, Median, Mode video:


https://www.youtube.com/watch?v=5C9LBF3b6
5s
38

Potrebbero piacerti anche