Sei sulla pagina 1di 6

0

Use business stat for decision making:


1.) Identify the target customer
2.) Collect data
3.) Analysis
4.) Conclusion

Census: the process of collecting the population of all measurements


Sampling method:
Sample with replacement: the unit is place back
Sample without replacement
Random sample:
On that selection every unit has the same chance
Finite: fixed
Infinite: unlimited

1A
Types of Variable
1.) Categorical/ Qualitative
--can’t express with numbers --have gaps in charts
--Bar Chart, Pie Chart
E.g. what kind of dessert you like? Ice cream, mango pudding
a.) Nominal
--not ordered and no rank --Identifier or name
--Example: gender, car type
b.) Ordinal
--the order matters but not the difference between values
--Rank-order categories --Ranks are relative to each other
--Example: Low (1), moderate (2) or high (3) risk
2.) Numeric/Quantitative
--Can be express with numbers (quantity) --have NO gaps (have continual relationship)
--Histogram, Polygon, Scatter Plot
E.g. how many instant noodles the supermarket sold yesterday? 100, 200, 300
a.) Interval􏰃
--difference between two values is meaningful
--On a numerical scale with an arbitrary zero point / origin --Can add or subtract
--Example: Temperature (80 degree F is not twice as warm as 40 degree F)
The mutiplication means nothing 180-90=90=/ 180/2
b.) Ratio
--Measurements are on a numerical scale with a meaningful zero point􏰄
--Zero means “none” or “nothing”
--Values can be compared in terms of their interval and ratio
--$30 is $20 more than $10
--In business and finance, most quantitative variables are ratio variables
--Examples: Earnings, profit, loss, age, distance, height, weight
Frequency Distribution:
--A grouping of data into different categories
--Steps:
1.) decide on the number of classes
2.) Determine the class interval or width
3.) Set the individual class boundaries
4.) Count the number in each class
Relative Frequency:
--the proportion of items in each class
--Divide the frequency of that class by the total number of observation

Percent Frequency:
--Multiply relative frequency by 100
Bar Chart:
--a vertical or horizontal rectangle represents the frequency for each category
--can be frequency, relative frequency or percent frequency
Pie Chart:
--a circle where the size represent the relative or percent frequency
Frequency distribution table:
1.) Arrange raw data in ascending
2.) Find range
3.) Select number of classes
4.) Class intervals (Range/ number of classes)
5.) Determine class bounderies
6.) Compute class midpoint
Histogram:
--It can show:
--the highest and the lowest
--the distribution of the graph
--the concentration of the graph (the highest 2 bars)
--the sample size
--No gaps
--The larger the number of bins, the less obvious the graph (more evenly distributed)
Draw a Histogram:
--Choose the classes
--Choose origin
--Choose Bin width (the number of bins)
--compute the number of observations
--the height of the rectangle=relative frequency/percentage frequency
Frequency Polygon:
--Plot a point above each class midpoint at a height equal to the frequency, then connect the points
--Useful to compare two or more distributions
Scatter Plots:
--Study relationship between two variables (one on x-axis, one on y-axis)
--plot graph(1a34)
The Normal Curve: (1a35)
--bell-shaped --Symmetric
--Skewness:
--not symmetrical about the centre
--left/right skewed
1B
Population Parameter: number calculated from population measurements that describes some population
Sample statistic: a number calculated using sample measurement that describes some aspects of sample

Measures of Central Tendency:


-mean: will be affected by extreme values
-median: a value on the 50% of all measurements, not affected by extreme values
-mode:
--the value that occurs most frequently
--not affected by extreme values
--Two modes= bimodal -->2 modes= multimodal --{0,0,1,1,2,2}, {0,1,2,3,4)=no mode
Sample mean is a point estimate of population mean
If two distributions have the same mean, median and mode, they may not be the same, the distribution
may be different

Measures of Variation:
-Range: largest-smallest, sensitive to extreme values
-Interquartile range: 3rd-1st quartile, can eliminate outliers problems
-SD and Variance:
-Value far from the mean are given weight (squared)
-Less sensitive to extreme values than range, more sensitive than interquartile range
-Variance:
-same unit as SD
-Formulas
-Average of the squared deviations of individual measurements from the mean
--Population Variance
--Sample Variance
-Standard Deviation:
-The larger the SD, the more spread-out the data set
-Measure the risk of holding a security
-Coefficient of Variation (CV)
-measure the size of the SD relative to the size of mean
- (SD/Mean) x100%
-compare the relative variability of values about the mean
-compare two or more sets of data measured in different units
-Measure Risk
-The smaller the CV, the less the risk and return trade-off, the better the investment

Measure of relative location:


Z-scores (Standard Score)
--indicate the relative location of a value within a population
--If mean=0,SD=1, =>Standardized value
--Can be used to detect outlier (Z-score above 3/below -3)
Boxplot:
--show median, quartile, outlier, skew distribution
--when consider the skew distribution, ignore the outliers (dots outside the box)
When comparing groups:
--look for patterns, difference and trends
--can use histogram,
but boxplot is better as it offers better result for side-by-side comparison
---Compare the median, interquartile range (the variability) and the symmetry (shape)

1C
Association between quantitative variables:
Scatter Plot:
--Describe association
--1.) Direction: the trend
--2.) Curvature: linear or curved
--3.) Variation: Points tightly clustered?
--4.) Outliers

Measure of Association:
Covariance:
-a measure that quantifies the linear association
-formula (the total area)
-If >0 = on I / III (positive association/ linear relationship)
-If <0 = on II / IV (negative association)

Correlation (r):
-standardized measure of the strength of linear relationship
-formula
-no unit
-between 1 and -1
- 1 =perfect positive correlation
- 0 =no correlation
- -1 = perfect negative correlation
- If r>0, positive trend
If r<0, negative trend
- If lrl >0.75, strong relationship
If lrl <0.25, weak relationship
If 0.25< lrl < 0.75, moderate relationship
-corr(x,y)=corr(y,x)

How to find association:


-Plot a scatter plot
-Find the relationship
-Find the correlation to find the strength of relationship

Lurking variables:
-a variable that is not included in the explanation but will affect the apparent relationship between two
other variables

Correlation matrix:
-a table showing all correlations among a set of numeric variables
Association between categorical variables:

Contingency Table:
-marginal distribution: the subtotal of the two variables
-conditional distribution: the counts within a row and column
--If Associated,
--the column percentage will vary from column to column
--the row percentage will vary from row to row

Then find the artificial data (no association)

Chi-square Statistic:
--formula

Strength of association:
Cramer’s V:
-formula
-Ranges in value from 0 to 1
-no unit
-If V>0.75, strong association
-If V<0.25, weak association

Simpson’s Paradox:
-

2A
Concept of Probability:
Experiment: the observation of some activity
Outcome: Result of experiment
Sample space: the set of all possible experimental outcomes
Event: the collection of one or more outcomes of an experiment

Venn diagrams: graph showing the relationship among events


Union: A or B
Intersection: A and B
Mutually Exclusive Events:
-Do not overlap
-P(A and B)=0
-P(A or B)=P(A)+P(B)

Independent Events:
-P(A and B)=P(A)P(B)
See if the two independent or not
Probability Tree (Tree Diagram)

Collective Exhaustive events:


--Events that at least one of the events must occur
P(A or B or C or D)=1

Joint probability: two events P(Yes and MSN)


Marginal probability: margin (subtotal) P(MSN)

Conditional Probability:
-the probability of A, given that B has occurred
-P(AlB)=P(A and B)/P(B)

Complement Rule:
-let Ac be not A
-Given A and Ac are mutually exclusive and collectively exhaustive
-P(A)=1-P(Ac)

Addition rule:
P(AUB)=P(A)+P(B)-P(A and B) if A and B have joint probability
P(AUB)=P(A)+P(B) if A and B are mutually exclusive

If post several datas,


P(A l B)
P(A l B’)
P(B)
Method 1: contingency table
Method 2:Bayes Rule, formula

Potrebbero piacerti anche