Sei sulla pagina 1di 12

• Variables, Units of Measurement and Frequency

• Measures of Central Tendency

• Measures of Dispersion

• Probability Theory

• Binomial Distribution

• Poisson Distribution

• Normal Distribution
Units of Measurement for Variables

Variables

Categorical / Qualitative Continuous / Quantitative

Nominal Ordinal Interval Ratio


Frequency

• Frequency: Number of times values of the variable repeats itself

• Frequency Distribution: Statistical table which shows the corresponding frequencies against the values of the variables

Simple Frequency Distribution Grouped Frequency Distribution


Variable (x) Frequency (f) Variable (x) Frequency (f)  Class
2 8 2-5 8  Class Frequency
4 10 5-7 10  Class Width

7-9 15  Class Limit


7 15
 Class Boundary
 Relative Frequency
Measures of Central Tendency
• Arithmetic Mean: Sum of a collection of numbers divided by the count of numbers in the collection

 Simple A.M:
Weighted A.M :

• Mode: It is that value of the variable which has the highest frequency

 Simple frequency distribution

 Grouped frequency distribution : Mode = l1 + c ( )

• Median: It is the central most value of the variable and divides the dataset into two equal halves

 Simple series

 Simple frequency distribution

 Grouped frequency distribution: Median = l1 + ( - ∑ f1 )


Measures of Dispersion
• Range: Difference between maximum value of the variable and minimum value of the variable from the dataset

• Standard Deviation: “ Root – Mean – Square – Deviation – from Mean “

 Simple A.M:

 Weighted A.M :

• Quartile Deviation: Divides the dataset into four equal parts and so we have Q1, Q2 and Q3. Way to estimate the spread of the
distribution w.r.t the central measure.

• Inter-Quartile Range: Range between the Quartiles Q1 and Q3 and is used to measure outliers (Box Plot)

• Coefficient of Variance:
R codes
• Creating a Vector:
> x <- c(2,4,7,8,10) # Quantitative values #
> y <- c( “Yes”, “No” ) # Qualitative values #

• Creating a dataframe:
> df <- data.frame (x, y)

• Creating Frequency Table:


> t <- table (data.frame name $ variable name) # with one variable #
> t1 <- table (data.frame $ var 1, dataframe $ var 2 ) # with more than one variable #

• Creating Groups or Cut points:


> cutvariable <- cut(variable name, breaks = c( 10,20,30,40), labels = c(“A”, “B”, “C”,) # e.g. 20 will fall in 10-20 #
> cutvariable <- cut(variable name, breaks = c( 10,20,30,40), labels = c(“A”, “B”, “C”), right = FALSE) # 20 will fall in 20-30 #

• Creating Charts:
> barplot( t, main = “title”, xlab = “x”, ylab = “y”, legend = row.names(t), col = rainbow (specify no.))
> pie (t)
> hist(t)
> boxplot (dataframe $ variable name)
R codes

• Measures of Central Tendency:


> mean(dataframe $ variable name)
> median (dataframe $ variable name)
> t <- table (dataframe name $ variable name)
> t[t = = max (t)] # Gives the Modal value; which needs to be calculated from the frequency table #

• Measures of Dispersion:
>sd (dataframe $ variable name)
>range (dataframe $ variable name)
>quantile (dataframe $ variable name) # Gives all four quartile values #
Probability

• Important concepts:

 Trial : An experiment which can be conducted repeatedly
 Event: The outcome of an experiment
 Mutually Exclusive: Events cannot occur simultaneously
 Exhaustive: At least one event has to occur after every experiment
 Equally Likely: Every event has same chance of occurrence
 Union (U): Events A union B means, A or B = A + B = A U B
 Intersection (Ω): Events A intersection B means, A and B = A * B = A Ω B
 Complement: Á means wherever event A is not present

• Classical definition:
If there are N mutually exclusive, exhaustive and equally likely events; and if N(A) of them are favorable to event A, then:

P(A) =
Probability
• Properties:

 Values of probability lies between 0 and 1

 The sum of all the events present in the sample space = 1

 Á=1–A

 Addition Rule : A or B = A + B = A U B

a. Mutually Exclusive events: P(AUB) = P(A) + P (B)


b. Not Mutually Exclusive events: P(AUB) = P(A) + P (B) – P(A Ω B)

 Multiplication Rule: A and B = A * B = A Ω B

a. Independent Events: P(A Ω B) = P(A) * P(B)


b. Conditional Events: P(A Ω B) = P(B) * P(A/B)

 Thomas Bayes Theorem: If event A can occur with any N mutually exclusive, exhaustive and equally likely events and if A actually occurs with Ei
P(Ei / A) =
Binomial Distribution

• Properties: It is a Discrete Probability distribution
(Used when there are repeated trials)

 Every trial has a success or a failure pmf: f(x) = nCx. θx .(1-θ)n-x


 Every trial is independent to each other
 Probability of success is same for every trial

Poisson Distribution

• Properties: It is a Discrete Probability distribution


(Used when trials becomes huge and tends towards infinity)

 Limiting form of Binomial distribution pmf: f(x) =


 The average occurrence of the event is known
 No. of trials is generally very large and so is unknown
Normal Distribution

• Properties of Normal Distribution: Continuous Probability Distribution



 Symmetrical curve with Skewness = 0
 Infinite Limits tending from - to +
 Mean = Median = Mode

• Standard Normal (Z) Distribution: Continuous Probability Distribution

 Symmetrical curve with Skewness = 0 Standard Normal Density Function

 Finite Limits tending from - 3 to + 3 f(z) =


 Mean = Median = Mode at z= 0
R codes

• Binomial Distribution
> dbinom(12:24, size = n, prob = θ)
> sum (dbinom(12:24, size = n, prob = θ))

• Poisson Distribution
> dpois (x = 112: 115, lambda = value)
> sum (dpois (x = 112: 115, lambda = value))

• Normal Distribution:
> X <-pnorm(5000, mean = value, sd = value)
>Y <- pnorm(10000, mean = value, sd = value)
> Y – X # Probability between 10000 and 5000 #

* Values of x has been assumed to make it understandable

* By default it calculates values of the lower tail…so we add : lower.tail =FALSE

Potrebbero piacerti anche