Statistics Guide: Variables, Measures, Distributions & R Codes

• Variables, Units of Measurement and Frequency
• Measures of Central Tendency
• Measures of Dispersion
• Probability Theory
• Binomial Distribution
• Poisson Distribution
• Normal Distribution
Units of Measurement for Variables
Variables
Categorical / Qualitative Continuous / Quantitative
Nominal Ordinal Interval Ratio

Frequency
• Frequency: Number of times values of the variable repeats itself
• Frequency Distribution: Statistical table which shows the corresponding frequencies against the values of the variables
Simple Frequency Distribution Grouped Frequency Distribution

Variable (x) Frequency (f) Variable (x) Frequency (f)  Class
2 8 2-5 8  Class Frequency
4 10 5-7 10  Class Width
7-9 15  Class Limit

7 15
 Class Boundary
 Relative Frequency
Measures of Central Tendency
• Arithmetic Mean: Sum of a collection of numbers divided by the count of numbers in the collection
•
 Simple A.M:
Weighted A.M :
• Mode: It is that value of the variable which has the highest frequency
 Simple frequency distribution
 Grouped frequency distribution : Mode = l1 + c ( )
• Median: It is the central most value of the variable and divides the dataset into two equal halves
 Simple series
 Simple frequency distribution
 Grouped frequency distribution: Median = l1 + ( - ∑ f1 )

Measures of Dispersion
• Range: Difference between maximum value of the variable and minimum value of the variable from the dataset
•
• Standard Deviation: “ Root – Mean – Square – Deviation – from Mean “
 Simple A.M:
 Weighted A.M :
• Quartile Deviation: Divides the dataset into four equal parts and so we have Q1, Q2 and Q3. Way to estimate the spread of the
distribution w.r.t the central measure.
• Inter-Quartile Range: Range between the Quartiles Q1 and Q3 and is used to measure outliers (Box Plot)
• Coefficient of Variance:
R codes
• Creating a Vector:
> x <- c(2,4,7,8,10) # Quantitative values #
> y <- c( “Yes”, “No” ) # Qualitative values #
• Creating a dataframe:
> df <- data.frame (x, y)
• Creating Frequency Table:

> t <- table (data.frame name $ variable name) # with one variable #
> t1 <- table (data.frame $ var 1, dataframe $ var 2 ) # with more than one variable #
• Creating Groups or Cut points:

> cutvariable <- cut(variable name, breaks = c( 10,20,30,40), labels = c(“A”, “B”, “C”,) # e.g. 20 will fall in 10-20 #
> cutvariable <- cut(variable name, breaks = c( 10,20,30,40), labels = c(“A”, “B”, “C”), right = FALSE) # 20 will fall in 20-30 #
• Creating Charts:
> barplot( t, main = “title”, xlab = “x”, ylab = “y”, legend = row.names(t), col = rainbow (specify no.))
> pie (t)
> hist(t)
> boxplot (dataframe $ variable name)
R codes
• Measures of Central Tendency:

> mean(dataframe $ variable name)
> median (dataframe $ variable name)
> t <- table (dataframe name $ variable name)
> t[t = = max (t)] # Gives the Modal value; which needs to be calculated from the frequency table #
• Measures of Dispersion:
>sd (dataframe $ variable name)
>range (dataframe $ variable name)
>quantile (dataframe $ variable name) # Gives all four quartile values #
Probability
• Important concepts:
•
 Trial : An experiment which can be conducted repeatedly
 Event: The outcome of an experiment
 Mutually Exclusive: Events cannot occur simultaneously
 Exhaustive: At least one event has to occur after every experiment
 Equally Likely: Every event has same chance of occurrence
 Union (U): Events A union B means, A or B = A + B = A U B
 Intersection (Ω): Events A intersection B means, A and B = A * B = A Ω B
 Complement: Á means wherever event A is not present
• Classical definition:
If there are N mutually exclusive, exhaustive and equally likely events; and if N(A) of them are favorable to event A, then:
P(A) =
Probability
• Properties:
•
 Values of probability lies between 0 and 1
 The sum of all the events present in the sample space = 1
 Á=1–A
 Addition Rule : A or B = A + B = A U B
a. Mutually Exclusive events: P(AUB) = P(A) + P (B)

b. Not Mutually Exclusive events: P(AUB) = P(A) + P (B) – P(A Ω B)
 Multiplication Rule: A and B = A * B = A Ω B
a. Independent Events: P(A Ω B) = P(A) * P(B)

b. Conditional Events: P(A Ω B) = P(B) * P(A/B)
 Thomas Bayes Theorem: If event A can occur with any N mutually exclusive, exhaustive and equally likely events and if A actually occurs with Ei
P(Ei / A) =
Binomial Distribution
•
• Properties: It is a Discrete Probability distribution
(Used when there are repeated trials)
 Every trial has a success or a failure pmf: f(x) = nCx. θx .(1-θ)n-x

 Every trial is independent to each other
 Probability of success is same for every trial
Poisson Distribution
• Properties: It is a Discrete Probability distribution

(Used when trials becomes huge and tends towards infinity)
 Limiting form of Binomial distribution pmf: f(x) =

 The average occurrence of the event is known
 No. of trials is generally very large and so is unknown
Normal Distribution
• Properties of Normal Distribution: Continuous Probability Distribution

•
 Symmetrical curve with Skewness = 0
 Infinite Limits tending from - to +
 Mean = Median = Mode
• Standard Normal (Z) Distribution: Continuous Probability Distribution
 Symmetrical curve with Skewness = 0 Standard Normal Density Function
 Finite Limits tending from - 3 to + 3 f(z) =

 Mean = Median = Mode at z= 0
R codes
• Binomial Distribution
> dbinom(12:24, size = n, prob = θ)
> sum (dbinom(12:24, size = n, prob = θ))
• Poisson Distribution
> dpois (x = 112: 115, lambda = value)
> sum (dpois (x = 112: 115, lambda = value))
• Normal Distribution:
> X <-pnorm(5000, mean = value, sd = value)
>Y <- pnorm(10000, mean = value, sd = value)
> Y – X # Probability between 10000 and 5000 #
* Values of x has been assumed to make it understandable
* By default it calculates values of the lower tail…so we add : lower.tail =FALSE

Statistics Guide: Variables, Measures, Distributions & R Codes

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Statistics Guide: Variables, Measures, Distributions & R Codes

Caricato da

Copyright:

Formati disponibili

• Variables, Units of Measurement and Frequency

• Measures of Central Tendency

Categorical / Qualitative Continuous / Quantitative

Nominal Ordinal Interval Ratio

• Frequency: Number of times values of the variable repeats itself

Simple Frequency Distribution Grouped Frequency Distribution

7-9 15  Class Limit

 Simple frequency distribution

 Grouped frequency distribution : Mode = l1 + c ( )

 Simple frequency distribution

 Grouped frequency distribution: Median = l1 + ( - ∑ f1 )

• Standard Deviation: “ Root – Mean – Square – Deviation – from Mean “

• Creating Frequency Table:

• Creating Groups or Cut points:

• Measures of Central Tendency:

 The sum of all the events present in the sample space = 1

a. Mutually Exclusive events: P(AUB) = P(A) + P (B)

 Multiplication Rule: A and B = A * B = A Ω B

a. Independent Events: P(A Ω B) = P(A) * P(B)

 Every trial has a success or a failure pmf: f(x) = nCx. θx .(1-θ)n-x

• Properties: It is a Discrete Probability distribution

 Limiting form of Binomial distribution pmf: f(x) =

• Properties of Normal Distribution: Continuous Probability Distribution

• Standard Normal (Z) Distribution: Continuous Probability Distribution

 Symmetrical curve with Skewness = 0 Standard Normal Density Function

 Finite Limits tending from - 3 to + 3 f(z) =

* Values of x has been assumed to make it understandable

* By default it calculates values of the lower tail…so we add : lower.tail =FALSE

Potrebbero piacerti anche