Sei sulla pagina 1di 54

Statistical Methods

Recommended Books:
Fundamentals of Mathematical Statistics by S.C. Gupta and V.K. Kapoor, S. Chand Publisher (2007) Advanced Engineering Mathematics by R.K. Jain & S.R.K. Iyenger, 3rd edition, Narosa (2002).

Outlines:
Collection of Data and Frequency Distribution Measure of Central Tendency Measures of Dispersion Probability Theory Mathematical Expectation Probability distributions Theory of least squares and curve fitting Correlation and Regression

Collection of Data and Frequency Distribution

The source of data may be Primary or Secondary. The term Primary data refers to the statistical material collected by original observations, measurements or counting in the course of investigations. The term Secondary data refers to the statistical material collected from some one elses record and not by original observations.

Continue Method of Collection of Primary Data


Direct Personal Investigation Indirect Oral Investigations Schedule and Questionnaire Information collected through local Correspondence

Continue Direct Personal Investigation: In this method the investigator himself meet the persons concerned and collect the necessary information. Indirect Oral Investigations: In this method the investigator collects the necessary data from the third persons who are directly in touch with the information sought. Schedule and Questionnaire: A schedule is name usually applied to a set of questions which are asked and filled in by the interviewers in a face to face situations with another person. A questionnaire refers to a device for securing answers to questions which the respondents fills in himself. It may be mailed.

Continue Nature of questions in Schedule and Questionnaire:


Number of questions should be small The questions should be arranged logically The questions should be short and easy to understand As per practicable personal questions should be avoided They should be capable of objective answers( say yes or no) Questions requiring calculations should be avoided.

Information Collected through Local correspondents: In this method some local persons send information on certain subjects from time to time.

Continue Methods of Collection of Secondary Data: Government Publications: Usually information are collected from publications of Central, State or International institutions. Semi Government Publications: In this method, the information are collected from the statistical material published by the semi government bodies like Municipal corporation, District Boards. Necessary information can also be collected from the publications of trade associations of chambers of commerce, News papers, Research Bureau etc.

Continue
Frequency Distribution: When the data contains a large number of observations of a particular characteristics, it is then impossible to grasp its significance. In such cases the data should be summarized in a systematic order to study its salient features. The special type of summarization is done by frequency distribution table. Here we first calculate the difference between the highest and the lowest value of the data. It is called the range of the observations. Then the range is divided into a number of small intervals. The number of items in each class is called the class frequency. The method by which frequency are distributed in different classes is called classes is called the Frequency Distributions.

Measure of Central Tendency


It is found by experience that the values of the variable in any frequency distribution have a tendency to concentrate round a central values. This characteristics is measured by the measure of central tendency. The central value of the variable is also useful in locating the position of the frequency distribution. Therefore measure of central tendency are also called measures of location. The different measures of central tendency that are in common use -------1. Arithmetic mean or simply mean 2. Median 3. Mode 4. Geometric Mean 5. Harmonic Mean

1. Arithmetic mean or simply mean


It is the sum of all the observations divided by the number of observations. If the variable X takes the values x1, x2, xn then the mean of X is denoted by and is given by

If the frequency of xi if fi (i = 1, 2, n) then

Where Note: In case of grouped or continues frequency distribution x is taken as the mid value of the corresponding class interval.

Continue Short cut method for calculating Arithmetic mean:

Continue

Class Interval (Depth in meter)

Example1. The Frequency distribution of the pore pressure prediction in the Bombay High basin is given in the following Table. Calculate the Arithmetic mean
Frequency (Pore pressure ) 5 4 8 12 16 15 10 8 5 2

0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100

Continue

Continue Properties of the Arithmetic Mean


1. The sum of the deviations, of all the values of x, from their arithmetic mean, is zero. i.e.

2. If are the arithmetic mean of first, second , k th series with n1 n2, nk observations respectively then, the arithmetic mean of the combined series can be calculated as

2. Median
Median is the middle most value of the variable, when the observations are arranged in ascending or descending order. Cumulative frequency is used in calculating median from discrete frequency distribution. Here first we calculate N/2, where N is the sum of frequencies and see the cumulative frequency just greater than N/2. The corresponding value of X is the median. To find the median from the continues frequency distribution, we first calculate N/2. The class corresponding to the cumulative frequency just greater than N/2 is the median class and the median is calculated from the following formula

Continue . . .
Where l, f and h are the lower limit, frequency and width of the median class respectively. fc is the cumulative frequency just before the median class.

3. Mode
The value which occurs most frequently in a set of observations is called the mode. In a discrete frequency distribution the value corresponding to the maximum frequency is called the mode. In a grouped or continues frequency distribution the class corresponding to the maximum frequency is called the modal class and the formula for the mode is

Continue
Where l, h, f1 are the respectively lower limit, width and the frequency of the modal class. f0 and f2 are the frequency just before and after the modal class respectively.

Relation between Mean, Media and Mode: For moderately asymmetrical distribution mean, media, mode obey the following relations. Mean Mode = 3(Mean Median) Example 3: Find the mean median and mode from the example 1 and example 2.

Measures of Dispersion:
Average measures how the values of the variable concentrates round the centre. To find out the manner in which the various values are distributed about the centre, we measure the variation or dispersion of the variant values. The following are the different measures of dispersion : Range, Semi interquartile Range, Mean Deviation, Standard Deviation, Variance.

Skewness and Kurtosis Both skewness and kurtosis can be used to describe the shape of a frequency distribution. Skewness is a measure of asymmetry of the tails of a distribution. The most popular way to compute the asymmetry of a distribution is Pearsons mode skewness:

Continue
A negative skew indicates that the distribution is spread out more to the left of the mean value, assuming increasing values on the axis to the right. The sample mean is smaller than the mode. Distributions with positive skewness have large tails that extend to the right. The skewness of the symmetric normal distribution is zero.

Continue Although Pearsons measure is a useful measure, the following formula by Fisher for calculating the skewness is often used, including the corresponding MATLAB function.

Continue

Continue

Continue The second important measure for the shape of a distribution is the kurtosis. Again, numerous formulas to compute the kurtosis are available. MATLAB uses the following formula:

Continue The kurtosis is a measure of whether the data are peaked or fl at relative to a normal distribution. A high kurtosis indicates that the distribution has a distinct peak near the mean, whereas a distribution characterized by a low kurtosis shows a flat top near the mean and heavy tails.

Continue Higher peakedness of a distribution is due to rare extreme deviations, whereas a low kurtosis is caused by frequent moderate deviations. A normal distribution has a kurtosis of three. Therefore some defi nitions for kurtosis subtract three from the above term in order to set the kurtosis of the normal distribution to zero.

Continue

Find the shape of the frequency distribution of the Table 1:

Find the shape of the frequency distribution of the Table 2

Solution of Table 1

Solution of Table 2

Correlation and Regression:


If the change in the values of one appears to be related or linked with the change in the other then the two variables are said to correlated. If the increase (decrease) in the values of one variable results a corresponding increase (decrease) in values of another variables are said to be positively correlated. e.g., Paired quantities like height and weight, income and expenditure etc. are positively correlated.

Continue If the increase (decrease) in the values of one variable results a corresponding decrease (increase) in values of another variables are said to be negatively correlated. e.g. The demand and price of commodities, volume and pressure of a perfect gas.

Karl Pearsons Correlation Coefficient:

Properties of Karl Pearsons Correlation Coefficient:


1. 2. 3. 4. It is a pure number If X and Y are two random variables then rxy = ryx The correlation coefficient lies between -1 and 1. Correlation coefficient is invariant of change of origin and scale.

REGRESSION ANALYSIS
Regression Analysis is the method used for estimating the unknown values of one variable corresponding to the known value of another variable. If there are only two variables under study then one is taken as independent and another is taken as dependent variable and regression analysis explain how on the average the values of the dependent variables change with a change in the values of the independent variable. If the scatter diagram indicates some relationship between two variables x and y, then the dots of the scatter diagram will be concentrated round a curve. This curve is called the curve of regression.

Continue
LINE OF REGRESSION: When the curve is a straight line, it is called a line of regression. A line of regression is the straight line which gives the best fit in the least square sense to the given frequency.

EQUATIONS OF THE LINES OF REGRESSION


The regression line of y on x through is

The regression of x on y through

is

Derive the above equation (Home works)

Regression Coefficient:
The slop of the line of regression of y on x is called the regression coefficient of y on x and is calculated by The slop of the line of regression of x on y is called the regression coefficient of x on y and is calculated by

PROPERTIES OF THE REGRESSION COEFFICIENTS: The correlation coefficient is the geometric mean of two regression coefficients. Regression coefficients are free from change of origin but not scale. The arithmetic mean of two regression coefficients is greater than correlation coefficient.

ANGLE OF TWO LINES OF REGRESSION

METHODS OF LEAST SQUARES & CURVE FITTING: Suppose (xi, yi), i = 1, 2, n be n pairs of values of the variables X and Y. Here X is taken as independent variable and Y is dependent variable. Suppose we are to fit a polynomial of degree p given by In the method of least squares, the values of a0, a1, ap are so determined that the sum of the squares of the errors becomes minimum.

Continue The error at the ith point viz at (xi, yi)is The value of ei can not be zero, because it is the difference between observed and expected values. The sum of squares of errors is

Continue Now we are to find the values of a0, a1, a2, ap by making S minimum for them. From the differential calculus we know that the solution of the normal equations

give us the most appropriate values of a0, a1, a2, ap .

SHORTCUT METHOD It should be noted here that if n (no. of observations) is odd then we take

If n is even number then we take

Mathematical Expectation:
Random Variable: Let X denotes the uppermost face when a die is thrown. Then X can take the values 1, 2, 3, 4, 5, 6, each with probability 1/6 and sum of all the probabilities is 1. Here we see that the variable X has taken different values with some probabilities and sum of all the probabilities is unity. Here X is called a random variable. Definition: If a variable takes different values with some probabilities and if sum of all the probabilities is unity then the variable is called a random variable. The random variable may be discrete or continuous.

Continue
Probability Mass Function (p.m.f): If X is a discrete random variable the p(x) = P(X=x) will be called pmf if (i) p(x) 0 (ii) p(x) = 1. Probability Density Function (p.d.f): If X be a continous random variable in (a, b) the f(x) will be called pdf of X if (i) f(x) 0; a < x < b (ii) f(x) dx = 1.

Continue
Mathematical Expectation of a Random Variable: Mathematical Expectation of a random variable X is the sum of product of the values taken by the random variable and their respective probabilities and is denoted by E(X). If X is a discrete random variable the then where p(x) is the pmf of X. If X is a continuous random variable then where f(x) is the pdf of X.

Continue
Mathematical Expectation of the function of a Random variable: If g(X) is a function of the discrete random variable X then where p(x) is the pmf of X. If g(X) is a function of the continuous random variable X in (a, b)then where f(x) is the pdf of X

Some Fundamental Theorem


If a is any constant then E(a) = a If X is a random variable and a is any constant, then E(aX) = aE(X) Addition Theorem of Mathematical Expectation: Statement: If X and Y are two random variables then E(X + Y) = E(X) + E(Y). Remarks: This theorem can be extended to any number of random variables. Multiplicative Theorem: Statement: If X and Y are two independent random variables then, E(XY) = E(X)E(Y). Remarks: This theorem can be extended to any number of random variables.

Some Standard Distribution:

Continue

Potrebbero piacerti anche