Sei sulla pagina 1di 16

Q1. Why it is necessary to summarise data? Explain the approaches available to summarize the data distributions?

Mass data, which is collected, classified, tabulated and presented systematically, is analysed further to bring its size to a single representative figure. Thats why it is necessary to summarise data. Graphical representation is a good way to represent summarised data. However, graphs provide us only an overview and thus may not be used for further analysis. Hence, we have to summarise data by using summary statistics like computing averages to analyse the data. Objectives of statistical average The statistical average or simply an average refers to the measure of middle value of the data set. The objectives of statistical average are to: Present mass data in a concise form The mass data is condensed to make the data readable and to use it for further analysis. Facilitate comparison It is difficult to compare two different sets of mass data. But we can compare those two after computing the averages of individual data sets. While comparing, the same measure of average should be used. It leads to incorrect conclusions when the mean salary of employees is compared with the median salary of the employees.

Establish relationship between data sets The average can be used to draw inferences about the unknown relationships between the data sets. Computing the averages of the data sets is helpful for estimating the average of population. Provide basis for decision-making In many fields, such as business, finance, insurance and other sectors, managers compute the averages and draw useful inferences or conclusions for taking effective decisions. Mass data, which is collected, classified, tabulated and presented systematically, is analysed further to bring its size to a single representative figure. This single figure is the measure which can be found at central part of the range of all values. It is the one which represents the entire data set. Hence, this is called the measure of central tendency. In other words, the tendency of data to cluster around a figure which is in central location is known as central tendency. Measure of central tendency or average of first order describes the concentration of large numbers around a particular value. It is a single value which represents all units. Statistical Averages: The commonly used statistical averages are arithmetic mean, geometric mean, harmonic mean. Arithmetic mean is defined as the sum of all values divided by number of values and is represented by X.

Before we study how to compute arithmetic mean, we have to be familiar with the terms such as discrete data, frequency and frequency distribution, which are used in this unit. If the number of values is finite, then the data is said to be discrete data. The number of occurrences of each value of the data set is called frequency of that value. A systematic presentation of the values taken by variable together with corresponding frequencies is called a frequency distribution of the variable. Median: Median of a set of values is the value which is the middle most value when they are arranged in the ascending order of magnitude. Median is denoted by M. Mode: Mode is the value which has the highest frequency and is denoted by Z. Modal value is most useful for business people. For example, shoe and readymade garment manufacturers will like to know the modal size of the people to plan their operations. For discrete data with or without frequency, it is that value corresponding to highest frequency. Appropriate Situations for the use of Various Averages 1. Arithmetic mean is used when: a. In depth study of the variable is needed b. The variable is continuous and additive in nature c. The data are in the interval or ratio scale d. When the distribution is symmetrical

2. Median is used when: a. The variable is discrete b. There exist abnormal values c. The distribution is skewed d. The extreme values are missing e. The characteristics studied are qualitative f. The data are on the ordinal scale 3. Mode is used when: a. The variable is discrete b. There exist abnormal values c. The distribution is skewed d. The extreme values are missing e. The characteristics studied are qualitative 4. Geometric mean is used when: a. The rate of growth, ratios and percentages are to be studied b. The variable is of multiplicative nature 5. Harmonic mean is used when: a. The study is related to speed, time b. Average of rates which produce equal effects has to be found 4.9 Positional Averages Median is the mid-value of series of data. It divides the distribution into two equal portions. Similarly, we can divide a given distribution into four, ten or hundred or any other number of equal portions.

Q.2 Explain the purpose of tabular presentation of statistical data. Draft a form of tabulation to show the distribution of population according to i) Community by age, ii) Literacy , iii) sex , and iv) marital status. Tabulation follows classification. It is a logical or systematic listing of related data in rows and columns. The row of a table represents the horizontal arrangement of data and column represents the vertical arrangement of data. The presentation of data in tables should be simple, systematic and unambiguous. The objectives of tabulation are to: i. Simplify complex data ii. Highlight important characteristics iii. Present data in minimum space iv. Facilitate comparison v. Bring out trends and tendencies vi. Facilitate further analysis

Marital Status

Sex Age: Male

Educated Below 20yrs 2040 Above 40

Non-Educated Below 20yrs 2040 Above 40

Married Femal

e Male Unmarried Femal e

Q3. Give a brief note of the measures of central tendency together with their merits & Demerits. Which is the best measure of central tendency and why? The measures of central tendency and measures of dispersion summarise mass data in terms of its two important features: With respect to nature of data to cluster around a central value. With respect to their spread from their central value. 1 Arithmetic mean Arithmetic mean is defined as the sum of all values divided by number of values and is represented by . Before you study how to compute arithmetic mean, you have to be familiar with the terms such as discrete data, frequency and frequency distribution, which are used in this unit. If the number of values is finite, then the data is said to be discrete data. The number of occurrences of each value of the data set is called frequency of that value. A systematic presentation of the values taken by variable together with corresponding frequencies is called a frequency distribution of the variable.

Properties of arithmetic mean You have studied how to calculate arithmetic mean for grouped and ungrouped data. Let us study about the properties of arithmetic mean which are helpful in understanding the concept of arithmetic mean. The properties of arithmetic mean are: i. Algebraic sum of deviations of a set of values taken from their mean is always zero, that is,

ii. Sum of squares of deviations of a set of values from their mean is always minimum, that is,

iii. Arithmetic mean is capable of further algebraic treatment. Suppose if X1, X2.. Xn are the means of n1, n2.nn sets of values, then their combined arithmetic mean value is given by:

Merits and demerits of arithmetic mean Merits It is simple to calculate and easy to understand. It is based on all values Demerits It is affected by extreme values. It cannot be determined for distributions with open-end

It is rigidly defined. It is more stable. It is capable of further algebraic treatment.

class intervals. It cannot be graphically located. Sometimes it is a value which is not in the series.

Median Median of a set of values is the value which is the middle most value when they are arranged in the ascending order of magnitude. Median is denoted by M. In case of discrete series without or with frequency, it is given by:

Merits and demerits of median Merits It can be easily understood and computed. It is not affected by extreme values. It can be determined graphically (Ogives). It can be used for qualitative data. It can be calculated for distributions with open-end classes. Demerits It is not based on all values. It is not capable of further algebraic treatment. It is not based on all values.

Mode Mode is the value which has the highest frequency and is denoted by Z. Modal value is most useful for business people. For example, shoe and readymade garment manufacturers will like to know the modal size of the people to plan their operations. For discrete data with or without frequency, it is that value corresponding to highest frequency. Merits and demerits of mode Merits Demerits In many cases it can be found It is not based on all values. by inspection. It is not affected by extreme It is not capable of further values. mathematical treatment. It can be calculated for It is much affected by distributions with open end sampling fluctuations. classes. It can be located graphically. It can be used for qualitative data.

The measures of variations are: Range(R) Quartile Deviations(Q.D) Mean Deviations(M.D) Standard Deviation (S.D)

Range Range is the difference between highest and lowest value of the data. R = H-L where, H: Highest value L: Lowest value Coefficient of range = The table 4.20 shows the merits and demerits of range. Merits and demerits of range Merits Demerits It is easily understood and It is affected by extreme simple to calculate. values. It is rigidly defined. It is not based on all values. It uses extreme values only. Range is used: In Statistical Quality control When the study does not require deep analysis When data has no abnormal values

Quartile deviations Unlike range, quartile deviation does not involve the extreme values. It is defined as:

Q.D. = Coefficient of Q.D =

Merits and demerits of quartile deviations Merits Demerits It is easy to understand and It is not based on all values. to compute. It is rigidly defined. It is affected by sampling fluctuations. It is not affected by extreme It is not capable of further values. algebraic treatment.

Mean deviation Mean deviation is defined as the mean of absolute deviations of the values from the central value. For discrete data with frequency, mean deviation is calculated as:

In case of continuous series X represents mid value of class-interval. Similarly, we can have mean deviation from median or mode. X is replaced by median or mode in the above formula. However, mean deviation from median is the least. It is known as minimal property of mean deviation. The corresponding relative measures are coefficient of mean deviation.

Merits and demerits of mean deviation Merits It is based on all values. Demerits It is not capable of further algebraic treatment. It is less affected by extreme It does not take into account values. negative signs. It is not affected much by sampling fluctuations.

Standard Deviation Measures of dispersion range and Q.D are not based on all values. Mean deviation based on all values does not take into consideration the positive or negative sign. Therefore, a measure that removes both drawbacks is given by standard deviation (S.D). The standard deviation of a set of values is the positive square root of mean of the squared deviations of the values from their arithmetic mean. It is denoted by s (sigma). For discrete series without frequency it is given by:

Variance =

Variance = s= For discrete series with frequency, it is given by:

Variance = s= Where, X is the mid value of class interval for continuous series in case of grouped data, alternative form for (A) & (B) are the followings For (A)

Variance = s= For (B)

Variance = s= Where, d = X-A: here, A is assumed mean And C.F.= Class Width

Merits and demerits of standard deviation Merits It is rigidly defined. It is based on all values. It is capable of further algebraic treatment. It is not very much affected by sampling fluctuations. Coefficient of Variation When we want to compare two different sets of values pertaining to different characteristics or pertaining to same characteristic, then we use coefficient of variation (CV). It is a relative measure expressed in percentage and is defined as: Demerits It is difficult to understand. It gives undue weightage for extreme values. It cannot be calculated for classes with open end interval.

CV in % = It is used to compare the homogeneity or stability or uniformity or consistency of two or more data sets. A low value of coefficient of variation indicates a low degree of variation

Standard deviation is the best central tendency because it is independent of origin but not independent of scale. It is always greater than or equal to zero. It is the least of all root mean square deviations. Suppose the mean of n1 values is and that of n2 values is and

standard deviation of the n1 and n2 values is s1 and s2 respectively. Then the combined standard deviation of both the values is given by: Variance = Where, d1 = and d2 =

being the combined mean of n1 and n2 values.

Q4. Machines are used to pack sugar into packets supposedly containing 1.20 kg each. On testing a large number of packets over a long period of time, it was found that the mean weight of

the packets was 1.24 kg and the standard deviation was 0.04 Kg. A particular machine is selected to check the total weight of each of the 25 packets filled consecutively by the machine. Calculate the limits within which the weight of the packets should lie assuming that the machine is not been classified as faulty

Q5. A packaging device is set to fill detergent power packets with a mean weight of 5 Kg. The standard deviation is known to be 0.01 Kg. These are known to drift upwards over a period of time due to machine fault, which is not tolerable. A random sample of 100 packets is taken and weighed. This sample has a mean weight of 5.03 Kg and a standard deviation of 0.21 Kg. Can we calculate that the mean weight produced by the machine has increased? Use 5% level of significance.

Q6. Find the probability that at most 5 defective bolts will be found in a box of 200 bolts if it is known that 2 per cent of such bolts are expected to be defective .(you may take the distribution to be Poisson; e-4= 0.0183).

Potrebbero piacerti anche