Statistics Essays 1. Classification & Tabulation

STATISTICS Essays 1.
Classification & Tabulation The human race exists and progresses on the basis of knowledge, Knowledge of the past present and the foreseeing future. Knowledge of anything is obtained from the study of it. To study something we need facts. In every sphere of life such facts are needed. Eg: If a government is to function well it needs facts about the manpower that can be obtained, facts about the financial resources, facts about the cultivation and trade and so on. Facts are collected from various sources. The facts obtained from various sources are called DATA. These facts may be true or false, direct or indirect. Eg: If a foreigner wants to know about India he could visit India and get first hand information. Otherwise he could read books on Indians or meet people who have visited India and get information. Data collected directly are called Primary Data and data collected from indirect sources are called Secondary Data When facts about a large body of person are to be collected it is not possible to go to each and every person concerned and get the facts. So we divide the larger body into smaller groups. The facts obtained from different smaller groups when put together will convey no idea as it will be a big mass of idea which will be in a mess. They have to be expressed so that they are brief and clear. So we divide them into smaller homogenous groups with common characteristics. The process of arranging the collected data into different groups is called Classification. Classification helps to reduce the bulk of the data by suppressing irrelevant matter and presents the material in some homogenous form. In short it gives a clarity and brevity to the facts collected. There is no general rule for the choice of classes. One can have any number of classes in a particular situation. In general there are two types of classification a) Quantitative Classification b) Qualitative Classification When data is grouped according to character, attributes or qualities it is qualitative classification, and when data is classified according to magnitude or quantity it is quantitative classification.
Eg: The population of a country can be divided according to sex, religion linguistics basis such a study will be qualitative classification.
or
Even after classification the statistical data may not be fit for study and interpretation. Hence after classification we put the data in a tabular form. The tabulated form of the data giving clearly the frequency of each class is called Frequency Distribution of the Data. Classification and tabulation renders the data in a neat form. It enables quick interpretation and helps comparing two or more distribution. It helps even a layman to understand the facts properly. Further it saves the onlooker time and money. 2. Write an essay on Diagrammatic and Graphical Representation ( Pictorial Representation) For the study of anything we need facts, facts about the things under study. The facts which are collected from different sources are called Data. The data collected from different sources will be a bundle of ideas in a heterogeneous form. For brevity and claritys sake these facts have to be classified and tabulated. The simplification thus affected it in a dull numerical form which is not very easy to digest. To a lay man, numbers may not be appealing. Diagrams and graphs catch the eye and give a birds eye view of the whole data. The impressions created by graphs last longer in the mind. Hence as an aid to understand data diagrams are used in statistics. The different types of diagrams used are a) Bar Diagram b) Histogram c) Pie-Diagram d) Pictorial Graphs a) Bar Diagram Bar diagrams consists of a series of bars of equal width (all horizontal or all vertical) standing on a common base line, the length of the graphs being proportional to the magnitude of the items represented. This is also known as a simple bar diagram. A simple bar diagram is one dimensional diagram in which each bar represents the whole magnitude.
Eg:
The second type of bar is the subdivided bar. This is used to exhibit the division of a whole into component parts. If a given magnitude can be split into subdivisions, the bars may be subdivided into smaller parts proportional to the magnitude of the subdivision.
Eg:
There is also the percentage bar diagram where the components are expressed in percentages. In such cases the bars are of equal length and the components are marked according to the percentage of the each to the whole. b) Histogram Histogram consists of rectangles erected over true class intervals, their areas being proportional to the frequencies of the respective classes. Therefore the areas of the
rectangles will be proportional to the actual value of the classes. Here the rectangles are adjacent. In the case of bar diagrams it need not be so.
Eg: c) Pie Diagram Another way of representing the data diagrammatically is pie-diagrams. If in bar diagrams the data are represented by bars, in pie diagrams the data are represented by circles. Here circles whose area is proportional to the data in each class are drawn. Hence looking at the size of the circles we can judge the magnitude of the data. The component bars of the different variables can be marked by sectors of the circles. Hence it is also known as Sector Graph. The sector graph may be used to compare various components of a single distribution or different distributions. When the component bars are represented as percentages of the whole, it is known as percentage pie diagram. Here the radii of all the circles will be same. Eg:
d)
Pictograph This method is often used to show the comparative sizes of armies, navies and air forces of different countries, by showing proportionate size of soldiers, ships and aircrafts respectively. Bu certain type of data cannot be represented in this way. For example the number of suicides or death by cancer etc cannot represented this way as it will offend the finer taste of man. (If only diagrammatic representation is asked stop here and write the last two paragraphs)
Eg: All the above methods are diagrams. The data can also be expressed as graphs. If the data is in the class form mark the midpoints of the classes on the X-axis, these are the X coordinates and the frequencies as the Y coordinates. Join these points with broken lines. Then we get a graph called the Frequency Polygon. If these points are joined by a smooth curve we get a curve called the Frequency Curve. If points are marked whose X- coordinates are upper limits of the classes and the Ycoordinates, the corresponding less than cumulative frequencies, and then these points are joined by a smooth curve then the curve is known as the Less Than Cumulative Frequency Curve or Less Than Ogive. Similarly if points are marked whose X- coordinates are lower limits of the classes and the Y- coordinates, the corresponding greater than cumulative frequencies, and then these points are joined by a smooth curve then the curve is known as the Greater Than Cumulative Frequency Curve or Greater Than Ogive. As mentioned earlier, diagrams and graphs are only visual aids to understand the data better. These exploit the aesthetic sense of man. But remember these do not substitute the actual data. Further they can give a wrong picture of the data if the diagrams
and graphs are not done correctly. A small error in the entry can result in a big error. Hence care should be taken to depict actual data. Hence some rules must be observed in preparing diagrams or graphs. (i) The title should be given either above or below the diagrams clearly underlined. (ii) The footnote should be given above the source and the degree of accuracy attained. (iii) When axis are used they should be clearly drawn, giving clearly what each axis represents. (iv) When diagrams contain portions colored differently or marked by different methods to distinguish them from each other, it is necessary to give under the diagrams small squares containing the same colors or markings, with a note opposite to each as to what it represents. A clumsy pictorial representation will not impress anyone and so it loses its value completely. 3. Average The collection of facts is very essential for any study. The facts collected from various sources are known as data. The data so collected will be bundle of ideas in a heterogeneous form. For claritys and brevitys sake they have to be arranged into groups or classes having homogenous qualities. Dividing the data into smaller groups is called classification and putting the data into tables is called tabulation. However well the classification and tabulation are done, the data will be in a dull numerical form and will not appeal to the human mind. Further the comparison of more than one distribution cannot be done if the data is in classified form. Hence we need certain constants to understand a single distribution better or to compare two or more distributions. An average is one such constant. An average is a certain value which is a satisfactory representative of a given group of certain items. It is a measure of central tendency giving an idea how the item tend towards or deviate from central value. Then they are a) b) c) d) e) a) Arithmetic Mean Median Mode Geometric Mean Harmonic Mean Arithmetic Mean The AM is defined as that value of the variate which is equal to the sum of the scores divided by the total number of items. In a raw data x1, x2,..xn are the items with frequencies with f1, f2,..fn then
AM = (f1x1 + f2x2 +.. fnxn)/ (f1+ f2+..+fn)
If the data is in frequency form and x1, x2,..xn are the mid values of classes with corresponding frequencies f1, f2,..fn then AM = (f1x1 + f2x2 +.. fnxn)/ (f1+ f2+..+fn) b) Median Median is that value of the variate such that the items lie above it and the items lie below it, when the items are arranged in ascending or descending order. When there are an odd number of items there will be two mid terms and their mean is taken as the median. If the data is in a class form then the median is given by the formula Median = lm + [ (N/2 ) Fm ] * C f lm Lower limit of the median class Fm Cumulative frequency of the previous class f Frequency of the median class C Class interval N Total number of frequency of the class c) Mode Mode is defined as that value of the variate which has the maximum frequency. When the data is in a class form the mode is calculated as Mode = lm + [ f2 ] * C [ f1 + f2 ] lm Lower limit of the modal class f1 & f2 frequencies of the previous and succeeding classes of the modal class C Class interval Modal class is that class that has the highest frequency
d)
Geometric Mean If x1, x2,..xn are the n items with frequencies f1, f2,..fn then the G.M = (x1f1 * x2f2* xnfn)1/N
N = f1 + f2 +.. +fn When the data is in frequency form and x1, x2,..xn are the mid values of classes with frequencies f1, f2,..fn then the
G.M = (x1f1 * x2f2* xnfn)1/N N = f1 + f2 +.. +fn e) Harmonic Mean If If x1, x2,..xn are the n items with frequencies f1, f2,..fn then the HM is defined as HM = N / [ (f1/ x1 )+ (f2/ x2)+. (fn/ xn ) ] ; where N = f1 + f2 +.. +fn When the data is in frequency form and x1, x2,..xn are the mid values of classes with frequencies f1, f2,..fn then the HM = N / [ (f1/ x1 )+ (f2/ x2)+. (fn/ xn ) ] ; where N = f1 + f2 +.. +fn The above mentioned averages have their own advantages and disadvantages. Before assuming the advantages and disadvantages it will be nice if we could know what the qualities are a good average should possess. A good average will have a clear well defined formula, which will be a significant measure of the whole distribution. It should depend on all the scores under consideration. It should be easy to evaluate and susceptible to algebraic operations It should be stable, not giving absurd values
Arithmetic Mean As for AM it possesses almost all the above qualities of an average except that at times it may give absurd values. AM is affected by too big and too small values AM is also affected by the open and unequal classes AM cannot be calculated from the frequency curve Median As for median though it has a formula it is not in keeping with the definition when the data is in a frequency form. It is not susceptible to algebraic operations. It does not take into account all the scores under consideration It is not affected by too big or too small a value It is not affected by open or unequal classes
The importance of median is that when actual values of all the scores are not known and when their order is known, median can be calculated provided a few midterms are given. Median can be calculated from the ogives.
4. Measures of Dispersion Facts are the most essential factors for the study of anything. Facts collected from various sources are termed as data. The data thus collected will be a bundle of heterogeneous ideas. For claritys and brevitys sake they have to be arranged into groups or classes having homogenous qualities. Dividing the data into smaller groups is called classification and putting the data into tables is called tabulation. However well the classification and tabulation are done, the data will be in a dull numerical form and will not appeal to the human mind. Further the comparison of more than one distribution cannot be done if the data is in classified form. Hence we need certain constants to understand a single distribution better or to compare two or more distributions. There are three types of such constants in common use. They are averages, measures of dispersion and measures of skewness. An average is a certain value of the group. But sometimes two distributions may give the same average, but still differ in individual values. The dispersion among the items may however vary. Hence we need a constant to measure the extent to which the individual items differ from an average. Such constant is termed as a measure of dispersion. A measure of dispersion is a value which measures the extent to which individual items differ from an average. The important measures of dispersion are Range, Quartile Deviation, Mean Deviation and Standard Deviation. a) Range
Range is defined as the difference between the biggest and the smallest items of the distributions. It is very easy to calculate but it is highly unstable as it depends only on extreme values. If the extreme values are very big or very small the range will be very wide. Larger the range wider will be the dispersion among the items. Hence for a good distribution the range must be small. b) Quartile Deviation
Since the range depends only on two extreme values it is very unstable. To avoid this we take more stable constants, which depend on some intermediate items, and take the
difference between them as a measure of dispersion. The Quartile deviation is defined as the semi-inter quartile range QD = (Q3 Q1)/ 2 Where Q1 is the first quartile and Q3 is the third quartile. Q1 is defined as that value of the variate such that th of the variate lie below it and th of the variate lie above it. It is given by Q1 = lm [ (N/4 ) Fm ] * C f Similarly Q3 is defined as that value of the variate such that th of the variate lie below it and th of the variate lie above it. It is given by Q3 = lm [ (3N/4 ) Fm ] * C f lm - Lower limit of the quartile class Fm The cumulative frequency of the previous class f - The frequency of the Quartile class C The class interval N The total frequency Quartile deviation is easy to calculate and is not affected by open or unequal classes. But is not based on all observations. Further it is not susceptible to algebraic operations. Also a change in the method of classification will affect the QD. Hence this is also not a satisfactory measure of dispersion. To avoid the defects of this measure of dispersion we introduce another measure which will depend on all scores. The AM of all the deviations from a constant is taken as a measure of dispersion. But sum of the deviations taken from the mean is zero. Hence to avoid this we take only absolute deviations. Mean deviation is defined as the AM of the absolute deviations taken from an average. Hence we can take mean deviation about the mean, about the median, about the mode and so on. c) Mean Deviation Mean deviation is easy to calculate; it is based on all the observations and is not affected by the extreme values. But it is not susceptible to algebraic operations and also it does not give equal weightage to positive and negative deviations. Hence this also may fail to represent a good measure of dispersion. Hence we need a measure of dispersion devoid of all the above defects. In order to give equal weight to both positive and negative deviations, we take a new measure of
dispersion known as Root Mean Square Deviation. Here we take the AM of all the squares of the deviations taken from any origin and then its square root is taken. It is denoted by S S = ( fi [xi - A] 2)/ N A is any origin N = fixi When root mean square deviation about the mean is taken it is known as Standard Deviation , denoted by = ( fi [xi - x +2)/ N The square of standard deviation is known as Variance. Standard deviation has almost all the properties of an ideal measure of dispersion and hence is widely used. It is well defined with a clear definite formula. It depends on all the observation and is susceptible to algebraic operations. There is no difficulty in algebraic signs as we take the square of deviations. It is usually very stable even though at times extreme values do affect it. It is also affected by open and unequal classes. But these defects are negligible compared to the advantages. It aids in the measurement of skewness and Kurtosis and also in the testing of the hypothesis. The above measures of dispersion are all expressed in specific units like Distribution of heights will be in centimeters and distribution of weights will be in Kilograms. Even the dispersion of two distributions will be wide or narrow depending on the circumstances. Hence comparison becomes difficult. So we need a measure which is independent of units. So we take a measure of dispersion and divide it with the corresponding average. Relative Measure of Dispersion (Coefficient of Dispersion) is defined as the ratio of a measure of dispersion to the average about which the dispersions are taken. Hence the usual measure of dispersions are (Mean Deviation / Mean); (Mean Deviation / Median); (Mean Deviation / Mode); (Quartile Deviation / Median); (Standard Deviation / Mean). Coefficient of variation is the ratio of standard deviation to the mean. Usually it is represented as a percentage i.e. Coefficient of Variation = ( x)* Coefficient of variation also plays a very important role in statistical calculations ( Three important measure of dispersion are QD, MD, SD)
5.
Write an essay on Skewness
Facts are the most essential factors for the study of anything. Facts collected from various sources are termed as data. The data thus collected will be a bundle of heterogeneous ideas. For claritys and brevitys sake they have to be arranged into groups or classes having homogenous qualities. Dividing the data into smaller groups is called classification and putting the data into tables is called tabulation. However well the classification and tabulation are done, the data will be in a dull numerical form and will not appeal to the human mind. Further the comparison of more than one distribution cannot be done if the data is in classified form. Hence we need certain constants to understand a single distribution better or to compare two or more distributions. There are three types of such constants in common use. The first two being averages and measures of dispersion. Averages give in a representative value of the distribution, while the measures of dispersion give the extent to which the individual items differ from a central value. But none of these attributes tell us about the form or type of the distribution. By the type of distribution we mean, in a symmetrical distribution the mean, median and mode coincide. In certain distributions the variates tend to be dispersed more on one side of the central value, than the other. In such cases we say the distribution is Skewed. Skewness has got the effect of pulling arithmetic mean and median away from the mode, sometimes to the right or sometimes to the left. If more observations are on the LHS of the mode it is called Negatively Skewed. If more variates are on the RHS of the mode then the distribution is said to be Positively Skewed.
A natural measure of skewness therefore seems to be the difference between the AM and the mode or that between median and mode. Therefore the usual measure of skewness is
Mean Mode Median mode
The above measures of skewness are absolute measures. For the purpose of comparison it is necessary to have a relative measure of skewness, which is independent of units. The relative measures of skewness are obtained by dividing an absolute measure of skewness by any measure of dispersion. The purpose of studying skewness is to find out how much more or less do the items on one side deviate from the items on the other side of the central value. Therefore the absolute measures of skewness should not be divided by any central value but only by any measure of dispersion to get a relative measure of skewness. Relative measures of skewness are also known as coefficient of skewness. The most practical measure of skewness are (i) (Mean Mode) SD , this is known as Pearsons Coefficient of Skewness (ii) 3* (Mean Median)/ SD, the value of the above coefficient of skewness would be zero for a symmetrical distribution and a pure number for a skewed distribution. In most of the asymmetrical distribution, the above coefficient of skewness will lie between +1 or 1. Quartile Measure of Skewness It has been proved that in a symmetrical distribution (M-Q1) = (Q3 M), but in a skewed distribution this is not true. Hence the difference Q3 M (M-Q1) is used as a measure of skewness. The quartile coefficient of skewness is defined as [(Q3 M) (M-Q1)+ 2 Substituting in the equation it becomes (Q3 + Q1 -2M)/ (Q3-Q1) *where = (Q3 Q1)/2]
This measure is not very satisfactory as it is not based on all observations. To test whether a distribution is skewed the following tests are applied. In a skewed distribution mean, median and mode will not coincide. In a skewed distribution the two quartiles will not be equidistant from the median. A skewed distribution when plotted in a graph paper will not give a symmetrical bell shaped curve. Since skewness is just a measure of the excess positive deviations over the negative deviation or vice versa. Any odd moment can be made use of to measure skewness. Since M1 is zero M3 can be taken as a measure of skewness. 3 = M3 3 can be taken as the Coefficient of Skewness
PROBABILITY 1. Sample Space Experiments like tossing a coin or throwing a die or drawing a ticket in a lottery which could be repeated under the same conditions are random experiments. The result of a single trial of a random experiment is called an Outcome. For example if a coin is tossed getting a head is an outcome. If a die is tossed getting the face 2 is an outcome. The set of all possible outcomes of a random experiment is called the Sample Space for the experiment. The Sample space is denoted by S. Subsets of a sample space including the null set and the set itself are called random events. We can give a geometrical representation of the sample space for any random experiment. Outcomes will then be represented by point sets in S. Therefore the outcomes are also called Sample Points. Any subset of S will be called an event. A sample space is called discrete if it contains only finite number of points or countable number of infinite points. If the sample space consists of infinitely many points which are not countable, then it is called an uncountable sample space or discontinuous sample space. If A is an event in a sample space then A and satisfy the following axioms AUS=S AS=A AU=A A= SU=S S= S= AUA=S A A = * where A is the event of non occurrence of A +
(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix)
Consider the experiment of tossing a coin, there are two outcomes getting a head or getting a tail. Denote head by 1 and tail by 0. Then sample points of the outcomes are 1 and 0. These can be represented by points in a line. Hence the sample space is one dimensional and discrete. Consider the experiment of tossing two coins. The sample points are (1, 1), (1, 0), (0, 1), (0, 0). These can be represented by points on a plane. Hence this is a two dimensional sample space and is also discrete.
[Consider the experiment of tossing 3 coins. The sample points of the outcomes can be marked as (1,1,1), (1,1,0), (1,0,1), (0,1,1), (1,0,0), (0,1,0), (0,0,0). These points will represent 3 dimensional sample space which is also discrete.] [Consider the experiment of getting number between 0 and 10. There are infinite number of points which can be represented by points on a straight line . These points are uncountable but can be put within the limits of 0 to 10. The sample space thus obtained is continuous and is one dimension.] In discrete sample space two events A and B are said to be mutually exclusive if the occurrence of one excludes the occurrence of the other. In a discrete sample space events A1, A2, .An are said to be equally likely if all the events have equal chances of occurring. They are said to be exhaustive if at least one event materializes in a trial. The events A and B are said to be independent if the occurrence of one has nothing to do with the occurrence of another. Two events are said to be dependant if the occurrence of one depends on the occurrence of the other.] For a given sample S we define a set of functions on P of S, which associates a real number P(A) with every event A in S. This function P(A) is called the probability measure on S if it satisfies the following axioms P(A) P(A) P(S) =1 P(A1 U A2 U U An ) = P(A1) + P(A2) +..P(An) where A1, A2, .An are mutually exclusive events as the ration of the number of sample points corresponding to the occurrence of the event A to the total no of sample points in sample space i.e. if there are totally n sample points and m sample points in A then P(A) = m/n A random variable X is a real valued function defined over the outcome set. The domain of the function is the outcome set and the range is the subset of the real numbers. Modern probability theory is based on associative approach and this approach deals with sample space and its sample points. Hence probability theory is incomplete without the knowledge of sample space. 2. Give a Symmetric Approach to Axiomatic Probablity Probability is the branch of science dealing with laws governing chance phenomenon. The theory has its origin in the attempt at solving problems of games of chances. The probability of the occurrence of an event means the amount of likelihood of the occurrence of an event.
There are essentially three different ways of defining probability. They are classical, relative frequency limit, and axiomatic approach to probability. According to classical definition if an event A can happen in m cases out of a total of n exhaustive mutually exclusive and equally likely cases, the probability of event A is (m/n). According to frequency approach if an event A has occurred m times in a series of n independent trials then
P(A) = Lt n (m/n)
Both these approaches have their own limitations. Classical definition fails if the events are not exhaustive and equally likely. Frequency approach fails when it is not possible to make finite number of trials under the same conditions. These limitations have necessitated a new definition of probability known as the Axiomatic Approach to probability. This modern approach to probability is purely axiomatic and is based on set theory. To begin with we define certain basic terms and concepts. Random Experiments: Experiments with the following features are called random experiments It is repeatable under uniform similar conditions It should have a number of outcomes It should not be possible to predict the outcomes of any particular trial. When an experiment is conducted, every result in this experiment is considered to be an elementary outcome or a sample point, or a simple event. A set of points representing all the possible outcomes of an experiment is called sample space and is denoted by S. A sample space is called discrete if it contains only finitely many points or infinitely many points which are countable. A sample space is continuous if S has infinite number of points which are uncontrollable. Subsets of the sample space S including the null set and the set itself are called random events. That is any subset of S is defined as a random event in the sample space. Eg: in tossing two coins getting both as head is a random event. If A and B are two events in S then the sample points in A or in B both of them is defined as the union of the events A and B and is denoted by A U B. The sample points which are both in A and B form the event known as the intersection of the events A and B and is denoted by A B. Two events are said to be mutually exclusive if the occurrence of one event excludes the occurrence of the other event. If A and B are mutually exclusive then A B = . If the occurrence of one event does not depend on the occurrence of another event then the two events are said to be independent.
A function whose domain is a set of sets and whose range is a set of real numbers is called a set function. Let S be a sample space and let A be any event in S. For a given sample space S we define a set function P(A) of S which associates a real number P(A) with every event A in S. This function P(A) is called a probability measure of the sample space if the following axioms are satisfied P(A) is real and P(A) > 0 P(A) P(S) = 1 P(A1 U A2 U U An ) = P(A1) + P(A2) +..P(An) [where A1, A2, .An are mutually exclusive events] According to this in a sample space the probability of an event A can be defined as the number of sample points in the set A divided by the total number of sample points. According to this definition if there are no sample pints in an event, then the probability of that event is zero. This means that the probability of an impossible event is zero. The probability of a sure event is 1. If A is an event and P(A) = P then P(A) = - P(A) =1P =q *Where A is the non occurrence of the event A and always p + q = .+ Axiomatic approach to probability is a more sophisticated definition than the other two. 3. Trace The Classical Definition Of Probability , Discussing The Limitations With Suitable Example Wherever Necessary Probability is the branch of science dealing with laws governing chance phenomenon. The theory has its origin in the attempt at solving problems of games of chances. The probability of the occurrence of an event means the amount of likelihood of the occurrence of an event. When experiments are conducted in certain cases we are certain of the result, while in some other cases we cannot say for sure what the outcome will be. For example when a ball is thrown into the air we are quite sure it will come down. But is a coin is tossed we are not quite sure whether head will occur. But actual outcomes of such experiments are sometimes of vital importance in practice. Eg: winning a toss in a cricket match gives the captain the freedom of choosing
batting or fielding this makes a lot of differences. In such cases there is an element of chance numerically. To summarize by probability we mean to give a numerical expression to the chance of certain outcomes of experiments in the face of uncertainty. Before we proceed it would be nice if we classify certain items. Any operation that gives two or more results is called an experiment. Each result of an experiment is called an outcome. A combination of one or more outcomes is called and event. Events are said to be equally likely is all outcomes have equal chances of occurring in a trial. Events are said to be exhaustive if all the possible outcomes are there in an experiment, and at least one event materializes in a trial. In an experiment if the occurrence of an event excludes the occurrence of another event then these two events are said to be mutually exclusive. If the occurrence of an event does not affect the occurrence of another event, then they are called independent events. If the occurrence of an event affects the occurrence of another event, then they are called dependent events. (Give example) If an even can result in n exhaustive , equally likely and mutually exclusive events and m of them are in favor of the event A , then the probability of the occurrence of the event A is given by (m/n). It is denoted as P(A) . Since m and n are both positive integer and m n ; P(A) is always positive and . The complement of the event A which means non occurrence of the event A is denoted by A or AC then If P(A) = P P(A) = - P(A) =1P =q Always p + q = 1 Probability defined in the above manner is called mathematical or classical probability. It is also known as Aprioric probability as the probability is calculated prior to the experiment. From classical definition it follows that the probability of an impossible event is zero and that of a null event is 1. When an unbiased coin is tossed to get the probability of getting a head , let the event be A Total no: of outcomes = 2 No: of favorable outcomes = 1 P(A) =
Limitations of this Definition This is not a very satisfactory measure as it fails when the no: of possible outcomes of the experiment is infinite. Eg: if we want to get the probability that a positive integer drawn at random is even, the total no of outcomes is infinite hence classical definition cannot be applied. Further their definition is based on cases which are equally likely and such cannot be considered equally likely and hence probability of getting head will not be Also in many cases it may not be possible to enumerate all the possible outcomes of a certain experiment. In such cases this method fails. Apart from the above defects at the classical definition cannot answer questions such as what is the probability that a particular child born on a particular day in a certain locality is a boy? What is the probability that a person aged 50 will die in the next year? and so on. Whatever may be the limitations of classical definition, it is still used in calculating probability in many cases. Further it is the beginning of the other definition of probability. 4. Critically Examine Frequency Approach to Probability Probability theory is the branch of science dealing with the laws governing chance phenomena. The theory has its origin in the attempt at solving problem of games of chances. The probability of occurrence of an event means the amount of likelihood of the occurrence of the event. The probability of an event is a positive number lying between 0 and 1. There are essentially three different ways of defining probability. They are Classical or Mathematical Probability or Aprioric Probability Relative Frequency Limit or Statistical or Aposterioric Probability Axiomatic Approach to Probability
In a sequence of random experiments, it is not possible to predict individual results. The result may vary from one observation to another. Inspite of the irregular behavior of individual results the average results of a long sequence of random experiments show striking regularity. This regularity is termed as statistical regularity. Let us conduct an experiment a large number of times and observe the result. If we find that among the first n experiments the event has occurred exactly r times the ratio r/n is called Frequency Ratio or Relative Frequency of the event A in the sequence formed by n experiments. If we observe the frequency ratio r/n of a fixed event A , for increasing the
value of n we find that it shows a marked tendency to become more or less constant, i.e. suppose the experiment has been performed n times, let the event A occur in r of the occasions. Repeat the experiment larger and larger number of times. It will be observed that the frequency ratio r/n stabilizes around a particular value. Eg: Let us toss a coin 100 time out of which head occurs 48 times, then r/n = 48/100 = 0.48 Let us toss a coin 1000 time out of which head occurs 510 times, then r/n = 510/1000 = 0.510 Let us toss a coin 10000 time out of which head occurs 5050 times, then r/n = 5050/100 = 0.505 We find r m stabilizes around .5 as the number of trials tends to . The limit of r n as n is taken as the probability of the event A. It is denoted by P(A) i.e. Lt n (r/n) = P(A) In the above result P(A) = , this is also called Aposteriorie Probability as the probability is calculated after the experiment is done. The frequency definition of probability is a method of estimating probability from experimental trials. There are many limitations to this definition Always the trials may not be conducted under the same conditions, which is an essential condition for the above method. We can never make infinitely many trials But all the same this definition holds good in many trials and probability is calculated in many cases using this definition. However, this definition paves the way to a more sophisticated approach namely Axiomatic approach to probability.

Statistics Essays 1. Classification & Tabulation

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Statistics Essays 1. Classification & Tabulation

Caricato da

Copyright:

Formati disponibili

STATISTICS Essays 1.

AM = (f1x1 + f2x2 +.. fnxn)/ (f1+ f2+..+fn)

Write an essay on Skewness

Mean Mode Median mode

(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix)

Potrebbero piacerti anche