Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Group Roll No. 8 17 24 41 55 RM Assignment: RM 5 Q1 Differentiate between following, 1. Parameter and statistic. 2. Level of significance and level of confidence. 3. Null and Alternate hypothesis. 4. Type-I and type-II error. 5. One-tailed and two-tailed test of hypothesis 6. Testing of hypothesis and estimation. 7. Point estimate and interval estimate. 8. Parametric and non-parametric test of hypothesis 9. Z-test and t-test of hypothesis. 10. Test of goodness of fit and test of independence, under chi-square test 11. 1-way ANOVA and 2-way ANOVA. 12. Test of confirmation and test of comparison. 10 Name Sarvesh Desai Pooja Gupta Nilesh Jadhav Rupesh Phalke Venugopalan Swaminathan
Solution: Q 1.1 Parameter 1 A parameter describes a full population 2 A parameter is a property of the underlying population distribution 3 as the sample becomes large, approaches the population mean, which is a parameter Q 1.2 1 Level of Significance It indicates the likelihood that the answer will fall outside that range 1% significance level means 99% confidance level It indicates the likelihood that the answer will fall outside that range
Statitics a statistic describes a sample "statistic" is "a function of a sample/observation." the sample mean is a statistic
Level of confidance Is the expected % of times that actual value will fall with the stated precision limits 95% confidance level means 95 chances in 100 that sample represents true condition Is the expected % of times that actual value will fall with the stated precision limits Alternate hypothesis Page 1
Q 1.3
Null Hypothesis
[Type text]
Q 1.4 1 2 3 Q 1.5 1 2
Type I error Means rejection of hypothesis which should have been accepted Denoted by alapha Can be controlled by fixing it lower One tailed Hypothesis Rejection/Acceptance area only on one side
Q 1.6
Testing of Hypothesis Hypothesis testing is carried out for testing of the assumed criteria Point Estimate The esitmate of a population parameter may be one single value or it could be a range
Estimation of Hypothesis Population parameters are unknown so has to be estimated from sample Interval Estimate Estimation of the parameter is not sufficient. It is necessary to analyse and see how confident we can be about this particular estimation. One way of doing it is defining confidence intervals. If we have estimated q we want to know if the true parameter is close to our estimate. In other words we want to find an interval that satisfies following relation:
Q 1.7
as the name suggests is the estimation of the population parameter with one number
[Type text]
Page 2
Q 1.9 1 2
T test T-test follows a Student s T-distribution A T-test is appropriate when you are handling small samples (n < 30) T-test is more adaptable than Z-test T-tests are more commonly used than Ztests Test of independence under chi sqaure A test of independence is a two variable Chi-square test the goal of a two-variable Chi-square is to determine whether or not the first variable is related to or independent of the second variable A two variable Chi-square test or test of independence is similar to the test for an interaction effect in ANOVA Is the outcome in one variable related to the outcome in some other variable 2 Way ANOVA purpose of the two way Anova is to verify whether the data collected from different sources coverage on a common mean based on two categories of defining characteristics
3 4
Q 1.10 1 2
Q 1.11 1
[Type text]
Page 3
Q 1.12 1 2
Test of confirmation
[Type text]
Page 4
[Type text]
Page 5
Q 2 .8 Q 2 .9
TRUE TRUE
1-beta error is type-II error in which False H0 is accepted. 1% LOS is 99% confidence level which means 99% confidence level is > 95% confidence level Yes. 'F' calculated is lesser than 1 explains variation in data with strong reason
Q 2 .10
TRUE
Q 2 .11
FALSE
Q 2 .12 Q 2 .13
TRUE TRUE
Q 2 .14
TRUE
Q 2 .15
TRUE
Q 2 .16
TRUE
Here False H0 is accepted, indicating failures are accepted hence good hypothesis Alternate hypothesis tells the
Q 2 .17
FALSE
Q 2 .18
FALSE
Lets say population has seasonality factor and while if the sampling is not done proper way, your sample statistic and population parameter can be different.
Q 2 .19
TRUE
[Type text]
Page 6
[Type text]
Page 7
Q 3.1
Q 3.6
Q 3.7
If byx = 0.8, bxy = - 0.2, hence r = 0.4. If byx = 0.8,bxy = 1.6, hence r = 1.13. byx and bxy must be less than 1, always. y = a + bx this equation can be used to estimate value of x for a given value of y always. If two regression lines are perpendicular to each other., correlation coefficient is 1 If r =0.7, amount of variation in y because of x is 70 %.
TRUE
TRUE
[Type text]
Page 8
Q 3.9
FALSE
Q 3.10
FALSE
Q 3.11
FALSE
Q 3.12 Q 3.13
FALSE FALSE
Q 3.14
FALSE When r +/- 1, there is exact linear relationship between X & Y and two regression lines coincides with each other. TRUE Two regression lines always intersect each other at point mean of X and mean of Y
[Type text]
Page 9
This reads as 'the coefficient of variation is equal to the standard deviation divided by the mean, multiplied by 100 (to produce a percentage). The steps required for calculating the coefficient of variation are: [Type text] Page 10
where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:
where
is the mean, is the standard deviation, and N is the number of data points.
5) Syx : standard error of estimate of y because of x. Let us consider yest as the estimated value of y for a given value of x. This estimated value can be obtained from the regression curve of y on x From this, the measure of the scatter about the regression curve is supplied by the quantity:
[Type text]
Page 11
8) Interval estimate. An interval estimate is defined by two numbers, between which a population parameter is said to lie. For example, a < x < b is an interval estimate of the population mean . It indicates that the population mean is greater than a but less than b. 9) Classification, tabulation, presentation of data Tabulation refers to the systematic arrangement of the information in rows and columns. Rows are the horizontal arrangement. In simple words, tabulation is a layout of figures in rectangular form with appropriate headings to explain different rows and columns. The main purpose of the table is to simplify the presentation and to facilitate comparisons "A statistical table is a systematic organisation of data in columns and rows." "Tabulation involves the orderly and systematic presentation of numerical data in a form designed to elucidate the problem under consideration." 10) Frequency curve and histogram Frequency curve is obtained by joining the points of frequency polygon by a freehand smoothed curve. Unlike frequency polygon, where the points we joined by straight lines, we make use of free hand joining of those points in order to get a smoothed frequency curve. It is used to remove the ruggedness of polygon and to present it in a good form or shape. We [Type text] Page 12
12) Yule s coefficient of association In order to find the degree of intensity of association between two or more sets of attributes, we should work out the coefficient of association , Professor Yule s coefficient of association QAB = {(AB)(ab)-(Ab)(aB)}/{(AB)(ab)+(Ab)(aB)} QAB = Yule s coefficient of association between attributes A & B (AB)=Frequency of class AB in which A & B are present (Ab) = Frequency of class Ab in which A is present & B is absent (aB) = Frequency of class aB in which A is absent & B is present (ab)= Frequency of class ab in which both A & B are absent
[Type text]
Page 13
Is Analsed by 1 way ANOVA Q 1.2 Stratified sampling 1 If a population from which a sample is to be drawn does not constitue a homogenous group , stratified sampling technique is used 2 Generally used to obtain representative sample 3 Sampling population is divided into several sub -population(Strata) that are individually more homogenous than the total population then from Stratum items are selected for sampling 4 Sample size ni = { n x N1 x si} /{N1 x s1 +N2 x s2+ ..Ni x si} [Type text]
Page 14
2 3 are conducted in case of descriptive reaserch studies 4 Larger samples 5 Normally used for social & behavioural sciences 6 Example firld research Q 1.5 Simple random sampling 1 Just a random sample 2 every entity from universe may become a sample 3 low cost Nominal data 1 Simply a system of assigning nmber symbols to events in order to lable hem. 2 conveienet for keeping taracks 3 only mode is measure of central tendancy 4 Widely used in surveys Exploratory research 1 This is carried out for exploring new ideasm with support
Q 1.6
Q 1.7
[Type text]
Q 1.9
Bias in Research 1 This may impacts the results of the research 2 This is the attitude
Error in research This impacts a lot the results of the reasearch This is system related Un-structured interview Questions are not fixed Normal standards for recording freedom to condct interview Question sequence may be chaged Factorial experimental design are used in experiments where the effects of varying more than one factor are to be determined There is interractio between row & column entity more complex problem are been looked with multiple rows and columns Provide equivalent accuracy with lesss labour and as such are a source of economy Principle of replication
Q 1.10 Structured interview 1 Invovles a set of predetermined questions 2 Highly standardised techniques of recording 3 Rigid procedure to intervirew 4 Question order is fixed sometimes Q 1 .11 Latin square 1 Very frequenctly used in agricultural reasearch 2 Asumption that there is no interaction between row factor & coum factors 3 No of row & columns are required to be equal 4 Acuuracy us low compared to factorial deisgn
Q 1 .12 1 2
Principle of randomizing
Q 1. 13 Multi-stage sampling 1 It is further dvelopment of cluster sampling 2 Easier to administer 3 Large no of units can be sampledfor given cost under mutlistsge [Type text]
Multi-phase sampling
Page 16
Q2 Justify following statements 1. Quota sampling is a non-probability sampling. 2. We don t need hypothesis firmed up in diagnostic research. 3. Wording of questionnaire can cause ineffective instrument. 4. In Latin square experimental design it is assumed that factors are independent of each other. 5. Stratified sampling method assumes strata to be homogeneous within and heterogeneous between. 6. Convenience sampling is a method of probability sampling. 7. Semantic differential scale requires identifying bi-polar adjectives describing the object. 8. Likert scale is a summative model for attitude measurement. 9. Principle of replication in experimental design is aimed at increasing statistical accuracy 10. Principle of local control in experimental design is identifying effect of known source of variation in data. 11. Non-sampling errors cannot be totally avoided in research. 12. Word association test is a projective method of data collection. 13. Defining the problem involves in identifying unit of analysis and characteristic of interest, time and space references and environmental conditions. 14. Projective methods of data collection are used for inferred characteristics 15. On ordinal data, we can do all mathematical operations. 16. Optimal sample size is based on degree of accuracy and level of confidence expected. 17. Cluster sampling needs each cluster to be homogeneous between and heterogeneous within. 18. Systematic sampling is not truly probability sampling. 19. Parameters of quality data are same whether it is primary data or secondary data. 20. We firm up hypothesis based on exploratory, descriptive and diagnostic research. Solution: 1) Quota sampling is a non-probability sampling. The first step in non-probability quota sampling is to divide the population into exclusive subgroups. Then, the researcher must identify the proportions of these subgroups in the population; this same proportion will be applied in the sampling process. Finally, the [Type text] Page 17
[Type text]
Page 19
[Type text]
Page 20