Sei sulla pagina 1di 37

ANALYSIS OF DATA

Manoj Kumar Y Department of Pharmaceutics 1701128860111

April 5, 2013

Contents
1. Brief introduction on sampling I. Methods of sampling 2. Probability distribution 3. Terms used 4. Parametric and Non Parametric methods 5. Parametric methods I. T test II. 2 test III. F test 6. Nonparametric methods I. Types of data II. Methods III. Sign test IV. Wilcoxon signed rank test V. Wilcoxon Rank sum test VI. Kruskal Wallis Test ( one way ANOVA) VII. Friedman test (two way ANOVA) VIII.Analysis of covariance IX. Run test for randomness X. Contingency tables 7. References
April 5, 2013

04 05 07

09 11 12 13 14 15 17 19 21 23 25 26 27 30
2

Brief introduction on sampling.


You dont have data unless you select samples. Hence sampling is a prime step. Sample is a thing which represents the whole lot. Samples includes raw materials, intermediate products, finished products, packed products etc.,

Sampling plan
After selection of sample, plan should be made for approval. It may be single, double, multiple sampling.

Single Approving the lot by examining a single lot. Double Approving by examining the second lot where first lot dont meet requirements. Multiple Approving the lot by examining many lots till it satisfies the preset requirements. Eg Dissolution test Q limits.
April 5, 2013 3

Methods of sampling
Methods of sampling
Random sampling Random sampling (each sample has equal opportunity) Random numbers Lottery method Stratified sampling Systematic sampling Non random sampling Judgement, purposive or deliberate

Quota
Convenience

Cluster sampling

April 5, 2013

Probability distribution
Mathematical determination of frequency distribution.

Types of probability distribution


1. Binomial Distribution For two mutually exclusive results p and q, it is given by the formula

r success for n trials


2. Poisons Distribution Studies probability of occurrence of rare events. It is given by e-m vaules are tabulated
April 5, 2013 5

3. Normal distribution It is for continuous variables, it uses Z transformation. It is given by

X is random variable, is mean, s is standard devieation

April 5, 2013

Terms used Significance is having difference between two compared samples or methods.
Level of significance Probability of making type 1 error here null hypothesis is rejected when it is true Different levels are 1%, 5%, 10% etc., mostly 5% is used. Mostly 5% level of significance is used (p = 0.05) Ranking, average ranking, geometric mean of ranks -

Different ranks are given for the data according to ascending or descending order. For values with same ranks average of ranks is given as common value. Geometric mean of values is given as Hypothesis
April 5, 2013

Null hypothesis Ho and Alternative hypothess H1


7

Parametric methods and non parametric methods


Parametric methods
Parametric methods considers the probability distribution parameters.

Non Parametric methods


Special prameters are not considered, need continuous distribution os data
Sign test Wilcoxon signed rank test Wilcoxon rank sum test Kruskal wallis test Friedman test Analysis of covariance Run test Contingency tables
8

t test Chi-square test F test

April 5, 2013

Parametric methods
t - Test
Assessing the significance of difference is test of significance.

t test differs with small and large sample sizes.


For large samples Standard error of difference is obtained as Where s1 and s2 are variance n1 and n2 are no of samples Difference in mean is given by

April 5, 2013

Smaller samples

where

are means
Paired samples

where

April 5, 2013

10

2 test
This test describes the magnitude of difference between observed frequency and expected frequency.
These differences are compared to the tabulated values to show significance Where f0 is observed frequency fe is expected frequency Degree of freedom Given by (c-1) or (c-1)x(r-1) Increase in the degree of freedom increases the symmetry of the curve
April 5, 2013

Applications
Check the good fit. Check Independence of variables by contingency tables Test for homogenity Detection of linkage 11

F - test
This test is to check the equality of 2 variables from two populations. It is simply given as Where S1 and S2 are variances if not given directly in the data can be obtained by squaring standard deviation
With reference to ANOVA it is given as

It is variability between in the group to variability within the group.


April 5, 2013 12

Non Parametric Methods


Distribution free statistics, Unknown type of distribution Need continuous data, independent.

Types of Data
1. Nominal/ Categoirical data categorized in to a particular group. Eg colours like white, Black, Yellow, Red,

2. Ordinal the data is ranked. Eg Assigning 1, 2, 3, ranks in order.


3. Interval /ratio scaled interval data. Eg 1 5, 6 10,
April 5, 2013 13

Methods
1. 2. 3. 4. 5. 6. 7. 8. Sign Test WilcoXon Signed Rank Test Wilcoxon rank sum test Kruskal wallis test (one way ANOVA) Friedman test (two way ANOVA) Analysis of covariance Contingency tables Run test

April 5, 2013

14

Sign test
It is the simplest of nonparametric tests This is compared to one sample t test of parametric tests.
Ex - Data from peak Plasma concentraion
Subject Time of peak plasma conc. (hr) A B 2.5 3.5 3.0 4.0 1.25 2.5 1.75 2.0 3.5 3.5 2.5 4.0 1.75 1.5 2.25 2.5 3.5 3.0 2.5 3.0 2.0 3.5 3.5 4.0 Difference (BA) +1 +1 +1.25 +0.25 0 +1.5 -0.25 +0.25 -0.5 +0.5 +1.5 +0.5

Steps
1. Calculate the difference of measurent of paired matches 2. Remove ties if any 3. Count the number of times one treatment has higher value. Number of positives or negatives. 4. Compare with tabulated values.
April 5, 2013

1 2 3 4 6 6 7 8 9 10 11 12

15

Sign test
The values of Z for more than 20 samples can be obtained by normal approximation.

Where

P is proportion observed N is number of samples

It can be simplifies as

April 5, 2013

16

Wilcoxon signed rank test


This is sensitive than sign test.
Value Rank 1 2 3 4 5 6 7 8 9 10 11

Steps
1. Claculate difference 2. Ranks are asigned disregarding sign 3. Average ranks are given for same values 4. Reassign the signs to assigned ranks 5. Ranks with like sign are summed up
-0.25 0.25 0.25 -0.5 0.5 0.5 1.0 1.0 1.25 1.5 1.5

Assigned Rank Rank with sign 2 -2 2 2 2 2 5 -5 5 5 5 5 7.5 7.5 7.5 7.5 9 9 10.5 10.5 10.5 10.5

April 5, 2013

17

Wilcoxon signed rank test


Compare the smaller rank with the tabulated values

For sample size larger then 20 it is calculated by normal approximation as

Where

R is sum of ranks(larger or smaller N is number of samples

April 5, 2013

For bioavailability studies confidence intervals are used.

18

Wilcoxon Rank sum test


This is also known as Mann Whitney U test. This is for independent samples Original apparatus Steps 1. Data is pooled and ranked irrespective of group 2. Average ranks are given to equal values. 3. Ranks of one group are summed up. 4. Compared with significance limits.
modified apparatus Amount Rank Amount Rank dissolved dissolved 53 3 58 11 61 14 55 5.5 57 9 67 21 50 1 62 15.5 63 17 55 5.5 62 15.5 64 18.5 54 4 66 20 52 2 59 12.5 59 12.5 68 22 57 9 57 9 64 18.5 69 23 56 7 If ties are included then correction factor is used
19

April 5, 2013

Wilcoxon Rank sum test

For samples larger than 10 normal approximation is using formula

Where

T is sum of ranks for small sample size N1 is small sample size N2 is large sample size

For the above data Z is obtained as 1.63 which is not more at 5% significance.(form significance table)
April 5, 2013 20

Kruskal Wallis Test ( one way ANOVA)


This is extension of rank sum test steps 1. Ranks are given to the data similar to Wilcoxon Rank sum test 2. Average ranks are assigned and regrouped. 3. Ranks in each group are summed.
Example effect of sedative on rats (time in min)
Control 8 1 9 9 6 3 15 1 April 5, 2013 7 Rank 22 3.5 535 24.5 15 10 28 3.5 18.5 low dose 10 5 8 6 7 7 15 1 15 7 Rank 26 13 22 15 18.5 18.5 28 3.5 28 18.5 High dose 3 4 8 1 1 1 1 6 2 2 Rank 10 12 22 3.5 3.5 3.5 3.5 15 7.5 7.5

21

Kruskal Wallis Test ( one way ANOVA)


This test statistic can be approximated by chi- square design -

Where

k-1 is degree of freedom k is number of groups N is total no of observations Ri is sum of ranks of ithgroup ni is number of observation in ith group

Chi-square is obtained as

, and shows sginificance(p=0.05).

April 5, 2013

22

ANOVA One way.


ANOVA Analysis of Variance In one way model influence of one factor is studied. The variation is set between the samples and within the samples. Rows show the replicates, columns show different treatments.

Example -

Different treatments
Sample 1 20 10 17 17 16 sample 2 19 13 17 12 9 sample 3 13 12 10 15 5

April 5, 2013

Replicates

23

ANOVA One way.


Steps involved in ANOVA one way
1. Find the sum of squares of columns and denote it as A 2. Square the sum of column and divide by number of observations and denote it as B 3. Find the correction factor, correction factor is square of sum of all the observations divided by number of observations. Denote it as D 4. Calculate F value and compare with tabulated value.
Source of variation Between treatments (column) Residual Total
April 5, 2013

Degree of freedom Sum of squares Mean of squares c-1(n1) B-D B-D/c-1 c(r-1) (n2) cr-1 A-B A-D A-B/c(r-1)

24

ANOVA One way.


Calculated F value is equal to 2.16. The tabulated F value is 3.68 Because the F value is below the tabulated value there is no significance at level 5%(p = 0.05)

April 5, 2013

25

Friedman Test (two way ANOVA)


Data is arranged as in two way ANOVA model. Steps

1. Unlike rank sum test the ranks are given to individual groups i.e., with in the group. 2. If there are ties average ranks are given. 3. Test for significance is obtained by chi-square distribuiton
Tablet formulation 1 2 3 4 5 Ri
April 5, 2013

Tablet press A 7.5(4) 8.2(3) 7.3(1) 6.6(3) 7.5(3) 14 B 637(1) 8.0(2) 7.9(3) 6.5(2) 6.8(2) 10 C 7.3(3) 8.5(4) 8.0(4) 7.1(4) 7.6(4) 19 D 7.0(2) 7.9(1) 7.6(2) 6.7(1) 6.7(1) 7
26

Friedman Test (two way ANOVA)


Chi-square distribution is given as

Where

c is number of column c-1 is degree of freedom r is number of rows Ri is sum of ranks in Ith group

The value of chi-square for the above data is 9.72 At the level of 5% significance chi-square value is high and the tablet press shows significant differences
April 5, 2013 27

ANOVA Two way.


This method can handle two variables or influencing factors.

Example
Yields of different variety of rice and four nitrogen rates were recorded. Different treatments
Nitrogen rate (Kg/ha) 0 30 60 90 v1 4.50 4.30 5.60 5.21 19.61
April 5, 2013

v2 5.01 6.17 6.37 6.48 24.39

v3 6.11 6.92 7.27 7.86 28.16

Total 15.62 17.39 19.24 19.91 72.16


28

Replicates

ANOVA Two way.


Steps involved in ANOVA two way 1. Find the sum of squares of columns and denote it as A 2. Square the sum of column and divide by number of observations and denote it as B 3. Square the sum of row and divide by number of observations and denote it as C 4. Find the correction factor, correction factor is square of sum of all the observations divided by number of observations. Denote it as D 5. Calculate F value and compare with tabulated value.

April 5, 2013

29

Source of variation Between treatments (column) Between replicates (rows) Residual (row) Total

degree of freedom c-1(n1) r-1(n1) (c-1)(r-1) (n2) cr-1

sum of squares B-D C-D (A-D)-[(B-D)+C-D)] A-D

mean of squares B-D/c-1 C-D/r-1 (A-D)-[(B-D)+C-D)]/ (c-1)(r-1)

Calculated F value for the given data is between varieties 25.34, between treatments is 14.38 F values for treatments i.e., different varieties has significance, for nitrogen fretilizer it is below the tabulateed value and has no significance.
April 5, 2013 30

Analysis of covariance
Steps involved 1. 2. 3. 4.

This is proposed by Quade

Rank X, Y irrespective of their treatments. Correct ranks to get mean of ranks 0. Perform regression for all data (predicted value) Residual is calculated (observed predicted) i.e., Ry-predicted
raw material X 98.40 98.6 98.6 99.2 Ranks RY 2.50 1.50 3.50 -0.5 RX -3.0 -1.0 -1.0 2.50 predicted 1.445 0.481 0.481 -1.204 Residual 1.0548 1.0182 3.0182 0.7042 5.7957 0.7408 -2.7774 -3.4451 -5.7957

Final assay Y Method 1 98.0 97.80 98.5 97.4 sum Method 2 97.6 95.4 96.1 sum

98.7 99.0 99.3

0.5 -3.5 -2.0

0.5 1.5 -3.0

-0.240 -0.722 1.445

April 5, 2013

31

Run test for randomness


Run is a series of uninterrupted like observations.

Ex 6 tablets with weight of >200 mg, next 5 tablets with weight of < 200 mg, next 4 tablets with weight of > 200 mg, next 5 tablets with weight of < 200 mg, etc.,
In this method compare the number of runs with the tabulated values, upper limits are checked for measuring significance.

If the sample size if greater than 40 then significance is calculated by normal approximation using the formula
Where r is number of runs

April 5, 2013

32

Contingency tables
This is used for categorical data which cant be analyzed by Ranking methods, R X C is rows X columns The relationship in rows and columns in contingency is given by chi-square method with (R-1)(C-1) as degree of freedom

Estimated values are obtained from product of row sum, column sum, and dividing by grand total.

April 5, 2013

33

Data
Very severe
Treatment A Treatment B Total 13 19 32

moderately severe
24 20 44

mildly severe
18 12 30

Total
55 51 106

Expected values
Very severe
Treatment A Treatment B Total 16.60 15.40 32

moderately severe
22.83 21.17 44

mildly severe
15.57 14.43 30

Total
55 51 106

April 5, 2013

34

Thank You

April 5, 2013

35

References:
1. Khan and Khanum, Fundamentals of Biostatistics, third edition, 2009. 2. Sanford Bolton, Charles BON, Pharmaceutical Statistics practical and clinical applicatrions, fourth edition,

April 5, 2013

36

First samples need to be collected , analysed by appropriate method and then the data is made to fit in to any of the frequecy curves, or to different distribuiton which is appropriate, then from different methds the samples are characterized like parametric methods, non parametric methods, control charts, from these methods you can find whether the samples or data shows significance or not from control chart limits.

Potrebbero piacerti anche