Sei sulla pagina 1di 35

Application of statistics

Siswanto

The statistics has several meanings. It is frequently used to refer to recorded data such as the number traffic accident or number of patients visiting clinic etc. Use to Calculate for set of data such as mean, standard deviation etc

In short, statistics is body of technique and procedures dealing with the collection, organization, analysis, interpretation and presentation of information that can be stated numerically.

Measures of central tendency


The Mean. The arithmetic mean or mean is computed by summing all the observations in the sample and dividing the sum by the number of observation. The mean is affected by the value of each observation of the distribution.

The median. In a list ranked according to size, the observations arranged in an array, the median is the observation that divides the distribution into equal parts. Is is that value above which there are the same number of observation as below.

The mode. The mode is the observation that occurs most frequently

Which average should you use ?


The arithmetic mean is the most commonly used. In certain situation, particularly if distribution of data observation is skewed, the median proves to better measure than the mean.

Independent and dependent t-test


The sample t-test may be either independent (unpaired) or dependent (paired). If the two samples are independent, there is no connection between any subject in group 1 and any subject in group 2.

Independent and dependent t-test


Paired t-test. In paired t-test, there is a connection between scores in one group and scores in the other. In t-test the scale of independent variable is nominal, and the scale of dependent variable is numeric

Anova
Anova is a logical extension of the t test when we have data from three or more independent group

Assumption of Anova
The observations are independent: that is, the value of one observation is not correlated with the value of another. The observations in each group are normally distributed The variance of each group is equal to that of any other group: that is the variances of the various groups are homogenous, or we have homogeneity of variances.

The chi-square test


Rationale for the chi-square test. Although the t-test is popular and widely used, it may not be appropriate for certain health science problems that call for tests of significance. Because the t test requires data that are quantitative

The chi-square test


For qualitative (Category) data we need other the test, one of the important the test is the chi-square. Category data are not used to quantify blood pressure level, for example but rather to classify persons as hypertensive or normotensive.

Types of chi-square test


Specifically we may employ chi-square to determine the following: 1. Whether the two variables are independent 2. Whether various subgroup are homogenous 3. Whether there is a significant difference in the proportions in the sub classes among the subgroup

Correlation and linear regression


Differences between correlation and regression. The two most common methods used to describe the relationship between two quantitative variables (x and y) are linear correlation and linear regression. The former is a statistic that measure the strength of a bivariate association; the latter is a prediction equation that estimates he value of y for any given x.

When should you use correlation and when regression ?


Example: We are interested in studying the relationship between the pregnancy weight of a group of mothers to their infants birth weights. How strong is the association between the mothers weight and her infants birth weight? The method of choice is to calculate a correlation coefficient as a measure of the strength of association between these two variables.

When should you use correlation and when regression ?


If you ask : what would be an infants predicted birth weight for a mother possessing a known pre pregnancy weight ? We use linear regression analysis

The preferred, and the most likely, methods of obtaining a representative sample is to select a random sampling The basic principal random sampling is that every subject has an equal chance of being selected

The disadvantages of a random sampling: in certain situation and condition this methods are not feasible and to costly

TYPES OF PROBABILITY SAMPLING


Sample random sampling: when the population has been enumerated, each individual is assigned a number and a sample of the required size selected by use of a table of random numbers

TYPES OF PROBABILITY SAMPLING


Systematic sampling: the first unit is chosen at random and then other units for the sample are chosen in a systematic way. For example: by taking every nth person on a list.

TYPES OF PROBABILITY SAMPLING


Multistage sampling: is sub sampling within groups chosen as cluster sample. The first stage is to select the groups or clusters. Then sub samples are taken in as many subsequent stages as necessary to obtain the desired sample size.

TYPES OF PROBABILITY SAMPLING


Example:
1st stage : choice of province within countries 2nd stage : choice of district within each province 3rd stage : choice of sub district within each district and then all persons in sub district become sample

TYPES OF PROBABILITY SAMPLING


Stratified sampling: use to achieve an even distribution of people in different group or strata (e.g. define on the basis of age, sex, social class, race). Divide the list into the strata of interest and then draw equal-sized random sample from each strata.

TYPES OF PROBABILITY SAMPLING


Cluster random: involves choosing group of unit (cluster) at random. All the unit in the selected cluster, are then in the study as a sample Is used when the population is large Less detail sampling frame

Non-probability sampling
Non-probability sampling carries the risk that it will not be possible to ascertain the probability that an individual unit will be included in the sample

Non-probability sampling
The method should therefore be used only when it is not important to refer the result of a study to a reference population. Example: Convenience sampling

Missing and incomplete data


Avoid the missing and incomplete data as much as possible. Missing and incomplete data can introduce bias.

The aim and consideration of sampling


Always try to:
Achieve maximum precision within a given sample size Avoid selection bias Non-random method Sampling frame : incomplete and inaccurate A section of population : refuse or impossible to reach

Bias may occur if:

Basic requisites for reliable sample


Efficiency. This means the ability of the sample to yield the desired information, certain types of sampling are more efficient than others.

Basic requisites for reliable sample


Representative. A sample should be representative of the reference population so that inferences drawn from the sample can be generalized to that population

Basic requisites for reliable sample


Feasibility. The design should be simple enough to be carried out in practice Goal orientation. Sample selection and estimation procedures should be oriented towards the study objectives and research design.

Basic requisites for reliable sample


Size. A sample should be large enough to minimize sample variability and to allow estimates of the population

Basic requisites for reliable sample


Economy and cost efficiency. The design of the sample should be such that appreciable savings in time and cost can be achieved without undermining the study objective.