Sei sulla pagina 1di 3

Statistics is the study of the collection, organization, analysis, and interpretation of data.

[1][2] It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. Once a sample that is representative of the population is determined, data are collected for the sample members in an observational or experimental setting. This data can then be subjected to statistical analysis, serving two related purposes: description and inference.

Descriptive statistics summarize the population data by describing what was observed in the sample numerically or graphically. Numerical descriptors include mean and standard deviation for continuous data types (like heights or weights), while frequency and percentage are more useful in terms of describing categorical data (like race). Inferential statistics uses patterns in the sample data to draw inferences about the population represented, accounting for randomness. These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and modeling relationships within the data (for example, using regression analysis). Inference can extend to forecasting, prediction and estimation of unobserved values either in or associated with the population being studied; it can include extrapolation and interpolation of time series or spatial data, and can also include data mining.[13]

The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of a data set often reveals that two variables (properties) of the population under consideration tend to vary together, as if they were connected. For example, a study of annual income that also looks at age of death might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated; however, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to immediately infer the existence of a causal relationship between the two variables Both descriptive and inferential statistics look at a sample from some population. The difference between descriptive and inferential statistics is in what they do with that sample:

Descriptive statistics aims to summarize the sample using statistical measures, such as average, median, standard deviation etc. For example, if we look at a basketball team's game scores over a year, we can calculate the average score, variance etc. and get a description (a statistical profile) for that team. Inferential statistics aims to draw conclusions about the population from the sample at hand. For example, it may try to infer the success rate of a drug in

treating high temperature, by taking a sample of patients, giving them the drug, and estimating the rate of effectiveness in the population using the rate of effectiveness in the sample. Type of test One-sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory or are calculated from the population. Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment. Paired tests are appropriate for comparing two samples where it is impossible to control important variables. Rather than comparing two sets, members are paired between samples so the difference between the members becomes the sample. Typically the mean of the differences is then compared to zero. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. T-tests are appropriate for comparing means under relaxed conditions (less is assumed). Tests of proportions are analogous to tests of means (the 50% proportion). Chi-squared tests use the same calculations and the same probability distribution for different applications:

Chi-squared tests for variance are used to determine whether a normal population has a specified variance. The null hypothesis is that it does. Chi-squared tests of independence are used for deciding whether two variables are associated or are independent. The variables are categorical rather than numeric. It can be used to decide whether left-handedness is correlated with libertarian politics (or not). The null hypothesis is that the variables are independent. The numbers used in the calculation are the observed and expected frequencies of occurrence (from contingency tables). Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data. The null hypothesis is that the curve fit is adequate. It is common to determine curve shapes to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation sums the squared errors.

F-tests (analysis of variance, ANOVA) are commonly used when deciding whether groupings of data by category are meaningful. If the variance of test scores of the left-handed in a class is much smaller than the variance of the whole class, then it may be useful to study lefties as a group. The null hypothesis is that two variances are the same - so the proposed grouping is not meaningful

Potrebbero piacerti anche