1

1.
Statistics as a science
The purpose of Statistics as a science is to summarize the essential features and relationships of the masses of numerical data in
order to determine patterns of behaviour, particular outcomes or future tendencies. Statistics is a way of getting information
from data, studying large collectivities, and not being interested in individuals. Statistics aims to establish stable characteristics
expressed by a statistical law (conclusion); this can be applied only for the whole collectivity, not for individuals.
2.Main objectives of statistics
Statistics, as a science, consists of an extremely spread collection of methods and it is divided into two main parts: Descriptive
Statistics Inferential Statistics and decision-making Each part represents a main step of a statistical research.
3.Descriptive Statistics
Descriptive Statistics is a set of techniques used for the description of collectivities. Descriptive Statistics consists of data
summarizing, presentation, tabulation, and displaying followed by computation of the central tendency measures, analysis of
data uniformity, consistency, and symmetry interpretation. Once the data are gathered, the next main purpose of Statistics is to
treat them in a specific manner in order to be able to apply further the analysis procedures.
4.Inferential Statistics
Inferential Statistics is a set of procedures used to make predictions about the whole population by studying the properties of a
population sample. Statistical Inference is gathering the set of methods allowing to: draw conclusions on a population based
on the information characterizing the sample; forecast the evolution of a phenomenon; characterize statistical relations
between variables.
5.Types of Data
The statistical characteristics can take different values or forms, from a unit to another or, from a group of units to another
group. This is because the factors determining the level or the form have different ways of action. Thus, the standard
characteristic is variable and it is also called statistical variable. There are many possibilities to classify the statistical
characteristics. -A first choice can split them according to their contents into three categories: 1.1.Time variables (ex. birthday)
1.2.Territorial variables: shows spatial location of statistical data and units 1.3. Attributive variables any variable, except time
and space
6.Attributive characteristics of variable.Qualitative and Quantitative
According to the second criterium, the attributive characteristics can be classified according to their presentation form into:
2.1. Non-numerical or qualitative, represented by words, also called categories. The qualitative characteristics can also be
transformed into numbers according to a scale and pure non- numerical. Also the qualitative scalable characteristics can be
binary and non-binary, according to the number of possible categories they can take, two or many. 2.2. Numerical or
quantitative, represented by numbers. The quantitative characteristics can be: Discrete, being able to take a finite number of
values within an interval of variation, as a result of a counting procedure, like the number of students in a group. Discrete data
have distinct values with no intermediate points Continuous, being able to take an infinite number of values within a variation
finite or not finite class, like the oil production measured in barrels
7.Data summarizing:Frequency:absolute,relative,cumulated.Frequency Distribution.By classes
A statistical distribution is a table with two columns: on the first column, we will have the variants or the classes and on the
second column, we will have the frequencies. We can have absolute, relative, and cumulated frequencies. Absolute frequency,
denoted by fi represents the number of units occurring to a certain variant or falling into a certain class. Relative frequency,
denoted by fir represents the share of the absolute frequency corresponding to a variant or a class into the total number of
frequencies. Cumulated frequencies can be obtained from absolute and relative frequencies. They can be computed as
frequency more than the lower limit , meaning the number of unit with the variable value over the lower limit of the current
class and less than the upper limit, meaning the number of units with the variable value lower than the upper limit of the
current class.
Data summarizing by classes can be applied only for quantitative characteristics. A class of variation or an interval is defined
between two boundaries: its lower and upper limit. The size of the class or the interval size will be defined as the difference
between the upper limit and the lower limit. This classification can be: continuous (the lower limit of the current class is the
same as the upper limit of the previous class) or discrete (there are gaps between the upper limit of the current class and the
lower limit of the next class). In this case the class size will be the difference between two lower limits or between two upper
limits,
Data grouping assumes solving a few issues: a. establishing the purpose of the classification by classes: data summarizing by
classes is used in order to obtain synthetic data b. choosing the classification variable or variables: the result of classification
should be obtaining homogeneous groups c. establishing the number of classes: according to the classification purpose, the
completion criterion (the classification should comprise all the units) and empiric rules, the result should be frequency
distribution as close as possible to the normal distribution (Gauss bell) d. constructing the classes continuously or discretely e.
marking each frequency as they occur
9.Measures of central tendency for ungrouped data-definition and formula
10.
The Median is that value splitting a ranked data set in two equal parts. Compared to the median we will have half of the
population possessing the characteristic with a smaller value and the other half possessing the characteristic larger than the
median value.
The mode is that value of the variable most frequent occurring into a distribution. It is the variable value corresponding to the
highest frequency. The mode is also called modal value or the dominant value into a data set. As a result we cannot determine
the modal value into an ungrouped data set. It can be identified or computed only for frequency distributions.
4.7 Comparing the Central Tendency Measures
Mean, median and mode are identical in a symmetrical distribution. Often we must decide whether to use the mean or the
mode and the median as measure of the central tendency if all the measures can be computed.
Analysing a symmetric distribution we found that all the fundamental central tendency measures are equal, located after 50%
out of the observations: MoMex == and () 3 MexMox = which can be written also as xMeMo = 32 or () 2 MexMoMe =
In the case of the skewed distribution the central tendency measure are located in different places. We have the following
relations between the central tendency measures according to the category of skew ness:
When the frequencies are concentrated around the small values of the variable: xMeMo << and the symmetric distribution
was modified by extending the tale of the distribution toward + and it is becoming skewed to the right, called positive
skewness When the frequencies are concentrated around the large values of the variable:MoMex << and the symmetric
distribution was modified by prolonging the tale of the distribution toward - and it is becoming skewed to the left, called
negative skewness. Choosing between the main central tendency measures when describing the essential topic into a data set
should take into account the specific attributes of the measures.
When computing the mean, we should see if we have extreme values, and if so, they ought to be eliminated, in order to
maintain its representative character. The mean is computed only if we can compute the overall variable value, as i x or ii fx
and we know the number of observations, n.
When computing the median we take into account that this measure is more influenced by the number of observation. The
median can be computed even if we do not know the extreme recorded values. When computing the mode this represent the
most probable value to be encountered and can be computed also for pure qualitative variable measured on the nominal scale.

1

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

1

Caricato da

Copyright:

Formati disponibili

1.

Potrebbero piacerti anche