Sei sulla pagina 1di 11

History and origin of statistics:

The historical origin of the statistical theory underlying the methods of this book, and as some
misapprehensions have occasionally gained publicity, ascribing to the originality of the author
methods well known to some previous writers, or ascribing to his predecessors modern
developments of which they were quite unaware, it is hoped that the following notes on the
principal contributors to statistical theory will be of value to students who wish to see the modern
work in its historical setting.

Thomas Bayes and statistics


Thomas Bayes' celebrated essay published in 1763 is well known as containing the first attempt
to use the theory of probability as an instrument of inductive reasoning; that is, for arguing from
the particular to the general, or from the sample to the population. It was published
posthumously, and we do not know what views Bayes would have expressed had he lived to
publish on the subject. We do know that the reason for his hesitation to publish was his
dissatisfaction with the postulate required for the celebrated "Bayes' Theorem." While we must
reject this postulate, we should also recognise Bayes' greatness in perceiving the problem to be
solved, in making an ingenious attempt at its solution, and finally in realising more clearly than
many subsequent writers the underlying weakness of his attempt.
Whereas Bayes excelled in logical penetration, Laplace (1820) was unrivalled for his mastery of
analytic technique. He admitted the principle of inverse probability, quite uncritically, into the
foundations of his exposition. On the other hand, it is to him we owe the principle that the
distribution of a quantity compounded of independent parts shows a whole series of features - the
mean, variance, and other cumulants - which are simply the sums of like features of the
distributions of the parts. These seem to have been later discovered independently by Thiele
(1889), but mathematically Laplace's methods were more powerful than Thiele's and far more
influential on the development of the subject in France and England. A direct result of Laplace's
study of the distribution of the resultant of numerous independent causes was the recognition of
the normal law of error, a law more usually ascribed, with some reason, to his great
contemporary, Gauss.
Gauss, moreover, approached the problem of statistical estimation in an empirical spirit, raising
the question of the estimation not only of probabilities but of other quantitative parameters. He
perceived the aptness for this purpose of the Method of Maximum Likelihood, although he
attempted to derive and justify this method from the principle of inverse probability. The method

has been attacked on this ground, but it has no real connection with inverse probability. Gauss,
further, perfected the systematic fitting of regression formulae, simple and multiple, by the
method of least squares, which, in the cases to which it is appropriate, is a particular example of
the method of maximum likelihood.
The first of the distributions characteristic of modern tests of significance, though originating
with Helmert, was rediscovered by K Pearson in 1900, for the measure of discrepancy between
observation and hypothesis, known as c2. This, I believe, is the great contribution to statistical
methods by which the unsurpassed energy of Prof Pearson's work will be remembered. It
supplies an exact and objective measure of the joint discrepancy from their expectations of a
number of normally distributed, and mutually correlated, variates. In its primary application to
frequencies, which are discontinuous variates, the distribution is necessarily only an approximate
one, but when small frequencies are excluded the approximation is satisfactory. The distribution
is exact for other problems solved later. With respect to frequencies, the apparent goodness of fit
is often exaggerated by the inclusion of vacant or nearly vacant classes which contribute little or
nothing to the observed c2, but increase its expectation, and by the neglect of the effect on this
expectation of adjusting the parameters of the population to fit those of the sample. The need for
correction on this score was for long ignored, and later disputed, but is now, I believe, admitted.
The chief cause of error tending to lower the apparent goodness of fit is the use of inefficient
methods of fitting. This limitation could scarcely have been foreseen in 1900, when the very
rudiments of the theory of estimation were unknown.

The Probable Error of a Mean:


The study of the exact sampling distributions of statistics commences in 1908 with "Student's"
paper The Probable Error of a Mean. Once the true nature of the problem was indicated, a large
number of sampling problems were within reach of mathematical solution. "Student" himself
gave in this and a subsequent paper the correct solutions for three such problems - the
distribution of the estimate of the variance, that of the mean divided by its estimated standard
deviation, and that of the estimated correlation coefficient between independent variates. These
sufficed to establish the position of the distributions of c2 and of t in the theory of samples,
though further work was needed to show how many other problems of testing significance could
be reduced to these same two forms, and to the more inclusive distribution of z. "Student's" work
was not quickly appreciated, and from the first edition it has been one of the chief purposes of
this book to make better known the effect of his researches, and of mathematical work
consequent upon them, on the one hand, in refining the traditional doctrine of the theory of errors
and mathematical statistics, and on the other, in simplifying the arithmetical processes required
in the interpretation of data.

Origin of modern statistics:


Although the origins of statistical theory lie in the 18th century advances in
probability, the modern field of statistics only emerged in the late 19th and
early 20th century in three stages. The first wave, at the turn of the century,
was led by the work of Francis Galton and Karl Pearson, who transformed
statistics into a rigorous mathematical discipline used for analysis, not just in
science, but in industry and politics as well. The second wave of the 1910s
and 20s was initiated by William Gosset, and reached its culmination in the
insights of Ronald Fisher. This involved the development of better design of
experiments models, hypothesis testing and techniques for use with small
data samples. The final wave, which mainly saw the refinement and
expansion of earlier developments, emerged from the collaborative work
between Egon Pearson and Jerzy Neyman in the 1930s.Today, statistical
methods are applied in all fields that involve decision making, for making
accurate inferences from a collated body of data and for making decisions in
the face of uncertainty based on statistical methodology.

Royal statistical body and statistics:


The first statistical bodies were established in the early 19th century. The Royal Statistical
Society was founded in 1834 and Florence Nightingale, its first female member, pioneered the
application of statistical analysis to health problems for the furtherance of epidemiological
understanding and public health practice. However, the methods then used would not be
considered as modern statistics today.
The Oxford scholar Francis Ysidro Edgeworth's book, Metretike: or The Method of Measuring
Probability and Utility (1887) dealt with probability as the basis of inductive reasoning, and his
later works focused on the 'philosophy of chance'.[16] His first paper on statistics (1883) explored
the law of error (normal distribution), and his Methods of Statistics (1885) introduced an early
version of the t distribution, the Edgeworth expansion, the Edgeworth series, the method of
variate transformation and the asymptotic theory of maximum likelihood estimates.
The Norwegian Anders Nicolai Kir introduced the concept of stratified sampling in 1895.[17]
Arthur Lyon Bowley introduced new methods of data sampling in 1906 when working on social
statistics. Although statistical surveys of social conditions had started with Charles Booth's "Life

and Labour of the People in London" (1889-1903) and Seebohm Rowntree's "Poverty, A Study
of Town Life" (1901), Bowley's, key innovation consisted of the use of random sampling
techniques. His efforts culminated in his New Survey of London Life and Labour.

Founder of statistics Francis Galton:


Francis Galton is credited as one of the principal founders of statistical theory. His contributions
to the field included introducing the concepts of standard deviation, correlation, regression and
the application of these methods to the study of the variety of human characteristics - height,
weight, eyelash length among others. He found that many of these could be fitted to a normal
curve distribution.[19]
Galton submitted a paper to Nature in 1907 on the usefulness of the median. [20] He examined the
accuracy of 787 guesses of the weight of an ox at a country fair. The actual weight was 1208
pounds: the median guess was 1198. The guesses were markedly non-normally distributed.

Karl Pearson, the founder of mathematical statistics.


Galton's publication of Natural Inheritance in 1889 sparked the interest of a brilliant
mathematician, Karl Pearson ]then working at University College London, and he went on to
found the discipline of mathematical statistics. He emphasised the statistical foundation of
scientific laws and promoted its study and his laboratory attracted students from around the
world attracted by his new methods of analysis, including Udny Yule. His work grew to
encompass the fields of biology, epidemiology, anthropometry, medicine and social history. In

1901, with Walter Weldon, founder of biometry, and Galton, he founded the journal Biometrika
as the first journal of mathematical statistics and biometry.
His work, and that of Galton's, underpins many of the 'classical' statistical methods which are in
common use today, including the Correlation coefficient, defined as a product-momentthe
method of moments for the fitting of distributions to samples; Pearson's system of continuous
curves that forms the basis of the now conventional continuous probability distributions; Chi
distance a precursor and special case of the Mahalanobis distance and P-value, defined as the
probability measure of th complement of the ball with the hypothesized value as center point and
chi distance as radius He also introduced the term 'standard deviation'.
He also founded the statistical hypothesis testing theory Pearson's chi-squared test and principal
component analysis.In 1911 he founded the world's first university statistics department at
University College London.
Ronald Fisher, "A genius who almost single-handedly created the foundations for modern
statistical science
The second wave of mathematical statistics was pioneered by Ronald Fisher who wrote two
textbooks, Statistical Methods for Research Workers, published in 1925 and The Design of
Experiments in 1935, that were to define the academic discipline in universities around the
world. He also systematized previous results, putting them on a firm mathematical footing. In his
1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian
Inheritance, the first use to use the statistical term, variance. In 1919, at Rothamsted
Experimental Station he started a major study of the extensive collections of data recorded over
many years. This resulted in a series of reports under the general title Studies in Crop Variation.
In 1930 he published The Genetical Theory of Natural Selection where he applied statistics to
evolution.

Statistics in Real Life:


The impact of statistics profoundly affects society today. Statistics is a very
practical discipline, concerned about real problems in the real world.
Statistical tables, survey results and the language of probability are used
with increasing frequency by the media. Statistics also has a strong influence
on physical sciences, social sciences, engineering, business and industry. The
improvements in computer technology make it easier than ever to use
statistical methods and to manipulate massive amounts of data.

Decision making:
Sometimes decisions are made under certainty where the outcomes of the strategic options are
fully predictable. They essentially deal with the management of resources,such as inventory
control, where the issue in question is to optimize the available resources to achieve the best
possible outcome. These are classic mathematicalprogramming problems.
Sometimes decisions are made under partial uncertainty where knowledge of the outcomes under
different strategic options is incomplete. A classical case is acceptance testing in quality control
where the decision is to accept or reject the batch in question,and this is a problem for preposterior analysis.
Sometimes decisions are made under risk where only the likelihood of the outcomes of various
strategic options is known. They are typical problems of risk and return in investment, where the
issue in question is to optimize the payoff. The usefulness of statistical methods depends very
much on the validity of the quantification of risk withthe variance.

STATISTICS FOR FORECASTING


Theories in multiple regression and time series analysis have provided a well-developed
mathematical framework for business forecasting. In practice, a mathematical model
consisting of a set of multiple regression equations is developed from history.linear
relationships, there are a number of practical presumptions for statistical forecasting to be
effectively applied. Firstly, both the target variables and explanatory variables are
discretely measurable. Secondly, causal relationships in variable sets are logical in a real
life situation. Thirdly, such relationships will persist into the future.
In real life, however, the economic environment is changing. So are consumer tastes and
behaviors. These could upset the demand-supply relationships built up over the years. As
such, causal relations between variables shift continuously over time. The result is that
6

the future usually bears a closer relationship with the immediate past than the distant past.
As forecasting models are often built on rather long time series, their
prediction ability is often impaired.

STATISTICS AS A SCIENCE AND ART:


In the face of the inherent inadequacies of statistical applications in the complexity
of the real world, the experience of the users helps bridge the gap between theory and
practice in the use of statistics in business.
The backdating of swap deposits figures in Hong Kong for the period 1981 to 1984
is an illustration. A swap deposit is an arrangement under which a customer purchases
USdollars from the bank to place a time deposit with that bank and at the same time
enters into a forward contract to sell the US dollar proceeds back to the bank for Hong
Kong dollars upon maturity. As the deposit is denominated in US dollars, it is not subject
to Hong Kong dollar deposit rates regulations. With this arrangement, retail Hong Kong
dollar time deposits would effectively enjoy market rates rather than the regulated rates.
Swap deposits were first offered by banks in 1981 when Hong Kong dollar deposits
were highly regulated and subject to withholding tax. Official statistics were only
available.

Predicting Disease
Lots of times on the news reports, statistics about a disease are reported. If the reporter simply
reports the number of people who either have the disease or who have died from it, it's an
interesting fact but it might not mean much to your life. But when statistics become involved,
you have a better idea of how that disease may affect you.
For example, studies have shown that 85 to 95 percent of lung cancers are smoking related. The
statistic should tell you that almost all lung cancers are related to smoking and that if you want to
have a good chance of avoiding lung cancer, you shouldn't smoke.

Sampling
When full census data cannot be collected, statisticians collect sample data by developing
specific experiment designs and survey samples. Statistics itself also provides tools for
prediction and forecasting the use of data through statistical models. To use a sample as a guide
to an entire population, it is important that it truly represents the overall population.
Representative sampling assures that inferences and conclusions can safely extend from the
sample to the population as a whole. A major problem lies in determining the extent that the

sample chosen is actually representative. Statistics offers methods to estimate and correct for any
bias within the sample and data collection procedures.

Medical statistics:
Medical statistics deals with applications of statistics to medicine and the health sciences,
including epidemiology, public health, forensic medicine, and clinical research. Medical
statistics has been a recognized branch of statistics in the United Kingdom for more than 40
years but the term has not come into general use in North America, where the wider term
'biostatistics' is more commonly used.[1] However, "biostatistics" more commonly connotes all
applications of statistics to biology.[1] Medical Statistics are a sub discipline of Statistics. "It is
the science of summarizing, collecting, presenting and interpreting data in medical practice, and
using them to estimate the magnitude of associations and test hypotheses. It has a central role in
medical investigations. It not only provides a way of organizing information on a wider and more
formal basis than relying on the exchange of anecdotes and personal experience, but also takes
into account the intrinsic variation inherent in most biological processes.

Application
Engineering:

of

Statistic

in

Statistics is a critical tool for robustness analysis, measurement system error


analysis, test data analysis, probabilistic risk assessment, and many other
fields in the engineering world. Traditionally, however, statistics is not
extensively used in undergraduate engineering technology (ET) programs,
resulting in a major disconnect from industry expectations. The research
question: How to effectively integrate statistics into the curricula of ET
programs, is in the foundation of this paper. Based on the best practices
identified in the literature, a unique learning-by-using approach was
deployed for the Electronics Engineering Technology Program at Texas A&M
University. Simple statistical concepts such as standard deviation of
measurements, signal to noise ratio, and Six Sigma were introduced to
students in different courses. Design of experiments (DOE), regression, and
the Monte Carlo method were illustrated with practical examples before the
students applied the newly understood tools to specific problems faced in
their engineering projects. Industry standard software was used to conduct
statistical analysis on real results from lab exercises

Engineering statistics combines engineering and statistics:


Design of Experiments (DOE) is a methodology for formulating scientific and engineering
problems using statistical models. The protocol specifies a randomization procedure for the
experiment and specifies the primary data-analysis, particularly in hypothesis testing. In a
secondary analysis, the statistical analyst further examines the data to suggest other questions
and to help plan future experiments. In engineering applications, the goal is often to optimize a
process or product, rather than to subject a scientific hypothesis to test of its predictive
adequacy.he use of optimal (or near optimal) designs reduces the cost of experimentation.
1. Quality control and process control use statistics as a tool to manage conformance to
specifications of manufacturing processes and their products.[1][2][3]
2. Time and methods engineering use statistics to study repetitive operations in
manufacturing in order to set standards and find optimum (in some sense) manufacturing
procedures.
3. Reliability engineering which measures the ability of a system to perform for its intended
function (and time) and has tools for improving performance.[2][7][8][9]
4. Probabilistic design involving the use of probability in product and system design
5. System identification uses statistical methods to build mathematical models of dynamical
systems from measured data. System identification also includes the optimal design of
experiments for efficiently generating informative data for fitting such models.[10][11]

Experimental and observational studies


A common goal for a statistical research project is to investigate causality, and in particular to
draw a conclusion on the effect of changes in the values of predictors or independent variables
on dependent variables. There are two major types of causal statistical studies: experimental
studies and observational studies. In both types of studies, the effect of differences of an
independent variable (or variables) on the behavior of the dependent variable are observed. The
difference between the two types lies in how the study is actually conducted. Each can be very
effective. An experimental study involves taking measurements of the system under study,
manipulating the system, and then taking additional measurements using the same procedure to
determine if the manipulation has modified the values of the measurements. In contrast, an
observational study does not involve experimental manipulation. Instead, data are gathered and
correlations between predictors and response are investigated. While the tools of data analysis
work best on data from randomized studies, they are also applied to other kinds of data like
natural experiments and observational studies[9] for which a statistician would use a modified,

more structured estimation method (e.g., Difference in differences estimation and instrumental
variables, among many others) that produce consistent estimators.

Geostatistics is a branch of geography that deals with the analysis of data from
disciplines such as petroleum geology, hydrogeology, hydrology, meteorology,
oceanography, geochemistry, geography.

Operations research (or Operational Research) is an interdisciplinary branch of applied


mathematics and formal science that uses methods such as mathematical modeling, statistics,
and algorithms to arrive at optimal or near optimal solutions to complex problems.

Population ecology is a sub-field of ecology that deals with the dynamics of species
populations and how these populations interact with the environment.

Psychometric is the theory and technique of educational and psychological measurement


of knowledge, abilities, attitudes, and personality traits.

Quality control reviews the factors involved in manufacturing and production; it can
make use of statistical sampling of product items to aid decisions in process control or in
accepting deliveries.

Quantitative psychology is the science of statistically explaining and changing mental


processes and behaviors in humans.

Actuarial science is the discipline that applies mathematical and statistical methods to
assess risk in the insurance and finance industries.

Astrostatistics is the discipline that applies statistical analysis to the understanding of


astronomical data.

Biostatistics is a branch of biology that studies biological phenomena and observations


by means of statistical analysis, and includes medical statistics.

Business analytics is a rapidly developing business process that applies statistical


methods to data sets (often very large) to develop new insights and understanding of
business performance & opportunities

Chemometrics is the science of relating measurements made on a chemical system or


process to the state of the system via application of mathematical or statistical methods.

10

Demography is the statistical study of all populations. It can be a very general science
that can be applied to any kind of dynamic population, that is, one that changes over time
or space.

Econometrics is a branch of economics that applies statistical methods to the empirical


study of economic theories and relationships.

Environmental statistics is the application of statistical methods to environmental


science. Weather, climate, air and water quality are included, as are studies of plant and
animal populations.

Epidemiology is the study of factors affecting the health and illness of populations, and
serves as the foundation and logic of interventions made in the interest of public health
and preventive medicine.

Reliability Engineering is the study of the ability of a system or component to perform


its required functions under stated conditions for a specified period of time

Statistical finance, an area of econophysics, is an empirical attempt to shift finance from


its normative roots to a positivist framework using exemplars from statistical physics
with an emphasis on emergent or collective properties of financial markets.

Statistical mechanics is the application of probability theory, which includes


mathematical tools for dealing with large populations, to the field of mechanics, which is
concerned with the motion of particles or objects when subjected to a force.

Statistical physics is one of the fundamental theories of physics, and uses methods of
probability theory in solving physical problems.
=

Statistical thermodynamics is the study of the microscopic behaviors of thermodynamic


systems using probability theory and provides a molecular level interpretation of
thermodynamic quantities such as work, heat, free energy, and entropy.

11

Potrebbero piacerti anche