Sei sulla pagina 1di 6

THE NATURE OF DATA

APPROACHES TO LANGUAGE
Prescriptive -> judgements about how
speakers should use the language. Definition
of the standard variety of the language

Descriptive -> a scientific description of


phenomena, without judgements about
correctness or appropriateness
THE DESCRIPTIVE APPROACH
Based on the observation of data and

The formulation of a hypothesis


The data are used to confirm or reject the
hypothesis
SOURCES OF DATA
The data we can be obtained from different
sources:
Our own intuition. We can create data using our
knowldege of a language.
Interviews and questionnaires. We can create data
asking questions to speakers of the language.
A corpus. This is a collection of texts. They
represent natural occurring data.
PROPERTIES OF A CORPUS
A corpus can be seen as a sample of the
language (because its impossible to study the
whole of the language).
Like any sample, a corpus must be
representative. The more representative it is,
the more reliable your conclusions will be.
REPRESENTATIVITY
What makes a corpus more representative?
The number of words. The more words it has, the
more representative.
The number of text sources. The more sources
your corpus consists of, the more representative it
is. It is better to have 100 words from 100 sources
than 10,000 from one single source.
The nature of the sources. A corpus will be more
representative if it contains different text types,
channels, topics, etc.

Potrebbero piacerti anche