Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
INTRODUCTION
Q.1: Explain the concept of processing of data and analyse in detail the
different stages in the data processing?
Data Processing
Data reduction or processing mainly involves various manipulations necessary for preparing the
data for analysis. The process (of manipulation) could be manual or electronic. It involves
editing, categorising the open ended questions, coding, computerisation and preparation of
tables and diagrams.
which educates its children in a costly private school cannot survive on a monthly expense of Rs.
2500. Such answers need editing.
EDITING
COMPUTER FEEDING
CODING
DATA DISTRIBUTION
TABULATION
UNIVERATE
MULTIVARATE
BIVARATE
DATA ANALYSIS
CATEGORISATION
MEASUREMENT
FREQUENCY DISTRIBUTION
DATA INTREPRETATION
DIAGRAMATIC REPRESENTATION
Editing is required for proper coding and entering in the computer (when decision is taken not to
analyse the data manually). Editing thus means that the data are complete, error-free, readable
and worthy of being assigned a code. Editing process brings in the field itself. Interviewers, soon
after completing the interviews should check the completed forms for errors and omissions. They
can complete the incomplete responses and reduce the number of no responses with the rapid
follow-up, stimulated by field editing. In many cases, field editing may not be possible. In such
cases, in-house editing may help.
Editing also occurs simultaneously with forming categories, e.g. age given by the respondents
may be put in the category of below 18 years (very young), 18-30 years (young), 30-40 years
9early middle-aged), 40-50 years (late middle aged) and above 50 years old (old). Field
supervisors can do editing in the field itself by re-contacting the respondents. Editing can be
done along with coding too.
Editing also requires re-arranging answer re-arranging answers to open-ended questions.
Sometimes dont know answer is edited to no response. This is wrong. Dont know means the
respondent is not sure and is in a double mind about his reaction or is not able to formulate a
clear-cut opinion, or considers the question personal and does not want to answer it. No response
means that the respondent is not familiar with the situation/object/individual about which he is
asked.
Coding of Data
Coding is translating answers into numerical values or assessing numbers to
the various categories of a variable to be used in data analysis. Coding is
generally done while preparing the questions and before finalizing the
questionnaires and interview schedules. Fieldwork is thus done with
precoded questions. However, sometimes, when questions are not precoded,
coding is done after the fieldwork. Coding is done on the basis of the
3
instructions given in the codebook. The code book gives a numerical code for
each variable.
Coding is done by using a code book, code sheet and a computer card. Code
book explains how to assign numerical codes for response categories
received in the questionnaire/schedule. It also indicates the location of a
variable on computer cards. Code sheet is a sheet used to transfer data from
original source (i.e. questionnaire/schedule, etc.) to cards.
They are
categories.
This
distribution
appears
in
two
forms:
(iii)
Tabulation of Data
After editing, which ensures that the information on the schedule is accurate
and categorized in a suitable form, the data are put together in some kind of
tables and may also undergo some other forms of statistical analysis. There
5
when
distribution
is
adding
all
the
schedules
together
(in
the x-axis and of the dependant variable on the y-axis. Graph 1 shows the
number of the cognizable crimes in India in last 40 years.
Sometimes, the multiple line graph is also used for indicating comparisons
between two or more elements as shown in the Graph 2.
Histograms
In histogram, the values of variables are presented in vertical bars drawn
adjacent to each other, as shown in Diagram 3. The difference between
graph and histogram is that in graph, points are plotted.
Stages in Analysis
The analysis of a research is done in four stages. These are (i)categorization,
(ii) frequency distribution, (iii) measurement, and (iv) interpretation.
Categorization
7
equal. The ratioscale is used for determining ratios of the numbers assigned
to categories.
Interpretation
Interpretation of data can be descriptive or analytical or it can be from a
theatrical standpoint. Negative results are much harder to interpret than the
positive results (i.e., when the data support the hypotheses).
Exhibit
Annual contract salary for a nationwide sample of public school teachers for school year 1975-76
Annual
frequency
Salary
(1)
(2)
midpoint
limit
(3)
(4)
relative
Frequency
(5)
20,999.5
19,000-20,999
43
19,999.5
cumulative relative
Frequency
(6)
1.0001
0.033
18,999.5
10
0.968
17,000-18,999
98
17,999.5
0.075
16,999.5
15,000-16999
125
15,999.5
0.893
0.095
14,999.5
13,000-14,999
179
13,999.5
0.798
0.137
12,999.5
11,000-12,999
275
0.661
11,999.5
0.210
10,999.5
9,000-10.999
363
9,999.5
0.45
0.277
8,999.5
7,000-8,999
224
7,999.5
0.171
6,999.5
6,000-6,999
5,999.5
0.003
4,999.5
TOTAL
1,311
11
1.001
Colum2 indicates the frequency, or number, of teachers with contract salaries in each of the eight
intervals. For example, 98 teachers earned salaries between $17,000 and $18,999. Columns 1 and
2 taken together indicate the frequency with which individual teacher salaries fall within each of
the intervals and are referred to as a frequency distribution.
Because these data have been grouped, it is impossible to identify where within this interval each
of the 98 teachers is located. Two assumptions are usually made in dealing with grouped data.
The AUs within the interval have values of the variable that average out the interval
Midpoints for the intervals are obtained by averaging the lower upper value is interval. (For
example, the midpoint of the interval with 275 teachers salaries is average of $12,999 Row 5
or $11,999.5). Successive midpoints appeared in column 3
Limits dividing the successive intervals are halfway between the upper value in one interval and
the lower value in the next higher interval. For example, the limit dividing the intervals with 179
12
and 125 teachers is halfway between $14,999 (Row 4) and $ 15,000 (Row 3), clearly at
$14,999.50. Successive interval limits are given in column 4.
Though not usually presented in a report, the entries in column 6 are necessary develop one of
the graphic methods presented in the next section. Column 6, most frequently referred to as the
cumulating relative frequency distribution, is obtained by cumulating successively the entries
in column 5, starting with the lowest value and moving toward the highest value of the variable
of interest. These successive cumulations are then placed besides the successive limits dividing
adjacent intervals. For example, the entry of 0.451 in the last column opposite the entry of
$10.999.50 was obtained by adding together 0.00. +0.171+0.277. This quantity, 0.451, is the
proportion of teachers in the sample with salaries less than @10,999.50, or 45.1% of the teachers
in the sample fall in the first three class intervals.
Columns 1, 2, and 5 are those most frequently encountered in a tabular display data of this type,
namely, a variable. Characteristics that are attributes lend them to slightly different tables.
13
14
Presenting results
Frequencies and numbers can often presented more clearly in charts than in words. Consider, for
figure, which present the
4
2
1
.
8
7
1
9
.
.
0
4
2
0
3
%
0
%
< 29 year
30-39 year
40-49 year
50-59 year
>60 year
15
16
17
simple measures like averages, measures of dispersion, percentages, correlation etc. Hence
statistical analysis forms a part of survey analysis.
The problems raised by the analysis of data are directly related to the complexity of the
hypothesis. Problems of data analysis involve the entire questions raise in research design, for
secondary analysis to involve the designing and redesigning of substitutes for the controlled
experiment.
After collecting the data from a representative sample of the population, the next step is to
analyze them to test the research hypotheses. However, before analyzing the data to test
hypotheses, some preliminary steps need to be completed. This will help to ensure that the data
are reasonably good and assured of good quality for further analysis. There are four steps namely
1)
2)
3)
4)
18
the way to this knowledge. Thus the task to analysis can hardly be said to be complete without
interpretation coming to illuminate the results.
Analysis of data is the most skilled task of all stages of the research. It is a task calling for the
researchers own judgment and skill. It should be done by the researcher himself. Proper analysis
requires a familiarity with the background of the survey and with all its stages. The analysis does
not necessarily be statistical as both quantitative and non- quantitative (Qualitative) methods can
be used.
The steps in the analysis of data depend upon the type of study. In case, there is a set of clearly
formulated hypotheses, then each hypothesis can be seen as a work prescribing a certain action to
be taken vis--vis the data. The more specific the hypothesis, the more specific action. In such a
study, analysis of data is almost completely a mechanical procedure. Part of the analysis is
working out statistical distribution, construction of diagrams and calculating simple measures
like average, measures of dispersion, percentage correlation etc. Hence statistical analysis form a
part of survey analysis. The analysis means verification of hypotheses.
19
Analysis of data is one of the most important aspects of research. Since it is highly
skilled and technical job, it should be carried out by the researcher himself or under
his also supervision. It demands a deep and intense knowledge on the part of the
researcher about the data to be analyzed. The researcher should also possess judgment
skill, ability of generalization and should be familiar with the background objects and
hypothesis of study.
Data, facts and figures are silent and they never speak for themselves but they have
complexities. It is through systematic analysis that the important characteristics which
are hidden in the data are brought out and valid generalizations are drawn. Analysis
demands a thorough knowledge of ones data. Without deep knowledge, the analysis
is likely to be aimless. It is only by organizing , analyzing and interpreting the
research data that we know their important features, inter-relationship and cause-effect
relationship. The trends and sequences inherent in the phenomena are elaborated by
means of generalization.
According to P. V. Young, The function of systematic analysis is to build an
intellectual edifice in which properly sorted and shifted facts and figures are placed in
their appropriate settings and broader generalizations beyond the immediate contents
of the facts under study, consistent relationships, or that general inferences can be
We should remember that the steps envisaged in the analysis of data will vary
depending on the type of study. A set of clearly formulated hypothesis to start with the
study presents a norm prescribing a certain action to be taken. The more specific is the
hypothesis, the more specific is the action and in such types of studies, the analysis of
The most difficult task in the analysis and interpretation of data is the establishment of cause
and effect relationship especially in the cases of social and personal problems. Research
problems do not necessarily have one factor or a set of factors but they arise due to a
complex variety of factors and sequence. Karl Pearson has observed, No phenomena or
stage in sequence has only one cause; all antecedent stages are successive causes when we
scientifically state causes we are really describing he successive stages of a routine of
experience.
In fact, human behaviour cannot be reduced or explained with the help of cause effect
sequences as we face difficulties in detecting the factors and in establishing cause and effect
relationships because nature of these factors differ from one individual to another and due to
the fact the cause and effect both are inter-dependent, i.e., one stimulates the other.
22
Types of Analysis
Analysis of survey or experimental data involves estimating the values of unknown parameters
of the population and testing of hypotheses for drawing inferences. Analysis may be categorised
as:
1. Descriptive Analysis:
It is largely a study of distributions of one or more variable. Such study provides with profiles of
a business group, work group, persons or others subjects on any of a multitude of characteristics
such as size, composition, efficiency, preferences etc. Various measures that show the size and
shape of distribution along with the study of measuring the relationship between two or more
variables are available from this analysis.
2. Inferential Analysis:
It is concerned with the various tests of significance for testing hypotheses in orderto determine
with what validity the data can indicate some conclusion or conclusions. It is also concerned with
the estimation of population values. It is mainly on the basis of inferential analysis that the task
of interpretation is performed.
3. Correlation Analysis:
It studies that joint variation of two or more variables for determining the amount of correlation
between two or more variables.
4. Casual Analysis:
It is concerned with the study of how one or more variables affect changes in another variable. It
is a study of functional relationship existing between two or more variables.
5. Multivariate Analysis:
23
With the availability of computer facilities, there is a development of multivariate analysis which
means use of statistical methods which analyse more than two variables on a sample of
observations. These include:
a) Multiple Discriminate Analysis:
It is suitable when the researcher has a single dependent variable that cannot be measured, but
can be classified into two or more groups on the basis of some attribute. The objective of this
analysis happens to be to predict an organizations possibility of belonging to a particular group
based on several predictor variables.
b) Multiple Regression Analysis:
It is suitable when the researcher has one dependent variable which is presumed to be a function
of two or more independent variables. The objective of this analysis is to make a prediction about
the dependent variable based on its covariance with all the concerned independent variables.
c) Multivariate Analysis of Variance (Multi- Anova):
This analysis is an extension of two way ANOVA, where in the ratio of among group variable to
within group variance is worked out on a set of variables.
d) Canonical Analysis:
This analysis can be used in case of both measurable and non-measurable variables for the
purpose of simultaneously predicting a set of dependent variables from their joint covariance
with a set of independent variables.
24
Q.5: What are the various methods of tabulation and explain the
significance or processing of the data. Discuss the role of computer
in data processing and analysis? Explain the need of statistical
techniques in the research.
Tabulation
The process of placing classified data into tabular form is known as tabulation. A table is a
symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements. It may be simple, double or complex
depending upon the type of classification.
Basic description
A table consists of an ordered arrangement of rows and columns. This is a simplified description
of the most basic kind of table. Certain considerations follow from this simplified description:
the term row has several common synonyms (e.g., record, k-tuple, n-tuple, vector);
the term column has several common synonyms (e.g., field, parameter, property,
attribute);
25
The elements of a table may be grouped, segmented, or arranged in many different ways, and
even nested recursively. Additionally, a table may include metadata, annotations, header,[6]footer
or other ancillary features.
Simple table
The following illustrates a simple table with three columns and six rows. The first row is not
counted, because it is only used to display the column names. This is traditionally called a
"header row".
An example of a table containing rows with summary information. The summary information
consists of subtotals that are combined from previous rows within the same column.
The concept of dimension is also a part of basic terminology.[7] Any "simple" table can be
represented as a "multi-dimensional" table by normalizing the data values into ordered
hierarchies. A common example of such a table is a multiplication table.
Multiplication table
26
1
2
3
1
1
2
3
2
2
4
6
3
3
6
9
NOTE: Multidimensional tables, 2-dimensional as in the example, are created under the
condition the coordinates or combination of the basic headers (margins) give a unique value
attached. This is an injective relation: each combination of the values of the headers row (row 0,
for lack of a better term) and the headers column (column 0 for lack of a better term) is related to
a unique value represented on the table:
column 1 and row 1 will only correspond to the value 1 (and no other)
column 1 and row 2 will only correspond to the value 2 (and no other), etc.
If the said condition is not present, it is required to insert extra columns or rows which increases
the size of table with plenty of empty cells.
To illustrate how a simple table can be transformed into a multi-dimensional table, consider the
following transformation of the Age table.
Modified Age Table (names only)
+
1
Nancy
Nancy Davolio
Justin
Justin Saunders
2
Nancy Klondike
Justin Timberland
3
Nancy Obesanjo
Justin Daviolio
This is structurally identical to the multiplication table, except it uses concatenation instead of
multiplication as the operator; and first name and last name instead of integers as the operands.
Wide and Narrow Tables
Tables can be described as wide or narrow in format. Wide format has a separate column for each
data variable, a Narrow format will have one column for all the variable values and another
column for the context of that value. See Wide and Narrow Data.
27
Importance Of Tabulation
There are no hard and fast rules for preparing a statistical table. Prof. Bowley has rightly pointed
out In collection and tabulation, common sense is the chief requisite and experience is the chief
teacher. However, the following points should be borne in mind while preparing a table.
(i) A good table must contain all the essential parts, such as, Table number, Title, Head note,
Caption, Stub, Body, Foot note and source note.
(ii) A good table should be simple to understand. It should also be compact, complete and selfexplanatory.
(iii) A good table should be of proper size. There should be proper space for rows and columns.
One table should not be overloaded with details. Sometimes it is difficult to present entire data in
a single table. In that case, data are to be divided into more number of tables.
(iv) A good table must have an attractive get up. It should be prepared in such a manner that a
scholar can understand the problem without any strain.
(v) Rows and columns of a table must be numbered.
(vi) In all tables the captions and stubs should be arranged in some systematic manner. The
manner of presentation may be alphabetically, or chronologically depending upon the
requirement.
(vii) The unit of measurement should be mentioned in the head note.
(viii) The figures should be rounded off to the nearest hundred, or thousand or lakh. It helps in
avoiding unnecessary details.
(ix) Percentages and ratios should be computed. Percentage of the value for item to the total must
be given in parenthesis just below the value.
(x) In case of non-availability of information, one should write N.A. or indicate it by dash (-).
28
(xi) Ditto marks should be avoided in a table. Similarly the expression etc should not be used in
a table.
Image processing may be a minor job but it can greatly affect the marketing of your company.
Making high quality images and putting them in catalogs and brochures will surely get the
attention of your target clients and customers.
There are many benefits that you can get from data processing. First, the important data in your
company will be converted into a standard format that can be understandable to you and your
employees. Since all the sets of information are in standard electronic format, you can make a
back up copy that you can use in case of data loss. These sets of information are ensured to be
accurate so that you can make your decision correctly. Lastly, you will save more time, effort,
and money because of data processing. You can also say goodbye to lost opportunities.
30
31
Statistical hypothesis testing plays an important role in the whole of statistics and in statistical
inference. For example, Lehmann (1992) in a review of the fundamental paper by Neyman and
Pearson (1933) says: "Nevertheless, despite their shortcomings, the new paradigm formulated in
the 1933 paper, and the many developments carried out within its framework continue to play a
central role in both the theory and practice of statistics and can be expected to do so in the
foreseeable future".
Significance testing has been the favored statistical tool in some experimental social sciences
(over 90% of articles in the Journal of Applied Psychology during the early 1990s). [11] Other
32
fields have favored the estimation of parameters (e.g., effect size). Significance testing is used as
a substitute for the traditional comparison of predicted value and experimental result at the core
of the scientific method. When theory is only capable of predicting the sign of a relationship, a
directional (one-sided) hypothesis test can be configured so that only a statistically significant
result supports theory. This form of theory appraisal is the most heavily criticized application of
hypothesis testing.
33