Sei sulla pagina 1di 3

UNIT III

LESSON 16:
DATA CODING AND ANALYSIS

Students, today we shall be doing the most crucial step in research • How missing data are treated

RESEARCH METHODOLOGY
process- Data coding and Data Analysis. This stage of data entry You should have knowledge about :
and coding comes after the collection of desired information is
i. Non-ascertained Information has to be recognized:
the coding and analysis of data.
information not obtained because of interviewer or
Once the information is tabulated, it is easy to perform various respondent performance.
statistical tests for their validity, accuracy and significance. This
• Reason for failure to ask question
step seems very simple, although it is not so. Gathered information
should be presented in such a manner that even a layman • Failure to obtain appropriate response
understands what, why, when and how of information. • Refusal to answer question (separate)

Data Entry ii. Inapplicable Information: information does not apply to a


It is the process of taking completed questionnaires\surveys and particular respondent
putting them into a form that can readily be analyzed. iii.Unknown information: information as to respondent’s
A series of options need to consider when you enter the claim of awareness (How to treat “Don’t know” option)
information you have gathered.You will first have to decide on a c. Entry of Data
file format and then devise a code for analysis. • You should fix up the number of translation steps
Decision on File Format between subject’s response and readable data file
It comprises of decisions regarding: • Computer assisted techniques: 1
• The way the data will be organized in a file • Digital answer format (Scantron):
• Order of information collected • Entry by hand: 4
• How subject is referenced • Impacts ability to check quality of data entry (accuracy,
• Constructing individual records reliability)
• History of 80-column format d. Clean Data File
• Application to statistics programs • You should examine each data file to ensure each record is
complete and in order
Devise Code for Analysis
• You should remove non-legal codes
The main points you want to remember while devising the code
• Then you should replace it with information from
for analysis are:
original response format
• Set of rules that translates answers into discrete values • Proper importance should be given to verification
• Alphabetical or Numerical depending on measurement scale The problem most decision makers must resolve is how to deal
• Preserve level of measurement for each item with the uncertainty that is inherent in almost all aspects of their
• General Considerations (closed questions): jobs. Raw data provide little, if any, information to the decision
makers. Thus, they need a means of converting the raw data into
a. Now, we will discuss these in detail for the better understanding.
useful information. In this lecture note, we will concentrate on
• First of all you should try to make coding translation some of the frequently used methods of presenting and
simple organizing data.
• Coding should be done minimizing effort and risk
Frequency Distribution
of coding errors
The easiest method of organizing data is a frequency distribution,
• Remember the Item-level: Leave #s as #s (#s can be which converts raw data into a meaningful pattern for statistical
nominal). analysis.
• Perform Reverse coding/Unfolding complex response The following are the steps of constructing a frequency
formats. distribution:
• For Test-level: you code questions in order of 1. Specify the number of class intervals. A class is a group (category)
appearance. of interest. No totally accepted rule tells us how many intervals
• You have to be consistent in assigning values with are to be used. Between 5 and 15 class intervals are generally
similar responses recommended. Note that the classes must be both mutually
• You should identify the question groups within test. exclusive and all-inclusive. Mutually exclusive means that classes
must be selected such that an item can’t fall into two classes,
B. It should help in facilitating data interpretation

© Copy Right: Rai University


11.556 97
and all-inclusive classes are classes that together contain all the Cumulative Frequency Distribution
RESEARCH METHODOLOGY

data. When the observations are numerical, cumulative frequency is


2. When all intervals are to be the same width, the following rule used. It shows the total number of observations which lie above
may be used to find the required class interval width: or below certain key values.
Cumulative Frequency for a population = frequency of each class
W = (L - S) / K where: W= class width, L= the largest data, interval + frequencies of preceding intervals. For example, the
S= the smallest data, K= number of classes cumulative frequency for the above problem is: 3, 5, 9, and 10.

Example Presenting Data


Suppose the age of a sample of 10 students are: Graphs, curves, and charts are used to present data. Bar charts are
20.9, 18.1, 18.5, 21.3, 19.4, 25.3, 22.0, 23.1, 23.9, and 22.5 used to graph the qualitative data. The bars do not touch, indicating
We select K=4 and W=(25.3 - 18.1)/4 = 1.8 which is rounded-up that the attributes are qualitative categories, variables are discrete
to 2. The frequency table is as follows: and not continuous.
Histograms are used to graph absolute, relative, and cumulative
Class Interval...............Class Frequency............Relative frequencies.
F r e q u e n c y Ogive is also used to graph cumulative frequency. An ogive is
18-U-20................................3..................................30% constructed by placing a point corresponding to the upper end of
20-U-22................................2..................................20% each class at a height equal to the cumulative frequency of the class.
22-U-24................................4..................................40% These points then are connected. An ogive also shows the relative
24-U-26................................1..................................10% cumulative frequency distribution on the right side axis.
Note that the sum of all the relative frequency must always be A less-than ogive shows how many items in the distribution
equal to 1.00 or 100%. In the above example, we see that 40% of have a value less than the upper limit of each class.
all students are younger than 24 years old, but older than 22 years A more-than ogive shows how many items in the distribution
old. Relative frequency may be determined for both quantitative have a value greater than or equal to the lower limit of each class.
and qualitative data and is a convenient basis for the comparison A less-than cumulative frequency polygon is constructed by
of similar groups of different size. using the upper true limits and the cumulative frequencies.
A more-than cumulative frequency polygon is constracted by
What Frequency Distribution Tells Us using the lower true limits and the cumulative frequencies.
1. It shows how the observations cluster around a central value; Pie chart is often used in newspapers and magazines to depict
and budgets and other economic information. A complete circle (the
2. It shows the degree of difference between observations. pie) represents the total number of measurements. The size of a
For example, in the above problem we know that no student is slice is proportional to the relative frequency of a particular category.
younger than 18 and the age below 24 is most typical. The most For example, since a complete circle is equal to 360 degrees, if the
common age is between 22 an 24, which from general information relative frequency for a category is 0.40, the slice assigned to that
we know to be higher than usual for the students who enter category is 40% of 360 or (0.40)(360)= 144 degrees.
college right after high school and graduate about age 22. The Pareto chart is a special case of bar chart and often used in quality
students in the sample are generally older. It is possible that the control. The purpose of this chart is to show the key causes of
population is made up of night students who work on their unacceptable quality. Each bar in the chart shows the degree of
degrees on a part-time basis while holding full-time jobs. This quality problem for each variable measured.
descriptive analysis provides us with an image of the student Time series graph is a graph in which the X axis shows time
sample, which is not available from raw data. As we will see in periods and the Y axis shows the values related to these time
lecture number 3, frequency distribution is the basis for probability periods.
theory. Stem-and-leaf plots offer another method for organizing raw
data into groups. These types of plots are similar to the histogram
Stated & True Class Limits except that the actual data are displayed instead of bars. The stem-
True Classes are those classes such that the upper true (or real) and-leaf is developed by first determining the stem and then
limit of a class is the same as the lower true limit of the next class. adding the leaves. The stem contains the higher-valued digits and
For comparison, the stated class limits and true (real) class limits the leaf contains the lower-valued digits. For example, the number
are given in the following table: 78 can be represented by a stem of 7 and a leaf of 8. Thus, the
Stated Limit................True Limits numbers 34, 32, 36, 20, 20, 22, 54, 55, 52, 68, and 63 can be
$600 - $799.................$599.50 up to but not including $799.50 grouped as follows:
$800 - $999.................$799.50 up to but not including $999.50 Stem...............Leaf 2....................0..0..2 3....................2..4..6
In the first column of the above table the data were rounded to 4 5....................2..4..5 6....................3..8
the nearest dollar. For example, $799.50 was rounded up to $800
Steps to Construct a Stem and Leaf Plot
and tallied in the second class. Any amount over $799 but under
$799.50 was rounded down to $799 and included in the first class. 1. Define the stem and leaf that you will use. Choose the units
Thus, the $600 - $799 class actually includes all data from $599.50 for the stem so that the number of stems in the display is
inclusive up to but not including $799.50. between 5 and 20.

© Copy Right: Rai University


98 11.556
2. Write the stems in a column arranged with the smallest stem

RESEARCH METHODOLOGY
at the top and the largest stem at the bottom. Include all
stems in the range of the data, even if there are some stems
with no corresponding leaves.
3. If the leaves consist of more than one digit, drop the digits
after the first. You may round the numbers to be more
precise, but this is not necessary for the graphical description
to be useful.
4. Record the leaf for each measurement in the row
corresponding to its stem. Omit the decimals, and include a
key that defines the units of the leaf.
See the following figures:

References:
Aaker D A , Kumar V & Day G S - Marketing Research (John
Wiley &Sons Inc, 6th ed.)
Donald R. Cooper – Business Research Methods, Tata McGraw –
Hill Publication
Notes

© Copy Right: Rai University


11.556 99

Potrebbero piacerti anche