Data and Information Session

Session 1
DATA AND INFORMATION
1.1 DATA
Data is one of the most critical assets of any business. Data is a collection of raw,
unorganized facts that need to be processed. Data comes from the latin word Datum which
means “Something given”. Data is the plural of datum, a single piece of information. Data can
be used both as singular and plural form of the word. Data represents a fact or statement of
event without relation to other things.
Example
• Student’s Test scores is one piece of data
• History of temperature readings all over the world for the past 100 years is data
1.2 Data Around us

Our world is made of all kinds of objects or things that we like to be informed of. Each
object is described by a number of attributes (characteristics). Each attribute can have a value
and that value is data. For example Fig.1.1 laptop is an object which has the following
attributes.
Fig. 1.1 An object characteristics

Color - Black
Size – 16 ”
Processing Capacity –
RAM – 4 GB
1.3 Types of Data

Data
Qualitative Quantitative
(Categorical) (Numerical)
( (()
(
(
Discrete ( Continuous
( (
Fig. 1.2 Types of Data
Qualitative Data ( (
Qualitative data refers to the quality of something. Deals with description. Data that can
be observed but not measured.
Eg. Color, texture, smell and taste
Qualitative data are often termed as categorical data. The categorical data are values
or observations that can be sorted into groups or categories.
Example:
• Tennis ball can be categorized into
• New , Used and Damaged
Quantitative Data
Deals with numbers and can be measured.
Example:
Number of golf balls
Quantitative data is further divided into discrete data and continuous data.
Discrete data
Discrete data are numeric data that have a finite number of possible values. It is
numerical data that has no gap between possible values. It is counted in whole numbers
0 1 2 3 4 5 6 7
Fig. 1.3 Discrete Data
Examples
• The number of products damaged in shipment
• The count of golf balls
• The number of students in the class
Continuous data
• Continuous data is numerical data with a continuous range and there is no gap between
possible values. The continuous data are measured.
Example: People’s heights could be any value within the range of human heights.
0 200
Fig. 1.4 Continuous Data
1.4 Structured and Unstructured
Data can be classified as structured or unstructured based on how it is stored and
managed.
Structured or Machine Readable

Structured data is organized in rows and columns in a rigidly defined format so that
application can retrieve and process it efficiently. Structured data is typically stored using a
database management system (DBMS).
Example
Spreadsheets, would be considered structured data, which can be quickly scanned for
information
Unstructured or Human Readable

Data is unstructured if its elements cannot be stored in rows and columns, and is
therefore difficult to query and retrieve by business applications. It refers to data that is not
organized in a predefined manner. For example, customer contacts may be stored in various
forms such as sticky notes, e-mail messages, or even digital format files such as .doc, .txt, and
.pdf. Due to its unstructured nature, it is difficult to retrieve data through an application.
Example
5 yellow used tennis balls with the size of 6.54 cm, Rs. 50 each
1.5 Other forms of DATA

Before the advent of computers, the procedures and methods adopted for data creation
and sharing were limited to fewer forms, such as paper and film. Today, the same data can be
converted into more convenient forms such as an email message, an e-book, a bitmapped
image, or a digital movie. This data can generated using a computer and stored in strings of 0s
and 1s shown in fig 1.6. Data in this form is called digital data and is accessible by the user
only after it is processed by the computer.
Audio & Video
Digital Data
JPEG
TEXT
NUMBER
Fig. 1.6 Other Types of Data
1.6 Information
When the data are processed, organized, structured or presented in a given context so
as to make them useful, they are called information. Data themselves are fairly useless, but
when these data are interpreted and processed to determine its true meaning they becomes
useful and can be named as information. Information is the data that has been processed in such
a way as to be meaningful to the person who receives it. Information is the intelligence and
knowledge derived from data.
Fig. 1.7 Conversion of Data to Information

Example
• The history of temperature readings all over the world for the past 100 years is data. If
this data is organized and analyzed to find that global temperature is rising, then that is
information.
When data is stored electronically in files, it can be used as input for an information
system. An information system has programs to process (or transform) data to produce
information as an output, as shown in Figure 1.8. Information reveals meaning of data. For
example, students' data values such as ID, Name, Address, Major, and Phone number represent
raw facts. Class roll is a list which shows students' ID and Names of those students who are
enrolled in particular class (course section).
Students Student Registration System Class Roll
Enrolment Details
Fig 1.8 Data processed to Information
1.7 Difference between Data and Information

Data Information
Used as input Output of data
Unprocessed facts figures Processed data
Does not depend on information Depends on data
Not specific Specific
Single unit Group of data which carries news and

meaning
Does not carry meaning Carry logical meaning
Raw material Product
Eg. Each student's test score is one piece Eg. The average score of a class or of the
of data. entire department is information that can be
derived from the given data
Examples of Data and Information

• The history of temperature readings all over the world for the past 100 years is data. If
this data is organized and analyzed to find that global temperature is rising, then that is
information.
• The number of visitors to a website by country is an example of data. Finding out that
traffic from the U.S. is increasing while that from Australia is decreasing is meaningful
information.
1.8 How computers represent data
Physical devices used to store and process data in computers in two state devices. A
switch for example, is a two state device; it can be either ON or OFF. To a computer everything
is a number. Numbers are numbers, letters are numbers, sound and pictures are numbers. Even
the computers own instructions are numbers. A string of alphabet characters such as a sentence
looks just like a string of ones and zeros to a computer. A computer has only 2 possible states
available to represent data – on or off. When a switch is off it is represented by a 0, when it is
on it is represented by a 1. Thus all data to be stored and processed in computers are
transformed or coded as strings of two symbols, one symbol to represent each state. The two
symbols normally used are 0 and 1. These are known as bits, an abbreviation for binary digits.
A group of 8 bits is called a byte. With one byte the computer can represent 256 different
symbols or characters because the 8 1s and 0s in a byte can be combined in 256 different ways.
• 4 bits = Nibble
• 0110
• 8 bits = Byte
• 0110 1010
• 16 bits = Word
• 0110 1010 1001 1111
Session 2
Number Systems
2.1 Number System
In stone age, knots and some stone marks are used to count the items. In roman number
system I, II, III etc., are used for counting the items. There are many positional –value systems
are used like decimal , binary, octal etc.,
2.2 Decimal Number System

It uses 10 digits (0,1,2,3,4,5,6,7,8,9) and it is used by the human being in day to day
activities. Digit is the word it came from latin word for finger. The people those who are
having minimum knowledge in education, the y are using the fingers to count the items (Fig
2.1).
Fig 2.1 Fingers
2.3 Binary number system

Binary number system uses only two digits 0 or 1. The main application area of binary
number system is in computer (Fig 2.2, Fig 2.3). There are many types of number systems.
Their base value, the digits in that number systems are shown in the table (Table 2.1)
Fig 2.2 Binary Number system
Fig 2.3 Computer

Table 2.1 Different types of number system
2.4 Positional Number System

The traditional number system is called a positional number system. In traditional
number system, a number is represented as a string of digits. Every digit position has a
associated weight and the number value is calculated by addition the sum of the digits.
Following equation is used in positional number system. In this equation, b represents the base
value and I indicates the position of the digit.
p 1
D   di bi
i 0
The example of position value system is given below.
6354  6 *1000  3 *100  5 *10  4

Each number system is associated with a base or radix.The decimal number system has the
base or radix as 10. A number in base r contains r digits 0,1,2,...,r-1. For examble in Decimal
(Base 10): 0,1,2,3,4,5,6,7,8,9. Numbers are usually expressed in positional notation. MSD
represents most significant digit and LSD represents least significant digit.
2.5 Positional Notation – examples
Value of number is determined by multiplying each digit by a weight and then
summing. The weight of each digit is called as a POWER of the BASE of each digit is
determined by position.
Decimal number system
953.78 = 9 x 102 + 5 x 101 + 3 x 100 + 7 x 10-1 + 8 x 10-2
= 900 + 50 + 3 + .7 + .08 = 953.78
Binary number system
% 1011.11 = 1x23 + 0x22 + 1x21 + 1x20 + 1x2-1 + 1x2-2
= 8 + 0 + 2 + 1 + 0.5 + 0.25
= 11.75
Hexadecimal number system
$ A2F = 10x162 + 2x161 + 15x160
= 10 x 256 + 2 x 16 + 15 x 1
= 2560 + 32 + 15 = 2607
Need for number conversions
• The decimal number system are used in our day to day life. But the binary number
systems are used in computers. Following are the two steps to be followed for
converting the base 10 to base n.
• Divide the Decimal Number by the base n; the remainder is the LSB of base n
number.
• If the Quotient Zero, the conversion is complete; else repeat step (a) using the
Quotient as the Decimal Number. The new remainder is the next most
significant bit of the base n number.
Conversion of Hexadecimal to decimal

2EA16 =2*162+ 14*161+10*160
=74610
Octal to binary Conversion
In octal to binary conversion three digits of binary values are considered.
Fig. 2.4 Binary to Octal Conversion
Fig. 2.5 Hexadecimal to Binary Conversion

In Hexadecimal to Binary, four digits of binary values are considered.
Fig. 2.6 Binary to Hexadecimal Conversion
Fig. 2.7 Octal to Hexadecimal Conversion
Fig. 2.8 Conversion of Binary to Decimal

Example:
Convert the binary number 100102 into its decimal equivalent.
1 0 0 1 0
24 23 22 21 20
16 8 4 2 1
16 + 0 + 0 + 2 + 0 = 1810
Therefore 10010 = 18
2 10
Solve the following Examples (Decimal to Binary)
a) 1310 = ?
b) 2210 = ?
c) 4310 = ?
d) 15810 = ?
Example:
Convert the binary number 01101012 into its decimal equivalent.
0 1 1 0 1 0 1
26 25 24 23 22 21 20
64 32 16 8 4 2 1
0 + 32 + 16 + 0 + 4 + 0 + 1 = 5310
01101012 = 5310
Convert the following binary value to Decimal value

0110 2 = ?
11010 2 = ?
0110101 2 = ?
11010011 2 = ?
Convert binary to octal
2.6 Binary Operations

Binary Addition
Following are the rules followed for the binary addition
• 0+0=0
• 0+1=1
• 1+0=1
• 1 + 1 = 0, and carry 1 to the next more significant bit
Binary Subtraction
Following are the rules followed for binary subtraction

• 0-0=0
• 0 - 1 = 1, and borrow 1 from the next more significant bit
• 1-0=1
• 1-1=0
Binary Multiplication
In binary multiplication, we only need to remember the following,

0x0=0
0x1=0
1x0=0
1x1=1
Example
101
x11
101
1010
1111
Binary Division
A B Output
0 1 0
1 1 1
Session 4
Data Compression
4.1 Introduction
While technology keeps growing, the world keeps shrinking. Everything seems to be nearer
and smaller to you. Our world has changed a lot from an era where a computer used to occupy
a room to the present where supercomputers can be conveniently carried in your hand. It would
be an understatement to merely term this transformation as a technological growth; rather, it
should be termed a technological explosion. This transformation has certainly occurred as part
of wonderful contributions by many eminent personalities the world over. In this context, the
period in history which marked the advent of data compression has got a remarkable role to
play in this aspect. It is truly fascinating to figure out how data compression and its wide
techniques have facilitated this transformation. As we know the massive world of Internet is
extensively using data compression techniques in innumerable ways, without which the dreams
of web technology booms would never have been possible.
Data Compression is the process of encoding the data, so that fewer bits will be needed to
represent the original data whereby the size of the data is reduced. Compressing data can save
storage capacity, speed file transfer, and decrease costs for storage hardware and network
bandwidth.
Compression is performed by a program that uses a formula or algorithm to determine how to

shrink the size of the data. For instance, an algorithm may represent a string of bits, or 0s and
1s, with a smaller string of 0s and 1s by using a dictionary for the conversion between them,
or the formula may insert a reference or pointer to a string of 0s and 1s that the program has
already seen.
Text compression can be as simple as removing all unneeded characters, inserting a single
repeat character to indicate a string of repeated characters, and substituting a smaller bit string
for a frequently occurring bit string. Compression can reduce a text file to 50% or a
significantly higher percentage of its original size.
For data transmission, compression can be performed on the data content or on the entire
transmission unit, including header data. When information is sent or received via the Internet,
larger files, either singly or with others as part of an archive file, may be transmitted in a .ZIP,
gzip or other compressed format.
4.2 Data compression techniques

Data compression techniques can be broadly classified into two ways: lossless and lossy
compression.
Fig. 4.1 Types of Data Compression and Methods
4.3 Lossless and lossy compression

Compressing data can be a lossless or lossy process. Lossless compression enables the
restoration of a file to its original state, without the loss of a single bit of data, when the file is
uncompressed. Lossless compression is the typical approach with executables, as well as text
and spreadsheet files, where the loss of words or numbers would change the information.
Lossy compression permanently eliminates bits of data that are redundant, unimportant or
imperceptible. Lossy compression is useful with graphics, audio, video and images, where the
removal of some data bits has little or no discernible effect on the representation of the content.
Graphics image compression can be lossy or lossless. Graphic image file formats are typically
designed to compress information since the files tend to be large. JPEG is an image file format
that supports lossy image compression. Formats such as GIF and PNG use lossless
compression.
 Used for compressing images and video files (our eyes cannot distinguish subtle
changes, so lossy data is acceptable).
 These methods are cheaper, less time and space.
 Several methods:
 JPEG: compress pictures and graphics
 MPEG: compress video
 MP3: compress audio
4.4 Compression vs. data deduplication

Compression is often compared to data deduplication, but the two techniques operate
differently. Deduplication is a type of compression that looks for redundant chunks of data
across a storage system or a file system and replaces each duplicate chunk with a pointer to the
original. Compression algorithms reduce the size of the bit strings in a data stream that is far
smaller in scope and generally remember no more than the last megabyte or less of data.
File-level deduplication eliminates redundant files and replaces them with stubs pointing to the
original file. Block-level deduplication identifies duplicate data at the sub-file level. The
system saves unique instances of each block, uses a hash algorithm to process them and
generates a unique identifier to store them in an index. Deduplication typically looks for larger
chunks of duplicate data than compression, and systems can de-duplicate using a fixed or
variable-sized chunk.
Deduplication is most effective in environments that have a high degree of redundant data,
such as virtual desktop infrastructure or storage backup systems. Compression tends to be more
effective than deduplication in reducing the size of unique information such as image, audio,
video, database and executable files. Many storage systems support both compression and
deduplication.
4.5 Pros and cons of compression

The main advantages of compression are a reduction in storage hardware, data transmission
time and communication bandwidth, and the resulting cost savings. A compressed file requires
less storage capacity than an uncompressed file, and the use of compression can lead to a
significant decrease in expenses for disk and/or solid-state drives. A compressed file also
requires less time for transfer, and it consumes less network bandwidth than an uncompressed
file.
The main disadvantage of compression is the performance impact resulting from the use of
CPU and memory resources to compress and decompress the data. Many vendors have
designed their systems to try to minimize the impact of the processor-intensive calculations
associated with compression. If the compression runs inline, before the data is written to disk,
the system may offload compression to preserve system resources. For instance, IBM uses a
separate hardware acceleration card to handle compression with some of its enterprise storage
systems.
If data is compressed after it is written to disk, or post process, the compression may run in the
background to reduce the performance impact. Although post-process compression can reduce
the response time for each input/output (I/O), it still consumes memory and processor cycles,
and can affect the overall number of I/Os a storage system can handle. Also, because data
initially must be written to disk or flash drives in an uncompressed form, the physical storage
savings are not as great as they are with inline compression.
4.6 Tools/technologies that use compression

Compression is built into a wide range of technologies, including storage systems, databases,
operating systems and software applications used by businesses and enterprise organizations.
Compressing data is also common in consumer devices such as laptops, PCs and mobile
phones.
Many systems and devices perform compression transparently, but some give users the option
to turn compression on or off. Compression can be performed more than once on the same file
or piece of data, but subsequent compressions result in little to no additional compression and
may even increase the size of the file to a slight degree, depending on the algorithms.
WinZip is a popular Windows program that compresses files when it packages them in an
archive. Archive file formats that support compression include ZIP and RAR. The bzip2 and
gzip formats see widespread use for compressing individual files.
4.7 Run-length encoding
Run-length encoding (RLE) is a very simple form of lossless data compression in which runs
of data (that is, sequences in which the same data value occurs in many consecutive data
elements) are stored as a single data value and count, rather than as the original run. This is
most useful on data that contains many such runs. Consider, for example, simple graphic
images such as icons, line drawings, and animations. It is not useful with files that don't have
many runs as it could greatly increase the file size.
RLE may also be used to refer to an early graphics file format supported by CompuServe for
compressing black and white images, but was widely supplanted by their later Graphics
Interchange Format. RLE also refers to a little-used image format in Windows 3.x, with the
extension rle, which is a Run Length Encoded Bitmap, used to compress the Windows 3.x
startup screen.
Typical applications of this encoding are when the source information comprises long
substrings of the same character or binary digit.
For example, consider a screen containing plain black text on a solid white background. There
will be many long runs of white pixels in the blank space, and many short runs of black pixels
within the text. A hypothetical scan line, with B representing a black pixel and W representing
white, might read as follows:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWW
WWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
With a run-length encoding (RLE) data compression algorithm applied to the above
hypothetical scan line, it can be rendered as follows:
12W1B12W3B24W1B14W
This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc.
The run-length code represents the original 67 characters in only 18. While the actual format
used for the storage of images is generally binary rather than ASCII characters like this, the
principle remains the same. Even binary data files can be compressed with this method; file
format specifications often dictate repeated bytes in files as padding space. However, newer
compression methods such as DEFLATE often use LZ77-based algorithms, a generalization of
run-length encoding that can take advantage of runs of strings of characters (such as
BWWBWWBWWBWW).
Run-length encoding can be expressed in multiple ways to accommodate data properties as

well as additional compression algorithms. For instance, one popular method encodes run
lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or
using the character itself as the escape, so that any time a character appears twice it denotes a
run. On the previous example, this would give the following:
WW12BWW12BB3WW24BWW14
This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs,
etc. In data where runs are less frequent, this can significantly improve the compression rate.
One other matter is the application of additional compression algorithms. Even with the runs
extracted, the frequencies of different characters may be large, allowing for further
compression; however, if the run lengths are written in the file in the locations where the runs
occurred, the presence of these numbers interrupts the normal flow and makes it harder to
compress. To overcome this, some run-length encoders separate the data and escape symbols
from the run lengths, so that the two can be handled independently. For the example data, this
would result in two outputs, the string "WWBWWBBWWBWW" and the numbers (12, 12, 3,
24, and 14).
4.7 a. Examples:
1. Replace consecutive repeating occurrences of a symbol by 1 occurrence of the symbol
itself, then followed by the number of occurrences.
2. The method can be more efficient if the data uses only 2 symbols (0s and 1s) in bit
patterns and 1 symbol is more frequent than another.
Fig. 4.2 Run-length Encoding
4.8 Activity
Compress the following data using Run-length encoding method
1. Original Data - AAABBCDDDD
Compressed Data is A3B2C1D4
2. Original Data - aabbbccccdddddeeeeeefffffffgggggggg
Compressed Data is a2b3c4d5e6f7g8
Session 5
Data Collection and Analysis
5.1 Introduction
Data collection is the systematic approach to gathering and measuring information from a
variety of sources to get a complete and accurate picture of an area of interest. Data collection
enables a person or organization to answer relevant questions, evaluate outcomes and make
predictions about future probabilities and trends.
Accurate data collection is essential to maintaining the integrity of research, making informed
business decisions and ensuring quality assurance. For example, in retail sales, data might be
collected from mobile applications, website visits, loyalty programs and online surveys to learn
more about customers. In a server consolidation project, data collection would include not just
a physical inventory of all servers, but also an exact description of what is installed on each
server -- the operating system, middleware and the application or database that the server
supports.
5.2 Big data and Data collection

Big data describes voluminous amounts of structured, semi-structured and unstructured data
collected by organizations. But because it takes a lot of time and money to load big data into a
traditional relational database for analysis, new approaches for collecting and analysing data
have emerged. To gather and then mine big data for information, raw data with
extended metadata is aggregated in a data lake. From there, machine learning and artificial
intelligence programs use complex algorithms to look for repeatable patterns.
5.3 Types of data

Generally, there are two types of data: quantitative data and qualitative data. Quantitative data
is any data that is in numerical form -- e.g., statistics and percentages. Qualitative data is
descriptive data -- e.g., color, smell, appearance and quality.
Quick Facts Examples

 Requires use of statistical
analysis
 Variables can be
identified and
An evaluator may wish to measure the
relationships measured
knowledge of social skills amongst
Quantitative  Counted or expressed
program participants. He/she may
Data numerically
administer surveys to participants to test
 Often perceived as a
their knowledge of these social skills.
more objective method
of data analysis
 Typically collected with
surveys or questionnaires
 Often represented
visually using graphs or
charts
 Examines non-numerical
data for patterns and
meanings
 Often described as being
more "rich" than Evaluators may wish to look at the level
quantitative data of engagement of afterschool staff in
 Is gathered and analysed program trainings. He/she might
Qualitative
by an individual, it can conduct interviews of these staff
Data
be more subjective members to capture the level of
 Can be collected through engagement that each staff member
methods such as feels they have during the trainings.
observation techniques,
focus groups, interviews,
and case studies
 May increase the validity

of your evaluation
 May explain unexpected
results obtained using
You may administer a survey to
only one approach
participants which solicits answers that
Mixed (quantitative or
are eligible for statistical analysis as
Methods qualitative)
well as conduct a focus group with a
Data  Help you capture both
sampling of participants to capture any
process and outcome
nuances the survey may have missed.
results
 May strengthen your
analysis
Table 5.1 Comparison of Types of Data

5.4 Data Sources
When evaluating a program, there are alternative ways to get the information you need in
addition to collecting the data yourself. Data that you retrieve first-hand is known as primary
data. Alternatively, data that is retrieved from pre-existing sources is known as secondary data.
Primary data sources include information collected and processed directly by the researcher,
such as observations, surveys, interviews, and focus groups.
Secondary data sources include information that you retrieve through pre-existing sources such
as research articles, Internet or library searches. Pre-existing data may also include examining
existing records and data within the program such as publications and training materials,
financial records, student/ client data, and performance reviews of staff, etc.
Primary Data Sources Secondary Data Sources
 Data that are not pre-existing and are

 Information that has already been
collected by the evaluator using
collected, processed and reported out
methods such as observations, surveys
by another researcher/entity
or interviews
 Provides information if existing data on  Offers an opportunity to review any

your topic/project is not current or and all secondary data available for
directly applicable to your evaluation your project before collecting
questions what you need primary data
 Can be more expensive and time-

 Will tell you what questions still need
consuming, but it enables you to collect
to be addressed and what data you
data that is specific to the purpose of
should collect yourself
your evaluation
Table 5.2 Primary Data Sources vs Secondary Data Sources
5.5 Data Collection Techniques

Information you gather can come from a range of sources. Surveys, interviews and focus
groups are primary instruments for collecting information. Today, with help from Web and
analytics tools, organizations are also able to collect data from mobile devices, website traffic,
server activity and other relevant sources, depending on the project.
Likewise, there are a variety of techniques to use when gathering primary data. Listed below
are some of the most common data collection techniques used for collecting data.
 Questionnaires and Surveys

 Observations
 Focus Groups
 Ethnographies, Oral History, and Case Studies
 Documents and Records
5.6 Overview of Different Data Collection Techniques
Technique Key Facts Example

 Interviews can be conducted One-on-one conversation with
in person or over the parent of at-risk youth who
telephone can help you understand the
Interviews
 Interviews can be done issue
formally (structured), semi-
structured, or informally
 Questions should be focused,
clear, and encourage open-
ended responses
 Interviews are mainly
qualitative in nature
 Responses can be analyzed

with quantitative methods by
assigning numerical values to
Likert-type scales Results of a satisfaction
Questionnaires  Results are generally easier survey or opinion survey
and Surveys (than qualitative techniques)
to analyze
 Pre-test/Post-test can be
compared and analyzed
 Allows for the study of the

dynamics of a situation,
frequency counts of target
behaviors, or other behaviors
as indicated by needs of the
evaluation
 Good source for providing
additional information about
Site visits to an after-school
a particular group, can use
program to document the
Observations video to provide
interaction between youth and
documentation
staff within the program
 Can produce qualitative (e.g.,
narrative data) and
quantitative data (e.g.,
frequency counts, mean
length of interactions, and
instructional time)
 A facilitated group interview

with individuals that have
something in common A group of parents of
 Gathers information about teenagers in an after-school
combined perspectives and program are invited to
Focus Groups
opinions informally discuss programs
 Responses are often coded that might benefit and help
into categories and analyzed their children succeed
thematically
 Involves studying a single Shadowing a family while

Ethnographies,
phenomenon recording extensive field
Oral History, and
 Examines people in their notes to study the experience
Case Studies
natural settings and issues associated with
 Uses a combination of youth who have a parent or
techniques such as guardian that has been
observation, interviews, and deployed
surveys
 Ethnography is a more
holistic approach to
evaluation
 Researcher can become a
confounding variable
 Consists of examining
existing data in the form of
databases, meeting minutes,
To understand the primary
reports, attendance logs,
reasons students miss school,
financial records, newsletters,
Documents and records on student absences
etc.
Records are collected and analysed
 This can be an inexpensive
way to gather information,
but may be an incomplete
data source
Table 5.3 Overview of Data Collection Techniques
5.7 Integrating Technology into Data Collection

While using paper and pencil surveys is the tried and true method of collecting data, technology
is rapidly becoming a popular and oftentimes more efficient way to collect data, especially
quantitative data like the kind you might collect with a traditional survey. This section provides
an overview of the benefits and challenges of using technology to collect data.
Types of technology that can be used to collect data traditionally captured with surveys include:
Online or web-based surveys
 Hand-held devices such as clickers and PDAs

 Text messages
 Social networking sites such as Twitter, MySpace, and Facebook
The focus is on using technology to collect quantitative data from participants. However you
could use a social networking site to engage participants in a virtual focus group. Or conduct
observations of interactions on a social-networking site.
5.7.1 Online/Web-based Survey

Online and web-based surveys enable users to design a survey that can then be administered
via an internet link. Some online tools include Survey Monkey, Zoomerang, and QuestionPro.
Advantages Disadvantages
 Simpler and quicker way of collecting
both quantitative and qualitative data
 Limited to respondents who have
 Easy to access a large group of
access to the internet
respondents in geographically diverse
 Some may find on-line interface off-
locations
putting
 More cost effective than manually
 Does not guarantee the quality
administering surveys
(reliability and validity) of actual
 Data can typically be exported
survey design
eliminating manual data entry
 Potential lack of security
 Improves accuracy of data entry (e.g.,
reduces omissions, duplicate entries)
Table 5.4 Advantages and Disadvantages of Online/Web based Survey
5.7.2 Clickers
Clickers are hand-held devices, much like household remote controls, that have been
implemented in classrooms to gauge student participation and learning. Clickers can also be
used to collect data from a group of participants gathered in one location at the same time.
 Reduce errors/missing data  Typically limited to collecting
 Greatly reduce/eliminate data entry quantitative data
 Increase internal program  Technology may be off-putting to certain
evaluation capacity demographics of respondents
 Can collect data from large groups  Cost of obtaining clicker technology
of respondents at once system may be prohibitive initially
 Cost effective over time
Table 5.5 Advantages and Disadvantages of Clickers
5.7.3 Personal Digital Assistants
A Personal Digital Assistant (PDA) is a hand-held mobile computer that can also be used for
data collection in the field. Data is inputted directly on the PDA and then transferred to
another computer for analysis.
 Streamlines the data collection process  Cost of obtaining PDAs may be
 Reduce errors/missing data prohibitive initially
 Greatly reduce/eliminate data entry  Data loss due to malfunctioning
 Can be cost effective over time device
 Enable collection of more data in a  Learning curve associated with i
shorter time frame
Table 5.6 Advantages and Disadvantages of PDA
5.7.4 Text Messaging
Cell phones can also be used as portable, real time data collection tools. Text messaging is a
way to capture information from a large group at once. Each participant would need to have a
cell phone and familiarity with texting. To store/collate data, the use of a message relay system
or interface technology (software program, for example) is also needed. Formatting of
responses needs to be very specific to be received by interface or relay service. The cell phone
receiving messages may need to be linked to a computer or a Web-based interface designed to
capture and store all messages sent to a specific cell phone.
 Capturing data in real time from  Requires advanced technological
many users proficiency of the administrator
 Popularity of texting may mean  All users must possess cell phone and
increased level of comfort with texting capabilities
this method  Currently, message relay systems are not
 No need to purchase costly equipped to receive high number of text
technology for initial data responses at once
collection  Loss of data or technological difficulties
with interface may occur
Table 5.7 Advantages and Disadvantages of Text Messaging
5.7.5 Social Networking Sites
Social networking sites often include profiles of individuals with information available such as
the user’s age/birth date, gender, ethnicity, location (address/city), sexual orientation, political
orientation, education, and contact information (email, phone number, website). These sites
can also include forums in which users can dialogue with one another. Examples of networking
sites include MySpace, Facebook, and Twitter, all of which have gained popularity as social
forums and modes of communication. Data could be collected through sampling random sites
for trends, soliciting information from specific users and creating a profile for data collection
that attracts certain users for discussions (such as online focus groups).
Disadvantages
Advantages
 No verification of information available
 Able to reach a young demographic
on public profiles
using a popular medium
 Privacy settings on profiles may impede
 Option to create a profile to target
data collection
specific community
 Social networking caters to very specific
 Ability to engage participants at
demographic of users, with an average
remote locations in real time
age range of 14-35 years
 Can be a rich source of quantitative
 Consent issues involved working with
and qualitative data, some of which
underage youth (if soliciting information
is publicly available
not publicly available on profile)
Table 5.8 Advantages and Disadvantages of Social Networking Sites
5.8 Data Analysis

Data Analysis is the process of systematically applying statistical and/or logical techniques to
describe and illustrate, condense and recap, and evaluate data.
While data analysis in qualitative research can include statistical procedures, many times
analysis becomes an ongoing iterative process where data is continuously collected and
analyzed almost simultaneously. Indeed, researchers generally analyze for patterns in
observations through the entire data collection phase. An essential component of ensuring data
integrity is the accurate and appropriate analysis of research findings. Improper statistical
analyses distort scientific findings, mislead casual readers and may negatively influence the
public perception of research. Integrity issues are just as relevant to analysis of non-statistical
data as well.
5.8.1 Statistical Analysis
Statistical analysis is a component of data analytics. In the context of business intelligence

(BI), statistical analysis involves collecting and scrutinizing every data sample in a set of items
from which samples can be drawn. The statistical methodologies are used in data analysis are
 Mean
 Median
 Standard Deviation
5.8.2 Mean and median
The mean and the median are both measures of central tendency; they give an indication of the
average value of a distribution of figures.
The mean is the arithmetic average of a group of scores; that is, the scores are added up and
divided by the number of scores. The mean is sensitive to extreme scores when population
samples are small. For example, for a class of 20 students, if there were two students who
scored well above the others, the mean will be skewed higher than the rest of the scores might
indicate. Means are better used with larger sample sizes.
The median is the middle score in a list of scores; it is the point at which half the scores are
above and half the scores are below. Medians are less sensitive to extreme scores and are
probably a better indicator generally of where the middle of the class is achieving, especially
for smaller sample sizes.
The larger the population sample (number of scores) the closer mean and median become. In
fact, in a perfect bell curve, the mean and median are identical.
5.8.3 Standard deviation
Standard deviation (SD) is a widely used measurement of variability used in statistics. It shows
how much variation there is from the average (mean). A low SD indicates that the data points
tend to be close to the mean, whereas a high SD indicates that the data are spread out over a
large range of values.
One SD away from the mean in either direction on the horizontal axis (the orange area on the
graph) accounts for around 68 percent of the people in this group. Two SDs away from the
mean (the orange and beige areas) account for roughly 95 percent of the people. And three SDs
(the orange, beige and blue areas) account for about 99 percent of the people.
Fig. 5.1 Standard Deviation Chart
If this curve were flatter and more spread out, the SD would have to be larger in order to account
for those 68 percent or so of the people. So the SD can tell you how spread out the examples
in a set are from the mean.
For example, if you were to calculate the SD of scores from a class of students of similar ability,
you would expect it to be low, because the scores would all be close to the mean. On the other
hand, you would expect the SD of scores from a mixed-ability class to be higher. If these
calculations did not conform to expectations, you would want to look more closely at the data
to check for inaccuracies.
Fig. 5.2 Standard Deviation Chart
5.9 Types of Graphs

The Pie Chart
A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to
illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its
central angle and area), is proportional to the quantity it represents. While it is named for its
resemblance to a pie which has been sliced, there are variations on the way it can be presented.
Fig. 5.3 Pie Charts
There are sub-types of the Pie Chart available. The second chart above is the Pie in 3-D and
the third chart is an Exploded Pie Chart; an Exploded Pie in 3-D is also available.
It is possible to customize the design of the pie chart so either the numeric values or the
percentages display inside the chart on top of the slices of the pie.
The Column Chart
A bar chart or bar graph is a chart that presents grouped data with rectangular bars with
lengths proportional to the values that they represent. The bars can be plotted vertically or
horizontally. A vertical bar chart is sometimes called a column bar chart.
A bar graph is a chart that use either horizontal or vertical bars to show comparisons among
categories. One axis of the chart shows the specific categories being compared, and the other
axis represents a discrete value. Some bar graphs present bars clustered in groups of more than
one.
Fig. 5.4 Column Chart
The Column Chart very effectively shows the comparison of one or more series of data points.
But the Clustered Column Chart is especially useful in comparing multiple data series.
Fig. 5.5 Clustered Column Chart
One variation of this chart type is the Stacked Column Chart. We show a 3-D Stacked Column
Chart at left. In a Stacked Column Chart, the data points for each time period are "stacked"
instead of "clustered." This chart type lets us see the percentage of the total for each data point
in the series.
All the Column Charts have a version in which the columns display in three-dimension - as
illustrated by the 3-D Stacked Column Chart above. But one chart, the "3-D Column Chart," is
special because the chart itself is three-dimensional - displaying multiple series on the X-axis,
Y-axis, and Z-axis. The first chart below is a 3-D Column Chart of our data series.
Fig. 5.6 3-D Column Chart
In newer versions cylinders, pyramids, and cones can be used instead of bars for most of the
Column charts. The second chart above shows a 3-D Pyramid Chart.
The Bar Chart
The Bar Chart is like a Column Chart lying on its side. The horizontal axis of a Bar Chart
contains the numeric values. The first chart below is the Bar Chart for our single series,
Flowers.
When to use a Bar Chart versus a Column Chart depends on the type of data and user
preference. Sometimes it is worth the time to create both charts and compare the results.
However, Bar Charts do tend to display and compare a large number of series better than the
other chart types.
Fig. 5.7 Bar Charts and Clustered Bar Charts
All of the Bar Charts are available in 2-D and 3-D formats, but only the bars are 3-D. There is
no 3-D Bar chart containing three axes.
The Line Chart
A line chart or line graph is a type of chart which displays information as a series of data
points called 'markers' connected by straight line segments. It is a basic type of chart common
in many fields. It is similar to a scatter plot except that the measurement points are ordered
(typically by their x-axis value) and joined with straight line segments. A line chart is often
used to visualize a trend in data over intervals of time – a time series – thus the line is often
drawn chronologically.
Fig. 5.8 Line Chart
The Line Chart is especially effective in displaying trends. In a Line Chart, the vertical axis
(Y-axis) always displays numeric values and the horizontal axis (X-axis) displays time or other
category.
Fig. 5.9 Line Chart (Multiple Series)
The Line Chart is equally effective in displaying trends for multiple series as shown in our
chart at right. As you will notice, each line is a different color. Though not as colorful as the
other charts, it is easy to see how effective the Line Chart in showing a trend for a single series,
and comparing trends for multiple series of data values.
The Area Chart
An area chart or area graph displays graphically quantitative data. It is based on the line
chart. The area between axis and line are commonly emphasized with colors, textures and
hatchings. Commonly one compares with an area chart two or more quantities.
Fig. 5.10 Area Chart
Area Charts are like Line Charts except that the area below the plot line is solid. And like Line
Charts, Area Charts are used primarily to show trends over time or other category. The chart at
left is an Area Chart for our single series.
Fig. 5.11 3-D Area Chart
In many cases, the 2-D version of the Area Chart can be ineffective in displaying multiple
series of data meaningfully. Series with lesser values may be completely hidden behind series
with greater values - as demonstrated in the first chart below. Flowers is totally hidden, and
just a wee bit of Trees peaks through.
Fig. 5.12 Area Charts
The Scatter Chart
The purpose of a Scatter Chart is to observe how the values of two series compares over time
or other category. A scatter plot can be used either when one continuous variable that is under
the control of the experimenter and the other depends on it or when both continuous variables
are independent.
"Scatter plots are similar to line graphs in that they use horizontal and vertical axes to plot data
points. However, they have a very specific purpose. Scatter plots show how much one variable
is affected by another. The relationship between two variables is called their correlation."
Take a look at our two sample Scatter Charts below. The first chart is a Scatter Chart with Only
Markers, and the second chart is a Scatter Chart with Smooth Lines.
Fig. 5.13 Scatter Charts
5.10 Data Mining

Data mining is an interdisciplinary subfield of computer science. It is the computational
process of discovering patterns in large data sets involving methods at the intersection of
artificial intelligence, machine learning, statistics, and database systems. The overall goal of
the data mining process is to extract information from a data set and transform it into an
understandable structure for further use.
There are forms of data that can be used for extracting models describing important classes or
to predict future data trends. Those forms are as follows –
 Clustering
 Classification
 Prediction
5.11 Classification
Classification is a classic data mining technique based on machine learning. Basically,
classification is used to classify each item in a set of data into one of a predefined set of classes
or groups. Classification method makes use of mathematical techniques such as decision trees,
linear programming, neural network and statistics. In classification, we develop the software
that can learn how to classify the data items into groups. For example, we can apply
classification in the application that “given all records of employees who left the company,
predict who will probably leave the company in a future period.” In this case, we divide the
records of employees into two groups that named “leave” and “stay”. And then we can ask our
data mining software to classify the employees into separate groups.
Fig. 5.14 Classification
5.12 Clustering
Clustering is a data mining technique that makes a meaningful or useful cluster of objects which
have similar characteristics using the automatic technique. The clustering technique defines the
classes and puts objects in each class, while in the classification techniques, objects are
assigned into predefined classes.
Fig. 5.15 Clustering

To make the concept clearer, we can take book management in the library as an example. In a
library, there is a wide range of books on various topics available. The challenge is how to keep
those books in a way that readers can take several books on a particular topic without hassle.
By using the clustering technique, we can keep books that have some kinds of similarities in
one cluster or one shelf and label it with a meaningful name. If readers want to grab books in
that topic, they would only have to go to that shelf instead of looking for the entire library.
5.13 Prediction
The prediction, as its name implied, is one of a data mining techniques that discovers the
relationship between independent variables and relationship between dependent and
independent variables. For instance, the prediction analysis technique can be used in the sale
to predict profit for the future if we consider the sale is an independent variable, profit could
be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted
regression curve that is used for profit prediction.
Fig. 5.16 Prediction
5.14 Summary
The process involved in finding the story within your data
1. Finding Data – find the data that is suitable to answer your question
2. Wrangle the Data – bring it to a format that is useable
3. Merge Datasets – Bring different datasets together
4. Filter and sort the Data – Pick the data that is interesting
5. Analyze Data – Is there something to it?
6. Visualize Data – If there is something interesting in the data how can we best
showcase it to others?
"Reason sits firm and holds the reins, and she will not let the feelings burst away and
hurry her to wild chasms. The passions may rage furiously, like true heathens, as they
are; and the desires may imagine all sorts of vain things: but judgement shall still have
the last word in every argument, and the casting vote in every decision."
-- Charlotte Brontë
Session 6
LOGICAL THINKING AND REASONING

6.1 Introduction
The term “logic” refers to the science that studies the principles of correct reasoning.
However, this is not science like physics, biology, or psychology. Rather, logic is a non-
empirical science like mathematics. Logic requires the act of reasoning by humans in order to
form thoughts and opinions, as well as classifications and judgments. It is not a magic, not
genetically built. It is a learned mental process or sequential thinking process. It is thinking in
terms of connections and consequences. Reasoning is coherent. When ideas stick together
because they are arranged in an order that makes sense to the reader. The reasons or evidence
must have a connection; they can’t just jump around. A text is illogical when it does not provide
reasons backed by evidence like facts and examples.
Logical thinking is commonly referred to as left-brain thinking. Logical thinking uses the
straight facts in order to solve problems, as opposed to right-brained thinking, which is more
emotional in nature. It is a branch of knowledge which reflects on nature of thinking itself when
he or she is reasoning- correct thinking in reasoning.
To think as a human is to think for a purpose. In pursuing a purpose (using thought),
questions are generated.
– Why is that so
– Where is the evidence
– Is the evidence reliable
– What are the alternative explanations
To answer a question you need information that bears on it. To make sense of information, you
must come to some conclusions, And, finally, to think purposively, using information, to come
to conclusions is to think within a point of view.
6.2 Arguments
Logical thinking is a special mental activity to draw conclusions. Whenever we express
a thought, we do so by means of statements. An Argument is a collection of conversation which
helps us draw some conclusion. Formally, argument is a group of statements, one of which (the
conclusion) is claimed to follow from the other or others (the premises).
The premises of an argument are intended to support (justify) the conclusion of the
argument. Premises are statements that set forth the evidences. Conclusion is a statement that
is claimed to follow from evidences
Example:
Premise 1: All Animals can jump
Premise 2: Cat is an animal
Conclusion: Therefore, cat can jump
General Nature of arguments
When we draw conclusions, we do it in some circumstances based on some information
(premises) using some concepts. Good arguments are those which the conclusion really does
follow from the premises. Bad arguments are those in which does not yield conclusion, even
though it is claimed to.
6.3 Logical Reasoning

Reasoning is the capacity for rational thought or to think logically. It is the process of
using rational, systematic series of steps based on sound mathematical procedures and given
statements to arrive at a conclusion. The reasoning process may be thought of as beginning
with input (premises, data, etc.) and producing output (conclusions). In each specific case of
drawing (inferring) a conclusion C from premises P1, P2, P3, ...,. Concern of logic is whether
the inference of C on the basis of P1, P2, P3, ... is correct.
Strength of logic lies in connecting right facts. The three parts of an argument are
Premises, Reasoning and Conclusion. It is significant that correct reasoning rests on the
structure and truth of arguments. If the premise is true does not mean conclusion is true.
Example: All Cats have four legs. I have four legs. Therefore, I am a cat is not correct
Conclusion
Logical reasoning skills can be learned and improved. Formation and analysis of
arguments leads to correct reasoning. Logic is the study of information encoded as logical
sentences. Now let us formalize our conversations/arguments into logical sentences
6.4 Formation of Logical sentences

Logical Sentences (statements) are declarative sentences, that is either true or false
The following are examples of statements.
It is raining
I am hungry
2+2 = 4
God exists
Hydrogen is combustible
The attribute by which a statement is either true or false is called truth value of
statement. The truth values of a statement can be either true or false.
There are also interrogative, imperative, and exclamatory sentences which is not
capable of being true or false are not statements. The following are examples of sentences
that are not statements.
are you hungry?
shut the door, please
We suggest that you travel by bus.
Turn to the left at the next corner.
You, here!!!!
6.5 Analysing/Evaluating Arguments

There are two different reasoning approaches to reach a logical conclusion. They are
deductive and inductive reasoning. Both these approaches can be depicted as follows
Fig. 6.1 Reasoning
6.6 Inductive Reasoning

In inductive reasoning, you make generalized decision after observing or witnessing,
repeated specific instance. In the process of induction, you begin with some data, and then
determine what general conclusion(s) can logically be derived from those data. For example to
solve the following series 2,4,8,16,_ You may have to look for a pattern, then make a
conclusion and prove it. In other words, you determine what theory or theories could explain
the data.
Inductive reasoning is
• Making generalization from observed patterns in data
• Use patterns to arrive at a conclusion (conjecture).
• Remember, just because something is true for several specific cases does not prove
that it is true in general.
Inductive reasoning is specific to General. However, induction does not prove that the
theory is correct. There is always some degree of truth regarding the veracity of the conclusion.
Inductive arguments, therefore, are fallible: no matter how strong or valid the argument is,
there is always the possibility that the conclusion may be false. Inductive arguments are unable
to be completely certain because the supporting premises utilize empirical, or observational,
evidence, and this kind of evidence is never fully reliable. Successful inductive reasoning
depends on the quality of your observations, or evidence. If the quality of the observations are
not good enough, or if not enough observations have been made, inductive reasoning may not
be as dependable. Inductive reasoning inferences can be only probable/plausible/likely/
reasonable to conclude
Example: The children in that house yell loudly when they play in their bedroom. I can hear
children yelling in that house, therefore the children must be playing in their bedroom .
Given the data this is certainly a reasonable hypothesis. But children may play somewhere
else and yell. Here the conclusion is only probable
Examples of inductive logic:
This cat is black. That cat is black A third cat is black. Therefore all cats are are black.
This marble from the bag is black. That marble from the bag is black. A third marble
from the bag is black. Therefore all the marbles in the bag black.
Two-thirds of my latino neighbors are illegal immigrants. Therefore, two-thirds of

latino immigrants come illegally.
6.7 Deductive Reasoning
Deductive reasoning is reasoning where true premises develop a true and valid
conclusion. A process of reasoning from known facts to conclusions. Sometimes it can be
thought of as starting from a general statement that is accepted as true to a specific statement
that is “therefore” true. In deductive reasoning one arrives at a specific conclusion based on
generalizations.
Example” "All men are mortal. Harold is a man. Therefore, Harold is mortal."
“Bachelor's are unmarried men. Bill is unmarried. Therefore, Bill is a bachelor”
For deductive reasoning to be sound, the premises must be correct. It is assumed that
the premises, "All men are mortal" and "Harold is a man" are true. Therefore, the conclusion
is logical and true. Validity of deductive reasoning can be certainly/absolutely/definitely
Deductive Reasoning often uses a 3-step argument called syllogism, introduced by the
Greek philosopher Aristotle. This 3-step process was the beginning of modern formal logic,
or logical thinking.
Theory of Syllogism
All “x” has the characteristic of “y”.
This thing is an “x”.
Therefore, this “x” has the characteristic of a “y”.
Example using Syllogism
The last day to register for the Disney trip is July 15. Joe missed the registration date.
Therefore, Joe will not be able to register for the trip.
We Need both Inductive and Deductive Reasoning
In scientific discovery and in life, we use both types of reasoning. For example, we can use
inductive reasoning to attempt to make enough observations to come up with a theory or
conclusion. Next, we can switch back to deductive reasoning using our conclusion or theory
(which is general). Note that sometimes it is not possible to prove something is always true, so
the best we can have is a theory.
Session 7
Boolean Logic
7.1 Introduction
Boolean Logic is a part of symbolic logic. Symbolic Logic is a modern extension of
Aristotelian logic where symbols represent statements of truth. Boolean Logic is created by a
English mathematician named George Boole in 1850 and is devised for dealing mathematically
with philosophical proposition which have only two possible values such as TRUE or FALSE.
To express the logical thought, the logical operators called logical connectives are used. The
rules of the logic tell us how to manipulate inputs and produce outputs. Logic is the tool for
reasoning about the truth or falsity of statements.
Fig 1. Circuit to switch ON/OFF the Lamp

Fig 1. shows the circuit to switch ON/OFF the Lamp. If the switch SW1 is open then the Lamp
is in OFF state and vise versa.
7.2 Boolean Algebra

Boolean algebra deals with the rules that govern various operations between the binary
variables. Binary digits are called Bits which is represented by 0 and 1. 0 represent the OFF
state and 1 represent ON state.
Fig 2a. ON/OFF state

Fig 2b. Truth Table
Fig 2a. represents the ON/OFF state of the Lamp and the same is represented in 0’s and 1’s in
Fig 2b.
7.3 Difference between Arithmetic and Logic

Arithmetic Logic
0 False
1 True
Boolean Variable Statement Variable
Forms of Function Statement Form
Value of Function Truth value of the statement
Equality of Function Equivalence of statement
7.4 Boolean Logic Operations

The operators used most often are AND and OR.
1. AND operation or Conjunction
It describes events that can occur if and only if all events are true. If and only
if all inputs are on, the output will be on. The output will be off if any of the inputs are
off.
It is represented as C=A.B
Fig 3a. Truth Table for AND Fig 3b. Circuit for AND
Fig 3a. represents the truth table for AND operations, the output is 1 if and only if both the
inputs are 1. Fig 3b. is the circuit diagram for the AND operation in which the switch A and
B are connected in series. If A and B are in ON state then the Lamp L will be in ON state.
2. OR operation or Disjunction
It describes events which can occur if at least one of the other events are true.
The OR operation says if any input is on, the output will be on.
It is represented as C= A+B
Fig 4a. Truth Table for OR Fig 4b. Circuit Diagram for OR
Fig 4a. represents the truth table for OR operation, in which if any of the input is 1 then the
output will be 1. Fig 4b. represents the circuit diagram for OR, in which the switch A and B
are connected in parallel and if any of the switch is ON the Lamp will be ON.
3. NOT Operation or Inversion
NOT operation changes a statement from true to false and vice versa. It is
represented as c= Ā.
Fig 5a. Truth Table for NOT Fig 5b. Circuit Diagram for
NOT
Fig 5a. represents the Truth Table for NOT and the circuit diagram for the same is given in
Fig 5b. If the switch A is ON state, the Lamp is in the OFF state and vise versa.
7.5 Logic gates

A logic gate is an elementary building block of a digital circuit. Most logic gates have two
inputs and one output. At any given moment, every terminal is in one of the two binary
conditions low (0) or high (1), represented by different voltage levels. The relationship between
the input and the output is based on a certain logic. Based on this, logic gates are named as
AND gate, OR gate, NOT gate etc.
Boolean Circuits
NOT
AND
OR
XOR
NAND
NOR
7.6 Basic Laws of Boolean Algebra
 Commutative Law
 A+B=B+A
 A.B=B.A
 Associative Law
 A + (B + C) = (A + B) + C
 A (BC) = (AB) C
 Distributive Law
 A (B + C) = AB + AC
7.7 Basic Rules of Boolean Algebra

 A+0=A
 A+1=1
 A.0=0
 A.1=A
 A+A=A
 A+Ā =1
 A.A=A
 A.Ā =0
 A’’ = A
 A + AB = A
 A + ĀB = A + B
 (A+B) (A+C) = A + BC
De-Morgan’s Theorems
 The inverse of a product is equal to the sum of the complements.
 The inverse of a sum is equal to product of its complements.
Example 2:
There is a car with three main control systems. A warning lamp should be designed to light if
any of the following conditions occur:
 All systems are down
 Systems A,B down but C is ok
 Systems A,C down but B is ok
 System A down, but B,C are ok
Step 1 : Define the problem
 There are two possible states for each system
 Assign:
 System : Down = 0, OK = 1
 Light : Off = 0, On = 1
Step 2: Draw a logic block diagram
Step 3: Prepare the truth table
Step 4: Write logic equations
Step 5: Simplify the equations

Session 8
Propositional Logic
8.1 Propositions
What is proposition?
• A proposition is a declarative statement that is either TRUE or FALSE
• It cannot be both TRUE and FALSE
• T denotes TRUE and F to denote FALSE
Example- PROPOSITION
• 5+2 = 7
• 5 + 2 = 12
• The sun rises in the east
• Milk is white
• The earth is flat
EXAMPLE- NON PROPOSITION

• What time it is?
• Go in a straight line
• Don’t look back
8.2 Propositional Logic

What is Propositional Logic?
• A formal language for representing knowledge and for making logical inferences.
• Sometimes called sentential logic or statement logic.
• In Propositional Logic, capital letter is used to represent a simple sentence.
• Simple sentences are relatively short and do not contain any other sentence as a component.
Ex.
• Grass is green.
• The sky is blue
• The first sentence ‘Grass is green’ can be symbolized as G. The second sentence ‘The sky is
blue’ can be symbolized as S.
Session 9
Connectives
9.1 Connectives
What is Connectives?
In propositional logic, connectives are used to show the relationship between the propositions
or sentences.
Need for Connectives
Complex and larger propositions can be constructed by combining simple propositions using
connectives.
9.2 Elements of connectives

• AND ( )
• OR ()
• NOT ()
• IF_THEN (IMPLY) ()
• IF_AND_ONLY_IF ()
• Propositions and connectives are the basic elements of propositional logic.

Session 10
Applications of Propositional Logic
10.1 Applications of Propositional Logic
1) Querying Search Engine
2) Analysis of Digital Circuit
3) Querying Database
Application 1:Querying Search Engine

What is Search Engine?
A (web)search engine is a web-based tool that enables users to locate(find) information on the
World Wide Web.
Ex: Google, Yahoo, MSN search and so on.
The search engine takes our phrase / keyword and returns search engine results pages with a
list of sites it deems relevant or connected to wer searched keyword.
What is Search Query ?
 A search query is a phrase (keyword) that describes the information that the user seeks to
obtain.
 A search query returns web pages that contain all the terms or phrases in the query.
 By default, all search queries that use more than one word use conjunction.
 Search Key: Karunya university  Two Words
 Search Engine Understands that:
Karunya AND University

Example 1:
As an example, consider searching for information about a high school classmate named Lisha Joy
 We first decide to use the search query “Lisha”.
 The query returns a flood of web pages related to Lisha's of all kinds; none of whom are
related to the “Lisha Joy” that was high school friend.
Conjunction:
o We decide to use the search query as “Lisha Joy”.
o The search engine understands this query to be the conjunction of the two words.
o It understands the query to mean “Lisha AND Joy.”
 We still cannot find any web page related to high school classmate Lisha Joy.
 As we think about how to improve search results, we remember that she married someone
with the surname “Stephen”.
 We therefore attempt to construct a search query “Lisha Joy Stephen”
 This search query would be overly narrow since only pages containing all three words would
be returned.
 The search engine would interpret this query to mean “find all web pages that contain the
words Lisha AND Joy AND Stephen”.
 Search engine shows all related details. But it fails to show desired result.
Disjunction:
o We therefore attempt to construct a search query that expresses the possibility that her
last name is either Joy OR Stephen.
o We therefore construct the search query “Lisha AND (Joy OR Stephen).
 Perhaps the search query “Lisha AND (Joy OR Stephen)” returns many web pages related to a
well-known soccer player.
 We are certain that high school classmate is not a soccer player and therefore we attempt to
construct a search query that prevents soccer-related pages from appearing.
Negation
o Search engines use the NOT operator to exclude pages that contain certain words.
o We can therefore construct the following search query: “Lisha AND (Joy OR
Stephen)AND NOT soccer”. It gives desire result.
Example 2:
 Consider a situation where we live in a city that has a grocery store named “Green”. We
would like to use a search engine to find the phone number and the hours of operation for this
store.
 You use the term “green”.
 When this text is written, the returned results may be:
 Green is a color
 Green Party’s home page
 The website of “Green Climate Fund” ,
 A Wikipedia article titled “Green” ,
 A link to a web page devoted to the “Green and Clean”.
 The query did not produce useful results!

 To narrow down the search further, use more words in the query. It is called
conjunction.
Conjunction:
 For example, Green Basket Coimbatore
 Green AND Basket AND Coimbatore
 Suppose you want to search the details of Green Basket or Green BigBasket
but not both. Then use disjunction.
Disjunction:
 Then the search query should be “Green AND (Basket or
BigBasket)”.
 This query correctly expresses your desire to find web pages that
contain only Basket or BigBasket’ but not both.
 If you want to exclude any word from a query, use negation.
Negation
 Search engines use the NOT operator to exclude pages that contain
certain words. (Represented by – symbol)
 For example, Green AND (Basket or BigBasket) AND –Fresh
Application II: Digital Circuits

 Digital circuits are the physical components of a computing system.
 A digital circuit is an electronic system that enables a computer to perform arithmetic

operations such as addition and multiplication among many other operations.
 These digital circuits are typically constructed by combining logic gates in various ways.
 A logic gate is an electronic device that implements a Boolean operator.
 It has inputs and produces a single output corresponding to the operator that it implements.
 The logic signal, takes on values of 0 (FALSE, OFF) or 1 (TRUE, ON).
 The signal might really be a voltage, a switch closure, etc.
 For the AND circuit (Figure: AND circuit) , it is apparent that both switches must be closed in
order to light up the bulb.
 If either one of the switches in the OR circuit (Figure: OR circuit) is closed, the light bulb will
be illuminated.
 Simple propositions can be represented by one of these logic gates.

 Compound propositions can be represented as a combination of these simple logic gates.
 Digital circuits are therefore equivalent to compound Boolean propositions.
Example 1: Find the output of the following circuit
Example 2: Find the output of the following circuit
Example 3:Find the output of the following circuit

Application III:Database Queries
 Databases are software systems designed to efficiently store enormous amounts of data such
that pieces of data in the database can be very quickly located and retrieved.
 Most databases store information in tables such that:
 Each row of the table contains a set of data that belongs to a single record.
 Each column of the table defines a field and
 Every cell of the table contains one field for one record of data.
Example database table:

Table Name: Donor Table
Query:
To retrieve the data from the database table, Queries are used.
Example 1:
If we want to find the customers, who donate more 700, following query can be used.
Query: Select First,Last from Donor where Amount > 700.
In this query , First and Last is Field name. Donor is table name. Amount > 700 is condition. So above
query used to find the customer name who donate more than 700. In this query, propositional logic is
used in specifying the condition (where part).
Output:
William Shell
Helen Lobby
Reggle Green
Example 2:
If we wants to find the donors who are under 40 years of age and have contributed at least $500 in
the past, following query can be used.
Query: Select First,Last from Donor where Amount >= 500 AND Age < 40;
Output:
Helen Lobby
Reggle Green
Jennifer Dichali
Session 11
Introduction to Computational Thinking and Problem Solving
11.1 Computational Thinking
Computers are used in everyday life for solving problems of various kinds in various fields.
Computer programming is used for implementing the solutions to the problems. Although
programming is an essential activity in computer science, it is not the only activity involved in
solving a problem. Computer science is mainly about computational thinking or computational
problem solving. It is about learning how computers solve a problem or the way of orienting
our thought process in a way in which the computer solves a problem. Computational problem
involves the following processes.
 Analyzing the problem
 Designing the solution to a problem
 Implementing the solution
 Testing the solution
The above processes are the steps involved in solving a problem computationally.
• Clearly understand the problem

Analyze • Know what constitutes a solution
• Determine what type of data is needed

• Determine how data is to be structured
Describe Data
& Algorithms • Find and/or design appropriate algorithms
• Represent data within the programming language

Implement • Implement algorithms in programming language
Programs
• Test the program on a selected set of problem instances

Test and • Correct and understand the causes of any errors found
Debug
Fig. 11.1 Steps involved in Problem Solving

11.2 Phase I : Problem Analysis
The first phase in solving a problem computationally is problem analysis. Problem analysis
requires two things.
1. A representation that captures all the relevant aspects of the problem
2. An algorithm that solves the problem by using the representation.
Let us consider the Man, Cabbage, Goat, Wolf [MCGW] problem. There is a man who lives
on the east side of a river. He has a goat, a wolf and a cabbage with him. When he is there he
will take care that the goat would not eat the cabbage and the wolf would not eat the goat. He
wishes to bring the cabbage, goat and the wolf to the west side of the river for selling them.
But he has a goat which is large enough to carry himself and either the cabbage or the goat or
the wolf. The man cannot leave the cabbage alone with the goat because the goat will eat the
cabbage. He cannot leave the wolf alone with the goat because the wolf will eat the goat. How
can he bring all of them safely to the west side of the river?
An algorithmic approach for solving this problem is simply trying all possible combinations of
the items that can be taken back and forth across the river and then arriving at a correct solution.
Trying all possible solutions to a problem and finding a solution is known as Brute Force
Approach. Initially we have to find out the relevant aspects of the problem. When a problem
is analyzed, there may be relevant details as well as irrelevant details. For example in the
MCGW problem, the relevant details are
 What is the current location of the items? – On the east side of the river
 What is their destination? – On the west side of the river
 What are the items? – Man, Cabbage, Goat, Wolf
 How many items can travel in the boat? – Only two
But there may be irrelevant details like, the color of the boat, the width of the river, the name
of the man etc. These details are not required for our solution. Therefore, these details need not
be represented in our data representation. The process of hiding the irrelevant data and exposing
only the relevant data is known as data abstraction.
11.3 Capturing Relevant Aspects

Example 1:
Let us consider the MCGW problem. The relevant aspects of this problem are the four items
i.e. Man, Cabbage, Goat and Wolf and their locations at each step. The initial state of all the
items is that they are on the East side of the river. The goal state is that all items should be on
the West side. The other details are not required and they can be ignored. The sequence of steps
which converts the initial state to the goal state is the solution to the problem. The steps required
to solve a problem is called an algorithm. To solve the problem using a computer the algorithm
has to be transformed into a computer language and executed.
Example 2:
Let us see another example for finding relevant data. The problem is to display a calendar
month for any given month and year. The relevant details for this problem are:
 Month and year for which the calendar is to be displayed – This should be provided as
input by the user.
 Number of days in each month of the year
 Names of the days of the week
 Day of the week for the first day of the month.
To solve this problem, we have to determine the day of the week that a given date falls on. This
computation requires an algorithm. In essence, an algorithm is required to solve any problem.
You may use an existing algorithm or you may design your own algorithm for solving
problems. Standard problems can be solved using existing algorithms.
Solutions can be calculated in different ways.
 By direct calculation (Eg. Area of circle, calendar problem)
 By Brute Force Method (Eg. MCGW problem)
For solving complex problems. More efficient algorithms may be required. If there exists more
than one solution to a problem, choose the best solution. If there are multiple solutions to a
problem, a program might find a solution, an approximate solution, a best solution or all the
solutions. For example, MCGW problem has infinite number of solutions. So the best solution
is the one with the shortest number of steps. In travelling salesman problem, there is only one
solution (provided there is only one shortest route). For chess, there may be multiple solutions.
Example for multiple solutions:
Problem : How many times the number 8 goes into the number 100?
• Solution 1
count = 0;
number = 100;
while the number is greater than 8
subtract 8 from the number
add one to the count
end // start the while again
• Solution 2
count = 100 / 8
11.4 Phase II. Problem Design
Problem design involves two major tasks.
 Data Representation
 Algorithm Description
Data representation: An appropriate representation of data is a relevant aspect of computer
science. Single data can be represented as number, character or Boolean. Multiple data can be
represented using lists, tables or a combination of data types (using structure and class).
Algorithm Description: An algorithm is a set of steps to solve a problem.
Data Representation - Examples
Example 1:
Now, let us represent the relevant details in the MCGW problem. At each step, the location of
the items can be represented as the state of the problem.
Initial State: [M, C, G, W] [E, E, E, E]
Here, we use two lists for representing the initial state of the problem. The first list is the list
of the items. The second list is the respective location of each item. That is, Man is in the East
side of the river, Cabbage is in the East side of the river, Goat is in the East side of the river
and Wolf is in the East side of the river. Now, each step of the solution can also be represented
with the same representation.
Step 1: The man carries the goat to the West side of the river. So, this state would be
[M, C, G, W] [W, E, W, E]
Here, W indicates the West side of the river.
Step 2: The man leaves the goat on the West side and goes to the East side of the river.
[M, C, G, W] [E, E, W, E]
Step 3: The man takes the cabbage with him and goes to the West side of the river.
[M, C, G, W][W, W, W, E]
Step 4: The man leaves cabbage on the West side of the river and takes the goat to the East side
of the river.
[M, C, G, W][E, W, E, E]
Step 5: The man leaves the goat on the East side and takes the wolf to the West.
[M, C, G, W][W, W, E, W]
Step 6: Man leaves wolf on the west and goes to East
[M, C, G, W][E, W, E, W]
Step 7: Man takes goat to the West side
[M, C, G, W][W, W, W, W]
Step 7 is the goal state.
Example 2:
Calendar Problem
The relevant details for the calendar problem are
 Month and year for which the calendar is to be displayed
 The number of days in each month of the year
 The names of the days of the week
 Day of the week of the first day of the month
The data can be represented in lists as follows:
[month, year]
[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
[‘Sun’, ‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’]
The day of the week of the first day of the month can be represented using a string.
Example 3:
The Travelling Salesman Problem
The travelling salesman problem is a standard problem in computer science. It solves the
problem of visiting all cities allotted to him in an optimized way. The salesman has to visit all
the cities exactly in the shortest possible route and return to the starting point. The relevant
details for this problem are: the cities to be visited and the distance between a pair of cities.
The data in the travelling salesman problem can be represented as a table or as a list of lists.
Table Representation
List of Lists
[ [‘Atlanta’, [‘Boston’, 1110], [‘Chicago’, 718], [‘Los Angeles’, 2175], [‘New York’, 888],
[‘San Francisco’, 2473] ],
[‘Boston’, [‘Chicago’, 992], [‘Los Angeles’, 2991], [‘New York’, 215], [‘San Francisco’,
3106] ],
[‘Chicago’, [‘Los Angeles’, 2015], [‘New York’, 791], [‘San Francisco’, 2131] ],
[‘Los Angeles’, [‘New York’, 2790], [‘San Francisco’, 381] ],
[‘New York’, [‘San Francisco’, 2901] ] ]
Algorithm Description: The second major task in problem design is algorithm description. You
can choose either an existing algorithm or you may design a new algorithm for solving a
problem. For example, for the calendar problem, day of week algorithm already exists. For the
travelling salesman problem, many algorithms are available. Algorithms that work well in
general, but are not guaranteed to give the correct result for each specific problem are called
heuristic algorithms.
11.5 Phase III : Program Implementation

The third phase in problem solving is program implementation. Once a problem is analyzed,
the data is represented and a suitable algorithm is identified, the next task is to implement the
algorithm. Choose a programming language and transform the algorithm into source code. This
is the task that is done by a developer or a programmer.
11.6 Phase IV: Program Testing

Once a program is developed, there may be errors. Programming errors are pervasive,
persistent and inevitable. Therefore, the program that is developed should be tested to ensure
that it is free from errors or program bugs. The process of testing a program or software is
called as software testing. Software testing is an essential part of software development. The
program can be tested with sample data sets.
11.7 Computational Thinking

Computational thinking is the thought process involved in formulating a problem and
expressing its solutions in such a way that a computer can effectively carry out.
Computational thinking has four elements.
• Decomposition: Breaking down data, processes, or problems into smaller,
manageable parts
• Pattern Recognition: Observing patterns, trends, and regularities in data
• Abstraction: Identifying the general principles that generate these patterns
• Algorithm Design: Developing the step by step instructions for solving this and
similar problems.
11.8 Creative Problem Solving

Creative problem solving is – looking at the same thing as everyone else and thinking
something different. The creative person uses information to form new ideas. The real key to
creative problem solving is what you do with the knowledge. Creative problem solving requires
an attitude that allows you to search for new ideas and use your knowledge and experience.
Change perspective and use knowledge to make the ordinary extraordinary and the usual
commonplace. Creative problem solving may face barriers like,
 Time
 Why change?
 Usually don’t need to be creative
 Habit
 Routine
 Haven’t been taught to be creative
Mental blocks are reasons or attitudes why we don’t think something different.
Mental blocks
1. The _______ answer.
2. That’s not _________.
3. __________ the rules.
4. Be ______________.
5. ________ is frivolous.
6. That’s not my _____.
7. ________ ambiguity.
8. Don’t be _________.
9. __________is wrong.
10. I’m not __________.
Mental Block 1
1. The right answer.
Mental Block 2
1. The right answer
2. That’s not logical
Mental Block 3
2. That’s not logical
3. Follow the rules
Why rules should be challenged?
1. We make rules based on reasons that make a lot of sense.
2. We follow these rules.
3. Time passes and things change.
4. The original reasons for the generation of these rules may no longer exist, but because
the rules are still in place, we continue to follow them.
Mental Block 4
2. That’s not logical.
3. Follow the rules
4. Be practical
Mental Block 5
1. The right answer
2. That’s not logical.
3. Follow the rules
4. Be practical
5. Play is frivolous.
Mental Block 6
6. That’s not my area
Mental Block 7
6. That’s not my area.
7. Avoid ambiguity
Mental Block 8
7. Avoid ambiguity
8. Don’t be foolish
Mental Block 9
7. Avoid ambiguity
9. To err is wrong.
Mental Block 10
6. That’s not my area
7. Avoid ambiguity
9. To err is wrong
10. I’m not creative
11.9 Creative Problem Solving Process
Step 1: State what appears to be a problem.

The real problem may not surface until facts have been gathered and analyzed. Therefore, start
with what you assume to be the problem that can later be confirmed or corrected.
Step 2: Gather facts, feelings and opinions.
Step 3: Restate the problem
Step 4: Identify alternative solutions.
Step 5: Evaluate alternatives
Step 6: Implement the decision
Step 7: Evaluate the results
11.10 Tools and Techniques

1. Brainstorming:
The process of generating creative ideas and solutions through intensive and
freewheeling group discussion. Brainstorming is a technique to generate a large number of
ideas in a short period of time.
Rules for more Brainstorming

Brainstorming is effective when there are more ideas. No idea is a bad idea. You can
build on one another’s idea. Display all the ideas.
Brainstorming Guidelines
Find out ways to motivate the members who participate in brainstorming. Clarify the
understanding. Once all the ideas have been generated, review the ideas that have been offered.
Combine items that are similar and eliminate duplicates.
2. Multivoting
Multivoting is a way to vote to select the most important or popular items (alternatives)
from a list. It is used to help a group of people to make a decision with which they are
comfortable.
Steps for Multivoting

1. Generate a list of items and number each item.
2. If two or more items seem similar, they may be combined.
3. If necessary, renumber the items.
4. Write down the numbers of the items you feel are the major cause of the
problem.
5. Share your votes by a show of hands.
6. Eliminate those items with the fewest votes.
7. Repeat steps 3 (renumber) through 6 on the list of remaining items. Continue
this process until only a few items remain. If a clear favorite does not emerge,
the group may discuss the items listed and make a choice.
3. Mind Mapping
Mind mapping is the visual picture of a group of ideas or concepts or issues.
The purpose of mind mapping is to unblock our thinking, see an entire idea or several
ideas on a single sheet of paper, see how ideas relate to one another, look at things in a
different way and look at an idea in depth.
Mind mapping exercise

 Over-sized blank sheet of paper.
 Select word, phrase or problem statement to serve as a focus for discussion.
 Print it in the middle of the paper. Enclose it in a box or oval.
 Let a word pop out of your mind. Print it anywhere on the paper.
 Underline it and connect the line with the problem statement (or key phrase
or word) you are working.
 Record the next idea and connect it to original focus point or the prior thought.
 Continue printing and connecting words
Example
Hints
 Keep your printing large and easy to read.

 Feel free to use symbols and or pictures.
 Have some fun using different colors.
Completed Map
 Draw over clusters of similar thoughts that are associated with the main focus point.
Have fun using a different color highlighter with each cluster of words.
 How do the variety of ideas relate to one another?
 Do you notice any common causes of the problem? What are the most important
causes?
 You are now ready to brainstorm solutions!
11.8 Activities
1. Write an algorithm to find the area of a circle.
2. Write an algorithm to find the greatest number among three numbers
3. Write an algorithm to find the factorial of a number.
4. Represent the following data in suitable formats
a. Seasons of a year
b. Colors of a rainbow
c. A unit matrix
d. Details of a book
5. Identify the modules in Library Management System
6. Group your class students based on some similarity
7. Identify the rules to join B.Tech. course in Karunya University.
8. What is the four-digit number in which the first digit is one-third the second, the third
is the sum of the first and second, and the last is three times the second?
9. The following verse spells out a word, letter by letter. "My first" refers to the word's
first letter, and so on. What's the word that this verse describes?
My first is in fish but not in snail

My second in rabbit but not in tail
My third in up but not down
My fourth in ice cream not in coffee
My fifth in tree you plainly see
My whole a food for you and me
10. How is it possible to cut a traditional circular cake into 8 equal size pieces, with only
3 cuts?
Session 12
Problem Definition
12.1 Problem Solving using Computational Thinking
Problem solving using computational thinking has two sequential steps.
1. Problem Definition
2. Generating appropriate response patterns.
Analyzing information related to a given situation is called problem definition. Then,
appropriate response patterns may be generated using decomposition, logical reasoning, pattern
recognition, abstraction or algorithm. Four steps are involved, in general, in problem definition.
It starts with collecting and analyzing information and data. During collection of data, you list
every relevant thing you can think of. Then, fill in the missing values in the data. Talk with
people familiar with the problem. Get clarifications when you don’t understand anything in the
problem. If at all possible, view the problem first hand and confirm all findings.
In software perspective, problem solving follows a similar approach. There are three major
steps of software development. They are (1) Analysis (2) Design and (3) Implementation. In
the analysis phase, the problem is defined. Then in the design phase, an algorithm is designed
for solving the problem. Then, the algorithm is implemented in a programming language. This
phase may be continued with testing the software with sample datasets. The stakeholders of a
software are:
Customer/client: The customer or the client is the one who requests for a software to be
developed.
Software Developer: This usually consists of a team to build the software.
User: User or end user is the one who finally uses the software.
Software development is related to computational thinking. The software analysis in software
development is equivalent to problem definition in computational thinking. Design and
implementation in software development is equivalent to problem solution in computational
thinking using logical reasoning, decomposition, pattern recognition, abstraction and
algorithm. Problem definition specifies what tasks are to be performed by the associated
software. It also serves as the software developer’s goal. Problem definition as software
analysis starts with the task of communicating with customers and users to determine their
requirements. This phase is known as requirements gathering.
A requirement is the features or functions of a system to fulfill the purpose for what it is
developed. There are two steps in requirements gathering.
1. Analyzing Requirements: In this phase, the team determines whether the stated
requirements are unclear, incomplete, ambiguous or contradictory and then resolves these
issues.
2. Recording Requirements: Once the requirements are finalized, the requirements
might be documented in various forms, usually as standard document formats.
In the requirement phase, the developer and customer should agree and prepare a legal
agreement. In the requirement document, the client and the developer team record what is to
be built. Successful design and implementation depends on adequate analysis.
There are two major types of requirements to define a problem:
• Functional requirements: specifies the particular task(s) the software must
perform.
• Non-functional requirements: defines other characteristics and constraints

related to the software.
• In general, functional requirements include
– Input/output
– Processing
– Error handling
• In general, non-functional requirements include:
– Reliability
– Safety
– Security
– Performance
– Delivery
– Help facilities
Example: Functional Requirements of a counter application
The purpose is to develop a counter application which is used to count items up or
down.
The sample design of the counter is shown in Fig. 12.1.
Reset
0 1 2 3 4

5 6 7 8 9 
Fig. 12.1 Design of a Counter App

The functional requirements are defined in table 12.1.
Button Functionality Description
Name
B1 Display  The display should display the current
value of the timer/counter.
B2 Reset  Pressing the Reset button should display
zero in the display.
B3 Up  Pressing the up button should increment
the value in the display by 1.
 If the value on display is the maximum
limit, pressing this button has no effect.
B4 Down  Pressing the down button should
decrement the value in the display by 1.
 If the value on display is zero, pressing this
button has no effect.
B5 Close  Pressing the close button should close the
counter window
B6 … B15 Number  Pressing a number button should display
buttons its value on the display; If the display
already has some digits, pressing the
number button should append its value to
the digit on the display.
Table 12.1 Functional Requirements of a Counter App
Example 2: Functional Requirements of a Media Player Application
Media player is an application that would play a video file with required buttons for the user to
play, pause, stop the video and for raising and lowering the volume. The functional
requirements of a media player are defined in table 12.2.
Button Functionality Description
Name
B1 Play • Clicking the play button should start playing the video.
• If the video is paused, it should resume at the point of
pausing.
• It should play in a new window.
• The pause button should display pause image when the
video is playing.
B2 Pause • Clicking the pause button causes the video to pause at the
current play location.
• This button should be disabled when the video is not being
played.
• Clicking the pause button causes the pause/play button to
function as a play button, displaying a play button image
B3 Raise volume  Clicking the raise volume button while the volume is less
than
maximum level causes the volume to increase by 1 point
 Clicking the raise volume button while volume is at the
maximum
level does nothing.
B4 Lower  Clicking the lower volume button while the volume is
volume greater than
silent level causes the volume to decrease by 1 point
 Clicking the lower volume button while volume is at silent
level does
nothing.
12.2 Validating Requirements

Once the requirements are defined, it is necessary to validate them. The requirements
should be validated to check whether they are correct, consistent and complete.
12.2.1 Correctness
Correctness means that customers, users, and developers understand in the specified
requirements in the same way. A good functional requirement can be translated into a
logical proposition. Therefore, functional requirements can be validated using logical
propositions.
Example
– “If the video has already finished playing, then this button has no effect”.
– F= the video has finished playing.
– N = no change kind of action.
– The expression F implies N.
12.2.2 Consistency
Consistency means no two functions contradict each another.
Example:
– Consider adding a new function to restart with the description “When the video
is playing and reaches the end, it immediately restarts playing with the
play/pause button functioning as a play button.”
– This would contradict play specification that play/pause button should function
as a pause button when videos are playing.
12.2.3 Completeness
Completeness takes careful consideration to be certain that every possible scenario has been
considered and explained in one or more requirement.
– The two ways to incorporate completeness includes:
• Manual checking
• Adding more functions
Another way to formally ensure completeness is to check all possible combination of situations
by constructing State-activity table.
Computational thinking is about looking at a problem in a way that a computer can help us to
solve it. When we do computational thinking, we use the following processes to tackle a
problem.
– Logical reasoning (Predicting and analyzing)
– Algorithms (making steps and rules)
– Decomposition (Breaking down into parts)
– Abstraction (Removing unnecessary details)
– Patterns and Generalization (Spotting and using similarities)
– Evaluation (Making judgements)
12.3 Activities
Think about an app you can develop for your mobile phone (may be to solve problems you
face in everyday life) and prepare its functional requirement specification.
Session 13
Problem Decomposition
13. 1 Introduction
A solution to a large problem is often complex. It can be made simple if the problem is
broken down into smaller parts. Then each part can be solved individually and then the
solutions are combined to produce the solution to the original problem. The process of breaking
down a complex problem or system into smaller sub problems with more manageable parts is
known as Problem Decomposition.
 Decomposition helps to solve complex problems and manage large projects.
 Large problems can be tackled with “divide and conquer” method
 The problem should be decomposed such that every sub problem is of the same level
of detail.
 Each sub problem can be solved independently.
 The solutions to the sub problems can be combined to solve the original problem.
Example 1: Consider the problem of making a pizza. The task of making a pizza is a
considerably larger task. But it can be subdivided into the following subtasks.
1. Make crust
2. Make spread and sauce
3. Spread cheese
4. Spread toppings
5. Bake
6. Slice
Fig. 13.1 Dividing a problem into smaller subproblems
Hence, if the task is divided into subtasks, the problem becomes smaller and more manageable.
Example 2: Organizing a school trip
The task of organizing a school trip can be subdivided into sub tasks such as, booking a coach,
getting consent letters, staffing, checking weather and checking resources. Each subtask can be
delegated to different persons and finally the school trip can be successfully organized easily.
Fig. 13.2 Decomposition of ‘Organizing a School Trip’

Example 3: Designing a course curriculum
Typically this would be decomposed as years and subjects, further decomposed into terms,
units of work and individual lessons or activities. Then each task will be given to a team
member and the team would work together to integrate the parts properly.
Example 4: Software Development
Software development is also a complex process. Therefore, to break down a large project
into its component parts is essential. A program like developing a powerpoint software will
have so many individual components.
Example 5: Searching a List using binary search
Binary search is an apt example of how the divide and conquer technique is applied in
programming. Binary search is a technique used to search a given element in a list of elements.
The usual method of searching an element is the linear search. In linear search, the given
element is first compared with the first element in the list. If there is a mismatch, it is compared
with the second element. If there is again a mismatch, it is compared with the third element and
so on. The comparison continues until a match is found or the entire list has been compared.
Therefore, if the search element, for example, is at the bottom of the list, the algorithm will
have much complexity.
An alternative search method is the binary search method. In this search method, the
divide and conquer technique is applied which reduces the complexity of the algorithm. Let us
consider a list as shown in Fig. 13.3.
0 1 2 3 4 5 6 7 8 9 10
20 23 15 7 4 8 10 20 40 25 12
Fig. 13.3 List of numbers

In the list shown in Fig. 13.3, 0,1,2,3, etc. on the upper row indicate the position of the
elements in the list. 20, 23, 15, 7 etc. indicate the elements stored in the list. In this list, we
have to search whether the element 25 is present or not. To apply binary search technique, the
list should be sorted initially. The sorted list is shown in Fig. 13.4.
0 1 2 3 4 5 6 7 8 9 10
4 7 8 10 12 15 20 20 23 25 40
Fig. 13.4 List sorted in ascending order

Now,
low = 0 (the position of the first element)
high = 10 (the position of the last element)
The position of the middle element is calculated using the formula,
mid = low + (high - low)/2
Therefore, mid = 0 + (10-0)/2 = 5
Hence, the middle element is the element in position 5, which is 15. Now the search element
25 is less than 15. Therefore, it will be found only in the upper half of the list. Hence, the lower
half of the list can be discarded as shown in Fig. 13.5.
0 1 2 3 4 5 6 7 8 9 10
4 7 8 10 12 15 20 20 23 25 40
Fig. 13.5 Middle Element 15; 25<15; So, left half is discarded
Now, the new low is calculated as low = mid + 1, since the lower half of the list is discarded.
If the upper half is discarded, low will be calculated as low = mid – 1.
So
low = mid+1 = 5+1 = 6
mid = 6 + (10-6)/2 = 6 + 2 = 8
6 7 8 9 10
20 20 23 25 40
Fig. 13. 6 New list
Now the search element is to be searched only in the upper half of the list shown in Fig. 13.6.
The low and middle values are calculated as shown above and the middle element is 23. Since,
25>23, once again the lower half of the list is discarded and the upper half is considered. The
low and middle values are calculated using the usual formula.
low = mid + 1 = 8 + 1 = 9
mid = low + (high-low)/2 = 9 + (10-9)/2 = 9.5 = 10.
Now the new list is shown in Fig. 13.7.
9 10
25 40
Fig. 13.7 New list

Once again, 25 is compared with the new middle element 40 and since it is less than 40, it
should be in the lower half of the array. Therefore, the upper half element 40 is discarded and
the remaining list is considered. The values are calculated as before,
low = mid – 1 = 10 – 1 = 9
mid = low + (high-low)/2 = 9 + (10-9)/2 = 9.5
Now only one element is remaining i.e. 25. Therefore, it is compared with the search element.
Since a match is found, the given element is found in the array. Otherwise, the element is not
found in the array.
A comparison with linear search
The steps for linear search is given below:
Is 25=20, No => Move to next element
Is 25=25, Yes => Stop
It shows that the steps are more, if linear search is used especially when the size of the list is
large and the search element is found in the extreme portion of the list.
Example 6: Computer Hardware
Computer hardware: a smartphone or a laptop computer is itself composed of many
components, often produced independently by specialist manufacturers and assembled to make
the finished product, each under the control of the operating system and applications.
Example 7: Solving a Crime
• Imagine that a crime has been committed. Solving a crime can be a very complex
problem as there are many things to consider.
• For example, a police officer would need to know the answer to a series of smaller
problems:
– what crime was committed
– when the crime was committed
– where the crime was committed
– what evidence there is
– if there were any witnesses
– if there have recently been any similar crimes
• The complex problem of the committed crime has now been broken down into simpler
problems that can be examined individually, in detail.
Example 8: Creating an App
To decompose this task, you would need to know the answer to a series of smaller problems:
• What kind of app you want to create
• What your app will look like
• Who the target audience for your app is
• What your graphics will look like
• What audio you will include
• What software you will use to build your app
• How the user will navigate your app
• How you will test your app
• Where you will sell your app
This list has broken down the complex problem of creating an app into much simpler
problems that can now be worked out. You may also be able to get other people to help you
with different individual parts of the app.
For example, you may have a friend who can create the graphics, while another will be your
tester.
Example 9: Decomposition in Writing
The frequently taught writing technique is called outlining which means to organize a work
beginning by decomposing the entire work into its main ideas. Each of the main task might be
divided into sub points, like dividing problems into sub problems. Outlining can continue by
decomposing any appropriate subpoint into its own sub points. The standard notation for
outlining uses,
Roman numerals to index the main tasks
Capital letters for the first level sub points
Numbers for the second level sub points
Small letters for the third level sub point … so on and so forth.
Fig. 13.8 Sample Outlining

Fig. 13.8 shows a sample.
13.2 Why is Decomposition Important?

If a problem is not decomposed, it is much harder to solve. Dealing with many
different stages all at once is much more difficult than breaking a problem down into a
number of smaller problems and solving each one, one at a time. Breaking the problem
down into smaller parts means that each smaller problem can be examined in more
detail. Similarly, trying to understand how a complex system works is easier using
decomposition. For example, Understanding how a bicycle works is more
straightforward than to separate it into smaller parts and then each part is examined to
see how it works in more detail. On the other hand, if you want to know the working of
each part in detail, it would be feasible only if you separated into individual parts and
see how every part is working.
13.3 Advantages and Disadvantages of Decomposition

Advantages
 Different people can work on different sub problems /Modules.
 Parallelization may be possible.
 Maintenance is easier.
Disadvantages
 The solutions to the sub problems might not combine to solve the original problem
.
 Poorly understood problems are hard to decompose.
13.4 Decomposition in Software Design

Decomposition in computer science,, is breaking a complex problem or system into parts
that are easier to conceive, understand, program, and maintain. This involves top-down design.
In top-down design, in which design begins by specifying complex pieces and then dividing
them into successively smaller pieces.
Fig. 13.9 Example for Modular Designing
13.5 Activities
Divide the following problems into modules.
1. Finding the sum of first ‘n’ natural numbers
2. Finding the factorial of a number
3. Finding the sum of the digits of a number
4. Counting the digits of a number and checking whether the given number is an
Armstrong number or not.
5. Counting the divisors of a number and checking whether the given number is prime or
composite.
6. Finding the sum of the divisors of a number and checking whether the given number
is perfect or not.
7. Evaluation of nCr.
Session 17
Algorithm Design
17.1 Need for Algorithm
Computer is an electronic machine. To solve any problem (sum of two numbers or any complex
problem), Computer can’t think on its own to solve it. But if we give instructions, computer
follows it and solves the problem.
17.2 Steps for writing good algorithms.

1. The inputs as well as outputs of an algorithm need to be defined specifically.
2. Write each step of an algorithm in a clear as well as in an unambiguous manner
3. The algorithm should definitely be the most effective and best way among many ways
to get the solution of a specific problem.
4. The algorithm should not have any computer code.
17.3 Algorithms
At its most basic, an algorithm is a method for solving a computational problem. An
algorithm is any well-defined computational procedure that takes some value, or set of values,
as input and produces some value, or set of values, as output.
An algorithm is thus a sequence of computational steps that transform the input into the
output. We can also view an algorithm as a tool for solving a well-specified computational
problem.
The statement of the problem specifies in general terms the desired input/output
relationship. The algorithm describes a specific computational procedure for achieving that
input/output relationship.
17.4 Representation of Algorithm

 Algorithm
 In plain natural language
 Flowchart
 Pictorial representation
 Pseudocode
 Using some specific keywords
 Program
 In a programming language
17.5 Algorithm to make tea

Step 1:Start
Step 2:Read the amount of water as input
Step 3:Read the amount of milk as input
Step 4:Read quantity of tea dust as input
1. Read the quantity of sugar as input
2. Turn on stove
3. Keep vessel on stove
4. Pour water in the vessel
5. Add milk to water
6. Make it boil
7. If it is boiled, add tea dust, go to step 13
8. If not boiled, make boil
9. Allow to boil for 2 minutes
10. Turn off stove
11. Strain the tea
12. Sugar_cube=1
13. Add one sugar cube
14. If sugar_cube=quantity of sugar, Pour in tea cups, go to 21
15. Else sugar_cube += 1
16. Go to step 17
17. Serve
18. Stop
17.6 Examples for Algorithm

1. Algorithm to find area of circle:
Step 1: Start
Step 2: Get the value of r
Step 3: Calculate Area=Pi*r*r
Step 4: Display Area
Step 5: End
In the above Algorithm
Step 2 is used for input, Step 3 is used for processing and Step 4 is used for output.
Here r, Pi and Area are called as variables
2. Algorithm to find sum of two numbers

Step 1: Start
Step 2: Get two numbers a,b
Step 3 :Calculate sum=a+b
Step 4:Display sum
Step5 :Stop
3. Algorithm to find the roots of the given quadratic equation
Step 1: Start
Step 2: Declare the variables
Step 3: Input the values
Step 4: Calculate d=b*b-4*a*c
Step 5: if d=0 then
Print the roots are real and equal
Calculate root1=root2= -b/2*a
Print the values of root1 and root2
Step 6: if d>0 then
Print the roots are real and unequal
Calculate root1=(-b+sqrt(d))/2*a
root2=(-b-sqrt(d))/2*a
Step 7: if d<0
Print the roots are real and imaginary
Calculate r1=-b/2*a
r2=sqrt(abs(d))/(2*a);
root1=r1+ir2
root2=r1-ir2
Step 8: stop
4. Algorithm to find the sum of digits
Step 1: Start
Step 2: Declare the Variables
Step 3: Input the value of n
Step 4: Initialize sum=0
Step 5: Calculate x=n%10
sum =sum+x
n=n/10
Step 6: Repeat Step 5 until n>0
Step 7: Print sum
Step 8: Stop
5. Algorithm to display the fibonaccci series
Step1: Start
Step2: Declare the variables
Step3: Input the value of n
Step4: Initialize f1=-1, f2=1 and f3=0
Step5: Inside the loop calculate
f3=f1+f2
f1=f2
f2=f3
Step6: Display f3
Step7: Stop
17.7 Summary
1. A sequence of steps to solve a given problem
2. Algorithm should be self-contained
3. The steps should not be ambiguous
4. It should produce an outcome when executed
5. May be designed to do one or multiple tasks
6. Written in plain natural language [with some specific terminology
Session 18
Flow Chart
18.1 Flow Chart
Flow chart is the pictorial representation of an algorithm. The flowchart uses some
standard notations or symbols to represent the programming components.
18.2 Symbols
18.1 Commonly used flow chart symbols
18.3 Flow chart Structures

• Sequence
– Series of actions performed in a sequence
– Example Area of a Circle
• Selection
– Selecting one of two possible actions based on a particular condition
– Example – To find the greatest of two numbers
• Iteration
– Repeating actions
– A loop tests a condition and if it is satisfied, performs an action. Then it tests

the Iteration structure condition again. If the condition is still satisfied, the
action is repeated. This is repeated until the condition is not satisfied.
18.2 Flow chart symbols for program control structures
18.4 Connectors
• Connectors are used to connect two parts of a flow chart
• Connectors should be named uniquely
• There are two types of connectors
– In page connector
– Off page connector
18.5 Advantages of using a FlowChart
• Communication: Flowcharts are better way of communicating the logic of a system.
• Effective Analysis: With the help of flowchart, problems can be analyzed in more
effective ways.
• Proper Documentation: Program flowcharts serve as a good documentation, which is
needed for various purposes.
• Efficient Coding: Flowcharts act as a guide or blueprint during system analysis and
program development phase.
• Efficient Program Maintenance: Maintenance of operating program becomes easy
with the help of flowchart.
• Proper Debugging: Flowchart helps in debugging process.
18.6 Examples
1. Flowchart for Area of a Circle
2. To add 6 subject marks of a person and calculate the total and average
3. To check whether a person is eligible to vote
4. Find the greatest among two numbers

6. Calculation of grade – using multiple selection statements
7. Calculate the factorial of a given number

8. Calculate the sum of n numbers
9. Calculate sum of digits

10. Print the Fibonacci series : 0, 1, 1, 2, 3, 5, … n
11. To reverse a given number

Data and Information Session

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data and Information Session

Caricato da

Copyright:

Formati disponibili

Session 1

DATA AND INFORMATION

1.2 Data Around us

Fig. 1.1 An object characteristics

1.3 Types of Data

Structured or Machine Readable

Unstructured or Human Readable

1.5 Other forms of DATA

Fig. 1.7 Conversion of Data to Information

Fig 1.8 Data processed to Information

1.7 Difference between Data and Information

Used as input Output of data

Unprocessed facts figures Processed data

Does not depend on information Depends on data

Not specific Specific

Single unit Group of data which carries news and

Does not carry meaning Carry logical meaning

Raw material Product

Examples of Data and Information

2.2 Decimal Number System

Fig 2.1 Fingers

2.3 Binary number system

Fig 2.3 Computer

2.4 Positional Number System

The example of position value system is given below.

6354  6 *1000  3 *100  5 *10  4

Conversion of Hexadecimal to decimal

Fig. 2.5 Hexadecimal to Binary Conversion

Fig. 2.7 Octal to Hexadecimal Conversion

Fig. 2.8 Conversion of Binary to Decimal

Solve the following Examples (Decimal to Binary)

Convert the following binary value to Decimal value

Convert binary to octal

2.6 Binary Operations

• 1 + 1 = 0, and carry 1 to the next more significant bit

Following are the rules followed for binary subtraction

• 0 - 1 = 1, and borrow 1 from the next more significant bit

In binary multiplication, we only need to remember the following,

Compression is performed by a program that uses a formula or algorithm to determine how to

4.2 Data compression techniques

4.3 Lossless and lossy compression

4.4 Compression vs. data deduplication

4.5 Pros and cons of compression

4.6 Tools/technologies that use compression

Run-length encoding can be expressed in multiple ways to accommodate data properties as

Fig. 4.2 Run-length Encoding

5.2 Big data and Data collection

5.3 Types of data

Quick Facts Examples

 May increase the validity

Table 5.1 Comparison of Types of Data

 Data that are not pre-existing and are

 Provides information if existing data on  Offers an opportunity to review any

 Can be more expensive and time-

Table 5.2 Primary Data Sources vs Secondary Data Sources

5.5 Data Collection Techniques

 Questionnaires and Surveys

5.6 Overview of Different Data Collection Techniques

Technique Key Facts Example

 Responses can be analyzed

 Allows for the study of the

 A facilitated group interview

6354  6 1000  3 100  5 *10  4