Sei sulla pagina 1di 54

2.

Data Formats

Chapt. 3 ITEC 1011

Introduction to Information Technologies

Introduction
Examples
Real World Data Input device Computer Data

Dear Mom:

Keyboard Digital camera

10110010

10110010
pp. 59.-61

ITEC 1011

Introduction to Information Technologies

Format must be appropriate


The internal representation must be appropriate for the type of processing to take place (e.g., text, images, sound)

ITEC 1011

Introduction to Information Technologies

Rules/Conventions
Proprietary formats
Unique to a product or company E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes

Standards
Evolve two ways:
Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time) Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)
pp. 61-62 ITEC 1011

Introduction to Information Technologies

Standards Organizations
ISO International Standards Organization CSA Canadian Standards Association ANSI American National Standards Institute IEEE Institute for Electrical and Electronics Engineers Etc.

ITEC 1011

Introduction to Information Technologies

Examples of Standards
Type of Data Alphanumeric
Image Motion picture

Standards ASCII, EBCDIC, Unicode


JPEG, GIF, PCX, TIFF MPEG-2, Quick Time

Sound Outline graphics/fonts

Sound Blaster, WAV, AU PostScript, TrueType, PDF

ITEC 1011

Introduction to Information Technologies

Why Standards?
Standard are arbitrary They exist because they are
Convenient Efficient Flexible Appropriate Etc.

ITEC 1011

Introduction to Information Technologies

Alphanumeric Data
Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters 123 (one, two, three) Four standards for representing letters (alpha) and numbers
BCD Binary-coded decimal ASCII American standard code for information interchange EBCDIC Extended binary-coded decimal interchange code Unicode
pp. 63-69 ITEC 1011

Introduction to Information Technologies

Standard Alphanumeric Formats


BCD ASCII EBCDIC Unicode
Next 2 slides

ITEC 1011

Introduction to Information Technologies

Binary-Coded Decimal (BCD)


Four bits per digit
Note: the following bit patterns are not used:
Digit 0 1 2 Bit pattern 0000 0001 0010

3
4 5 6 7

0011
0100 0101 0110 0111

1010 1011 1100 1101 1110 1111


ITEC 1011

8
9

1000
1001

Introduction to Information Technologies

Example
709310 = ? (in BCD)
7 0 9 3

0111

0000

1001

0011

ITEC 1011

Introduction to Information Technologies

Standard Alphanumeric Formats


BCD ASCII EBCDIC Unicode
Next 22 slides

ITEC 1011

Introduction to Information Technologies

The Problem
Representing text strings, such as Hello, world, in a computer

ITEC 1011

Introduction to Information Technologies

Codes and Characters


Each character is coded as a byte Most common coding system is ASCII (Pronounced ass-key) ASCII = American National Standard Code for Information Interchange Defined in ANSI document X3.4-1977

ITEC 1011

Introduction to Information Technologies

ASCII Features
7-bit code 8th bit is unused (or used for a parity bit) 27 = 128 codes Two general types of codes:
95 are Graphic codes (displayable on a console) 33 are Control codes (control features of the console or communications channel)

ITEC 1011

Introduction to Information Technologies

ASCII Chart
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI

001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

010 ! " # $ % & ' ( ) * + , . /

011 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

100 @ A B C D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Least 1100 1101 1110 1111

000 001 010 011 NULL DLE 0 SOH DC1 ! 1 STX DC2 " 2 ETX DC3 # 3 EDT DC4 Most $ significant 4 ENQ NAK % 5 ACK SYN & 6 BEL ETB ' 7 BS CAN ( 8 HT EM ) 9 LF SUB * : VT ESC + ; significant bit FF FS , < CR GS = SO RS . > SI US / ?

100 @ A B C bit D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

e.g., a = 1100001
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

95 Graphic codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

33 Control codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

Alphabetic codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

Numeric codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

Punctuation, etc.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

Hello, world Example


H e l l o ,
w o r l d
ITEC 1011

= = = = = = = = = = = =

Binary 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01100111 01110010 01101100 01100100

= = = = = = = = = = = =

Hexadecimal 48 65 6C 6C 6F 2C 20 77 67 72 6C 64

= = = = = = = = = = = =

Decimal 72 101 108 108 111 44 32 119 103 114 108 100

Introduction to Information Technologies

Common Control Codes


CR LF HT DEL NULL 0D 0A 09 7F 00 carriage return line feed horizontal tab delete null

Hexadecimal code

ITEC 1011

Introduction to Information Technologies

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI

001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

010 ! " # $ % & ' ( ) * + , . /

011 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

100 @ A B C D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

Terminology
Learn the names of the special symbols
[] {} () @ & ~ brackets braces parentheses commercial at sign ampersand tilde

ITEC 1011

Introduction to Information Technologies

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI

001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

010 ! " # $ % & ' ( ) * + , . /

011 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

100 @ A B C D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL

ITEC 1011

Introduction to Information Technologies

Escape Sequences
Extend the capability of the ASCII code set For controlling terminals and formatting output Defined by ANSI in documents X3.41-1974 and X3.64-1977 The escape code is ESC = 1B16 An escape sequence begins with two codes: ESC 1B16
ITEC 1011

[ 5B16

Introduction to Information Technologies

Examples
Erase display: Erase line: ESC [ 2 J ESC [ K

ITEC 1011

Introduction to Information Technologies

Standard Alphanumeric Formats


BCD ASCII EBCDIC Unicode

Next 1 slides

ITEC 1011

Introduction to Information Technologies

EBCDIC
Extended BCD Interchange Code (pronounced ebb-se-dick) 8-bit code Developed by IBM Rarely used today IBM mainframes only

ITEC 1011

Introduction to Information Technologies

Standard Alphanumeric Formats


BCD ASCII EBCDIC Unicode

Next 2 slides

ITEC 1011

Introduction to Information Technologies

Unicode
16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes

ITEC 1011

Introduction to Information Technologies

Unicode Version 2.1


1998 Improves on version 2.0 Includes the Euro sign (20AC16 = From the standard:

contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica. http://www.unicode.org
ITEC 1011

Introduction to Information Technologies

Keyboard Input
Key (scan) codes are converted to ASCII ASCII code sent to host computer Received by the host as a stream of data Stored in buffer Processed Etc.

pp. 69 ITEC 1011

Introduction to Information Technologies

Shift Key
inhibits bit 5 in the ASCII code
Key(s)
a

ASCII code 6 5 4 3 2 1 0 Character 1 1 0 0 0 0 1 1 0 0 0 0 0 1 a A

Shift

ITEC 1011

Introduction to Information Technologies

Control Key
inhibits bits 5 & 6 in the ASCII code
Key(s)
c

ASCII code 6 5 4 3 2 1 0 Character 1 1 0 0 0 1 1 0 0 0 0 0 1 1 c ETX


Control code

Ctrl

ITEC 1011

Introduction to Information Technologies

Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices

pp. 69-86 ITEC 1011

Introduction to Information Technologies

OCR
Hello, world
Optical scan 10110110

Page of text

Computer file

ITEC 1011

Introduction to Information Technologies

Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices

pp. 69-86 ITEC 1011

Introduction to Information Technologies

Bar Codes
An automatic identification (Auto ID) technology that streamlines identification and data collection See
http://www.digital.net/barcoder/barcode.html

ITEC 1011

Introduction to Information Technologies

Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices

pp. 69-86 ITEC 1011

Introduction to Information Technologies

Voice/audio Input
Input device: microphone Audio input is digitized and stored Processed in two ways
As is (no recognition) Recognized and converted to alphanumeric data (ASCII)
Digitize

10110010

ITEC 1011

Introduction to Information Technologies

Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices

pp. 69-86 ITEC 1011

Introduction to Information Technologies

Punched Cards
Invented by Herman Hollerith (founder of IBM) Each card holds 80 characters

ITEC 1011

Introduction to Information Technologies

Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices

pp. 69-86 ITEC 1011

Introduction to Information Technologies

Images
Typically images are pictures that are optically scanned and saved as a bit map or in some other format Many formats
gif, jpeg,

ITEC 1011

Introduction to Information Technologies

Typical Save As Dialog

ITEC 1011

Introduction to Information Technologies

Objects
Images made of geometrically definable shapes Offer efficiency, flexibility, small size, etc.

ITEC 1011

Introduction to Information Technologies

Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices

pp. 69-86 ITEC 1011

Introduction to Information Technologies

Pointing Devices
Originally used for specifying coordinates (x, y) for graphical input Today used as general purpose device for graphical user interfaces (GUIs)

ITEC 1011

Introduction to Information Technologies

Thank you

ITEC 1011

Introduction to Information Technologies

Potrebbero piacerti anche