Sei sulla pagina 1di 31

Data Codes

Er. Prateek Solanki

Data Forms Data conversion and representation Data Formats Alphanumeric Data Image Data Audio Data Data Input Data Compression Internal Computer Data Format

Human communication
Includes language, images and sounds

Computers
Process and store all forms of data in binary format

Conversion to computer-usable representation using data formats


Define the different ways human data may be represented, stored and processed by a computer

Proprietary formats
Unique to a product or company E.g., Microsoft Word, Word Perfect

Standards (evolve in two ways):


Proprietary formats become de facto standards (e.g., Adobe PostScript) Invented by an international standard organization (e.g., Motion Pictures Experts Group, MPEG)

Type of Data
Alphanumeric Image (bitmapped)

Standard(s)
Unicode, ASCII, EBCDIC
GIF

(graphical image format) TIF (tagged image file format) PNG (portable network graphics)

Image (object)
Outline graphics and fonts Sound

PostScript, JPEG, SWF (Macromedia Flash), SVG


PostScript, TrueType WAV, AVI, MP3, MIDI, WMA

Page description
Video

PDF (Adobe Portable Document Format), HTML, XML


Quicktime, MPEG-2, RealVideo, WMV
6

Characters (r, T), number digits (0..9), punctuation (!, ;), special purpose characters ($, &) Four codes/standards to represent letters and numbers:
BCD (Binary-Coded Decimal) Unicode ASCII (American Standard Code for Information Interchange) EBCDIC (Extended Binary Coded Decimal Interchange Code)

BCD ASCII EBCDIC Unicode

Next 2 slides

Four bits per digit


Note: the following 6 bit patterns are not used: 1010 1011 1100 1101 1110 1111

Digit 0 1

Bit pattern 0000 0001

2
3 4 5 6

0010
0011 0100 0101 0110

7
8 9

0111
1000 1001
9

709310 = ? (in BCD)


7 0 9 3

0111

0000

1001

0011

10

BCD ASCII EBCDIC Unicode

Next 13 slides

11

Developed by ANSI (American National Standards Institute) Defined in ANSI document X3.4-1977 7-bit code 8th bit is unused (or used for a parity bit or to indicate extended character set) 27 = 128 different codes Two general types of codes:
95 are Printing codes (displayable on a console) 33 are Control codes (control features of the console or communications channel)

Represents
Latin alphabet, Arabic numerals, standard punctuation characters Plus small set of accents and other European special characters (Latin-I ASCII)

12

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI

001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

010 ! " # $ % & ' ( ) * + , . /

011 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

100 @ A B C D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL 13

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Least 1100 1101 1110 1111

000 001 010 011 NULL DLE 0 SOH DC1 ! 1 STX DC2 " 2 ETX DC3 # 3 EDT DC4 Most $ significant 4 ENQ NAK % 5 ACK SYN & 6 BEL ETB ' 7 BS CAN ( 8 HT EM ) 9 LF SUB * : VT ESC + ; significant bit FF FS , < CR GS = SO RS . > SI US / ?

100 @ A B C bit D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL 14

e.g., a = 1100001
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 15

95 Printing codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 16

33 Control codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

17

Alphabetic codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 18

Numeric codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 19

Punctuation, etc.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL

20

MSD LSD
0 1 2 3 4 5

0
NUL SOH STX ETX EOT ENQ

1
DLE DC1 DC2 DC3 DC4 NAK

2
SP ! # $ %

3
0 1 2 3 4 5

4
@ A B C D E

5
P Q R S T U

7
p

a b c d e

W r s t u

6
7 8 9 A B C D E F

ACJ
BEL BS HT LF VT FF CR SO SI

SYN
ETB CAN EM SUB ESC FS GS RS US

&
( ) * + , . /

6
7 8 9 : ; < = > ?

F
G H I J K L M N O

V
W X Y Z [ \ ] ^ _

f
g h i j k l m n o

v
w x y z
{

7416
111 0100

|
} ~ DEL

21

H e l l o ,

w o r l d

= = = = = = = = = = = =

ASCII 1001000 1100101 1101100 1101100 1101111 0101100 0100000 1110111 1101111 1110010 1101100 1100100

Hex 919766CDEB1077DFCB664

1001000 1100101 1101100 1101100 1101111 0101100 9 1 9 7 6 6 C D E B 0100000 1110111 1101111 1110010 1101100 1100100 1 0 7 7 D F C B 6 6 4

22

CR LF HT DEL NULL

0D 0A 09 7F 00

carriage return line feed horizontal tab delete null

Hexadecimal code
23

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI

001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

010 ! " # $ % & ' ( ) * + , . /

011 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

100 @ A B C D E F G H I J K L M N O

101 P Q R S T U V W X Y Z [ \ ] ^ _

110 ` a b c d e f g h i j k l m n o

111 p q r s t u v w x y z { | } ~ DEL 24

BCD ASCII EBCDIC Unicode

Next 3 slides

25

8-bit code Developed by IBM IBM and compatible mainframes only Rarely used today (common in archival data)
Character codes differ from ASCII

ASCII

EBCDIC

Space

2016

4016

4116

C116

Conversion software to/from ASCII available

6216

8216

26

27

28

BCD ASCII EBCDIC Unicode

Next 2 slides

29

Most common 16-bit form represents 65,536 characters ASCII Latin-I subset of Unicode
Values 0 to 255 in Unicode table

Multilingual: defines codes for


Nearly every character-based alphabet Large set of ideographs for Chinese, Japanese and Korean Composite characters for vowels and syllabic clusters required by some languages

Allows software modifications for local-languages

30

31

Potrebbero piacerti anche