Character and String Representation

Caricato da

Rooth

Il 0% ha trovato utile questo documento (0 voti)

6 visualizzazioni14 pagine

Character and String Representation CS520 Department of Computer Science University of New Hampshire

Titolo originale

Character-and-String-Representation

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Character and String Representation CS520 Department of Computer Science University of New Hampshire

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

6 visualizzazioni14 pagine

Character and String Representation

Caricato da

Rooth

Character and String Representation CS520 Department of Computer Science University of New Hampshire

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 14

Cerca all'interno del documento

Character and String

Representation
CS520
Department of Computer Science
University of New Hampshire
CDC 6600

• 6-bit character encodings

• i.e. only 64 characters
• Designers were not too concerned
about text processing!

The table is from Assembly Language Programming for

the Control Data 6000 series and the Cyber 70 series by
Grishman.
C Strings
• Usually implemented as a series of ASCII
characters terminated by a null byte (0x00).
• ″abc″ in memory is:

n 0x61

n+1 0x62

n+2 0x63

n+3 0x00
Unicode
• The space of values is divided into 17 planes.
• Plane 0 is the Basic Multilingual Plane (BMP).
– Supports nearly all modern languages.
– Encodings are 0x0000-0xFFFF.
• Planes 1-16 are supplementary planes.
– Supports historic scripts and special symbols.
– Encodings are 0x10000-0x10FFFF.
• Planes are divided into blocks.
Unicode and ASCII
• ASCII is the bottom block in the BMP, known
as the Basic Latin block.
• So ASCII values are embedded “as is” into
Unicode.
• i.e. 'a' is 0x61 in ASCII and 0x0061 in Unicode.
Special Encodings
• The Byte-Order Mark (BOM) is used to signal
endian-ness.
• Has no other meaning (i.e. usually ignored).
• Encoded as 0xFEFF.
• 0xFFFE is a noncharacter.
– Cannot appear in any exchange of Unicode.
• So file can be started with a BOM; the reader can
then know the endian-ness of the file.
• In absence of a BOM, Big Endian is assumed.
Other Noncharacters
• There are a total of 66 noncharacters:
– 0xFFFE and 0xFFFF of the BMP
– 0x1FFFE and 0x1FFFF of plane 1
– 0x2FFFE and 0x2FFFF of plane 2
– etc., up to
– 0x10FFFE and 0x10FFFF of plane 16
– Also 0xFDD0-0xFDEF of the BMP.
UTF: UCS* Transformation Format
• UTF-8
– Encodes Unicode characters in 1-4 bytes.
– ASCII gets encoded as 1 byte.
– Dominant character encoding for the WWW.
• UTF-16
– Encodes BMP characters in 2 bytes
– Encodes non-BMP characters in 4 bytes.
• UTF-32
– Fixed-sized representation of Unicode.

*Universal Character Set.

UTF-8
• Take the Unicode character and throw away the
leading zero bits.*
• Count the remaining number of bits.
• 7 bits: 0xxxxxxx
• 11 bits: 110xxxxx 10xxxxxx
• 16 bits: 1110xxxx 10xxxxxx 10xxxxxx
• 21 bits: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
*Overlong encodings are forbidden. Therefore there is a unique UTF-8 encoding for each
Unicode character.
Errors in UTF-8
• Overlong encodings.
• An unexpected continuation byte.
• A start byte not followed by enough continuation
bytes.
• A 4-byte sequence starting with 0xF4 that
decodes to a value greater than 0x10FFFF.
• A sequence that decodes to a noncharacter.
• A sequence that decodes to a value in range
0xD800-0xDFFF.
UTF-16
• 1 UTF-16 code unit (2 8-bit bytes) for each
BMP character.
• 2 UTF-16 code units for each non-BMP
character (4 bytes in total).
– 0x10000 is subtracted from the value, leaving a
20-bit number in the range 0x00000-0xFFFFF.
– The top 10 bits are added to 0xD800 to give the
first code unit, called the lead surrogate.
– The low 10 bits are added to 0xDC00 to give the
second code unit, called the trail surrogate.
Self-synchronizing
• 10 bits express values in the range 0x000-0x3FF.
• Lead surrogates will be in range 0xD800+0x000 to
0xD800+0x3FF (0xD800-0xDBFF).
• Trail surrogates will be in range 0xDC00+0x000 to
0xDC00+0x3FF (0xDC00-0xDFFF).
• Remember: values 0xD800-0xDFFF are not valid
Unicode characters.
• UTF-16 BMP characters can be distinguished from
UTF-16 non-BMP characters.
• So you can tell where the Unicode character
boundaries are in a UTF-16 stream.
UTF-32
• Simply take the 21-bit Unicode value and add
leading zero bits to extend it to 32 bits.
• Byte-order is an issue, like with UTF-16.

Potrebbero piacerti anche

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Da Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Valutazione: 4 su 5 stelle
4/5 (895)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Da Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Valutazione: 4 su 5 stelle
4/5 (5794)
The Yellow House: A Memoir (2019 National Book Award Winner)
Da Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Valutazione: 4 su 5 stelle
4/5 (98)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Da Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Valutazione: 4.5 su 5 stelle
4.5/5 (474)
Shoe Dog: A Memoir by the Creator of Nike
Da Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Valutazione: 4.5 su 5 stelle
4.5/5 (537)
The Little Book of Hygge: Danish Secrets to Happy Living
Da Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Valutazione: 3.5 su 5 stelle
3.5/5 (399)
Yes Please
Da Everand
Yes Please
Amy Poehler
Valutazione: 4 su 5 stelle
4/5 (1891)
On Fire: The (Burning) Case for a Green New Deal
Da Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Valutazione: 4 su 5 stelle
4/5 (73)
Never Split the Difference: Negotiating As If Your Life Depended On It
Da Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Valutazione: 4.5 su 5 stelle
4.5/5 (838)
Grit: The Power of Passion and Perseverance
Da Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Valutazione: 4 su 5 stelle
4/5 (588)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Da Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Valutazione: 3.5 su 5 stelle
3.5/5 (231)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Da Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Valutazione: 4.5 su 5 stelle
4.5/5 (266)
Principles: Life and Work
Da Everand
Principles: Life and Work
Ray Dalio
Valutazione: 4 su 5 stelle
4/5 (599)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Da Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Valutazione: 4.5 su 5 stelle
4.5/5 (344)
The Emperor of All Maladies: A Biography of Cancer
Da Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Valutazione: 4.5 su 5 stelle
4.5/5 (271)
Fear: Trump in the White House
Da Everand
Fear: Trump in the White House
Bob Woodward
Valutazione: 3.5 su 5 stelle
3.5/5 (738)
Team of Rivals: The Political Genius of Abraham Lincoln
Da Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Valutazione: 4.5 su 5 stelle
4.5/5 (234)
Rise of ISIS: A Threat We Can't Ignore
Da Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Valutazione: 3.5 su 5 stelle
3.5/5 (137)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Da Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Valutazione: 4 su 5 stelle
4/5 (1090)
The Unwinding: An Inner History of the New America
Da Everand
The Unwinding: An Inner History of the New America
George Packer
Valutazione: 4 su 5 stelle
4/5 (45)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Da Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Valutazione: 3.5 su 5 stelle
3.5/5 (2259)
Steve Jobs
Da Everand
Steve Jobs
Walter Isaacson
Valutazione: 4.5 su 5 stelle
4.5/5 (806)
Angela's Ashes: A Memoir
Da Everand
Angela's Ashes: A Memoir
Frank McCourt
Valutazione: 4.5 su 5 stelle
4.5/5 (440)
Bad Feminist: Essays
Da Everand
Bad Feminist: Essays
Roxane Gay
Valutazione: 4 su 5 stelle
4/5 (1015)
John Adams
Da Everand
John Adams
David McCullough
Valutazione: 4.5 su 5 stelle
4.5/5 (2409)
The Glass Castle: A Memoir
Da Everand
The Glass Castle: A Memoir
Jeannette Walls
Valutazione: 4.5 su 5 stelle
4.5/5 (1712)
The Outsider: A Novel
Da Everand
The Outsider: A Novel
Stephen King
Valutazione: 4 su 5 stelle
4/5 (1839)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Da Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Valutazione: 4.5 su 5 stelle
4.5/5 (120)
The Light Between Oceans: A Novel
Da Everand
The Light Between Oceans: A Novel
M.L. Stedman
Valutazione: 4.5 su 5 stelle
4.5/5 (789)
The Woman in Cabin 10
Da Everand
The Woman in Cabin 10
Ruth Ware
Valutazione: 3.5 su 5 stelle
3.5/5 (2322)
A Man Called Ove: A Novel
Da Everand
A Man Called Ove: A Novel
Fredrik Backman
Valutazione: 4.5 su 5 stelle
4.5/5 (4609)
Brooklyn: A Novel
Da Everand
Brooklyn: A Novel
Colm Tóibín
Valutazione: 3.5 su 5 stelle
3.5/5 (1937)
Wolf Hall: A Novel
Da Everand
Wolf Hall: A Novel
Hilary Mantel
Valutazione: 4 su 5 stelle
4/5 (3811)
The Perks of Being a Wallflower
Da Everand
The Perks of Being a Wallflower
Stephen Chbosky
Valutazione: 4.5 su 5 stelle
4.5/5 (2101)
The Art of Racing in the Rain: A Novel
Da Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Valutazione: 4 su 5 stelle
4/5 (4200)
Manhattan Beach: A Novel
Da Everand
Manhattan Beach: A Novel
Jennifer Egan
Valutazione: 3.5 su 5 stelle
3.5/5 (792)
Little Women
Da Everand
Little Women
Louisa May Alcott
Valutazione: 4 su 5 stelle
4/5 (104)
Sing, Unburied, Sing: A Novel
Da Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Valutazione: 4 su 5 stelle
4/5 (1103)
A Tree Grows in Brooklyn
Da Everand
A Tree Grows in Brooklyn
Betty Smith
Valutazione: 4.5 su 5 stelle
4.5/5 (1929)
The Constant Gardener: A Novel
Da Everand
The Constant Gardener: A Novel
John le Carré
Valutazione: 3.5 su 5 stelle
3.5/5 (104)
Her Body and Other Parties: Stories
Da Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Valutazione: 4 su 5 stelle
4/5 (821)
Grade 12 - EL Book PDF
Documento169 pagine
Grade 12 - EL Book PDF
IceLand
Nessuna valutazione finora
Lab 6 Ms
Documento9 pagine
Lab 6 Ms
Mba Bnf
Nessuna valutazione finora
Programming With Java - 2022.23 - CSE
Documento66 pagine
Programming With Java - 2022.23 - CSE
rhf9rvk5sf
Nessuna valutazione finora
Laboratory Reports: Data Structures and Algorithms
Documento109 pagine
Laboratory Reports: Data Structures and Algorithms
User Account
Nessuna valutazione finora
PDFlib 10.0.1 API Reference
Documento306 pagine
PDFlib 10.0.1 API Reference
kuhsdf
Nessuna valutazione finora
UNIT-5: Managing Input / Output in JAVA
Documento58 pagine
UNIT-5: Managing Input / Output in JAVA
shwetha k
100% (1)
The Syntax of The C Programming Language Is A Set of Rules That Specifies Whether The
Documento43 pagine
The Syntax of The C Programming Language Is A Set of Rules That Specifies Whether The
benavert
Nessuna valutazione finora
Ecs C - Prog Material
Documento126 pagine
Ecs C - Prog Material
Roshan Thomas
Nessuna valutazione finora
Tricky C Questions For GATE
Documento114 pagine
Tricky C Questions For GATE
Kirti Kumar
Nessuna valutazione finora
Final Krishna - Oracle Book
Documento351 pagine
Final Krishna - Oracle Book
p.navyasree
100% (1)
Chapter 17 Binary IO
Documento59 pagine
Chapter 17 Binary IO
Bhavik Dave
Nessuna valutazione finora
UNIT 1 Notes
Documento32 pagine
UNIT 1 Notes
Tanishka
Nessuna valutazione finora
C - Programme by Real
Documento39 pagine
C - Programme by Real
Sahidulla Ayon
Nessuna valutazione finora
Topic2-Basic Data Types and Operations
Documento60 pagine
Topic2-Basic Data Types and Operations
khai nisa
Nessuna valutazione finora
mcq-4,5,6 Java Dry-Run Programs Answers
Documento2 pagine
mcq-4,5,6 Java Dry-Run Programs Answers
anshi
Nessuna valutazione finora
Computer Science Revision Notes Paper 1
Documento30 pagine
Computer Science Revision Notes Paper 1
Dinesh Kumar
Nessuna valutazione finora
Dom PDF
Documento212 pagine
Dom PDF
sathish14k
Nessuna valutazione finora
Csci 260 Study Guide-9
Documento10 pagine
Csci 260 Study Guide-9
zubayerthewizard
Nessuna valutazione finora
Scala Quick Reference PDF
Documento6 pagine
Scala Quick Reference PDF
Yassine Msaddak
Nessuna valutazione finora
SNES9x Savestate File Format
Documento19 pagine
SNES9x Savestate File Format
creaothceann
Nessuna valutazione finora
Number Systems and Coding
Documento50 pagine
Number Systems and Coding
nyenooke
Nessuna valutazione finora
CSC238 - 2) Java Programming Basics (Part 2)
Documento34 pagine
CSC238 - 2) Java Programming Basics (Part 2)
Smile Smiley
Nessuna valutazione finora
Controllers Screens Editing Guide 1 1
Documento52 pagine
Controllers Screens Editing Guide 1 1
LUAT
Nessuna valutazione finora
ZPL Zbi2 PM en 28 394
Documento367 pagine
ZPL Zbi2 PM en 28 394
Vicente
Nessuna valutazione finora
Cs8392 Object Oriented Programming: Unit Iii Exception Handling and I/O
Documento43 pagine
Cs8392 Object Oriented Programming: Unit Iii Exception Handling and I/O
mahalakshmi
Nessuna valutazione finora
Chapter 1 Data Representation 3
Documento12 pagine
Chapter 1 Data Representation 3
Allan Ishimwe
Nessuna valutazione finora
OCJP Notes Durga Classes PDF
Documento58 pagine
OCJP Notes Durga Classes PDF
Saurabh Agrawal
Nessuna valutazione finora
C Programming Exercises
Documento6 pagine
C Programming Exercises
parth149
Nessuna valutazione finora
Text Processing Ke Multimedia Processing: Zainal A. Hasibuan
Documento32 pagine
Text Processing Ke Multimedia Processing: Zainal A. Hasibuan
ir12u
Nessuna valutazione finora