Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Unicode Encodings
To simplify matters, Unicode defines allmost all commonly used characters in the
first 65536 characters. This means that most Unicode strings can be encoded using
2 bytes (a 16-bit value) for every character. This encoding is called UCS2, which in
Delphi is represented using WideChar and WideString.
To simplify matters further, Unicode defines the first 128 characters to be identical
to the characters from ASCII. An encoding that makes use of this fact is UTF-8. UTF-
8 is a variable length encoding with several propeties that makes it ideal for storing
Unicode when the majority of characters are ASCII characters. Among the properties
of UTF-8 is:
* It stores ASCII characters as their ASCII value in one byte. In other words, an
ASCII string will not be changed by UTF-8.
* Non-ASCII character sequences are stored as more than one byte, and no ASCII
character will be part of that sequence. In other words, functions that operate on
ASCII strings can transparently work on UTF-8 strings.
When you expect to work with lots of international text, use WideChar and
WideString. You should note that WideStrings are not reference counted on
Windows. This makes them less effiecient to use than LongStrings.
When you exepct to work with text which is mostly ASCII, but which may contain the
occasional international text, use UTF8Strings. They use less memory and are
reference counted in Delphi.
Once you have Unicode Strings, you can use the string functions in cUnicode to
work with the Unicode Strings and the character functions in cUnicodeChar to work
with Unicode characters.
The Unicode units provide common functions needed to use Unicode strings in your
Delphi application.
The following units make up the collection:
* Unicode codecs
Unicode codecs are encoders and decoders for convertings various character sets
and encodings to and from Unicode WideStrings.
* Unicode characters
* Unicode strings
More than 30 Unicode string functions for using WideStrings and null terminated
WideStrings,
* These functions are extremely fast. On a 450Mhz machine, the codecs reach
speeds of up to 40Mb/s and the reader up to 10Mb/s.