mail us  |  mail this page

contact us
training  | 
tech stuff  | 

Tech Stuff - Character Sets

This page summarises what, at face value, seems a remarkably simple concept - character representation. Turns out it's more like a nightmare. The column marked Relationship tries to define the relationships between the various standards.

Name Standard Aliases Description Relationship
ASCII ANSI X3.4-1986
ISO 646
ITU-T T.50
US-ASCII
IA5
IRA5
ISO 646
ASCII is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). What is frequently generically called ASCII is normally US-ASCII but various national definitions exist which typically have only two printable differences. ASCII is the same as IA5 or more properly now International Reference Alphabet No. 5 (IRA5) and previously International Alphabet No. 5 (defined in ITU-T T.50) and ISO 646. It has the same character values as the first 128 entries in ISO 8859-1 (Latin-1), ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different.
IA5 ITU-T T.50 IRA5
ASCII
ISO 646
International Alphabet No. 5 (ISO 646) now renamed International Reference Alphabet No. 5 (IRA5).
IRA5 ITU-T T.50 IA5
ISO 646
ASCII
International Reference Alphabet No. 5 (IRA5) (was International Alphabet No. 5 - IA5) and is the ITU equivalent of ASCII and ISO 646. IRA5 is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). IRA5 is almost the same as ISO 646 and ASCII (typically two - national/international variant - differences). The character values are the same as the first 128 entries in ISO 8859-1 (Latin-1), ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different.
ISO 646 ISO 646 IA5
IRA5
ASCII
ISO 646 is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). ISO 646 is the same as IRA5 (IA5) and ASCII. The character values are the same as the first 128 entries in ISO 8859-1 (Latin-1), ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different.
ISO 8859-1 ISO 8859-1 Latin-1 ISO 8859-1 is part of a large family (ISO 8859-1 to 8859-16) and is encoded as an 8 bit field which uses all 8 bits 00 to FF (0 to 255 decimal). The first 128 character values are the same as IRA5, ISO 646, ASCII, ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different.
ISO 8859-15 ISO 8859-15 Latin-9 ISO 8859-15 is part of a large family (ISO 8859-1 to 8859-16) and is encoded as an 8 bit field which uses all 8 bits 00 to FF (0 to 255 decimal). It differs from 8859-1 by 8 changes including the euro symbol. The first 128 character values are the same as IRA5, ISO 646, ASCII, ISO 8859-1 (Latin-1) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different.
ISO 10646 ISO 10646 UCS ISO 10646 (Universal Character Set) is designed to be the replacement for all previous character sets by providing a single family of standards for the encoding of all possible characters and symbols in all written languages. It has two implementations UCS-2 (a 16 bit encoding) and UCS-4 (a 32 bit encoding). The first 128 characters (but not the encoding) in ISO 10646 are the same as ASCII, IA5, IRA5 and ISO 646, 8859-1 and 8859-15. Unicode from version 1.1 is the same as ISO 10646.
Unicode Unicode Consortium - Unicode (currently version 3.0). From version 1.1 is fully compatible with ISO 10646.
CP1252 Microsoft code page 1252 Microsoft's version of ISO 8859-1. There are 27 differences from 8859-1 (it includes the euro) - all in range x80 - x9F. 8 bit encoding. The first 128 character values are the same as IRA5, ISO 646, ASCII, ISO 8859-1 (Latin-1) and ISO 8859-15 (Latin-9). The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different.
Transformations
These values define how the underlying codeset of Unicode/ISO 10646 are sent over the wire. They are not character sets.
UTF-7 RFC 2152 - UCS Transformation Format-7. Defines how ISO 10646 (UCS) is transformed for non-MIME email data communications. May use from 1 to 9 octets for a single ISO 10646/Unicode character.
UTF-8 RFC 3629 UTF-2
FSS-UTF
UCS Transformation Format-8. Defines how ISO 10646 (UCS) is transformed for MIME enabled data communications. May use from 1 to 7 octets for a single ISO 10646/Unicode character.
UTF-16 RFC 2781 - UCS Transformation Format-16. Defines how ISO 10646 (UCS) is transformed for data communications. May use 1 or 2 octets for a single ISO 10646/Unicode character and thus reduces any UCS-4 to a UCS-2 format before encoding.

ISO 8859 Family

ISO 8859-1   Latin alphabet No. 1     West European
ISO 8859-2   Latin alphabet No. 2     Central and East European
ISO 8859-3   Latin alphabet No. 3     South European, Maltese & Esperanto
ISO 8859-4   Latin alphabet No. 4     North European
ISO 8859-5   Latin/Cyrillic alphabet  Slavic languages
ISO 8859-6   Latin/Arabic alphabet    Arabic
ISO 8859-7   Latin/Greek alphabet     modern Greek
ISO 8859-8   Latin/Hebrew alphabet    Hebrew and Yiddish
ISO 8859-9   Latin alphabet No. 5     Turkish
ISO 8859-10  Latin alphabet No. 6     Nordic (Sámi, Inuit, Icelandic)
ISO 8859-11  Latin/Thai alphabet      Thai
ISO 8859-12  not been defined)
ISO 8859-13  Latin alphabet No. 7     Baltic Rim
ISO 8859-14  Latin alphabet No. 8     Celtic
ISO 8859-15  Latin alphabet No. 9     adds euro to -1 (8 changes)
ISO 8859-16  Latin alphabet No. 10    South-Eastern Europe


Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.

Tech Stuff

RSS Feed Icon

If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C standards compliant browser such as Firefox

Search

web zytrax.com

Share

Icons made by Icomoon from www.flaticon.com is licensed by CC 3.0 BY
share page via facebook tweet this page

Page

email us Send to a friend feature print this page Display full width page Decrease font size Increase font size

Standards

ISO (International)
IEC (International)
ANSI (US)
DIN (Germany)
ETSI (EU)
BSI (UK)
AFNOR (France)

Telecom

TIA (US)
ECIA (US)
ITU (International)
IEEE (US)
ETSI (EU)
OFCOM (UK)

Internet

IETF
IETF-RFCs
IANA
ICANN
W3C

Electronics

JEDEC
ECIA (US)

Site

CSS Technology SPF Record Conformant Domain
Copyright © 1994 - 2024 ZyTrax, Inc.
All rights reserved. Legal and Privacy
site by zytrax
hosted by javapipe.com
web-master at zytrax
Page modified: January 20 2022.