EUC
EUC stands for Extended Unix Code. It is a multibyte encoding standard originally developed by AT&T and supported on all System V implementations used to represent large Asian characters sets. There are several variants, two of them are for Chinese.
It defines both a fixed length and variable length encoding. It's a 8 bit coding method
The structure is based on the ISO 2022 standard. Up to 4 code sets can be defined. The layout is based upon a 94 x 94 grid so each plane set can contain up to 8.836 (94x94) characters.
If codeset 0 is ASCII, then the EUC codeset is ASCII transparent. Often this is the local version of ASCII.
The rules for describing a legal EUC codeset. These rules are the following:
1) Each character of an EUC multibyte string is chosen from among four distinct multibyte codesets (0,1,2,and 3).
2) Codeset 0 must be a 7bit codeset.
3) No multibyte character of Codeset 1 will use either SS2 or SS3 as its first byte.
4) Characters from codeset 2 will be preceded by the byte SS2.
5) Characters from codeset 3 will be preceded by the byte SS3.
6) For codesets 1, 2, and 3, every byte of every character must have the eighth bit set.
EUC-TW
- codeset 0 : ASCII
- codeset 1 : CNS 11643-1992 plane 1
- codeset 2 : CNS 11643-1992 plane 2 - 16
- codeset 3 : [not used]
EUC-CN
- codeset 0 : ASCII
- codeset 1 : GB 2312-80
- codeset 2 : [not used]
- codeset 3 : [not used]