 |
 |
Understanding Character Variants |
 |
|
VeriSign's Character Variant Solution - Language Tag
In December of 2003,
VeriSign implemented language tags into the SRS. The language tag requested
from the registrar for each new IDN registration should reflect the
primary language for which the registration is intended.
The language tag
will provide the language associated with the IDN registration. The
language tag will determine which character mapping and or inclusion
table to apply, if any.
To be considered
valid, language tags must come from the VeriSign
Valid Language Tag List. This list is a subset of the ISO
639-2 table, Codes for the representation of names of languages:
alpha-3 codes. There will not be a table for each and every one of the
languages identified in the VeriSign
Valid Language Tag List or the ISO 639-2 table.
Registrars may choose
to add a request for a language tag to their IDN purchase flow. If the
registrar chooses to add the language tag to their purchase flow, they
will need to modify their systems to capture the language tag for the
registrant. For some markets where character variants are not an issue
or where the bulk of IDN registrations are in a particular language,
registrars may choose not to modify their purchase flow to add language
tags. This decision is entirely up to the registrar, however, a language
tag is still required.
Example
The following is
an example of the language tags and how they will be handled. The written
Japanese language uses three scripts: Hiragana, Katakana and Kanji.
Kanji are Chinese characters. A registrant registers 日本.com, a Japanese IDN composed of only Kanji
characters. How would VeriSign generate the appropriate character variants?
The language tag will dictate the appropriate mapping tables to be used
to generate the character variants. If the Japanese IDN registration
is submitted with a Japanese language tag (JPN), VeriSign will look
for the Japanese mapping table. The Japanese mapping table does not
contain any variants, therefore there would be no character variants
generated or blocked.
The language tables deployed in the VeriSign Character Variant Solution
include
(as of April 24, 2004):
- Chinese
- Japanese
- Polish (Only the Latin characters)
- Greek: Unicode Code Points U+002D, U+0030 through U+0039,
U+0370 through U+03FF
- Russian: Unicode Code Points U+002D, U+0030 through U+0039,
U+0400 through U+04FF, U+0500 through U+052F
- Belarusian: Unicode Code Points U+002D, U+0030 through U+0039,
U+0400 through U+04FF, U+0500 through U+052F
- Ukrainian: Unicode Code Points U+002D, U+0030 through U+0039,
U+0400 through U+04FF, U+0500 through U+052F
- Serbian: Unicode Code Points U+002D, U+0030 through U+0039,
U+0400 through U+04FF, U+0500 through U+052F
- Macedonian: Unicode Code Points U+002D, U+0030 through U+0039,
U+0400 through U+04FF, U+0500 through U+052F
- Bulgarian: Unicode Code Points U+002D, U+0030 through U+0039,
U+0400 through U+04FF, U+0500 through U+052F
- English: Unicode Code Points U+002D, U+0030 through U+0039,
U+0061 through U+007A
- German: Unicode Code Points U+002D, U+0030 through U+0039,
U+0061 through U+007A, U+00BA, U+00F6, U+00FC
Languages Not Having Tables
It is not allowed to commingle characters from the Latin and Cyrillic
code pages with the exceptions of U+002D and U+0030 through U+0039
|
 |