Internationalized Domain Name Registration Rules
The Verisign Shared Registration System (SRS) supports Internationalized Domain Name (IDN) registrations containing various Unicode scripts.
Verisign has developed a policy for IDN registrations specifying permissible and prohibited code points. The policy is implemented in the following five validation rules. IDNs which adhere to these five rules are considered valid registrations.
1. IETF Standards
The IDNA2008 specification defines rules and algorithms that permit/prohibit Unicode code points in IDN registrations. Verisign is compliant with all of the RFC documents that comprise the IDNA2008 standard. Learn more about the IETF Standards.
2. Restrictions on Specific Languages
All IDN registrations require a three letter language tag. CHI, for instance, is for the Chinese language. If the language tag associated with the registration is in the following table, then Verisign has a list of included characters for that language. The requested IDN must be entirely contained within this list. If even one code point from the IDN is not a valid character for this language, then the registration is rejected.
List of Included Characters
LANGUAGE TAG | LANGUAGE |
---|---|
AZE | Azerbaijani |
BEL | Belarusian |
BUL | Bulgarian |
CHI | Chinese |
GRE | Greek |
JPN | Japanese |
KOR | Korean |
KUR | Kurdish |
MAC | Macedonian |
MOL | Moldavian |
POL | Polish |
RUS | Russian |
UKR | Ukranian |
3. Restrictions on Commingling of Scripts
If the language tag specified in the IDN registration is not in the above table, and so
does not have a list of included characters, then Verisign applies an alternate
restriction to prevent commingling of different scripts in a single domain.
The Unicode Standard defines a set of Unicode Scripts by assigning each code point exactly one Unicode Script value. As a rule, Verisign’s registries reject the commingling of code points from
different Unicode scripts. That is, if an IDN contains code points from two or more
Unicode scripts, then that IDN registration is rejected. For example, a character from
the Latin script cannot be used in the same IDN label with any Cyrillic character. All
code points within an IDN label must come from the same Unicode script. This is done to
prevent confusable code points of different scripts from appearing in the same IDN.
Again, this rule only applies to languages for which there is not a strictly defined
list of included characters. For example, the FRE language tag, indicating the French
language, does not have a strict list of included characters, and so the commingling
rule applies. All code points in a French domain name must come from a single script. But
that script may be any of the valid Unicode defined scripts.
Unicode Scripts
and Associated Code Points
- Arabic
- Armenian
- Avestan
- Balinese
- Bamum
- Batak
- Bengali
- Bopomofo
- Brahmi
- Buginese
- Buhid
- Canadian Aboriginal
- Carian
- Cham
- Cherokee
- Coptic
- Cuneiform
- Cyrillic
- Devanagari
- Egyptian Hierogyphs
- Ethiopic
- Georgian
- Glagolitic
- Greek
- Gujarati
- Gurmukhi
- Han
- Hangul
- Hanunoo
- Hebrew
- Hiragana
- Imperial Aramaic
- Inscriptional Pahlavi
- Inscriptional Parthian
- Javanese
- Kaithi
- Kannada
- Katakana
- Kayah Li
- Kharoshthi
- Khmer
- Lao
View a comprehensive list of all Unicode code points allowed for IDN registration.
4. ICANN’s IDN Implementation Guidelines
The Verisign SRS also adheres to ICANN’s Guidelines for the Implementation of Internationalized Domain Names.
5. Special Characters
There are exactly two Unicode characters whose latest definitions are not backward
compatible with previous versions of the IDNA Standard. The Latin Sharp S and Greek Final
Sigma were previously mapped to alternate characters. Clients and registries compliant with
the older standard would, for instance, map a Latin Sharp S into two lowercase Latin letter
S characters. This mapping is irreversible. The latest version of the IDNA standard does not
apply this mapping. So, whereas the Latin Sharp S was previously prohibited (mapped into
other characters), the latest standard allows registries to accept this character at their
own discretion.
Because these changes are not backward compatible, Verisign has elected to continue to
disallow these two characters, until a clear and fair approach to their registration has
been reached and communicated.
CHARACTER | UNICODE POINT | GLYPH |
---|---|---|
Latin Small Letter Sharp S | U+00DF | ß |
Greek Small Letter Final Sigma | U+03C2 | ς |