IDNs - Character Variants and Language Tools

To resolve an Internationalized Domain Name (IDN), the Domain Name System (DNS) has to translate local language characters into Unicode and connect them to a domain name. Because different languages use the same script, an overlap exists. A character-by-character substitution without language context creates variants that result in confusion and misunderstanding. When an IDN is registered, it is important that the DNS understands the context of the characters and the language they relate to in order to help Internet users reach their intended destination, and to prevent two IDNs from having essentially the same name.

Chinese Character Variants

Many languages may have character variants that could potentially cause end-user confusion. For example, the Chinese language has two written forms: Simplified Chinese, used primarily in Mainland China, and Traditional Chinese, used primarily in Taiwan, Hong Kong and other Southeast Asian countries. The two languages share many characters, however, simplified characters in Simplified Chinese may have the same meaning as complex characters in Traditional Chinese. These characters, called character variants, have the same meaning and pronunciation, but they do not look the same.

A user in Mainland China entering a domain name in Simplified Chinese could be directed to one site, while another user in Taiwan entering what they perceive to be the same domain name in Traditional Chinese is directed to a different destination. To some Chinese speakers, 会 and 會 are equivalent - they have the same meaning. If both of the following IDNs were permitted to exist, there may be confusion for end users.

A Character Variant Solution

Different thought leaders in the technical community have suggested different approaches to address the character variant issue. Each approach has both positive and negative aspects. However, the technical community is in agreement that the character variant issue may never fully be addressed because languages are always in a state of change. New character variants between languages will continue to be introduced into languages. VeriSign has adopted language tags that reference language tables to address the character variant issue.

VeriSign has worked to address the issue of character variants with interested stakeholders, including China Network Information Center (CNNIC) (.cn), Taiwan Network Information Center (TWNIC) (.tw), National Internet Development Agency of Korea (.kr), Japan Registry Service (JPRS) (.jp), the Chinese Domain Name Consortium (CDNC), and the IDN Implementation Committee established by ICANN.

Language Tags

The VeriSign IDN infrastructure complies with ICANN Registry Implementation Committee (RIC) guidelines and requires that each IDN be associated with a specific language using a "language tag". The registrant selects the IDN language tag during the registration process. If an IDN combines more than one language, the registrant must select the most appropriate language. (Not all language tags are referenced today, however, capturing the information during the registration process allows the adoption of language tables in the future. Download the list: VeriSign Valid Language Tags (PDF).

Language Tables

When an IDN registration is requested, the language tag is checked against a list of languages that have character inclusion tables or character variant mapping tables. These tables are applied to the Unicode code points that make up a registration to determine whether the registration is valid for a specific language. If a registration fails for one language, the character set may still be available with a different language tag.

Language Tables Deployed in the VeriSign Character Variant Solution
Language Unicode Code Points
Chinese
Japanese
Polish Only the Latin characters
Greek U+002D, U+0030 through U+0039, U+0370 through U+03FF
Russian U+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
Belarusian U+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
Ukrainian U+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
Serbian U+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
Macedonian U+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
Bulgarian U+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F

IDN Code Points

The VeriSign Shared Registration System (SRS) allows a registrant to register IDNs through a registrar in any script identified within Unicode 3.2 and passed through IETF’s IDN Name Preparation. To allow for rare scripts, musical notations, and other special characters, VeriSign has specified permissible, restricted, and prohibited code points in our Policy for IDN Code Points.

Need More Info?
Call 703-925-6999 Email or Chat with Customer Support.
Domain Name Primer