Internationalized Domain Names (IDNs)
Open a world of opportunity with new customers, new registrations, and expanding web services.
Verisign Internationalized Domain Names (IDNs) enable businesses to connect with customers online in their local language. IDNs make a website address memorable and easier to remember.
A registrant requests an IDN from a registrar that supports IDNs. The registrar converts the local-language characters into a sequence of supported letters using an ASCII-compatible encoding (ACE). The registrar submits the ACE string to the Verisign® Shared Registration System (SRS) where it is validated. The IDN is added to the TLD zone files and propagated across the Internet.
IDN Resolution Process
When a user enters an IDN into a web browser or follows a link, IDN-enabled applications encode the characters into an ACE string that the Domain Name System (DNS) understands. The DNS processes the request and returns the information to the application. Although the process sounds simple, IDN-enabled application and the DNS support of different languages and scripts has required significant research and development.
The Internet Engineering Task Force (IETF) led the effort to create standards for using non-ASCII characters in the DNS.
The DNS only recognizes ASCII characters A-Z, 0-9 and '-'. This limits the number of characters that can be utilized to build domain names to 37 of the more than 96,000 characters identified within Unicode. To create domain names from the range of Unicode characters, a character-encoding scheme that uniquely maps Unicode code points to an ASCII representation must be used and standardized.
The IETF published these standards related to IDNs: Encoding Schemes, Framework, Protocol, Unicode and Right-to-Left Scripts.
The encoding scheme for IDNs uses Punycode, an ACE that encodes any Unicode character into the ASCII character-set (A-Z, 0-9 and Hyphen) such that DNS can accurately answer a request for an address record. To select Punycode as the ACE standard, IETF considered the balance between compression and implementation. The Punycode algorithm allows the greatest number of characters (code points) to be represented and is not difficult to deploy.
Framework [RFC 5890]
This RFC is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA) that was largely completed in 2008, known within the series and elsewhere as "IDNA2008." The series replaces an earlier version of IDNA [RFC 3490] [RFC 3491]. For convenience, that version of IDNA is referred to as "IDNA2003." The newer version continues to use the Punycode algorithm [RFC 3492] and the ACE (ASCII-Compatible Encoding) prefix from the earlier version.
Protocol [RFC 5891]
This RFC describes the core IDNA2008 protocol and its operations. In combination with the "bi-directional" (Bidi) document described below, it explicitly updates and replaces [RFC 3490].
Unicode [RFC 5892]
This RFC specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an IDN. It is part of the specification of IDNA2008.
Right-To-Left Scripts [RFC 5893]
The use of right-to-left scripts in IDNs has presented several challenges. This RFC provides new Bidi rules for IDNA labels, based on the problems encountered with some scripts and some shortcomings in the 2003 IDNA Bidi criterion.
Rationale [RFC 5894]
This RFC provides the background, explanation and rationale for the need of new RFCs to tackle issues that have risen out of the previous version(s) of IDNA. The need to update the version of Unicode supported in IDNs is also discussed in this RFC.
These standards have been published and are now available:
- RFC 3492 — Encoding Scheme (Punycode)
- RFC 5890 — IDNA Framework
- RFC 5891 — IDNA Protocol
- RFC 5892 — IDNA Unicode
- RFC 5893 — IDNA Right-to-Left Scripts
- RFC 5894 — IDNA Rationale
Verisign is committed to following the IETF standards and supporting rapid deployment of this new technology.
IDNs are domain names or web addresses registered in any character set or script defined in Unicode.
Understanding how Verisign IDNs support domain name registration in hundreds of languages with a single Shared Registration System (SRS) requires an understanding of how characters and script are used in written language and translated for computing.
Relationship Between Script, Character and Language
A script is a collection of symbols used to represent textual information in a language. Examples of scripts: Latin, Arabic, Han, Greek.
A character is the basic building block of any script, and thus any written language. It invokes a meaning at a fundamental level; you cannot break a character down any further and still have meaning.
A written language utilizes characters from one or more scripts to communicate meaning. Examples of languages: English, Farsi, Chinese, Greek.
Adapting Language to Computers
Different scripts use different keyboards or soft keyboards for input into computing devices. Computer operating systems have Input Method Editors (IME) that facilitate the input of different scripts. IDNs are a similar type of adaptation, allowing people to use their language script such as Latin, Hebrew, Cyrillic to navigate the web, send and receive email, transfer files and other applications that require domain names.
A computer uses encoding of characters to understand them. Each character within a character set is assigned a unique number. For example, in the ASCII-coded character set, the uppercase "A" is assigned the number 65. Most domain names are registered in ASCII characters (A to Z, 0 to 9 and the hyphen “-“). However, many Latin-script words that require diacritics such as those in Spanish and French, and languages that use non-Latin scripts such as Kanji and Arabic, cannot be rendered in ASCII. Unicode is a universal coded character set, which covers as many as 350 different languages. For this reason, IDNs use Unicode.
When an IDN registration is requested, the language tag is checked against a list of languages that have character inclusion tables or character-variant mapping tables. These tables are applied to the Unicode points that make up a registration to determine whether the registration is valid for a specific language. If a registration fails for one language, the character set may still be available with a different language tag. Download the PDF list of Verisign valid language tags.
Verisign has worked to address the issue of character variants with interested stakeholders. Registrants typically register domain names that have meaning in their own language such as a name, word or phrase. However, a single script may be used by more than one language.
As a result, a domain name may have different meanings in the context of other languages or cultures. The variant phenomenon has been classified into four different categories: character, orthographic, lexemic and contextual variants. Verisign has determined that addressing character variants is essential to enable users to navigate the Internet in their own languages. The other variants require difficult linguistic judgments that are not essential to delivering a robust IDN solution.
Chinese Character Variants
Many languages may have character variants that could potentially cause end-user confusion. For example, the Chinese language has two written forms: Simplified Chinese; used primarily in Mainland China, and Traditional Chinese, used primarily in Taiwan, Hong Kong and other Southeast Asian countries. The two written forms share many characters; however, simplified characters in Simplified Chinese may have the same meaning as complex characters in Traditional Chinese. These characters, called character variants, have the same meaning and pronunciation, but they do not look the same.
A Character Variant Solution
Different thought leaders in the technical community have suggested different approaches to address the character variant issue. Each approach has both positive and negative aspects. However, the IDN community is in agreement that the character variant issue may never fully be addressed because languages are always in a state of change. New character variants between languages will continue to be introduced into languages. Verisign has adopted language tags that reference language tables to address the character variant issue.
Verisign has worked to address the issue of character variants with interested stakeholders, including China Network Information Center (CNNIC) (.cn), Taiwan Network Information Center (TWNIC) (.tw), National Internet Development Agency of Korea (.kr), Japan Registry Service (JPRS) (.jp), the Chinese Domain Name Consortium (CDNC) and the IDN Implementation Committee established by ICANN.
Verisign has developed a policy for IDN registrations specifying permissible and prohibited code points.
The Verisign SRS allows the creation of IDNs that utilize Unicode scripts in compliance with IDNA2008.
Understand the five validation rules through which the policy is implemented.
After validating an IDN, Verisign executes some further logic based on the Language Tag of the registration.