About IDN
Registrations
Open a world of opportunity with new customers, new registrations, and expanding web services.
Verisign Internationalized Domain Names (IDNs) enable your customers to connect with users online in their local language. IDNs make a website address easier to remember for your intended audience.
The IDN Registration Process
The domain name system (DNS) only recognizes American Standard Code for Information Interchange (ASCII) characters A-Z, 0-9 and '-'. This limits the number of characters that can be utilized to build domain names to 37 of the more than 96,000 characters defined within Unicode. To create domain names from the range of Unicode characters, a character-encoding scheme that uniquely maps Unicode code points to an ASCII representation must be used and standardized.
A registrant requests an IDN from a registrar that supports IDNs. The registrar converts the local-language characters into a sequence of supported letters using an ASCII-Compatible Encoding (ACE). The registrar submits the ACE string to the Verisign Shared Registration System where it is validated. The IDN is added to the top-level domain zone files and propagated across the internet.
Internet
IDN Resolution Process
When a user enters an IDN into a web browser or follows a link, IDN-Aware (IDNA) applications encode the characters into an ACE string that the DNS understands. The DNS processes the request and returns the information to the application. Although the process sounds simple, IDNA application and the DNS support of different languages and scripts has required significant research and development.
Verisign is committed to following the Internet Engineering Task Force (IETF) standards as follows:
RFC 3492 — Encoding Scheme (Punycode)
- The encoding scheme for IDNs uses Punycode, an ACE that encodes any Unicode character into the ASCII character-set (A-Z, 0-9 and hyphen) such that DNS can accurately answer a request for an address record. To select Punycode as the ACE standard, the IETF considered the balance between compression and implementation. The Punycode algorithm allows the greatest number of characters (code points) to be represented and is not difficult to deploy.
- This request for comment (RFC) is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA) that was largely completed in 2008, known within the series and elsewhere as "IDNA2008." The series replaces an earlier version of IDNA [RFC 3490] [RFC 3491]. For convenience, that version of IDNA is referred to as "IDNA2003." The newer version continues to use the Punycode algorithm [RFC 3492] and the ACE prefix from the earlier version.
- This RFC describes the core IDNA2008 protocol and its operations. In combination with the "bi-directional" document linked above, it explicitly updates and replaces [RFC 3490].
- This RFC specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an IDN. It is part of the specification of IDNA2008.
RFC 5893 — IDNA Right-to-Left Scripts
- The use of right-to-left scripts in IDNs has presented several challenges. This RFC provides new Bidi rules for IDNA labels, based on the problems encountered with some scripts and some shortcomings in the 2003 IDNA Bidi criterion.
- This RFC provides the background, explanation and rationale for the need of new RFCs to tackle issues that have risen out of the previous version(s) of IDNA. The need to update the version of Unicode supported in IDNs is also discussed in this RFC.
IDNs are domain names or web addresses registered using IETF standard valid characters from the extensive character set defined by the Unicode Consortium. Understanding how Verisign IDNs support domain name registration in hundreds of languages with a single Shared Registration System requires an understanding of how characters and script are used in written language and translated for computing.
Relationship Between Script, Character, and Language
SCRIPT | Latin | Arabic | Han | Greek |
---|---|---|---|---|
CHARACTER | L | س | 漢字 | Ω |
LANGUAGE | English | Farsi | Chinese | Greek |
Script
A script is a collection of symbols used to represent textual information in a language. Examples of scripts: Latin, Arabic, Han, Greek. A script may be used by more than one language e.g., Latin script is used by Spanish, English, French, German, Italian, etc.
Character
A character is the basic building block of any script, and thus any written language. It invokes a meaning at a fundamental level; you cannot break a character down any further and still have meaning.
Written Language
A written language utilizes characters from one or more scripts to communicate meaning. Examples of languages: English, Farsi, Chinese, Greek. In domain names, including IDNs, not all characters used in a language are allowed in a domain name registration.
Adapting Language to Computers
Different scripts use different keyboards or soft keyboards for input into computing devices. Computer operating systems have Input Method Editors that facilitate the input of different scripts. IDNs are a similar type of adaptation, allowing people to use their language script, such as Latin, Hebrew, Cyrillic, to navigate the web, send and receive email, transfer files and other applications that require domain names.
Unicode
A computer uses encoding of characters to understand them. Each character within a character set is assigned a unique number. For example, in the ASCII-coded character set, the uppercase "A" is assigned the number 65. Most domain names are registered in ASCII characters (A to Z, 0 to 9 and the hyphen “-“). However, many Latin-script words that require diacritics such as those in Spanish and French, and languages that use non-Latin scripts such as Japanese and Arabic, cannot be rendered in ASCII. Unicode is a universal coded character set, which covers as many as 350 different languages. For this reason, IDNs use Unicode.
Language Tables
When an IDN registration is requested, the language tag is checked against a list of languages that have character inclusion tables or character-variant mapping tables. These tables are applied to the Unicode points that make up a registration to determine whether the registration is valid for a specific language. If a registration fails for one language, the character set may still be available with a different language tag. Download the PDF list of Verisign valid language tags.
Verisign has worked to address the issue of character variants with interested stakeholders. Registrants typically register domain names that have meaning in their own language, such as a name, word, or phrase. However, a single script may be used by more than one language.
As a result, a domain name may have different meanings in the context of other languages or cultures. The variant phenomenon has been classified into four different categories: character, orthographic, lexemic, and contextual variants. Verisign has determined that addressing character variants is essential to enable users to navigate the internet in their own languages. The other variants require difficult linguistic judgments that are not essential to delivering a robust IDN solution.
Chinese Character Variants
Many languages may have character variants that could potentially cause end-user confusion. For example, the Chinese language has two written forms: Simplified Chinese, used primarily in Mainland China, and Traditional Chinese, used primarily in Taiwan, Hong Kong, and other Southeast Asian countries. The two written forms share many characters; however, simplified characters in Simplified Chinese may have the same meaning as complex characters in Traditional Chinese. These characters, called character variants, have the same meaning and pronunciation, but they do not look the same.
A Character Variant Solution
Different thought leaders in the technical community have suggested different approaches to address the character variant issue. Each approach has both positive and negative aspects.
However, the IDN community is in agreement that the character variant issue may never fully be addressed because languages are always in a state of change. New character variants between languages will continue to be introduced into languages. Verisign has adopted language tags that reference language tables to address the character variant issue.
Verisign has worked to address the issue of character variants with interested stakeholders, including China Network Information Center (.cn), Taiwan Network Information Center (.tw), National Internet Development Agency of Korea (.kr), Japan Registry Service (.jp), the Chinese Domain Name Consortium, and the IDN Implementation Committee established by ICANN.
Verisign has developed a policy for IDN registrations specifying permissible and prohibited code points. The Verisign SRS allows the creation of IDNs that utilize Unicode scripts in compliance with IDNA2008.
Registration Rules
Understand the five validation rules through which the policy is implemented.
Additional Logic
After validating an IDN, Verisign executes some further logic based on the Language Tag of the registration.
Certain characters in Chinese have the same meaning and/or pronunciation, but are visually different (such as simplified versus traditional Chinese). These characters are called character variants.
When a domain name is registered with a Chinese character that has variants, then all other variants domain names for that registered Chinese label will be prohibited from registration.
For a list of Chinese characters with variants, click here. (~ 5.5 MB)
Frequently Asked Questions
IDNs are domain names which use a wide range of Unicode characters used in different languages. An example of an IDN is: 스타벅스코리아.com or its ACE form: xn—oy2b35ckwhba574atvuzkc.com.
Please note, the ACE form or A-Label is not meant for the end user to see, but an indicator for the DNS servers and other IDN-ready applications to convert the string of characters in to its true IDN representation.
IDNs enable more web users to navigate the internet in their preferred script and more companies to maintain localization of their brand name in multiple scripts. Most domain names are registered in ASCII characters (A to Z, 0 to 9, and the hyphen "-"). However, languages that require diacritics such as Spanish and French, and those that use non-Latin scripts such as Japanese and Arabic, cannot be rendered in ASCII. As a result, millions of internet users may struggle to find their way online using non-native scripts and languages. IDNs improve the accessibility and functionality of the internet by enabling domain names in a wide variety of scripts from around the world.
Unicode is a collection of characters that uniquely represent scripts and languages with an assigned hexadecimal number. Usually, the operating system or browser will display this data using an appropriate or familiar font.
Punycode is an algorithm used to transform a Unicode string or U-Label into an ASCII-Compatible Encoding string or A-Label. Punycode transforms a Unicode sequence, or hexadecimal representation, into a string of ASCII characters which can be used in a hostname label. Only letters, digits, and hyphens are allowed. An A-label will always start with the prefix "XN--" to signal the encoding scheme.
To offer IDNs, registrars must be certified. For information on what is needed to get started, visit Become a Registrar.
If you own a website or provide other internet-based services and would like to use IDNs to help your customers, you may register your preferred IDN, to the extent available, through participating ICANN-accredited and Verisign-certified registrars.