IETF Standards Summary

The Internet Engineering Task Force (IETF) led the effort to create standards for using non-ASCII characters in the Domain Name System (DNS). The DNS only recognizes ASCII characters A-Z, 0-9 and '-'. This limits the number of characters that can be utilized to build domain names to 37 of the more than 40,000 characters identified within Unicode. To create domain names from the range of Unicode characters, a character-encoding scheme that uniquely maps Unicode code points to an ASCII representation must be used and standardized.

The IETF published three standards related to Internationalized Domain Names (IDN):

Encoding Scheme

The encoding scheme for IDNs uses punycode, an ASCII Compatible Encoding (ACE) that encodes local language characters into ASCII characters such that DNS can accurately answer a request for an address record. To select punycode as the ACE standard, IETF considered the balance between compression and implementation. Punycode allows the greatest number of characters (code points) to be represented and is not difficult to deploy.

Name Preparation

The name preparation standard is the rule that ensures uniqueness in registering Unicode code points. The rules outline the criteria through which a set of non-ASCII characters are refined to ensure that there is no ambiguity within the registrations of a specific name space. These rules are Mapping, Normalization and Prohibition.

Mapping: Characters are mapped to nothing, a single character or multiple characters based upon their usefulness in text only or case. An example of usefulness: the soft hyphen (U+00AD) is discretionary and only has use within text and is invisible or ignored. The more common example is the mapping of a capital letter to a small letter such as 'B' (U+0042) to 'b' (U+0062). This is to ensure that a registration such as ibm.com does not have a conflict with other registration such as IBM.com or iBm.com.

There are cases where a single character maps to multiple characters. The small letter sharp s or 'ß' (U+00DF) has an upper case representation of 'SS' (U+0053, U+0053). This is also the same upper case representation for 'ss' (U+0073, U+0073). Therefore, 'ß' maps to 'ss'.

Normalization: Once a set of characters has been mapped, the set is normalized. Some input method editors (IME) enter characters that look exactly like another character, but have different code points. For example, 1 is a fullwidth digit one (U+FF11) and will normalize into a digit one (1) (U+0031). Normalization also ensures predictable results through ordering where characters have a number of combining diacritics.

Prohibition: After normalization, the mapped and normalized set of characters is checked against a table of prohibited characters. These characters are prohibited for a variety of reasons but the most common are spaces that could lead to confusion and control characters that cannot be displayed.

IDNs in Applications

The IDN in applications standard focuses on the location where the Unicode to ASCII mapping takes place. The IETF's approach makes the applications that send and receive traffic from DNS (browsers, e-mail clients, etc.) encode and un-encode the Unicode characters.

Published RFCs

These standards have been published and are now available:

VeriSign is committed to following the IETF standards and supporting rapid deployment of this new technology.

Need More Info?
Call 703-925-6999 Email or Chat with Customer Support.