Internationalized Domain Names (IDNs) are second or third level domain names or Web addresses registered in any character set or script defined in Unicode. To understand how VeriSign IDNs support domain name registration in hundreds of native languages with a single Shared Registration System (SRS) requires an understanding of how characters and script are used in written language and translated for computing.
| Script | Latin | Arabic | Han | Greek |
| Character | L | س | 漢字 | Ω |
| Language | English | Farsi | Chinese | Greek |
Script
A script is a collection of symbols used to represent textual information in a language. Examples of scripts: Latin, Arabic, Han, Greek.
Character
A character is the basic building block of any script, and thus any written language. It invokes a meaning at a fundamental level; you cannot break a character down any further and still have meaning.
Written Language
A written language utilizes characters from one or more scripts to communicate meaning. Examples of languages: English, Farsi, Chinese, Greek.
Adapting Language to Computers
Different scripts use different keyboards or soft keyboards for input into computing devices. Computer operating systems have Input Method Editors (IME) that facilitate the input of different scripts. IDNs are a similar type of adaptation, allowing people to use their local language script to navigate the Web, send and receive email, transfer files, and other applications that require domain names.
Unicode
A computer uses encoding of characters to understand them. Each character within a character set is assigned a unique number. For example, in the ASCII coded character set, the uppercase "A" is assigned the number 65. Most domain names are registered in ASCII characters (A to Z, 0 to 9 and the hyphen “-“). However, non-English words that require diacritics such as Spanish and French and languages that use non-Latin scripts such as Kanji and Arabic cannot be rendered in ASCII. Unicode is a universal coded character set, which covers as many as 350 different native languages, essentially any language that can be written in one of scripts listed below. For this reason, IDNs use Unicode.
Arabic
Armenian
Bengali
Bopomofo
Canadian-Aboriginal Syllabics
Cherokee
Cyrillic
Devanagari
Ethiopic
Georgian
Greek
Gurmukhi
Han (Chinese, Japanese, Korean ideographs)
Hangul
Hebrew
Hiragana
Kannada
Katakana
Khmer
Lao
Latin
Malayalam
Mongolian
Myanmar
Ogham
Oriya
Runic
Sinhala
Syriac
Tamil
Telegu
Thaana
Thai
Tibetan
Yi
Variants
Registrants typically register domain names that have meaning in their own language such as a name, word or phrase. However, a single script may be used by more than one language. As a result, a domain name may have different meanings in the context of other languages or cultures. The variant phenomenon has been classified into four different categories: character, orthographic, lexemic, and contextual variants. VeriSign has determined that addressing character variants is essential to enable users to navigate the Internet in their own languages. The other variants require difficult linguistic judgments that are not essential to delivering a robust IDN solution. Learn more about Character Variants and Language Tags.
|

United States [
Feedback
