Understanding Character Variants - Scripts and Characters from VeriSign, Inc.

Understanding Character Variants



Scripts and Characters

Relationship Between Script, Character and Language

Script

Latin

Arabic

Han

Greek

Character

L

image

image

image

Language

English

Farsi

Chinese

Greek

 

Script

A script is a collection of symbols used to represent textual information in a language.

Examples of scripts:

 Arabic, Cyrillic, Greek, Han, Hiragana, Latin

Character

A character, in an abstract sense, is an element of writing that is the smallest quantity having semantic value. A character is the basic building block of any script, and thus any written language. It invokes a meaning at a fundamental level; you cannot break a character down any further and still have meaning.

Examples:

The English character "A" means something at a fundamental level. But a smaller portion of the character, such as the left leg "/" or cross member "-", has no meaning in English.

In Latin-based languages, such as German, French, and English, several characters are needed to form a word that represents a complete idea (e.g., "light").

In Asian countries, such as China, Japan and Korea, the same definition for character applies, but a single character can represent an idea. For example, a single Chinese character expresses the same idea as the word "light" does in English.

Written Language

A written language is a writing system made of characters from one or more scripts.

Examples of languages:

 English, French, Japanese, Russian, Urdu

See Supported Character Scripts for more information on character scripts.

Encoded Character Set

When grouped together, all the characters of a particular script form a character repertoire. Order these characters and assign a number to each of them, and you have a coded character set. ASCII, for example, is a coded character set in which uppercase "A" is assigned the number 65.

Unicode

Unicode is another example of a coded character set. The idea behind Unicode was to create a universal character set that covers all the major scripts of the world. Because of this, Unicode is the coded character set of choice for IDNs. As of this writing, Unicode is still being updated with new scripts and new characters.




Contact Us
Contact Us

Phone: (703) 925-6999
info@verisign-grs.com