Unicode Free Operator
This page describe Unicode (5.0) free operator to be implement in ISE compiler. We hightlight the relationship between special characters mentioned in Unicode standard and printable characters specified in the ECMA standard.
Contents
- 1 Legend
- 2 Space Characters
- 3 Currency symbols
- 4 Alternate format characters
- 5 Variation selectors
- 6 Format characters for musical symbols
- 7 Tag characters
Legend
<accepted> marks characters that are allowed as part of a free operator. <denied> marks characters that are not allowed as part of a free operator. <undecided> marks characters that are not yet decided.
Space Characters
<denied>
Code Position Name 0020 SPACE 00A0 NO-BREAK SPACE 2000 EN QUAD 2001 EM QUAD 2002 EN SPACE 2003 EM SPACE 2004 THREE-PER-EM SPACE 2005 FOUR-PER-EM SPACE 2006 SIX-PER-EM SPACE 2007 FIGURE SPACE 2008 PUNCTUATION SPACE 2009 THIN SPACE 200A HAIR SPACE 3000 IDEOGRAPHIC SPACE
Currency symbols
<accepted> Unicode standard emphasizes that currency symbols in ISO/IEC 10646 do not necessarily identify the currency of a country.
Alternate format characters
General format characters
Zero-width boundary indicators
COMBINING GRAPHEME JOINER (034F) -- Used to indicate that adjacent characters belong to the same grapheme cluster. SOFT HYPHEN (00AD) -- a format character that indicates a preferred intra-word linebreak opportunity ZERO WIDTH SPACE (200B) -- This character behaves like a SPACE in that it indicates a word boundary, but unlike SPACE it has no presentational width. WORD JOINER (2060) and ZERO WIDTH NO-BREAK SPACE (FEFF) -- These characters behave like a NOBREAK SPACE in that they indicate the absence of word boundaries, but unlike NO-BREAK SPACE they have no presentational width. ZERO WIDTH NON-JOINER (200C) -- This character indicates that the adjacent characters are not joined together in cursive connection even when they would normally join together as cursive letter forms. ZERO WIDTH JOINER (200D) -- This character indicates that the adjacent characters are represented with joining forms in cursive connection even when they would not normally join together as cursive letter forms.
Format separators
LINE SEPARATOR (2028) PARAGRAPH SEPARATOR (2029)
Bidirectional text formatting
LEFT-TO-RIGHT MARK (200E) -- In bidirectional formatting, this character acts like a left-to-right character (such as LATIN SMALL LETTER A). RIGHT-TO-LEFT MARK (200F) -- In bidirectional formatting, this character acts like a right-to-left character (such as ARABIC LETTER NOON). LEFT-TO-RIGHT EMBEDDING (202A) -- This character is used to indicate the start of a left-to-right implicit embedding. RIGHT-TO-LEFT EMBEDDING (202B) -- This character is used to indicate the start of a right-to-left implicit embedding. LEFT-TO-RIGHT OVERRIDE (202D) -- This character is used to indicate the start of a left-to-right explicit embedding. RIGHT-TO-LEFT OVERRIDE (202E) -- This character is used to indicate the start of a right-to-left explicit embedding. POP DIRECTIONAL FORMATTING (202C) -- This character is used to indicate the termination of an implicit or explicit directional embedding initiated by the above characters.
Other boundary indicators
NARROW NO-BREAK SPACE (202F) -- This character is a non-breaking space. It is similar to 00A0 NO-BREAK SPACE, except that it is rendered with a narrower width.
Script-specific format characters
Hangul fill characters
HANGUL FILLER (3164) -- This character represents the fill value used with the standard spacing Jamos. HALFWIDTH HANGUL FILLER (FFA0) -- As with the other halfwidth characters, this character is included for compatibility with certain systems that provide halfwidth forms of characters.
Symmetric swapping format characters
INHIBIT SYMMETRIC SWAPPING (206A) -- Between this character and the following ACTIVATE SYMMETRIC SWAPPING format character (if any), the stored characters listed in clause 19 are interpreted and rendered as LEFT and RIGHT, and the processing specified in that clause is not performed. ACTIVATE SYMMETRIC SWAPPING (206B) -- Between this character and the following INHIBIT SYMMETRIC SWAPPING format character (if any), the stored characters listed in clause 19 are interpreted and rendered as OPENING and CLOSING characters as specified in that clause.
Character shaping selectors
INHIBIT ARABIC FORM SHAPING (206C) Between this character and the following ACTIVATE ARABIC FORM SHAPING format character (if any), the character shaping determination process is inhibited. The stored Arabic presentation forms are presented without shape modification. This is the default state. ACTIVATE ARABIC FORM SHAPING (206D) Between this character and the following INHIBIT ARABIC FORM SHAPING format character (if any), the stored Arabic presentation forms are presented with shape modification by means of the character shaping determination process.
Numeric shape selectors
NATIONAL DIGIT SHAPES (206E) -- Between this character and the following NOMINAL DIGIT SHAPES format character (if any), digits from 0030 to 0039 are rendered with the appropriate national digit shapes as specified by means of appropriate agreements. NOMINAL DIGIT SHAPES (206F) -- Between this character and the following NATIONAL DIGIT SHAPES format character (if any), the digits from 0030 to 0039 are rendered with the shapes as those shown in the code tables for those digits. This is the default state.
Mongolian vowel separator
MONGOLIAN VOWEL SEPARATOR (180E) -- It indicates a special form of the graphic symbol for the letter A or E and the preceding consonant.
Ideographic description characters
An Ideographic Description Character (IDC) is a graphic character, which is used with a sequence of other graphic characters to form an Ideographic Description Sequence (IDS). IDS is not a character and therefore is not a member of the repertoire of ISO/IEC 10646.
IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT (2FF0) IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW (2FF1) IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT (2FF2) IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW (2FF3) IDEOGRAPHIC DESCRIPTION CHARACTER FULL SURROUND (2FF4) IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE (2FF5) IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW (2FF6) IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT (2FF7) IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT (2FF8) IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT (2FF9) IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT (2FFA) IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID (2FFB)
Interlinear annotation characters
If the interlinear annotation characters are filtered out during processing, then all characters between the Interlinear Annotation Separator and the Interlinear Annotation Terminator should also be filtered out.
INTERLINEAR ANNOTATION ANCHOR (FFF9) INTERLINEAR ANNOTATION SEPARATOR (FFFA) INTERLINEAR ANNOTATION TERMINATOR (FFFB)
Subtending format characters
0600 ARABIC NUMBER SIGN 0601 ARABIC SIGN SANAH 0602 ARABIC FOOTNOTE MARKER 06DD ARABIC END OF AYAH 070F SYRIAC ABBREVIATION MARK
Variation selectors
Variation selectors are combining characters following immediately a specific base character to indicate a specific variant form of graphic symbol for that character.
FE00 VARIATION SELECTOR-1 FE01 VARIATION SELECTOR-2 FE02 VARIATION SELECTOR-3 FE03 VARIATION SELECTOR-4 FE04 VARIATION SELECTOR-5 FE05 VARIATION SELECTOR-6 FE06 VARIATION SELECTOR-7 FE07 VARIATION SELECTOR-8 FE08 VARIATION SELECTOR-9 FE09 VARIATION SELECTOR-10 FE0A VARIATION SELECTOR-11 FE0B VARIATION SELECTOR-12 FE0C VARIATION SELECTOR-13 FE0D VARIATION SELECTOR-14 FE0E VARIATION SELECTOR-15 FE0F VARIATION SELECTOR-16
Format characters for musical symbols
1D159 MUSICAL SYMBOL NULL NOTEHEAD 1D173 MUSICAL SYMBOL BEGIN BEAM 1D174 MUSICAL SYMBOL END BEAM 1D175 MUSICAL SYMBOL BEGIN TIE 1D176 MUSICAL SYMBOL END TIE 1D177 MUSICAL SYMBOL BEGIN SLUR 1D178 MUSICAL SYMBOL END SLUR 1D179 MUSICAL SYMBOL BEGIN PHRASE 1D17A MUSICAL SYMBOL END PHRASE
Tag characters
E0000–E007F