EuroAssembler Index Manual Download Source Macros


Sitemap Links Forum Tests Projects

euroconv.htm
Macros
CP
UCP
DosGrep
DosGrepNum
WinGrep
WinGrepNum
Data
Tables
UniCodePoints
CodePages
DosHeader
DosData
WinHeader
WinConData
WinGuiData
Procedures
DosAutodetect
DosConvert
DosEncList
DosHtmlDecode
DosInfoText
DosMain
DosOutputProc
DosParseEnc
WinAutodetect
WinConvert
WinCreate
WinCtrlProc
WinEncList
WinFillEncBox
WinGui
WinHtmlDecode
WinInfoText
WinMain
WinOutputProc
WinParseEnc
WinProc
WinSetEncBox
WinUpdate

This file euroconv.htm is the source text in EuroAssembler of program EuroConvertor, which provides conversion of text file encodings.

A single executable file euroconv.exe works both in 16bit DOS and in 32bit or 64bit Windows. The DOS version is assembled first and then linked as a MZ stub file to PE Windows version. Static database of supported characters and conversion tables is shared between both versions.

EuroConvertor expects four arguments on its command line in fixed order:

  1. input encoding specification
  2. output encoding specification
  3. input file name
  4. output file name
Example:
euroconv.exe IBM852 UTF-8 input.txt output.txt

When the program is run without arguments (by the click in Explorer, for instance), it launches interactive GUI version, where the arguments can be selected from menu interactively.

If you are interrested in EuroConvertor and don't want to download the whole EuroAssembler necessary to compile its source, you can download the assembled binary euroconv.exe together with more detailed manual from the website vit$oft freeware as the file euroconv.zip [42 KB].

Format
Dual DOS/Windows CON/GUI application
Platform
DOS and MS Windows.
Import-library build
If you don't have the import library ..\objlib\winapi.lib yet,
compile in prowin32 subdirectory with command euroasm dll2lib.htm.
Build
Compile in prowin32 subdirectory with command euroasm euroconv.htm.
Run
euroconv InputEncoding OutputEncoding InputFile OutputFile
DosHeader

Header defines format of DOS program (MZ stub created as eurocond.exe in the current directory). It also specifies included macrolibraries, named constants, static data.

Order of segments is important in this program, [DATA] segment must be linked first because Windows version searches for the Unicode table directory at the beginning of its stub. That is why the segment map is explicitly specified here.

Segment [DATA] hosts two databases implemented as sections, each section contains one array of bytes, words or dwords:
1. UCP database with codepoints of Unicode characters from Basic Multilingual Plane (BMP) and their properties: CodePoint value, character category and its relevance in text, Latin transliteration, corresponding HTML entity.
2. CP database with code page encodings supported by EuroConv, and their properties: CP identifier, standard and alternative name, authoritative URL where it was defined, translation table.

Header also defines some constants and structures which are common for both DOS and Windows variant.

         EUROASM Unicode=Off
eurocond PROGRAM Format=MZ, Model=Compact, Width=16, Entry=DosMain
          ; Includes macro libraries from maclib directory:
         INCLUDEHEAD1 doss.htm, dosapi.htm, doscall.htm,status16.htm, cpuext.htm, cpuext16.htm, string16.htm
; Desired segment map :
[DATA]   SEGMENT PURPOSE=DATA  ; Static data segment.
  [CodePoint] ; Array of WORDs  with codepoint value of character.
  [Relevance] ; Array of BYTEs  with signed values of probability that this char.category appears in text.
  [Translit]  ; Array of DWORDs with transliterated ASCII characters, NUL padded.
  [EntVal4]   ; array of WORDs  with codepoint values of HTML entities which have 1..4 characters.
  [EntName4]  ; Array of DWORDs with HTML entities, which have 1..4 characters. NUL padded.
  [EntVal8]   ; Array of WORDs  with codepoint values of HTML entities which have 5..8 characters.
  [EntName8A] ; Array of DWORDs with first four characters of entities which have 5..8 characters.
  [EntName8B] ; Array of DWORDs with fifth to eighth characters of entities which have 5..8 characters. NUL padded.
  [CPid]      ; Array of WORDs  with codepage identifier in MS assignment.
  [CPname]    ; Array of WORDs  with offsets of CP display name in [CPinfo] section.
  [CPrem]     ; Array of WORDs  with offsets of CP remark in [CPinfo] section.
  [CPurl]     ; Array of WORDs  with offsets of CP URL in [CPinfo] section.
  [CPtable]   ; Array of WORDS  with offsets of 8bit translation tables in [CPtt] section.
  [CPinfo]    ; Zero-terminated byte strings with codepage names, remarks, URLs.
  [CPtt]      ; Translation tables of OEM/ANSI encodings. Each table has 128 WORDs.
[CODE]   SEGMENT PURPOSE=CODE  ; Program code segment.
[INPUT]  SEGMENT PURPOSE=BSS   ; Input file read area.
 D DataBlockSize * BYTE        ; Reserve 48 KB of uninitialized space.
[OUTPUT] SEGMENT PURPOSE=BSS   ; Output file write area.
 D DataBlockSize * BYTE        ; Reserve 48 KB of uninitialized space.
[STACK]  SEGMENT PURPOSE=STACK ; Machine stack.
 D DosStackSize  * WORD        ; Reserve 16 KB of uninitialized space.
[DATA]                         ; Switch back to data segment.
         HEAD ; The following part of DosHeader will be included to Windows version of EuroConvertor, too.
; Constants:
%Version      %SET 20190125 ; It will be displayed by euroconv.exe /? or euroconv.exe --help.
%Signature    %SET TableDir ; 8 characters for TABLEDIR identification.
Replacement   EQU 0xFFFD ; Codepoint of unsupported character.
DataBlockSize EQU 48K    ; I/O file operation block size. Max.64K-4, DWORD aligned.
DosStackSize  EQU  8K    ; Stack size reserved for DOS version. WORD aligned.

; Boolean encoding property names used as flags in InpEncSt and OutEncSt:
encStTransl  = 0x0000 ; Characters unsupported by output encoding are transliterated to visually similar ASCII (default).
encStIgn     = 0x0001 ; Characters unsupported by output encoding are ignored (omitted from output).
encStQm      = 0x0002 ; Characters unsupported by output encoding are replaced with question mark.
encStHtm     = 0x0004 ; HTML entities for UCP>127 are detected on input and encoded on output if not defined in CP.
encStHtml    = 0x0008 ; All HTML entities are detected on input and encoded on output if not defined in CP.
encStOem     = 0x0010 ; System default  OEM encoding.
encStAnsi    = 0x0020 ; System default ANSI encoding.
encStAscii   = 0x0040 ; CodePage is ASCII (7bit).
encStUtf     = 0x0080 ; CodePage is Unicode (UTF).
encStUtf8    = 0x0100 ; CodePage is UTF-8.
encStUtf16   = 0x0200 ; CodePage is UTF-16.
encStUtf32   = 0x0400 ; CodePage is UTF-32.
encStLe      = 0x0800 ; CodePage is Little Endian encoded.
encStBe      = 0x1000 ; CodePage is Big Endian encoded.
encStBom     = 0x2000 ; CodePage has BOM.
encStAuto    = 0x4000 ; Autodetection of enconding was requested.
encStEnc     = 0x8000 ; Display the list of all supported encodings.

; Character categories and their assigned relevance used for autodetection of input encoding:
Bm = +32    ; Byte order mark (FEFF).
Cc = -8     ; Other, control
Cf = -4     ; Other, format
Co = -6     ; Other, Private Use
Fm = +4     ; Format control (LF,CR,TAB,space}
Ll = +8     ; Letter, lowercase
Lm = -2     ; Letter, modifier
Lo = +1     ; Letter, other
Lu = +7     ; Letter, uppercase
Mn = -6     ; Mark, nonspacing
Nd = +4     ; Number, decimal digit
No = +2     ; Number, other
Pd = +1     ; Punctuation, dash
Pe = +1     ; Punctuation, close
Pf = +1     ; Punctuation, final quote
Pi = +1     ; Punctuation, initial quote
Po = +1     ; Punctuation, other
Ps = +1     ; Punctuation, open
Sc = +1     ; Symbol, currency
Sk = -5     ; Symbol, modifier
Sm = -4     ; Symbol, math
So = -5     ; Symbol, other
Zs = +2     ; Separator, space
?? = -32    ; Not a valid character.

TABLEDIR  STRUC ; Section directory keeps addresses of database sections.
.Signature  D 8*B ; Random text used for TableDir identification.
.CodePoint  D W ; Offset of WORD  array in section [CodePoint].
.Relevance  D W ; Offset of BYTE  array in section [Relevance].
.Translit   D W ; Offset of DWORD array in section [Translit].
.CodePoints D W ; The number of supported codepoints, i.e. the length of previous arrays.
.EntVal4    D W ; Offset of WORD  array in section [EntVal4].
.EntName4   D W ; Offset of DWORD array in section [EntName4].
.Entities4  D W ; The number of supported HTML entities with 1..4 characters.
.EntVal8    D W ; Offset of WORD  array in section [EntVal8].
.EntName8A  D W ; Offset of DWORD array in section [EntNamw8A].
.EntName8B  D W ; Offset of DWORD array in section [EntNamw8B].
.Entities8  D W ; The number of supported HTML entities with 5..8 characters.
.CPid       D W ; Offset of WORD  array in section [CPid].
.CPname     D W ; Offset of WORD  array in section [CPname].
.CPrem      D W ; Offset of WORD  array in section [CPrem].
.CPurl      D W ; Offset of WORD  array in section [CPurl].
.CPtable    D W ; Offset of WORD  array in section [CPtable].
.CPinfo     D W ; Offset of ASCIIZ strings in section [CPinfo].
.CPtt       D W ; Offset of 128*WORD blocks in section [CPtt].
.CodePages  D W ; The number of supported encodings, i.e. the length of CP* arrays.
           ENDSTRUC TABLEDIR
          ENDHEAD
Tables

Database of supported Unicode characters, their properties and conversion tables is located in sections of segment [DATA].

Data are stored as synchronized arrays of bytes, words or dwords, each array is in its own section. Such arrangement allows to search for an item with single instruction REPNE SCAS and retrieve other properties from the same line of their synchronized arrays.

Following arrays are mutually synchronized:
CodePoint with Relevance and Translit (their length is [TableDir.CodePoints]).
EntVal4 with EntName4 (their length is [TableDir.Entities4]).
EntVal8 with EntName8A and EntName8B (their length is [TableDir.Entities8]).
CPid with CPname, CPrem, CPurl,CPtable (their length is [TableDir.CodePages]).

At the very beginning of [DATA] segment resides a structured variable TableDir which specifies addresses of database sections. The sections are filled at assembly time by macros CP and UCP.

[DATA]
TableDir DS TABLEDIR,              \ Its members are defined statically at asm-time:
 .Signature= "%Signature",         \ Random text used for TableDir identification.
 .CodePoint= SECTION# [CodePoint], \ Offset of WORD  array in section [CodePoint].
 .Relevance= SECTION# [Relevance], \ Offset of BYTE  array in section [Relevance].
 .Translit=  SECTION# [Translit] , \ Offset of DWORD array in section [Translit].
 .CodePoints=SIZE# [CodePoint] /2, \ The number of supported codepoints, i.e. the length of previous arrays.
 .EntVal4=   SECTION# [EntVal4]  , \ Offset of WORD  array in section [EntVal4].
 .EntName4=  SECTION# [EntName4] , \ Offset of DWORD array in section [EntName4].
 .Entities4= SIZE# [EntVal4] / 2 , \ The number of supported HTML entities with 1..4 characters.
 .EntVal8=   SECTION# [EntVal8]  , \ Offset of WORD  array in section [EntVal8].
 .EntName8A= SECTION# [EntName8A], \ Offset of DWORD array in section [EntNamw8A].
 .EntName8B= SECTION# [EntName8B], \ Offset of DWORD array in section [EntNamw8B].
 .Entities8= SIZE# [EntVal8] / 2 , \ The number of supported HTML entities with 5..8 characters.
 .CPid=      SECTION# [CPid]     , \ Offset of WORD  array in section [CPid].
 .CPname=    SECTION# [CPname]   , \ Offset of WORD  array in section [CPname](offsets in CPinfo).
 .CPrem=     SECTION# [CPrem]    , \ Offset of WORD  array in section [CPrem] (offsets in CPinfo).
 .CPurl=     SECTION# [CPurl]    , \ Offset of WORD  array in section [CPurl] (offsets in CPinfo).
 .CPtable=   SECTION# [CPtable]  , \ Offset of WORD  array in section [CPtable] (offsets in CPtt).
 .CPinfo=    SECTION# [CPinfo]   , \ Offset of ASCIIZ strings with name, rem and url in section [CPinfo].
 .CPtt=      SECTION# [CPtt]     , \ Offset of 256*BYTE blocks in section [CPtt].
 .CodePages= SIZE# [CPid] / 2,     \ The number of supported encodings, i.e. the length of CP* arrays.
;
UCP CodePoint, Relevance, Translit, Entity
Macro UCP populates data sections which define Unicode characters, their codepoint and other properties at assembly time.
Input
CodePoint is ordinal number of character in Unicode table   as 4 hexadecimal digits.
Relevance is two-letter character category which has assigned value that defines probability that the character appears in ordinary texts.
Translit is a string of 0..4 ASCII characters used for transliteration of characters which are not defined in output encoding. Latin characters are transliterated by removing diacritic sign, characters from other alphabets are converted to phonetically or visually similar Latin ASCII characters.
Entity is HTML entity assigned to the character (without the leading ampersand & and without the trailing semicolon ;.
UCP %MACRO CodePoint, Relevance, Translit, Entity
 [CodePoint]
   DW 0x%CodePoint
 [Relevance]
   DB %Relevance
 [Translit]
   DD %Translit + 0
%EntSize %SETS %Entity
%IF %EntSize <= 4 && %EntSize >= 1
 [EntVal4]
   DW 0x%CodePoint
 [EntName4]
   DD "%Entity"
%ENDIF
%IF %EntSize <=8 && %EntSize >=5
 [EntVal8]
   DW 0x%CodePoint
 [EntName8A]
   DD "%Entity[1..4]"
 [EntName8B]
   DD "%Entity[5..8]"
%ENDIF
%IF %EntSize > 8
  %ERROR HTML entity %Entity is too long.
%ENDIF
[DATA]
   %ENDMACRO UCP
UniCodePoints
This division defines data of Unicode codepoints supported by EuroConv.
All valid Unicode characters including Asian languages and emojis are convertible between Unicode encodings (UTF) but the following table defines only those characters, which are defined in some of supported 8bit encoding, and also characters which have defined a named HTML entity.
The data are tossed by macro UCP into sections of segment [DATA] at assembly time.
Documentation
www.fileformat.info/info/unicode/char
;     ╔codepoint value
;     ║    ╔character category - relevance
;     ║    ║     ╔transliteration to ASCII
;     ║    ║     ║       ╔HTML entity
UCP 0000, Cc,   ' ',               ;  control NUL
UCP 0001, Cc,   ' ',               ;  control SOH
UCP 0002, Cc,   ' ',               ;  control STX
UCP 0003, Cc,   ' ',               ;  control ETX
UCP 0004, Cc,   ' ',               ;  control EOT
UCP 0005, Cc,   ' ',               ;  control ENQ
UCP 0006, Cc,   ' ',               ;  control ACK
UCP 0007, Cc,   ' ',               ;  control BEL
UCP 0008, Cc,   ' ',               ;  control BS
UCP 0009, Fm,   ' ',               ;   control HT
UCP 000A, Fm,   ' ',               ;   control LF
UCP 000B, Cc,   ' ',               ;   control VT
UCP 000C, Cc,   ' ',               ;   control FF
UCP 000D, Fm,   ' ',               ;   control CR
UCP 000E, Cc,   ' ',               ;  control SO
UCP 000F, Cc,   ' ',               ;  control SI
UCP 0010, Cc,   ' ',               ;  control DLE
UCP 0011, Cc,   ' ',               ;  control DC1
UCP 0012, Cc,   ' ',               ;  control DC2
UCP 0013, Cc,   ' ',               ;  control DC3
UCP 0014, Cc,   ' ',               ;  control DC4
UCP 0015, Cc,   ' ',               ;  control NAK
UCP 0016, Cc,   ' ',               ;  control SYN
UCP 0017, Cc,   ' ',               ;  control ETB
UCP 0018, Cc,   ' ',               ;  control CAN
UCP 0019, Cc,   ' ',               ;  control EM
UCP 001A, Cc,   ' ',               ;  control SUB
UCP 001B, Cc,   ' ',               ;  control ESC
UCP 001C, Cc,   ' ',               ;  control FS
UCP 001D, Cc,   ' ',               ;  control GS
UCP 001E, Cc,   ' ',               ;  control RS
UCP 001F, Cc,   ' ',               ;  control US
UCP 0020, Fm,   ' ',               ;   SPACE
UCP 0021, Po,   '!',               ; ! EXCLAMATION MARK
UCP 0022, Po,   '"',   quot        ; " QUOTATION MARK
UCP 0023, Po,   '#',               ; # NUMBER SIGN
UCP 0024, Sc,   '$',               ; $ DOLLAR SIGN
UCP 0025, Po,   '%',               ; % PERCENT SIGN
UCP 0026, Po,   '&',   amp         ; & AMPERSAND
UCP 0027, Po,   "'",               ; ' APOSTROPHE
UCP 0028, Ps,   '(',               ; ( LEFT PARENTHESIS
UCP 0029, Pe,   ')',               ; ) RIGHT PARENTHESIS
UCP 002A, Po,   '*',               ; * ASTERISK
UCP 002B, Sm,   '+',               ; + PLUS SIGN
UCP 002C, Po,   ',',               ; , COMMA
UCP 002D, Pd,   '-',               ; - HYPHEN-MINUS
UCP 002E, Po,   '.',               ; . FULL STOP
UCP 002F, Po,   '/',               ; / SOLIDUS
UCP 0030, Nd,   '0',               ; 0 DIGIT ZERO
UCP 0031, Nd,   '1',               ; 1 DIGIT ONE
UCP 0032, Nd,   '2',               ; 2 DIGIT TWO
UCP 0033, Nd,   '3',               ; 3 DIGIT THREE
UCP 0034, Nd,   '4',               ; 4 DIGIT FOUR
UCP 0035, Nd,   '5',               ; 5 DIGIT FIVE
UCP 0036, Nd,   '6',               ; 6 DIGIT SIX
UCP 0037, Nd,   '7',               ; 7 DIGIT SEVEN
UCP 0038, Nd,   '8',               ; 8 DIGIT EIGHT
UCP 0039, Nd,   '9',               ; 9 DIGIT NINE
UCP 003A, Po,   ':',               ; : COLON
UCP 003B, Po,   ';',               ; ; SEMICOLON
UCP 003C, Sm,   '<',   lt          ; < LESS-THAN SIGN
UCP 003D, Sm,   '=',               ; = EQUALS SIGN
UCP 003E, Sm,   '>',   gt          ; > GREATER-THAN SIGN
UCP 003F, Po,   '?',               ; ? QUESTION MARK
UCP 0040, Po,   '@',               ; @ COMMERCIAL AT
UCP 0041, Lu,   'A',               ; A LATIN CAPITAL LETTER A
UCP 0042, Lu,   'B',               ; B LATIN CAPITAL LETTER B
UCP 0043, Lu,   'C',               ; C LATIN CAPITAL LETTER C
UCP 0044, Lu,   'D',               ; D LATIN CAPITAL LETTER D
UCP 0045, Lu,   'E',               ; E LATIN CAPITAL LETTER E
UCP 0046, Lu,   'F',               ; F LATIN CAPITAL LETTER F
UCP 0047, Lu,   'G',               ; G LATIN CAPITAL LETTER G
UCP 0048, Lu,   'H',               ; H LATIN CAPITAL LETTER H
UCP 0049, Lu,   'I',               ; I LATIN CAPITAL LETTER I
UCP 004A, Lu,   'J',               ; J LATIN CAPITAL LETTER J
UCP 004B, Lu,   'K',               ; K LATIN CAPITAL LETTER K
UCP 004C, Lu,   'L',               ; L LATIN CAPITAL LETTER L
UCP 004D, Lu,   'M',               ; M LATIN CAPITAL LETTER M
UCP 004E, Lu,   'N',               ; N LATIN CAPITAL LETTER N
UCP 004F, Lu,   'O',               ; O LATIN CAPITAL LETTER O
UCP 0050, Lu,   'P',               ; P LATIN CAPITAL LETTER P
UCP 0051, Lu,   'Q',               ; Q LATIN CAPITAL LETTER Q
UCP 0052, Lu,   'R',               ; R LATIN CAPITAL LETTER R
UCP 0053, Lu,   'S',               ; S LATIN CAPITAL LETTER S
UCP 0054, Lu,   'T',               ; T LATIN CAPITAL LETTER T
UCP 0055, Lu,   'U',               ; U LATIN CAPITAL LETTER U
UCP 0056, Lu,   'V',               ; V LATIN CAPITAL LETTER V
UCP 0057, Lu,   'W',               ; W LATIN CAPITAL LETTER W
UCP 0058, Lu,   'X',               ; X LATIN CAPITAL LETTER X
UCP 0059, Lu,   'Y',               ; Y LATIN CAPITAL LETTER Y
UCP 005A, Lu,   'Z',               ; Z LATIN CAPITAL LETTER Z
UCP 005B, Ps,   '[',               ; [ LEFT SQUARE BRACKET
UCP 005C, Po,   '\',               ; \ REVERSE SOLIDUS
UCP 005D, Pe,   ']',               ; ] RIGHT SQUARE BRACKET
UCP 005E, Sk,   '^',               ; ^ CIRCUMFLEX ACCENT
UCP 005F, Pe,   '_',               ; _ LOW LINE
UCP 0060, Sk,   '`',               ; ` GRAVE ACCENT
UCP 0061, Ll,   'a',               ; a LATIN SMALL LETTER A
UCP 0062, Ll,   'b',               ; b LATIN SMALL LETTER B
UCP 0063, Ll,   'c',               ; c LATIN SMALL LETTER C
UCP 0064, Ll,   'd',               ; d LATIN SMALL LETTER D
UCP 0065, Ll,   'e',               ; e LATIN SMALL LETTER E
UCP 0066, Ll,   'f',               ; f LATIN SMALL LETTER F
UCP 0067, Ll,   'g',               ; g LATIN SMALL LETTER G
UCP 0068, Ll,   'h',               ; h LATIN SMALL LETTER H
UCP 0069, Ll,   'i',               ; i LATIN SMALL LETTER I
UCP 006A, Ll,   'j',               ; j LATIN SMALL LETTER J
UCP 006B, Ll,   'k',               ; k LATIN SMALL LETTER K
UCP 006C, Ll,   'l',               ; l LATIN SMALL LETTER L
UCP 006D, Ll,   'm',               ; m LATIN SMALL LETTER M
UCP 006E, Ll,   'n',               ; n LATIN SMALL LETTER N
UCP 006F, Ll,   'o',               ; o LATIN SMALL LETTER O
UCP 0070, Ll,   'p',               ; p LATIN SMALL LETTER P
UCP 0071, Ll,   'q',               ; q LATIN SMALL LETTER Q
UCP 0072, Ll,   'r',               ; r LATIN SMALL LETTER R
UCP 0073, Ll,   's',               ; s LATIN SMALL LETTER S
UCP 0074, Ll,   't',               ; t LATIN SMALL LETTER T
UCP 0075, Ll,   'u',               ; u LATIN SMALL LETTER U
UCP 0076, Ll,   'v',               ; v LATIN SMALL LETTER V
UCP 0077, Ll,   'w',               ; w LATIN SMALL LETTER W
UCP 0078, Ll,   'x',               ; x LATIN SMALL LETTER X
UCP 0079, Ll,   'y',               ; y LATIN SMALL LETTER Y
UCP 007A, Ll,   'z',               ; z LATIN SMALL LETTER Z
UCP 007B, Ps,   '{',               ; { LEFT CURLY BRACKET
UCP 007C, Sm,   '|',               ; | VERTICAL LINE
UCP 007D, Pe,   '}',               ; } RIGHT CURLY BRACKET
UCP 007E, Sm,   '~',               ; ~ TILDE
UCP 007F, Cc,   ' ',               ;  control DEL
UCP 00A0, Zs,   ' ',   nbsp        ;   NO-BREAK SPACE
UCP 00A1, Po-2, '!',   iexcl       ; ¡ INVERTED EXCLAMATION MARK
UCP 00A2, Sc,   'c',   cent        ; ¢ CENT SIGN
UCP 00A3, Sc,   'L',   pound       ; £ POUND SIGN
UCP 00A4, Sc,   '$',   curren      ; ¤ CURRENCY SIGN
UCP 00A5, Sc,   'Y',   yen         ; ¥ YEN SIGN
UCP 00A6, So,   '|',   brvbar      ; ¦ BROKEN BAR
UCP 00A7, Po,   '#',   sect        ; § SECTION SIGN
UCP 00A8, Sk,   '',    uml         ; ¨ DIAERESIS
UCP 00A9, So,   '(c)', copy        ; © COPYRIGHT SIGN
UCP 00AA, Lo-8, 'f',   ordf        ; ª FEMININE ORDINAL INDICATOR
UCP 00AB, Pi,   '<<',  laquo       ; « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
UCP 00AC, Sm,   '_',   not         ; ¬ NOT SIGN
UCP 00AD, Cf,   '-',   shy         ; ­ SOFT HYPHEN
UCP 00AE, So,   '(R)', reg         ; ® REGISTERED SIGN
UCP 00AF, Sk,   '',    macr        ; ¯ MACRON
UCP 00B0, So,   '`',   deg         ; ° DEGREE SIGN
UCP 00B1, Sm,   '+',   plusmn      ; ± PLUS-MINUS SIGN
UCP 00B2, No-5, '2',   sup2        ; ² SUPERSCRIPT TWO
UCP 00B3, No-5, '3',   sup3        ; ³ SUPERSCRIPT THREE
UCP 00B4, Sk,   '',    acute       ; ´ ACUTE ACCENT
UCP 00B5, Ll-6, 'u',   micro       ; µ MICRO SIGN
UCP 00B6, Po,   'P',   para        ;  PILCROW SIGN
UCP 00B7, Po,   '.',   middot      ; · MIDDLE DOT
UCP 00B8, Sk,   '',    cedil       ; ¸ CEDILLA
UCP 00B9, No-5, '1',   sup1        ; ¹ SUPERSCRIPT ONE
UCP 00BA, Lo-8, 'm',   ordm        ; º MASCULINE ORDINAL INDICATOR
UCP 00BB, Pf,   '>>',  raquo       ; » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
UCP 00BB, Pf,   '>>',  raquo       ; » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
UCP 00BC, No-5, '1/4', frac14      ; ¼ VULGAR FRACTION ONE QUARTER
UCP 00BD, No-5, '1/2', frac12      ; ½ VULGAR FRACTION ONE HALF
UCP 00BE, No-5, '3/4', frac34      ; ¾ VULGAR FRACTION THREE QUARTERS
UCP 00BF, Po,   '?',   iquest      ; ¿ INVERTED QUESTION MARK
UCP 00C0, Lu,   'A',   Agrave      ; À LATIN CAPITAL LETTER A WITH GRAVE
UCP 00C1, Lu+2, 'A',   Aacute      ; Á LATIN CAPITAL LETTER A WITH ACUTE
UCP 00C2, Lu,   'A',   Acirc       ; Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
UCP 00C3, Lu-2, 'A',   Atilde      ; Ã LATIN CAPITAL LETTER A WITH TILDE
UCP 00C4, Lu,   'A',   Auml        ; Ä LATIN CAPITAL LETTER A WITH DIAERESIS
UCP 00C5, Lu-2, 'A',   Aring       ; Å LATIN CAPITAL LETTER A WITH RING ABOVE
UCP 00C6, Lu,   'AE',  AElig       ; Æ LATIN CAPITAL LETTER AE
UCP 00C7, Lu,   'C',   Ccedil      ; Ç LATIN CAPITAL LETTER C WITH CEDILLA
UCP 00C8, Lu,   'E',   Egrave      ; È LATIN CAPITAL LETTER E WITH GRAVE
UCP 00C9, Lu+2, 'E',   Eacute      ; É LATIN CAPITAL LETTER E WITH ACUTE
UCP 00CA, Lu-2, 'E',   Ecirc       ; Ê LATIN CAPITAL LETTER E WITH CIRCUMFLEX
UCP 00CB, Lu,   'E',   Euml        ; Ë LATIN CAPITAL LETTER E WITH DIAERESIS
UCP 00CC, Lu,   'I',   Igrave      ; Ì LATIN CAPITAL LETTER I WITH GRAVE
UCP 00CD, Lu+2, 'I',   Iacute      ; Í LATIN CAPITAL LETTER I WITH ACUTE
UCP 00CE, Lu,   'I',   Icirc       ; Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX
UCP 00CF, Lu-2, 'I',   Iuml        ; Ï LATIN CAPITAL LETTER I WITH DIAERESIS
UCP 00D0, Lu-2, 'D',   ETH         ; Ð LATIN CAPITAL LETTER ETH
UCP 00D1, Lu,   'N',   Ntilde      ; Ñ LATIN CAPITAL LETTER N WITH TILDE
UCP 00D2, Lu,   'O',   Ograve      ; Ò LATIN CAPITAL LETTER O WITH GRAVE
UCP 00D3, Lu+2, 'O',   Oacute      ; Ó LATIN CAPITAL LETTER O WITH ACUTE
UCP 00D4, Lu,   'O',   Ocirc       ; Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX
UCP 00D5, Lu,   'O',   Otilde      ; Õ LATIN CAPITAL LETTER O WITH TILDE
UCP 00D6, Lu,   'O',   Ouml        ; Ö LATIN CAPITAL LETTER O WITH DIAERESIS
UCP 00D7, Sm,   'x',   times       ; × MULTIPLICATION SIGN
UCP 00D8, Lu,   'O',   Oslash      ; Ø LATIN CAPITAL LETTER O WITH STROKE
UCP 00D9, Lu,   'U',   Ugrave      ; Ù LATIN CAPITAL LETTER U WITH GRAVE
UCP 00DA, Lu+2, 'U',   Uacute      ; Ú LATIN CAPITAL LETTER U WITH ACUTE
UCP 00DB, Lu,   'U',   Ucirc       ; Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX
UCP 00DC, Lu,   'U',   Uuml        ; Ü LATIN CAPITAL LETTER U WITH DIAERESIS
UCP 00DD, Lu+2, 'Y',   Yacute      ; Ý LATIN CAPITAL LETTER Y WITH ACUTE
UCP 00DE, Lu-2, 'TH',  THORN       ; Þ LATIN CAPITAL LETTER THORN
UCP 00DF, Ll,   'ss',  szlig       ; ß LATIN SMALL LETTER SHARP S
UCP 00E0, Ll,   'a',   agrave      ; à LATIN SMALL LETTER A WITH GRAVE
UCP 00E1, Ll+2, 'a',   aacute      ; á LATIN SMALL LETTER A WITH ACUTE
UCP 00E2, Ll-2, 'a',   acirc       ; â LATIN SMALL LETTER A WITH CIRCUMFLEX
UCP 00E3, Ll-2, 'a',   atilde      ; ã LATIN SMALL LETTER A WITH TILDE
UCP 00E4, Ll,   'a',   auml        ; ä LATIN SMALL LETTER A WITH DIAERESIS
UCP 00E5, Ll,   'a',   aring       ; å LATIN SMALL LETTER A WITH RING ABOVE
UCP 00E6, Ll,   'ae',  aelig       ; æ LATIN SMALL LETTER AE
UCP 00E7, Ll,   'c',   ccedil      ; ç LATIN SMALL LETTER C WITH CEDILLA
UCP 00E8, Ll,   'e',   egrave      ; è LATIN SMALL LETTER E WITH GRAVE
UCP 00E9, Ll+2, 'e',   eacute      ; é LATIN SMALL LETTER E WITH ACUTE
UCP 00EA, Ll,   'e',   ecirc       ; ê LATIN SMALL LETTER E WITH CIRCUMFLEX
UCP 00EB, Ll,   'e',   euml        ; ë LATIN SMALL LETTER E WITH DIAERESIS
UCP 00EC, Ll,   'i',   igrave      ; ì LATIN SMALL LETTER I WITH GRAVE
UCP 00ED, Ll+2, 'i',   iacute      ; í LATIN SMALL LETTER I WITH ACUTE
UCP 00EE, Ll,   'i',   icirc       ; î LATIN SMALL LETTER I WITH CIRCUMFLEX
UCP 00EF, Ll,   'i',   iuml        ; ï LATIN SMALL LETTER I WITH DIAERESIS
UCP 00F0, Ll-2, 'd',   eth         ; ð LATIN SMALL LETTER ETH
UCP 00F1, Ll,   'n',   ntilde      ; ñ LATIN SMALL LETTER N WITH TILDE
UCP 00F2, Ll,   'o',   ograve      ; ò LATIN SMALL LETTER O WITH GRAVE
UCP 00F3, Ll+2, 'o',   oacute      ; ó LATIN SMALL LETTER O WITH ACUTE
UCP 00F4, Ll,   'o',   ocirc       ; ô LATIN SMALL LETTER O WITH CIRCUMFLEX
UCP 00F5, Ll,   'o',   otilde      ; õ LATIN SMALL LETTER O WITH TILDE
UCP 00F6, Ll,   'o',   ouml        ; ö LATIN SMALL LETTER O WITH DIAERESIS
UCP 00F7, Sm,   '/',   divide      ; ÷ DIVISION SIGN
UCP 00F8, Ll,   'o',   oslash      ; ø LATIN SMALL LETTER O WITH STROKE
UCP 00F9, Ll,   'u',   ugrave      ; ù LATIN SMALL LETTER U WITH GRAVE
UCP 00FA, Ll+2, 'u',   uacute      ; ú LATIN SMALL LETTER U WITH ACUTE
UCP 00FB, Ll+2, 'u',   ucirc       ; û LATIN SMALL LETTER U WITH CIRCUMFLEX
UCP 00FC, Ll,   'u',   uuml        ; ü LATIN SMALL LETTER U WITH DIAERESIS
UCP 00FD, Ll+2, 'y',   yacute      ; ý LATIN SMALL LETTER Y WITH ACUTE
UCP 00FE, Ll-2, 'th',  thorn       ; þ LATIN SMALL LETTER THORN
UCP 00FF, Ll,   'y',   yuml        ; ÿ LATIN SMALL LETTER Y WITH DIAERESIS
UCP 0100, Lu,   'A',               ; Ā LATIN CAPITAL LETTER A WITH MACRON
UCP 0101, Ll,   'a',               ; ā LATIN SMALL LETTER A WITH MACRON
UCP 0102, Lu,   'A',               ; Ă LATIN CAPITAL LETTER A WITH BREVE
UCP 0103, Ll,   'a',               ; ă LATIN SMALL LETTER A WITH BREVE
UCP 0104, Lu+2, 'A',               ; Ą LATIN CAPITAL LETTER A WITH OGONEK
UCP 0105, Ll+2, 'a',               ; ą LATIN SMALL LETTER A WITH OGONEK
UCP 0106, Lu+2, 'C',               ; Ć LATIN CAPITAL LETTER C WITH ACUTE
UCP 0107, Ll+2, 'c',               ; ć LATIN SMALL LETTER C WITH ACUTE
UCP 0108, Lu,   'C',               ; Ĉ LATIN CAPITAL LETTER C WITH CIRCUMFLEX
UCP 0109, Ll,   'c',               ; ĉ LATIN SMALL LETTER C WITH CIRCUMFLEX
UCP 010A, Lu,   'C',               ; Ċ LATIN CAPITAL LETTER C WITH DOT ABOVE
UCP 010B, Ll,   'c',               ; ċ LATIN SMALL LETTER C WITH DOT ABOVE
UCP 010C, Lu+2, 'C',               ; Č LATIN CAPITAL LETTER C WITH CARON
UCP 010D, Ll+2, 'c',               ; č LATIN SMALL LETTER C WITH CARON
UCP 010E, Lu,   'D',               ; Ď LATIN CAPITAL LETTER D WITH CARON
UCP 010F, Ll,   'd',               ; ď LATIN SMALL LETTER D WITH CARON
UCP 0110, Lu,   'D',               ; Đ LATIN CAPITAL LETTER D WITH STROKE
UCP 0111, Ll,   'd',               ; đ LATIN SMALL LETTER D WITH STROKE
UCP 0112, Lu,   'E',               ; Ē LATIN CAPITAL LETTER E WITH MACRON
UCP 0113, Ll,   'e',               ; ē LATIN SMALL LETTER E WITH MACRON
UCP 0116, Lu,   'E',               ; Ė LATIN CAPITAL LETTER E WITH DOT ABOVE
UCP 0117, Ll,   'e',               ; ė LATIN SMALL LETTER E WITH DOT ABOVE
UCP 0118, Lu+2, 'E',               ; Ę LATIN CAPITAL LETTER E WITH OGONEK
UCP 0119, Ll+2, 'e',               ; ę LATIN SMALL LETTER E WITH OGONEK
UCP 011A, Lu+2, 'E',               ; Ě LATIN CAPITAL LETTER E WITH CARON
UCP 011B, Ll+2, 'e',               ; ě LATIN SMALL LETTER E WITH CARON
UCP 011C, Lu,   'G',               ; Ĝ LATIN CAPITAL LETTER G WITH CIRCUMFLEX
UCP 011D, Ll,   'g',               ; ĝ LATIN SMALL LETTER G WITH CIRCUMFLEX
UCP 011E, Lu,   'G',               ; Ğ LATIN CAPITAL LETTER G WITH BREVE
UCP 011F, Ll,   'g',               ; ğ LATIN SMALL LETTER G WITH BREVE
UCP 0120, Lu,   'G',               ; Ġ LATIN CAPITAL LETTER G WITH DOT ABOVE
UCP 0121, Ll,   'g',               ; ġ LATIN SMALL LETTER G WITH DOT ABOVE
UCP 0122, Lu,   'G',               ; Ģ LATIN CAPITAL LETTER G WITH CEDILLA
UCP 0123, Ll,   'g',               ; ģ LATIN SMALL LETTER G WITH CEDILLA
UCP 0124, Lu,   'H',               ; Ĥ LATIN CAPITAL LETTER H WITH CIRCUMFLEX
UCP 0125, Ll,   'h',               ; ĥ LATIN SMALL LETTER H WITH CIRCUMFLEX
UCP 0126, Lu,   'H',               ; Ħ LATIN CAPITAL LETTER H WITH STROKE
UCP 0127, Ll,   'h',               ; ħ LATIN SMALL LETTER H WITH STROKE
UCP 0128, Lu,   'I',               ; Ĩ LATIN CAPITAL LETTER I WITH TILDE
UCP 0129, Ll,   'i',               ; ĩ LATIN SMALL LETTER I WITH TILDE
UCP 012A, Lu,   'I',               ; Ī LATIN CAPITAL LETTER I WITH MACRON
UCP 012B, Ll,   'i',               ; ī LATIN SMALL LETTER I WITH MACRON
UCP 012E, Lu,   'I',               ; Į LATIN CAPITAL LETTER I WITH OGONEK
UCP 012F, Ll,   'i',               ; į LATIN SMALL LETTER I WITH OGONEK
UCP 0130, Lu,   'I',               ; İ LATIN CAPITAL LETTER I WITH DOT ABOVE
UCP 0131, Ll,   'i',               ; ı LATIN SMALL LETTER DOTLESS I
UCP 0134, Lu,   'J',               ; Ĵ LATIN CAPITAL LETTER J WITH CIRCUMFLEX
UCP 0135, Ll,   'j',               ; ĵ LATIN SMALL LETTER J WITH CIRCUMFLEX
UCP 0136, Lu,   'K',               ; Ķ LATIN CAPITAL LETTER K WITH CEDILLA
UCP 0137, Ll,   'k',               ; ķ LATIN SMALL LETTER K WITH CEDILLA
UCP 0138, Ll,   'k',               ; ĸ LATIN SMALL LETTER KRA
UCP 0139, Lu,   'L',               ; Ĺ LATIN CAPITAL LETTER L WITH ACUTE
UCP 013A, Ll,   'l',               ; ĺ LATIN SMALL LETTER L WITH ACUTE
UCP 013B, Lu,   'L',               ; Ļ LATIN CAPITAL LETTER L WITH CEDILLA
UCP 013C, Ll,   'l',               ; ļ LATIN SMALL LETTER L WITH CEDILLA
UCP 013D, Lu,   'L',               ; Ľ LATIN CAPITAL LETTER L WITH CARON
UCP 013E, Ll,   'l',               ; ľ LATIN SMALL LETTER L WITH CARON
UCP 0141, Lu+2, 'L',               ; Ł LATIN CAPITAL LETTER L WITH STROKE
UCP 0142, Ll+2, 'l',               ; ł LATIN SMALL LETTER L WITH STROKE
UCP 0143, Lu,   'N',               ; Ń LATIN CAPITAL LETTER N WITH ACUTE
UCP 0144, Ll,   'n',               ; ń LATIN SMALL LETTER N WITH ACUTE
UCP 0145, Lu,   'N',               ; Ņ LATIN CAPITAL LETTER N WITH CEDILLA
UCP 0146, Ll,   'n',               ; ņ LATIN SMALL LETTER N WITH CEDILLA
UCP 0147, Lu,   'N',               ; Ň LATIN CAPITAL LETTER N WITH CARON
UCP 0148, Ll,   'n',               ; ň LATIN SMALL LETTER N WITH CARON
UCP 0149, Ll,   'n',               ; ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
UCP 014A, Lu-2, 'N',               ; Ŋ LATIN CAPITAL LETTER ENG
UCP 014B, Ll-2, 'n',               ; ŋ LATIN SMALL LETTER ENG
UCP 014C, Lu,   'O',               ; Ō LATIN CAPITAL LETTER O WITH MACRON
UCP 014D, Ll,   'o',               ; ō LATIN SMALL LETTER O WITH MACRON
UCP 0150, Lu+2, 'O',               ; Ő LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
UCP 0151, Ll+2, 'o',               ; ő LATIN SMALL LETTER O WITH DOUBLE ACUTE
UCP 0152, Lu,   'OE',  OElig       ; ΠLATIN CAPITAL LIGATURE OE
UCP 0153, Ll,   'oe',  oelig       ; œ LATIN SMALL LIGATURE OE
UCP 0154, Lu,   'R',               ; Ŕ LATIN CAPITAL LETTER R WITH ACUTE
UCP 0155, Ll,   'r',               ; ŕ LATIN SMALL LETTER R WITH ACUTE
UCP 0156, Lu,   'R',               ; Ŗ LATIN CAPITAL LETTER R WITH CEDILLA
UCP 0157, Ll,   'r',               ; ŗ LATIN SMALL LETTER R WITH CEDILLA
UCP 0158, Lu+2, 'R',               ; Ř LATIN CAPITAL LETTER R WITH CARON
UCP 0159, Ll+2, 'r',               ; ř LATIN SMALL LETTER R WITH CARON
UCP 015A, Lu,   'S',               ; Ś LATIN CAPITAL LETTER S WITH ACUTE
UCP 015B, Ll,   's',               ; ś LATIN SMALL LETTER S WITH ACUTE
UCP 015C, Lu,   'S',               ; Ŝ LATIN CAPITAL LETTER S WITH CIRCUMFLEX
UCP 015D, Ll,   's',               ; ŝ LATIN SMALL LETTER S WITH CIRCUMFLEX
UCP 015E, Lu,   'S',               ; Ş LATIN CAPITAL LETTER S WITH CEDILLA
UCP 015F, Ll,   's',               ; ş LATIN SMALL LETTER S WITH CEDILLA
UCP 0160, Lu+2, 'S',   Scaron      ; Š LATIN CAPITAL LETTER S WITH CARON
UCP 0161, Ll+2, 's',   scaron      ; š LATIN SMALL LETTER S WITH CARON
UCP 0162, Lu,   'T',               ; Ţ LATIN CAPITAL LETTER T WITH CEDILLA
UCP 0163, Ll,   't',               ; ţ LATIN SMALL LETTER T WITH CEDILLA
UCP 0164, Lu,   'T',               ; Ť LATIN CAPITAL LETTER T WITH CARON
UCP 0165, Ll,   't',               ; ť LATIN SMALL LETTER T WITH CARON
UCP 0166, Lu,   'T',               ; Ŧ LATIN CAPITAL LETTER T WITH STROKE
UCP 0167, Ll,   't',               ; ŧ LATIN SMALL LETTER T WITH STROKE
UCP 0168, Lu,   'U',               ; Ũ LATIN CAPITAL LETTER U WITH TILDE
UCP 0169, Ll,   'u',               ; ũ LATIN SMALL LETTER U WITH TILDE
UCP 016A, Lu,   'U',               ; Ū LATIN CAPITAL LETTER U WITH MACRON
UCP 016B, Ll,   'u',               ; ū LATIN SMALL LETTER U WITH MACRON
UCP 016C, Lu,   'U',               ; Ŭ LATIN CAPITAL LETTER U WITH BREVE
UCP 016D, Ll,   'u',               ; ŭ LATIN SMALL LETTER U WITH BREVE
UCP 016E, Lu+2, 'U',               ; Ů LATIN CAPITAL LETTER U WITH RING ABOVE
UCP 016F, Ll+2, 'u',               ; ů LATIN SMALL LETTER U WITH RING ABOVE
UCP 0170, Lu+2, 'U',               ; Ű LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
UCP 0171, Ll+2, 'u',               ; ű LATIN SMALL LETTER U WITH DOUBLE ACUTE
UCP 0172, Lu,   'U',               ; Ų LATIN CAPITAL LETTER U WITH OGONEK
UCP 0173, Ll,   'u',               ; ų LATIN SMALL LETTER U WITH OGONEK
UCP 0174, Lu,   'W',               ; Ŵ LATIN CAPITAL LETTER W WITH CIRCUMFLEX
UCP 0175, Ll,   'w',               ; ŵ LATIN SMALL LETTER W WITH CIRCUMFLEX
UCP 0176, Lu,   'Y',               ; Ŷ LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
UCP 0177, Ll,   'y',               ; ŷ LATIN SMALL LETTER Y WITH CIRCUMFLEX
UCP 0178, Lu,   'Y',   Yuml        ; Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS
UCP 0179, Lu,   'Z',               ; Ź LATIN CAPITAL LETTER Z WITH ACUTE
UCP 017A, Ll,   'z',               ; ź LATIN SMALL LETTER Z WITH ACUTE
UCP 017B, Lu+2, 'Z',               ; Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE
UCP 017C, Ll+2, 'z',               ; ż LATIN SMALL LETTER Z WITH DOT ABOVE
UCP 017D, Lu+2, 'Z',               ; Ž LATIN CAPITAL LETTER Z WITH CARON
UCP 017E, Ll+2, 'z',               ; ž LATIN SMALL LETTER Z WITH CARON
UCP 017F, Ll,   's',               ; ſ LATIN SMALL LETTER LONG S
UCP 0192, Ll-2, 'f',   fnof        ; ƒ LATIN SMALL LETTER F WITH HOOK
UCP 01A0, Lu,   'O',               ; Ơ LATIN CAPITAL LETTER O WITH HORN
UCP 01A1, Ll,   'U',               ; ơ LATIN SMALL LETTER O WITH HORN
UCP 01AF, Lu,   'u',               ; Ư LATIN CAPITAL LETTER U WITH HORN
UCP 01B0, Ll,   'S',               ; ư LATIN SMALL LETTER U WITH HORN
UCP 0218, Lu,   's',               ; Ș LATIN CAPITAL LETTER S WITH COMMA BELOW
UCP 0219, Ll,   'T',               ; ș LATIN SMALL LETTER S WITH COMMA BELOW
UCP 021A, Lu,   't',               ; Ț LATIN CAPITAL LETTER T WITH COMMA BELOW
UCP 021B, Ll,   '',                ; ț LATIN SMALL LETTER T WITH COMMA BELOW
UCP 027C, Ll,   'r',               ; ɼ LATIN SMALL LETTER R WITH LONG LEG
UCP 02C6, Lm,   '',    circ        ; ˆ MODIFIER LETTER CIRCUMFLEX ACCENT
UCP 02C7, Lm,   '',                ; ˇ CARON
UCP 02CB, Lm,   '`',               ; ˋ MODIFIER LETTER GRAVE ACCENT
UCP 02D8, Sk,   '',                ; ˘ BREVE
UCP 02D9, Sk,   '',                ; ˙ DOT ABOVE
UCP 02DA, Sk,   '',                ; ˚ RING ABOVE
UCP 02DB, Sk,   '',                ; ˛ OGONEK
UCP 02DC, Sk,   '',    tilde       ; ˜ SMALL TILDE
UCP 02DD, Sk,   '',                ; ˝ DOUBLE ACUTE ACCENT
UCP 0300, Mn,   '',                ; ̀ COMBINING GRAVE ACCENT
UCP 0301, Mn,   '',                ; ́ COMBINING ACUTE ACCENT
UCP 0303, Mn,   '',                ; ̃ COMBINING TILDE
UCP 0309, Mn,   '',                ; ̉ COMBINING HOOK ABOVE
UCP 0323, Mn,   '',                ; ̣ COMBINING DOT BELOW
UCP 037A, Lm,   '',                ; ͺ GREEK YPOGEGRAMMENI
UCP 0384, Sk,   '',                ; ΄ GREEK TONOS
UCP 0385, Sk,   ' ',               ; ΅ GREEK DIALYTIKA TONOS
UCP 0386, Lu,   'A',               ; Ά GREEK CAPITAL LETTER ALPHA WITH TONOS
UCP 0387, Po,   '',                ; · GREEK ANO TELEIA
UCP 0388, Lu,   'E',               ; Έ GREEK CAPITAL LETTER EPSILON WITH TONOS
UCP 0389, Lu,   'H',               ; Ή GREEK CAPITAL LETTER ETA WITH TONOS
UCP 038A, Lu,   'I',               ; Ί GREEK CAPITAL LETTER IOTA WITH TONOS
UCP 038C, Lu,   'O',               ; Ό GREEK CAPITAL LETTER OMICRON WITH TONOS
UCP 038E, Lu,   'Y',               ; Ύ GREEK CAPITAL LETTER UPSILON WITH TONOS
UCP 038F, Lu,   'O',               ; Ώ GREEK CAPITAL LETTER OMEGA WITH TONOS
UCP 0390, Ll,   'i',               ; ΐ GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
UCP 0391, Lu,   'A',   Alpha       ; Α GREEK CAPITAL LETTER ALPHA
UCP 0392, Lu,   'B',   Beta        ; Β GREEK CAPITAL LETTER BETA
UCP 0393, Lu,   'G',   Gamma       ; Γ GREEK CAPITAL LETTER GAMMA
UCP 0394, Lu,   'D',   Delta       ; Δ GREEK CAPITAL LETTER DELTA
UCP 0395, Lu,   'E',   Epsilon     ; Ε GREEK CAPITAL LETTER EPSILON
UCP 0396, Lu,   'Z',   Zeta        ; Ζ GREEK CAPITAL LETTER ZETA
UCP 0397, Lu,   'H',   Eta         ; Η GREEK CAPITAL LETTER ETA
UCP 0398, Lu,   'Th',  Theta       ; Θ GREEK CAPITAL LETTER THETA
UCP 0399, Lu,   'I',   Iota        ; Ι GREEK CAPITAL LETTER IOTA
UCP 039A, Lu,   'K',   Kappa       ; Κ GREEK CAPITAL LETTER KAPPA
UCP 039B, Lu,   'L',   Lambda      ; Λ GREEK CAPITAL LETTER LAMDA
UCP 039C, Lu,   'M',   Mu          ; Μ GREEK CAPITAL LETTER MU
UCP 039D, Lu,   'N',   Nu          ; Ν GREEK CAPITAL LETTER NU
UCP 039E, Lu,   'X',   Xi          ; Ξ GREEK CAPITAL LETTER XI
UCP 039F, Lu,   'O',   Omicron     ; Ο GREEK CAPITAL LETTER OMICRON
UCP 03A0, Lu,   'P',   Pi          ; Π GREEK CAPITAL LETTER PI
UCP 03A1, Lu,   'R',   Rho         ; Ρ GREEK CAPITAL LETTER RHO
UCP 03A3, Lu,   'S',   Sigma       ; Σ GREEK CAPITAL LETTER SIGMA
UCP 03A4, Lu,   'T',   Tau         ; Τ GREEK CAPITAL LETTER TAU
UCP 03A5, Lu,   'Y',   Upsilon     ; Υ GREEK CAPITAL LETTER UPSILON
UCP 03A6, Lu,   'F',   Phi         ; Φ GREEK CAPITAL LETTER PHI
UCP 03A7, Lu,   'Ch',  Chi         ; Χ GREEK CAPITAL LETTER CHI
UCP 03A8, Lu,   'Ps',  Psi         ; Ψ GREEK CAPITAL LETTER PSI
UCP 03A9, Lu,   'O',   Omega       ; Ω GREEK CAPITAL LETTER OMEGA
UCP 03AA, Lu,   'I',               ; Ϊ GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
UCP 03AB, Lu,   'Y',               ; Ϋ GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
UCP 03AC, Ll,   'a',               ; ά GREEK SMALL LETTER ALPHA WITH TONOS
UCP 03AD, Ll,   'e',               ; έ GREEK SMALL LETTER EPSILON WITH TONOS
UCP 03AE, Ll,   'h',               ; ή GREEK SMALL LETTER ETA WITH TONOS
UCP 03AF, Ll,   'i',               ; ί GREEK SMALL LETTER IOTA WITH TONOS
UCP 03B0, Ll,   'u',               ; ΰ GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
UCP 03B1, Ll,   'a',   alpha       ; α GREEK SMALL LETTER ALPHA
UCP 03B2, Ll,   'b',   beta        ; β GREEK SMALL LETTER BETA
UCP 03B3, Ll,   'g',   gamma       ; γ GREEK SMALL LETTER GAMMA
UCP 03B4, Ll,   'd',   delta       ; δ GREEK SMALL LETTER DELTA
UCP 03B5, Ll,   'e',   epsilon     ; ε GREEK SMALL LETTER EPSILON
UCP 03B6, Ll,   'z',   zeta        ; ζ GREEK SMALL LETTER ZETA
UCP 03B7, Ll,   'h',   eta         ; η GREEK SMALL LETTER ETA
UCP 03B8, Ll,   'th',  theta       ; θ GREEK SMALL LETTER THETA
UCP 03B9, Ll,   'i',   iota        ; ι GREEK SMALL LETTER IOTA
UCP 03BA, Ll,   'k',   kappa       ; κ GREEK SMALL LETTER KAPPA
UCP 03BB, Ll,   'l',   lambda      ; λ GREEK SMALL LETTER LAMDA
UCP 03BC, Ll,   'm',   mu          ; μ GREEK SMALL LETTER MU
UCP 03BD, Ll,   'n',   nu          ; ν GREEK SMALL LETTER NU
UCP 03BE, Ll,   'x',   xi          ; ξ GREEK SMALL LETTER XI
UCP 03BF, Ll,   'o',   omicron     ; ο GREEK SMALL LETTER OMICRON
UCP 03C0, Ll,   'p',   pi          ; π GREEK SMALL LETTER PI
UCP 03C1, Ll,   'r',   rho         ; ρ GREEK SMALL LETTER RHO
UCP 03C2, Ll,   's',   sigmaf      ; ς GREEK SMALL LETTER FINAL SIGMA
UCP 03C3, Ll,   's',   sigma       ; σ GREEK SMALL LETTER SIGMA
UCP 03C4, Ll,   't',   tau         ; τ GREEK SMALL LETTER TAU
UCP 03C5, Ll,   'u',   upsilon     ; υ GREEK SMALL LETTER UPSILON
UCP 03C6, Ll,   'f',   phi         ; φ GREEK SMALL LETTER PHI
UCP 03C7, Ll,   'ch',  chi         ; χ GREEK SMALL LETTER CHI
UCP 03C8, Ll,   'ps',  psi         ; ψ GREEK SMALL LETTER PSI
UCP 03C9, Ll,   'o',   omega       ; ω GREEK SMALL LETTER OMEGA
UCP 03CA, Ll,   'i',               ; ϊ GREEK SMALL LETTER IOTA WITH DIALYTIKA
UCP 03CB, Ll,   'u',               ; ϋ GREEK SMALL LETTER UPSILON WITH DIALYTIKA
UCP 03CC, Ll,   'o',               ; ό GREEK SMALL LETTER OMICRON WITH TONOS
UCP 03CD, Ll,   'u',               ; ύ GREEK SMALL LETTER UPSILON WITH TONOS
UCP 03CE, Ll,   'o',               ; ώ GREEK SMALL LETTER OMEGA WITH TONOS
UCP 03D1, Ll,   'th',  thetasym    ; ϑ GREEK THETA SYMBOL
UCP 03D2, Ll,   'u',   upsih       ; ϒ GREEK UPSILON WITH HOOK SYMBOL
UCP 03D6, Ll,   'p',   piv         ; ϖ GREEK PI SYMBOL
UCP 0401, Lu-1, 'Io',              ; Ё CYRILLIC CAPITAL LETTER IO
UCP 0402, Lu-1, 'Dj',              ; Ђ CYRILLIC CAPITAL LETTER DJE
UCP 0403, Lu-1, 'G',               ; Ѓ CYRILLIC CAPITAL LETTER GJE
UCP 0404, Lu,   'E',               ; Є CYRILLIC CAPITAL LETTER UKRAINIAN IE
UCP 0405, Lu-1, 'S',               ; Ѕ CYRILLIC CAPITAL LETTER DZE
UCP 0406, Lu-1, 'I',               ; І CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
UCP 0407, Lu-1, 'I',               ; Ї CYRILLIC CAPITAL LETTER YI
UCP 0408, Lu-1, 'J',               ; Ј CYRILLIC CAPITAL LETTER JE
UCP 0409, Lu-1, 'Lj',              ; Љ CYRILLIC CAPITAL LETTER LJE
UCP 040A, Lu-1, 'Nj',              ; Њ CYRILLIC CAPITAL LETTER NJE
UCP 040B, Lu-1, 'Cj',              ; Ћ CYRILLIC CAPITAL LETTER TSHE
UCP 040C, Lu-1, 'K',               ; Ќ CYRILLIC CAPITAL LETTER KJE
UCP 040D, Lu-1, 'I',               ; Ѝ CYRILLIC CAPITAL LETTER I WITH GRAVE
UCP 040E, Lu,   'Y',               ; Ў CYRILLIC CAPITAL LETTER SHORT U
UCP 040F, Lu-1, 'Dz',              ; Џ CYRILLIC CAPITAL LETTER DZHE
UCP 0410, Lu+2, 'A',               ; А CYRILLIC CAPITAL LETTER A
UCP 0411, Lu,   'B',               ; Б CYRILLIC CAPITAL LETTER BE
UCP 0412, Lu,   'V',               ; В CYRILLIC CAPITAL LETTER VE
UCP 0413, Lu,   'G',               ; Г CYRILLIC CAPITAL LETTER GHE
UCP 0414, Lu,   'D',               ; Д CYRILLIC CAPITAL LETTER DE
UCP 0415, Lu+2, 'E',               ; Е CYRILLIC CAPITAL LETTER IE
UCP 0416, Lu,   'Zh',              ; Ж CYRILLIC CAPITAL LETTER ZHE
UCP 0417, Lu,   'Z',               ; З CYRILLIC CAPITAL LETTER ZE
UCP 0418, Lu,   'I',               ; И CYRILLIC CAPITAL LETTER I
UCP 0419, Lu,   'J',               ; Й CYRILLIC CAPITAL LETTER SHORT I
UCP 041A, Lu,   'K',               ; К CYRILLIC CAPITAL LETTER KA
UCP 041B, Lu,   'L',               ; Л CYRILLIC CAPITAL LETTER EL
UCP 041C, Lu,   'M',               ; М CYRILLIC CAPITAL LETTER EM
UCP 041D, Lu,   'N',               ; Н CYRILLIC CAPITAL LETTER EN
UCP 041E, Lu+2, 'O',               ; О CYRILLIC CAPITAL LETTER O
UCP 041F, Lu,   'P',               ; П CYRILLIC CAPITAL LETTER PE
UCP 0420, Lu,   'R',               ; Р CYRILLIC CAPITAL LETTER ER
UCP 0421, Lu,   'C',               ; С CYRILLIC CAPITAL LETTER ES
UCP 0422, Lu,   'T',               ; Т CYRILLIC CAPITAL LETTER TE
UCP 0423, Lu,   'U',               ; У CYRILLIC CAPITAL LETTER U
UCP 0424, Lu,   'F',               ; Ф CYRILLIC CAPITAL LETTER EF
UCP 0425, Lu,   'Kh',              ; Х CYRILLIC CAPITAL LETTER HA
UCP 0426, Lu,   'C',               ; Ц CYRILLIC CAPITAL LETTER TSE
UCP 0427, Lu,   'Ch',              ; Ч CYRILLIC CAPITAL LETTER CHE
UCP 0428, Lu,   'Sh',              ; Ш CYRILLIC CAPITAL LETTER SHA
UCP 0429, Lu,   'Shch',            ; Щ CYRILLIC CAPITAL LETTER SHCHA
UCP 042A, Lu-1, "'",               ; Ъ CYRILLIC CAPITAL LETTER HARD SIGN
UCP 042B, Lu,   'Y',               ; Ы CYRILLIC CAPITAL LETTER YERU
UCP 042C, Lu,   '',                ; Ь CYRILLIC CAPITAL LETTER SOFT SIGN
UCP 042D, Lu,   'E',               ; Э CYRILLIC CAPITAL LETTER E
UCP 042E, Lu+1, 'Yu',              ; Ю CYRILLIC CAPITAL LETTER YU
UCP 042F, Lu+1, 'Ya',              ; Я CYRILLIC CAPITAL LETTER YA
UCP 0430, Ll+2, 'a',               ; а CYRILLIC SMALL LETTER A
UCP 0431, Ll,   'b',               ; б CYRILLIC SMALL LETTER BE
UCP 0432, Ll,   'v',               ; в CYRILLIC SMALL LETTER VE
UCP 0433, Ll,   'g',               ; г CYRILLIC SMALL LETTER GHE
UCP 0434, Ll,   'd',               ; д CYRILLIC SMALL LETTER DE
UCP 0435, Ll+2, 'e',               ; е CYRILLIC SMALL LETTER IE
UCP 0436, Ll,   'zh',              ; ж CYRILLIC SMALL LETTER ZHE
UCP 0437, Ll,   'z',               ; з CYRILLIC SMALL LETTER ZE
UCP 0438, Ll,   'i',               ; и CYRILLIC SMALL LETTER I
UCP 0439, Ll,   'j',               ; й CYRILLIC SMALL LETTER SHORT I
UCP 043A, Ll,   'k',               ; к CYRILLIC SMALL LETTER KA
UCP 043B, Ll,   'l',               ; л CYRILLIC SMALL LETTER EL
UCP 043C, Ll,   'm',               ; м CYRILLIC SMALL LETTER EM
UCP 043D, Ll,   'n',               ; н CYRILLIC SMALL LETTER EN
UCP 043E, Ll+2, 'o',               ; о CYRILLIC SMALL LETTER O
UCP 043F, Ll,   'p',               ; п CYRILLIC SMALL LETTER PE
UCP 0440, Ll,   'r',               ; р CYRILLIC SMALL LETTER ER
UCP 0441, Ll,   's',               ; с CYRILLIC SMALL LETTER ES
UCP 0442, Ll,   't',               ; т CYRILLIC SMALL LETTER TE
UCP 0443, Ll,   'u',               ; у CYRILLIC SMALL LETTER U
UCP 0444, Ll,   'f',               ; ф CYRILLIC SMALL LETTER EF
UCP 0445, Ll,   'kh',              ; х CYRILLIC SMALL LETTER HA
UCP 0446, Ll,   'c',               ; ц CYRILLIC SMALL LETTER TSE
UCP 0447, Ll,   'ch',              ; ч CYRILLIC SMALL LETTER CHE
UCP 0448, Ll,   'sh',              ; ш CYRILLIC SMALL LETTER SHA
UCP 0449, Ll,   'shch',            ; щ CYRILLIC SMALL LETTER SHCHA
UCP 044A, Ll-1, "'",               ; ъ CYRILLIC SMALL LETTER HARD SIGN
UCP 044B, Ll,   'y',               ; ы CYRILLIC SMALL LETTER YERU
UCP 044C, Ll,   '',                ; ь CYRILLIC SMALL LETTER SOFT SIGN
UCP 044D, Ll,   'e',               ; э CYRILLIC SMALL LETTER E
UCP 044E, Ll+1, 'yu',              ; ю CYRILLIC SMALL LETTER YU
UCP 044F, Ll+1, 'ya',              ; я CYRILLIC SMALL LETTER YA
UCP 0451, Ll,   'io',              ; ё CYRILLIC SMALL LETTER IO
UCP 0452, Ll,   'dj',              ; ђ CYRILLIC SMALL LETTER DJE
UCP 0453, Ll,   'g',               ; ѓ CYRILLIC SMALL LETTER GJE
UCP 0454, Ll,   'e',               ; є CYRILLIC SMALL LETTER UKRAINIAN IE
UCP 0455, Ll,   's',               ; ѕ CYRILLIC SMALL LETTER DZE
UCP 0456, Ll,   'i',               ; і CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
UCP 0457, Ll,   'i',               ; ї CYRILLIC SMALL LETTER YI
UCP 0458, Ll,   'j',               ; ј CYRILLIC SMALL LETTER JE
UCP 0459, Ll,   'lj',              ; љ CYRILLIC SMALL LETTER LJE
UCP 045A, Ll,   'nj',              ; њ CYRILLIC SMALL LETTER NJE
UCP 045B, Ll,   'cj',              ; ћ CYRILLIC SMALL LETTER TSHE
UCP 045C, Ll,   'k',               ; ќ CYRILLIC SMALL LETTER KJE
UCP 045D, Ll,   'i',               ; ѝ CYRILLIC SMALL LETTER I WITH GRAVE
UCP 045E, Ll,   'y',               ; ў CYRILLIC SMALL LETTER SHORT U
UCP 045F, Ll,   'dz',              ; џ CYRILLIC SMALL LETTER DZHE
UCP 0490, Lu-2, 'G',               ; Ґ CYRILLIC CAPITAL LETTER GHE WITH UPTURN
UCP 0491, Ll-2, 'g',               ; ґ CYRILLIC SMALL LETTER GHE WITH UPTURN
UCP 0492, Lu-2, 'G',               ; Ғ CYRILLIC CAPITAL LETTER GHE WITH STROKE
UCP 0493, Ll-2, 'g',               ; ғ CYRILLIC SMALL LETTER GHE WITH STROKE
UCP 049A, Lu-2, 'G',               ; Қ CYRILLIC CAPITAL LETTER KA WITH DESCENDER
UCP 049B, Ll-2, 'g',               ; қ CYRILLIC SMALL LETTER KA WITH DESCENDER
UCP 04B2, Lu-2, 'H',               ; Ҳ CYRILLIC CAPITAL LETTER HA WITH DESCENDER
UCP 04B3, Ll-2, 'h',               ; ҳ CYRILLIC SMALL LETTER HA WITH DESCENDER
UCP 04B6, Lu-2, 'Ch',              ; Ҷ CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
UCP 04B7, Ll-2, 'ch',              ; ҷ CYRILLIC SMALL LETTER CHE WITH DESCENDER
UCP 04E1, Ll-2, 'z',               ; ӡ CYRILLIC SMALL LETTER ABKHASIAN DZE
UCP 04E2, Lu-2, 'I',               ; Ӣ CYRILLIC CAPITAL LETTER I WITH MACRON
UCP 04EE, Lu-2, 'U',               ; Ӯ CYRILLIC CAPITAL LETTER U WITH MACRON
UCP 04EF, Ll-2, 'u',               ; ӯ CYRILLIC SMALL LETTER U WITH MACRON
UCP 05B0, Mn,   '',                ; ְ HEBREW POINT SHEVA
UCP 05B1, Mn,   '',                ; ֱ HEBREW POINT HATAF SEGOL
UCP 05B2, Mn,   '',                ; ֲ HEBREW POINT HATAF PATAH
UCP 05B3, Mn,   '',                ; ֳ HEBREW POINT HATAF QAMATS
UCP 05B4, Mn,   '',                ; ִ HEBREW POINT HIRIQ
UCP 05B5, Mn,   '',                ; ֵ HEBREW POINT TSERE
UCP 05B6, Mn,   '',                ; ֶ HEBREW POINT SEGOL
UCP 05B7, Mn,   '',                ; ַ HEBREW POINT PATAH
UCP 05B8, Mn,   '',                ; ָ HEBREW POINT QAMATS
UCP 05B9, Mn,   '',                ; ֹ HEBREW POINT HOLAM
UCP 05BA, Mn,   '',                ; ֺ HEBREW POINT HOLAM HASER FOR VAV
UCP 05BB, Mn,   '',                ; ֻ HEBREW POINT QUBUTS
UCP 05BC, Mn,   '',                ; ּ HEBREW POINT DAGESH OR MAPIQ
UCP 05BD, Mn,   '',                ; ֽ HEBREW POINT METEG
UCP 05BE, Pd,   '-',               ; ־ HEBREW PUNCTUATION MAQAF
UCP 05BF, Mn,   '',                ; ֿ HEBREW POINT RAFE
UCP 05C0, Po,   '|',               ; ׀ HEBREW PUNCTUATION PASEQ
UCP 05C1, Mn,   '',                ; ׁ HEBREW POINT SHIN DOT
UCP 05C2, Mn,   '',                ; ׂ HEBREW POINT SIN DOT
UCP 05C3, Po,   ':',               ; ׃ HEBREW PUNCTUATION SOF PASUQ
UCP 05D0, Lo,   'A',               ; א HEBREW LETTER ALEF
UCP 05D1, Lo,   'B',               ; ב HEBREW LETTER BET
UCP 05D2, Lo,   'G',               ; ג HEBREW LETTER GIMEL
UCP 05D3, Lo,   'D',               ; ד HEBREW LETTER DALET
UCP 05D4, Lo,   'H',               ; ה HEBREW LETTER HE
UCP 05D5, Lo,   'V',               ; ו HEBREW LETTER VAV
UCP 05D6, Lo,   'Z',               ; ז HEBREW LETTER ZAYIN
UCP 05D7, Lo,   'H',               ; ח HEBREW LETTER HET
UCP 05D8, Lo,   'T',               ; ט HEBREW LETTER TET
UCP 05D9, Lo,   'Yi',              ; י HEBREW LETTER YOD
UCP 05DA, Lo,   'Kh',              ; ך HEBREW LETTER FINAL KAF
UCP 05DB, Lo,   'Kh',              ; כ HEBREW LETTER KAF
UCP 05DC, Lo,   'L',               ; ל HEBREW LETTER LAMED
UCP 05DD, Lo,   'M',               ; ם HEBREW LETTER FINAL MEM
UCP 05DE, Lo,   'M',               ; מ HEBREW LETTER MEM
UCP 05DF, Lo,   'N',               ; ן HEBREW LETTER FINAL NUN
UCP 05E0, Lo,   'N',               ; נ HEBREW LETTER NUN
UCP 05E1, Lo,   'S',               ; ס HEBREW LETTER SAMEKH
UCP 05E2, Lo,   'A',               ; ע HEBREW LETTER AYIN
UCP 05E3, Lo,   'P',               ; ף HEBREW LETTER FINAL PE
UCP 05E4, Lo,   'P',               ; פ HEBREW LETTER PE
UCP 05E5, Lo,   'Tz',              ; ץ HEBREW LETTER FINAL TSADI
UCP 05E6, Lo,   'Tz',              ; צ HEBREW LETTER TSADI
UCP 05E7, Lo,   'K',               ; ק HEBREW LETTER QOF
UCP 05E8, Lo,   'R',               ; ר HEBREW LETTER RESH
UCP 05E9, Lo,   'Sh',              ; ש HEBREW LETTER SHIN
UCP 05EA, Lo,   'T',               ; ת HEBREW LETTER TAV
UCP 05F0, Lo,   'W',               ; װ HEBREW LIGATURE YIDDISH DOUBLE VAV
UCP 05F1, Lo,   'V',               ; ױ HEBREW LIGATURE YIDDISH VAV YOD
UCP 05F2, Lo,   'W',               ; ײ HEBREW LIGATURE YIDDISH DOUBLE YOD
UCP 05F3, Po,   "'",               ; ׳ HEBREW PUNCTUATION GERESH
UCP 05F4, Po,   '"',               ; ״ HEBREW PUNCTUATION GERSHAYIM
UCP 060C, Po,   ',',               ; ، ARABIC COMMA
UCP 061B, Po,   ';',               ; ؛ ARABIC SEMICOLON
UCP 061F, Po,   '?',               ; ؟ ARABIC QUESTION MARK
UCP 0621, Lo,   "'",               ; ء ARABIC LETTER HAMZA
UCP 0622, Lo,   'A',               ; آ ARABIC LETTER ALEF WITH MADDA ABOVE
UCP 0623, Lo,   'A',               ; أ ARABIC LETTER ALEF WITH HAMZA ABOVE
UCP 0624, Lo,   'W',               ; ؤ ARABIC LETTER WAW WITH HAMZA ABOVE
UCP 0625, Lo,   'A',               ; إ ARABIC LETTER ALEF WITH HAMZA BELOW
UCP 0626, Lo,   'Y',               ; ئ ARABIC LETTER YEH WITH HAMZA ABOVE
UCP 0627, Lo,   'A',               ; ا ARABIC LETTER ALEF
UCP 0628, Lo,   'B',               ; ب ARABIC LETTER BEH
UCP 0629, Lo,   'T',               ; ة ARABIC LETTER TEH MARBUTA
UCP 062A, Lo,   'T',               ; ت ARABIC LETTER TEH
UCP 062B, Lo,   'Th',              ; ث ARABIC LETTER THEH
UCP 062C, Lo,   'J',               ; ج ARABIC LETTER JEEM
UCP 062D, Lo,   'H',               ; ح ARABIC LETTER HAH
UCP 062E, Lo,   'Kh',              ; خ ARABIC LETTER KHAH
UCP 062F, Lo,   'D',               ; د ARABIC LETTER DAL
UCP 0630, Lo,   'Dh',              ; ذ ARABIC LETTER THAL
UCP 0631, Lo,   'R',               ; ر ARABIC LETTER REH
UCP 0632, Lo,   'Z',               ; ز ARABIC LETTER ZAIN
UCP 0633, Lo,   'S',               ; س ARABIC LETTER SEEN
UCP 0634, Lo,   'Sh',              ; ش ARABIC LETTER SHEEN
UCP 0635, Lo,   'S',               ; ص ARABIC LETTER SAD
UCP 0636, Lo,   'D',               ; ض ARABIC LETTER DAD
UCP 0637, Lo,   'T',               ; ط ARABIC LETTER TAH
UCP 0638, Lo,   'Z',               ; ظ ARABIC LETTER ZAH
UCP 0639, Lo,   "'",               ; ع ARABIC LETTER AIN
UCP 063A, Lo,   'Gh',              ; غ ARABIC LETTER GHAIN
UCP 0640, Lm,   '_',               ; ـ ARABIC TATWEEL
UCP 0641, Lo,   'F',               ; ف ARABIC LETTER FEH
UCP 0642, Lo,   'Q',               ; ق ARABIC LETTER QAF
UCP 0643, Lo,   'K',               ; ك ARABIC LETTER KAF
UCP 0644, Lo,   'L',               ; ل ARABIC LETTER LAM
UCP 0645, Lo,   'M',               ; م ARABIC LETTER MEEM
UCP 0646, Lo,   'N',               ; ن ARABIC LETTER NOON
UCP 0647, Lo,   'H',               ; ه ARABIC LETTER HEH
UCP 0648, Lo,   'W',               ; و ARABIC LETTER WAW
UCP 0649, Lo,   'A',               ; ى ARABIC LETTER ALEF MAKSURA
UCP 064A, Lo,   'Y',               ; ي ARABIC LETTER YEH
UCP 064B, Mn,   'A',               ; ً ARABIC FATHATAN
UCP 064C, Mn,   'U',               ; ٌ ARABIC DAMMATAN
UCP 064D, Mn,   'I',               ; ٍ ARABIC KASRATAN
UCP 064E, Mn,   'A',               ; َ ARABIC FATHA
UCP 064F, Mn,   'U',               ; ُ ARABIC DAMMA
UCP 0650, Mn,   'I',               ; ِ ARABIC KASRA
UCP 0651, Mn,   '',                ; ّ ARABIC SHADDA
UCP 0652, Mn,   '',                ; ْ ARABIC SUKUN
UCP 0660, Nd,   '0',               ; ٠ ARABIC-INDIC DIGIT ZERO
UCP 0661, Nd,   '1',               ; ١ ARABIC-INDIC DIGIT ONE
UCP 0662, Nd,   '2',               ; ٢ ARABIC-INDIC DIGIT TWO
UCP 0663, Nd,   '3',               ; ٣ ARABIC-INDIC DIGIT THREE
UCP 0664, Nd,   '4',               ; ٤ ARABIC-INDIC DIGIT FOUR
UCP 0665, Nd,   '5',               ; ٥ ARABIC-INDIC DIGIT FIVE
UCP 0666, Nd,   '6',               ; ٦ ARABIC-INDIC DIGIT SIX
UCP 0667, Nd,   '7',               ; ٧ ARABIC-INDIC DIGIT SEVEN
UCP 0668, Nd,   '8',               ; ٨ ARABIC-INDIC DIGIT EIGHT
UCP 0669, Nd,   '9',               ; ٩ ARABIC-INDIC DIGIT NINE
UCP 066A, Po,   '%',               ; ٪ ARABIC PERCENT SIGN
UCP 0679, Lo,   'T',               ; ٹ ARABIC LETTER TTEH
UCP 067E, Lo,   'P',               ; پ ARABIC LETTER PEH
UCP 0686, Lo,   'Ch',              ; چ ARABIC LETTER TCHEH
UCP 0688, Lo,   'D',               ; ڈ ARABIC LETTER DDAL
UCP 0691, Lo,   'R',               ; ڑ ARABIC LETTER RREH
UCP 0698, Lo,   'J',               ; ژ ARABIC LETTER JEH
UCP 06A4, Lo,   'V',               ; ڤ ARABIC LETTER VEH
UCP 06A9, Lo,   'Kh',              ; ک ARABIC LETTER KEHEH
UCP 06AF, Lo,   'G',               ; گ ARABIC LETTER GAF
UCP 06BA, Lo,   'N',               ; ں ARABIC LETTER NOON GHUNNA
UCP 06BE, Lo,   'H',               ; ھ ARABIC LETTER HEH DOACHASHMEE
UCP 06C1, Lo,   'H',               ; ہ ARABIC LETTER HEH GOAL
UCP 06D2, Lo,   'Y',               ; ے ARABIC LETTER YEH BARREE
UCP 06D5, Lo,   'Ae',              ; ە ARABIC LETTER AE
UCP 06F0, Nd,   '0',               ; ۰ EXTENDED ARABIC-INDIC DIGIT ZERO
UCP 06F1, Nd,   '1',               ; ۱ EXTENDED ARABIC-INDIC DIGIT ONE
UCP 06F2, Nd,   '2',               ; ۲ EXTENDED ARABIC-INDIC DIGIT TWO
UCP 06F3, Nd,   '3',               ; ۳ EXTENDED ARABIC-INDIC DIGIT THREE
UCP 06F4, Nd,   '4',               ; ۴ EXTENDED ARABIC-INDIC DIGIT FOUR
UCP 06F5, Nd,   '5',               ; ۵ EXTENDED ARABIC-INDIC DIGIT FIVE
UCP 06F6, Nd,   '6',               ; ۶ EXTENDED ARABIC-INDIC DIGIT SIX
UCP 06F7, Nd,   '7',               ; ۷ EXTENDED ARABIC-INDIC DIGIT SEVEN
UCP 06F8, Nd,   '8',               ; ۸ EXTENDED ARABIC-INDIC DIGIT EIGHT
UCP 06F9, Nd,   '9',               ; ۹ EXTENDED ARABIC-INDIC DIGIT NINE
UCP 0E01, Lo,   'K',               ;  THAI CHARACTER KO KAI
UCP 0E02, Lo,   'Kh',              ;  THAI CHARACTER KHO KHAI
UCP 0E03, Lo,   'Kh',              ;  THAI CHARACTER KHO KHUAT
UCP 0E04, Lo,   'Kh',              ;  THAI CHARACTER KHO KHWAI
UCP 0E05, Lo,   'Kh',              ;  THAI CHARACTER KHO KHON
UCP 0E06, Lo,   'Kh',              ;  THAI CHARACTER KHO RAKHANG
UCP 0E07, Lo,   'Ng',              ;  THAI CHARACTER NGO NGU
UCP 0E08, Lo,   'Ch',              ;  THAI CHARACTER CHO CHAN
UCP 0E09, Lo,   'Ch',              ;  THAI CHARACTER CHO CHING
UCP 0E0A, Lo,   'Ch',              ;  THAI CHARACTER CHO CHANG
UCP 0E0B, Lo,   'S',               ;  THAI CHARACTER SO SO
UCP 0E0C, Lo,   'Ch',              ;  THAI CHARACTER CHO CHOE
UCP 0E0D, Lo,   'Y',               ;  THAI CHARACTER YO YING
UCP 0E0E, Lo,   'D',               ;  THAI CHARACTER DO CHADA
UCP 0E0F, Lo,   'T',               ;  THAI CHARACTER TO PATAK
UCP 0E10, Lo,   'Th',              ;  THAI CHARACTER THO THAN
UCP 0E11, Lo,   'Th',              ;  THAI CHARACTER THO NANGMONTHO
UCP 0E12, Lo,   'Th',              ;  THAI CHARACTER THO PHUTHAO
UCP 0E13, Lo,   'N',               ;  THAI CHARACTER NO NEN
UCP 0E14, Lo,   'D',               ;  THAI CHARACTER DO DEK
UCP 0E15, Lo,   'T',               ;  THAI CHARACTER TO TAO
UCP 0E16, Lo,   'Th',              ;  THAI CHARACTER THO THUNG
UCP 0E17, Lo,   'Th',              ;  THAI CHARACTER THO THAHAN
UCP 0E18, Lo,   'Th',              ;  THAI CHARACTER THO THONG
UCP 0E19, Lo,   'N',               ;  THAI CHARACTER NO NU
UCP 0E1A, Lo,   'B',               ;  THAI CHARACTER BO BAIMAI
UCP 0E1B, Lo,   'P',               ;  THAI CHARACTER PO PLA
UCP 0E1C, Lo,   'Ph',              ;  THAI CHARACTER PHO PHUNG
UCP 0E1D, Lo,   'F',               ;  THAI CHARACTER FO FA
UCP 0E1E, Lo,   'Ph',              ;  THAI CHARACTER PHO PHAN
UCP 0E1F, Lo,   'F',               ;  THAI CHARACTER FO FAN
UCP 0E20, Lo,   'Ph',              ;  THAI CHARACTER PHO SAMPHAO
UCP 0E21, Lo,   'M',               ;  THAI CHARACTER MO MA
UCP 0E22, Lo,   'Y',               ;  THAI CHARACTER YO YAK
UCP 0E23, Lo,   'R',               ;  THAI CHARACTER RO RUA
UCP 0E24, Lo,   'R',               ;  THAI CHARACTER RU
UCP 0E25, Lo,   'L',               ;  THAI CHARACTER LO LING
UCP 0E26, Lo,   'L',               ;  THAI CHARACTER LU
UCP 0E27, Lo,   'W',               ;  THAI CHARACTER WO WAEN
UCP 0E28, Lo,   'S',               ;  THAI CHARACTER SO SALA
UCP 0E29, Lo,   'S',               ;  THAI CHARACTER SO RUSI
UCP 0E2A, Lo,   'S',               ;  THAI CHARACTER SO SUA
UCP 0E2B, Lo,   'H',               ;  THAI CHARACTER HO HIP
UCP 0E2C, Lo,   'L',               ;  THAI CHARACTER LO CHULA
UCP 0E2D, Lo,   'O',               ;  THAI CHARACTER O ANG
UCP 0E2E, Lo,   'H',               ;  THAI CHARACTER HO NOKHUK
UCP 0E2F, Lo,   'A',               ;  THAI CHARACTER PAIYANNOI
UCP 0E30, Lo,   'A',               ;  THAI CHARACTER SARA A
UCP 0E31, Mn,   '',                ;  THAI CHARACTER MAI HAN-AKAT
UCP 0E32, Lo,   'A',               ;  THAI CHARACTER SARA AA
UCP 0E33, Lo,   'A',               ;  THAI CHARACTER SARA AM
UCP 0E34, Mn,   '',                ;  THAI CHARACTER SARA I
UCP 0E35, Mn,   '',                ;  THAI CHARACTER SARA II
UCP 0E36, Mn,   '',                ;  THAI CHARACTER SARA UE
UCP 0E37, Mn,   '',                ;  THAI CHARACTER SARA UEE
UCP 0E38, Mn,   '',                ;  THAI CHARACTER SARA U
UCP 0E39, Mn,   '',                ;  THAI CHARACTER SARA UU
UCP 0E3A, Mn,   '',                ;  THAI CHARACTER PHINTHU
UCP 0E3F, Sc,   '$',               ; ฿ THAI CURRENCY SYMBOL BAHT
UCP 0E40, Lo,   'E',               ;  THAI CHARACTER SARA E
UCP 0E41, Lo,   'AE',              ;  THAI CHARACTER SARA AE
UCP 0E42, Lo,   'O',               ;  THAI CHARACTER SARA O
UCP 0E43, Lo,   'I',               ;  THAI CHARACTER SARA AI MAIMUAN
UCP 0E44, Lo,   'I',               ;  THAI CHARACTER SARA AI MAIMALAI
UCP 0E45, Lo,   'A',               ;  THAI CHARACTER LAKKHANGYAO
UCP 0E46, Lm,   '`',               ;  THAI CHARACTER MAIYAMOK
UCP 0E47, Mn,   '',                ;  THAI CHARACTER MAITAIKHU
UCP 0E48, Mn,   '',                ;  THAI CHARACTER MAI EK
UCP 0E49, Mn,   '',                ;  THAI CHARACTER MAI THO
UCP 0E4A, Mn,   '',                ;  THAI CHARACTER MAI TRI
UCP 0E4B, Mn,   '',                ;  THAI CHARACTER MAI CHATTAWA
UCP 0E4C, Mn,   '',                ;  THAI CHARACTER THANTHAKHAT
UCP 0E4D, Mn,   '',                ;  THAI CHARACTER NIKHAHIT
UCP 0E4E, Mn,   '',                ;  THAI CHARACTER YAMAKKAN
UCP 0E4F, Po,   '#',               ;  THAI CHARACTER FONGMAN
UCP 0E50, Nd,   '0',               ;  THAI DIGIT ZERO
UCP 0E51, Nd,   '1',               ;  THAI DIGIT ONE
UCP 0E52, Nd,   '2',               ;  THAI DIGIT TWO
UCP 0E53, Nd,   '3',               ;  THAI DIGIT THREE
UCP 0E54, Nd,   '4',               ;  THAI DIGIT FOUR
UCP 0E55, Nd,   '5',               ;  THAI DIGIT FIVE
UCP 0E56, Nd,   '6',               ;  THAI DIGIT SIX
UCP 0E57, Nd,   '7',               ;  THAI DIGIT SEVEN
UCP 0E58, Nd,   '8',               ;  THAI DIGIT EIGHT
UCP 0E59, Nd,   '9',               ;  THAI DIGIT NINE
UCP 0E5A, Po,   '|',               ;  THAI CHARACTER ANGKHANKHU
UCP 0E5B, Po,   '>>',              ;  THAI CHARACTER KHOMUT
UCP 1403, Lo,   'I',               ;  CANADIAN SYLLABICS I
UCP 1404, Lo,   'Ii',              ;  CANADIAN SYLLABICS II
UCP 1405, Lo,   'O',               ;  CANADIAN SYLLABICS O
UCP 1406, Lo,   'Oo',              ;  CANADIAN SYLLABICS OO
UCP 140A, Lo,   'A',               ;  CANADIAN SYLLABICS A
UCP 140B, Lo,   'Aa',              ;  CANADIAN SYLLABICS AA
UCP 1431, Lo,   'Pi',              ;  CANADIAN SYLLABICS PI
UCP 1432, Lo,   'Pii',             ;  CANADIAN SYLLABICS PII
UCP 1433, Lo,   'Po',              ;  CANADIAN SYLLABICS PO
UCP 1434, Lo,   'Poo',             ;  CANADIAN SYLLABICS POO
UCP 1438, Lo,   'Pa',              ;  CANADIAN SYLLABICS PA
UCP 1439, Lo,   'Paa',             ;  CANADIAN SYLLABICS PAA
UCP 1449, Lo,   'P',               ;  CANADIAN SYLLABICS P
UCP 144E, Lo,   'Ti',              ;  CANADIAN SYLLABICS TI
UCP 144F, Lo,   'Tii',             ;  CANADIAN SYLLABICS TII
UCP 1450, Lo,   'To',              ;  CANADIAN SYLLABICS TO
UCP 1451, Lo,   'Too',             ;  CANADIAN SYLLABICS TOO
UCP 1455, Lo,   'Ta',              ;  CANADIAN SYLLABICS TA
UCP 1456, Lo,   'Taa',             ;  CANADIAN SYLLABICS TAA
UCP 1466, Lo,   'T',               ;  CANADIAN SYLLABICS T
UCP 146D, Lo,   'Ki',              ;  CANADIAN SYLLABICS KI
UCP 146E, Lo,   'Kii',             ;  CANADIAN SYLLABICS KII
UCP 146F, Lo,   'Ko',              ;  CANADIAN SYLLABICS KO
UCP 1470, Lo,   'Koo',             ;  CANADIAN SYLLABICS KOO
UCP 1472, Lo,   'Ka',              ;  CANADIAN SYLLABICS KA
UCP 1473, Lo,   'Kaa',             ;  CANADIAN SYLLABICS KAA
UCP 1483, Lo,   'K',               ;  CANADIAN SYLLABICS K
UCP 148B, Lo,   'Ci',              ;  CANADIAN SYLLABICS CI
UCP 148C, Lo,   'Cii',             ;  CANADIAN SYLLABICS CII
UCP 148D, Lo,   'Co',              ;  CANADIAN SYLLABICS CO
UCP 148E, Lo,   'Coo',             ;  CANADIAN SYLLABICS COO
UCP 1490, Lo,   'Ca',              ;  CANADIAN SYLLABICS CA
UCP 1491, Lo,   'Caa',             ;  CANADIAN SYLLABICS CAA
UCP 14A1, Lo,   'C',               ;  CANADIAN SYLLABICS C
UCP 14A5, Lo,   'Mi',              ;  CANADIAN SYLLABICS MI
UCP 14A6, Lo,   'Mii',             ;  CANADIAN SYLLABICS MII
UCP 14A7, Lo,   'Mo',              ;  CANADIAN SYLLABICS MO
UCP 14A8, Lo,   'Moo',             ;  CANADIAN SYLLABICS MOO
UCP 14AA, Lo,   'Ma',              ;  CANADIAN SYLLABICS MA
UCP 14AB, Lo,   'Maa',             ;  CANADIAN SYLLABICS MAA
UCP 14BB, Lo,   'M',               ;  CANADIAN SYLLABICS M
UCP 14C2, Lo,   'Ni',              ;  CANADIAN SYLLABICS NI
UCP 14C3, Lo,   'Nii',             ;  CANADIAN SYLLABICS NII
UCP 14C4, Lo,   'No',              ;  CANADIAN SYLLABICS NO
UCP 14C5, Lo,   'Noo',             ;  CANADIAN SYLLABICS NOO
UCP 14C7, Lo,   'Na',              ;  CANADIAN SYLLABICS NA
UCP 14C8, Lo,   'Naa',             ;  CANADIAN SYLLABICS NAA
UCP 14D0, Lo,   'N',               ;  CANADIAN SYLLABICS N
UCP 14D5, Lo,   'Li',              ;  CANADIAN SYLLABICS LI
UCP 14D6, Lo,   'Lii',             ;  CANADIAN SYLLABICS LII
UCP 14D7, Lo,   'Lo',              ;  CANADIAN SYLLABICS LO
UCP 14D8, Lo,   'Loo',             ;  CANADIAN SYLLABICS LOO
UCP 14DA, Lo,   'La',              ;  CANADIAN SYLLABICS LA
UCP 14DB, Lo,   'Laa',             ;  CANADIAN SYLLABICS LAA
UCP 14EA, Lo,   'L',               ;  CANADIAN SYLLABICS L
UCP 14EF, Lo,   'Si',              ;  CANADIAN SYLLABICS SI
UCP 14F0, Lo,   'Sii',             ;  CANADIAN SYLLABICS SII
UCP 14F1, Lo,   'So',              ;  CANADIAN SYLLABICS SO
UCP 14F2, Lo,   'Soo',             ;  CANADIAN SYLLABICS SOO
UCP 14F4, Lo,   'Sa',              ;  CANADIAN SYLLABICS SA
UCP 14F5, Lo,   'Saa',             ;  CANADIAN SYLLABICS SAA
UCP 1505, Lo,   'S',               ;  CANADIAN SYLLABICS S
UCP 1528, Lo,   'Yi',              ;  CANADIAN SYLLABICS YI
UCP 1529, Lo,   'Yii',             ;  CANADIAN SYLLABICS YII
UCP 152A, Lo,   'Yo',              ;  CANADIAN SYLLABICS YO
UCP 152B, Lo,   'Yoo',             ;  CANADIAN SYLLABICS YOO
UCP 152D, Lo,   'Ya',              ;  CANADIAN SYLLABICS YA
UCP 152E, Lo,   'Yaa',             ;  CANADIAN SYLLABICS YAA
UCP 153E, Lo,   'Y',               ;  CANADIAN SYLLABICS Y
UCP 1546, Lo,   'Ri',              ;  CANADIAN SYLLABICS RI
UCP 1547, Lo,   'Rii',             ;  CANADIAN SYLLABICS RII
UCP 1548, Lo,   'Ro',              ;  CANADIAN SYLLABICS RO
UCP 1549, Lo,   'Roo',             ;  CANADIAN SYLLABICS ROO
UCP 154B, Lo,   'Ra',              ;  CANADIAN SYLLABICS RA
UCP 154C, Lo,   'Raa',             ;  CANADIAN SYLLABICS RAA
UCP 1550, Lo,   'R',               ;  CANADIAN SYLLABICS R
UCP 1555, Lo,   'Fi',              ;  CANADIAN SYLLABICS FI
UCP 1556, Lo,   'Fii',             ;  CANADIAN SYLLABICS FII
UCP 1557, Lo,   'Fo',              ;  CANADIAN SYLLABICS FO
UCP 1558, Lo,   'Foo',             ;  CANADIAN SYLLABICS FOO
UCP 1559, Lo,   'Fa',              ;  CANADIAN SYLLABICS FA
UCP 155A, Lo,   'Faa',             ;  CANADIAN SYLLABICS FAA
UCP 155D, Lo,   'F',               ;  CANADIAN SYLLABICS F
UCP 157C, Lo,   'H',               ;  CANADIAN SYLLABICS NUNAVUT H
UCP 157F, Lo,   'Qi',              ;  CANADIAN SYLLABICS QI
UCP 1580, Lo,   'Qii',             ;  CANADIAN SYLLABICS QII
UCP 1581, Lo,   'Qo',              ;  CANADIAN SYLLABICS QO
UCP 1582, Lo,   'Qoo',             ;  CANADIAN SYLLABICS QOO
UCP 1583, Lo,   'Qa',              ;  CANADIAN SYLLABICS QA
UCP 1584, Lo,   'Qaa',             ;  CANADIAN SYLLABICS QAA
UCP 1585, Lo,   'Q',               ;  CANADIAN SYLLABICS Q
UCP 158F, Lo,   'Ngi',             ;  CANADIAN SYLLABICS NGI
UCP 1590, Lo,   'Ngii',            ;  CANADIAN SYLLABICS NGII
UCP 1591, Lo,   'Ngo',             ;  CANADIAN SYLLABICS NGO
UCP 1592, Lo,   'Ngoo',            ;  CANADIAN SYLLABICS NGOO
UCP 1593, Lo,   'Nga',             ;  CANADIAN SYLLABICS NGA
UCP 1594, Lo,   'Ngaa',            ;  CANADIAN SYLLABICS NGAA
UCP 1595, Lo,   'Ng',              ;  CANADIAN SYLLABICS NG
UCP 1596, Lo,   'Nng',             ;  CANADIAN SYLLABICS NNG
UCP 15A0, Lo,   'Lhi',             ;  CANADIAN SYLLABICS LHI
UCP 15A1, Lo,   'Lhii',            ;  CANADIAN SYLLABICS LHII
UCP 15A2, Lo,   'Lho',             ;  CANADIAN SYLLABICS LHO
UCP 15A3, Lo,   'Lhoo',            ;  CANADIAN SYLLABICS LHOO
UCP 15A4, Lo,   'Lha',             ;  CANADIAN SYLLABICS LHA
UCP 15A5, Lo,   'Lhaa',            ;  CANADIAN SYLLABICS LHAA
UCP 15A6, Lo,   'Lh',              ;  CANADIAN SYLLABICS LH
UCP 1671, Lo,   'Nngi',            ;  CANADIAN SYLLABICS NNGI
UCP 1672, Lo,   'Ngii',            ;  CANADIAN SYLLABICS NNGII
UCP 1673, Lo,   'Nngo',            ;  CANADIAN SYLLABICS NNGO
UCP 1674, Lo,   'Ngoo',            ;  CANADIAN SYLLABICS NNGOO
UCP 1675, Lo,   'Nnga',            ;  CANADIAN SYLLABICS NNGA
UCP 1676, Lo,   'Ngaa',            ;  CANADIAN SYLLABICS NNGAA
UCP 1E02, Lu,   'B',               ;  LATIN CAPITAL LETTER B WITH DOT ABOVE
UCP 1E03, Ll,   'b',               ;  LATIN SMALL LETTER B WITH DOT ABOVE
UCP 1E0A, Lu,   'D',               ;  LATIN CAPITAL LETTER D WITH DOT ABOVE
UCP 1E0B, Ll,   'd',               ;  LATIN SMALL LETTER D WITH DOT ABOVE
UCP 1E1E, Lu,   'F',               ;  LATIN CAPITAL LETTER F WITH DOT ABOVE
UCP 1E1F, Ll,   'f',               ;  LATIN SMALL LETTER F WITH DOT ABOVE
UCP 1E40, Lu,   'M',               ;  LATIN CAPITAL LETTER M WITH DOT ABOVE
UCP 1E41, Ll,   'm',               ;  LATIN SMALL LETTER M WITH DOT ABOVE
UCP 1E56, Lu,   'P',               ;  LATIN CAPITAL LETTER P WITH DOT ABOVE
UCP 1E57, Ll,   'p',               ;  LATIN SMALL LETTER P WITH DOT ABOVE
UCP 1E60, Lu,   'S',               ;  LATIN CAPITAL LETTER S WITH DOT ABOVE
UCP 1E61, Ll,   's',               ;  LATIN SMALL LETTER S WITH DOT ABOVE
UCP 1E6A, Lu,   'T',               ;  LATIN CAPITAL LETTER T WITH DOT ABOVE
UCP 1E6B, Ll,   't',               ;  LATIN SMALL LETTER T WITH DOT ABOVE
UCP 1E80, Lu,   'W',               ;  LATIN CAPITAL LETTER W WITH GRAVE
UCP 1E81, Ll,   'w',               ;  LATIN SMALL LETTER W WITH GRAVE
UCP 1E82, Lu,   'W',               ;  LATIN CAPITAL LETTER W WITH ACUTE
UCP 1E83, Ll,   'w',               ;  LATIN SMALL LETTER W WITH ACUTE
UCP 1E84, Lu,   'W',               ;  LATIN CAPITAL LETTER W WITH DIAERESIS
UCP 1E85, Ll,   'w',               ;  LATIN SMALL LETTER W WITH DIAERESIS
UCP 1E9B, Ll,   's',               ;  LATIN SMALL LETTER LONG S WITH DOT ABOVE
UCP 1EF2, Lu,   'Y',               ;  LATIN CAPITAL LETTER Y WITH GRAVE
UCP 1EF3, Ll,   'y',               ;  LATIN SMALL LETTER Y WITH GRAVE
UCP 2002, Zs,   ' ',   ensp        ;  EN SPACE
UCP 2003, Zs,   ' ',   emsp        ;  EM SPACE
UCP 2009, Zs,   ' ',   thinsp      ;  THIN SPACE
UCP 200B, Cf,   '',                ;  ZERO WIDTH SPACE
UCP 200C, Cf,   '',    zwnj        ;  ZERO WIDTH NON-JOINER
UCP 200D, Cf,   '',    zwj         ;  ZERO WIDTH JOINER
UCP 200E, Cf,   '',    lrm         ;  LEFT-TO-RIGHT MARK
UCP 200F, Cf,   '',    rlm         ;  RIGHT-TO-LEFT MARK
UCP 2013, Pd,   '-',   ndash       ;  EN DASH
UCP 2014, Pd,   '-',   mdash       ;  EM DASH
UCP 2015, Pd,   '-',               ;  HORIZONTAL BAR
UCP 2017, Po,   '_',               ;  DOUBLE LOW LINE
UCP 2018, Pi,   "'",   lsquo       ;  LEFT SINGLE QUOTATION MARK
UCP 2019, Pf,   "'",   rsquo       ;  RIGHT SINGLE QUOTATION MARK
UCP 201A, Ps,   "'",   sbquo       ;  SINGLE LOW-9 QUOTATION MARK
UCP 201C, Pi,   '"',   ldquo       ;  LEFT DOUBLE QUOTATION MARK
UCP 201D, Pf,   '"',   rdquo       ;  RIGHT DOUBLE QUOTATION MARK
UCP 201E, Ps,   '"',   bdquo       ;  DOUBLE LOW-9 QUOTATION MARK
UCP 2020, Po,   '+',   dagger      ;  DAGGER
UCP 2021, Po,   '+',   Dagger      ;  DOUBLE DAGGER
UCP 2022, Po,   '.',   bull        ;  BULLET
UCP 2026, Po,   '...', hellip      ;  HORIZONTAL ELLIPSIS
UCP 202A, Cf,   '',                ;   LEFT-TO-RIGHT EMBEDDING
UCP 202B, Cf,   '',                ;   RIGHT-TO-LEFT EMBEDDING
UCP 202C, Cf,   '',                ;   POP DIRECTIONAL FORMATTING
UCP 202D, Cf,   '',                ;   LEFT-TO-RIGHT OVERRIDE
UCP 202E, Cf,   '',                ;   RIGHT-TO-LEFT OVERRIDE
UCP 2030, Po,   '%',   permil      ;  PER MILLE SIGN
UCP 2032, Po,   "'",   prime       ;  PRIME
UCP 2033, Po,   '"',   Prime       ;  DOUBLE PRIME
UCP 2039, Pi,   '<',   lsaquo      ;  SINGLE LEFT-POINTING ANGLE QUOTATION MARK
UCP 203A, Pf,   '>',   rsaquo      ;  SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
UCP 203E, Po,   '_',   oline       ;  OVERLINE
UCP 2044, Sm,   '/',   frasl       ;  FRACTION SLASH
UCP 204A, Po,   '@',               ;  TIRONIAN SIGN ET
UCP 2060, Cf,   '',                ;  WORD JOINER
UCP 207F, Lm,   '`',               ;  SUPERSCRIPT LATIN SMALL LETTER N
UCP 20A7, Sc,   '$',               ;  PESETA SIGN
UCP 20AA, Sc,   '$',               ;  NEW SHEQEL SIGN
UCP 20AB, Sc,   '$',               ;  DONG SIGN
UCP 20AC, Sc,   '$',   euro        ;  EURO SIGN
UCP 20AF, Sc,   '$',               ;  DRACHMA SIGN
UCP 2113, Ll,   'l',               ;  SCRIPT SMALL L
UCP 2116, So,   'N',               ;  NUMERO SIGN
UCP 2122, So,   '(TM)',trade       ;  TRADE MARK SIGN
UCP 2126, Lu,   'O',               ;  OHM SIGN
UCP 2190, Sm,   '<',   larr        ;  LEFTWARDS ARROW
UCP 2191, Sm,   '^',   uarr        ;  UPWARDS ARROW
UCP 2192, Sm,   '>',   rarr        ;  RIGHTWARDS ARROW
UCP 2193, Sm,   'v',   darr        ;  DOWNWARDS ARROW
UCP 2194, Sm,   '-',   harr        ;  LEFT RIGHT ARROW
UCP 2195, So,   '|',               ;  UP DOWN ARROW
UCP 21B5, So,   '<',   crarr       ;  DOWNWARDS ARROW WITH CORNER LEFTWARDS
UCP 2200, Sm,   'v',   forall      ;  FOR ALL
UCP 2202, Sm,   'd',   part        ;  PARTIAL DIFFERENTIAL
UCP 2203, Sm,   'E',   exist       ;  THERE EXIST
UCP 2205, Sm,   '/',   empty       ;  EMPTY SET
UCP 2206, Sm,   '#',               ;  INCREMENT
UCP 2207, Sm,   '.',   nabla       ;  NABLA
UCP 2208, Sm,   'E',   isin        ;  ELEMENT OF
UCP 2209, Sm,   '/',   notin       ;  NOT AN ELEMENT OF
UCP 220B, Sm,   'E',   ni          ;  CONTAINS AS MEMBER
UCP 220F, Sm,   '#',   prod        ;  N-ARY PRODUCT
UCP 2211, Sm,   '#',   sum         ;  N-ARY SUMMATION
UCP 2212, Sm,   '-',   minus       ;  MINUS SIGN
UCP 2217, Sm,   '*',   lowast      ;  ASTERISK OPERATOR
UCP 2219, Sm,   '.',               ;  BULLET OPERATOR
UCP 221A, Sm,   '#',   radic       ;  SQUARE ROOT
UCP 221D, Sm,   'o',   prop        ;  PROPORTIONAL TO
UCP 221E, Sm,   '#',   infin       ;  INFINITY
UCP 2220, Sm,   '<',   ang         ;  ANGLE
UCP 2227, Sm,   '&',   and         ;  LOGICAL AND
UCP 2228, Sm,   '|',   or          ;  LOGICAL OR
UCP 2229, Sm,   '#',   cap         ;  INTERSECTION
UCP 222A, Sm,   'U',   cup         ;  UNION
UCP 222B, Sm,   '/',   int         ;  INTEGRAL
UCP 2234, Sm,   '.',   there4      ;  THEREFORE
UCP 223C, Sm,   '~',   sim         ;  TILDE OPERATOR
UCP 2245, Sm,   '~',   cong        ;  APPROXIMATELY EQUAL TO
UCP 2248, Sm,   '=',   asymp       ;  ALMOST EQUAL TO
UCP 2260, Sm,   '=',   ne          ;  NOT EQUAL TO
UCP 2261, Sm,   '=',   equiv       ;  IDENTICAL TO
UCP 2264, Sm,   '<',   le          ;  LESS-THAN OR EQUAL TO
UCP 2265, Sm,   '>',   ge          ;  GREATER-THAN OR EQUAL TO
UCP 2282, Sm,   '<',   sub         ;  SUBSET OF
UCP 2283, Sm,   '>',   sup         ;  SUPERSET OF
UCP 2284, Sm,   '/',   nsub        ;  NOT SUBSET OF
UCP 2286, Sm,   '<=',  sube        ;  SUBSET OR EQUAL TO
UCP 2287, Sm,   '=>',  supe        ;  SUPERSET OR EQUAL TO
UCP 2295, Sm,   '+',   oplus       ;  CIRCLED PLUS
UCP 2296, Sm,   '-',   ominus      ;  CIRCLED MINUS
UCP 2297, Sm,   '.',   otimes      ;  CIRCLED TIMES
UCP 22A5, Sm,   '_',   perp        ;  UP TACK
UCP 22C5, Sm,   '.',   sdot        ;  DOT OPERATOR
UCP 2308, Sm,   '|',   lceil       ;  LEFT CEILING
UCP 2309, Sm,   '|',   rceil       ;  RIGHT CEILING
UCP 230A, Sm,   '|',   lfloor      ;  LEFT FLOOR
UCP 230B, Sm,   '|',   rfloor      ;  RIGHT FLOOR
UCP 2310, So,   '^',               ;  REVERSED NOT SIGN
UCP 2320, Sm,   '/',               ;  TOP HALF INTEGRAL
UCP 2321, Sm,   '/',               ;  BOTTOM HALF INTEGRAL
UCP 2500, So,   '-',               ;  BOX DRAWINGS LIGHT HORIZONTAL
UCP 2502, So,   '|',               ;  BOX DRAWINGS LIGHT VERTICAL
UCP 250C, So,   '+',               ;  BOX DRAWINGS LIGHT DOWN AND RIGHT
UCP 2510, So,   '+',               ;  BOX DRAWINGS LIGHT DOWN AND LEFT
UCP 2514, So,   '+',               ;  BOX DRAWINGS LIGHT UP AND RIGHT
UCP 2518, So,   '+',               ;  BOX DRAWINGS LIGHT UP AND LEFT
UCP 251C, So,   '+',               ;  BOX DRAWINGS LIGHT VERTICAL AND RIGHT
UCP 2524, So,   '+',               ;  BOX DRAWINGS LIGHT VERTICAL AND LEFT
UCP 252C, So,   '+',               ;  BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
UCP 2534, So,   '+',               ;  BOX DRAWINGS LIGHT UP AND HORIZONTAL
UCP 253C, So,   '+',               ;  BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
UCP 2550, So,   '-',               ;  BOX DRAWINGS DOUBLE HORIZONTAL
UCP 2551, So,   '|',               ;  BOX DRAWINGS DOUBLE VERTICAL
UCP 2552, So,   '+',               ;  BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
UCP 2553, So,   '+',               ;  BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
UCP 2554, So,   '+',               ;  BOX DRAWINGS DOUBLE DOWN AND RIGHT
UCP 2555, So,   '+',               ;  BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
UCP 2556, So,   '+',               ;  BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
UCP 2557, So,   '+',               ;  BOX DRAWINGS DOUBLE DOWN AND LEFT
UCP 2558, So,   '+',               ;  BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
UCP 2559, So,   '+',               ;  BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
UCP 255A, So,   '+',               ;  BOX DRAWINGS DOUBLE UP AND RIGHT
UCP 255B, So,   '+',               ;  BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
UCP 255C, So,   '+',               ;  BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
UCP 255D, So,   '+',               ;  BOX DRAWINGS DOUBLE UP AND LEFT
UCP 255E, So,   '+',               ;  BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
UCP 255F, So,   '+',               ;  BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
UCP 2560, So,   '+',               ;  BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
UCP 2561, So,   '+',               ;  BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
UCP 2562, So,   '+',               ;  BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
UCP 2563, So,   '+',               ;  BOX DRAWINGS DOUBLE VERTICAL AND LEFT
UCP 2564, So,   '+',               ;  BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
UCP 2565, So,   '+',               ;  BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
UCP 2566, So,   '+',               ;  BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
UCP 2567, So,   '+',               ;  BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
UCP 2568, So,   '+',               ;  BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
UCP 2569, So,   '+',               ;  BOX DRAWINGS DOUBLE UP AND HORIZONTAL
UCP 256A, So,   '+',               ;  BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
UCP 256B, So,   '+',               ;  BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
UCP 256C, So,   '+',               ;  BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
UCP 2580, So,   '*',               ;  UPPER HALF BLOCK
UCP 2584, So,   '*',               ;  LOWER HALF BLOCK
UCP 2588, So,   '*',               ;  FULL BLOCK
UCP 258C, So,   '*',               ;  LEFT HALF BLOCK
UCP 2590, So,   '*',               ;  RIGHT HALF BLOCK
UCP 2591, So,   '*',               ;  LIGHT SHADE
UCP 2592, So,   '*',               ;  MEDIUM SHADE
UCP 2593, So,   '*',               ;  DARK SHADE
UCP 25A0, So,   '*',               ;  BLACK SQUARE
UCP 25CA, So,   '#',   loz         ;  LOZENGE
UCP 2618, So,   '*',               ;  SHAMROCK
UCP 2660, So,   '*',   spades      ;  BLACK SPADE SUIT
UCP 2663, So,   '*',   clubs       ;  BLACK CLUB SUIT
UCP 2665, So,   '*',   hearts      ;  BLACK HEART SUIT
UCP 2666, So,   '*',   diams       ;  BLACK DIAMOND SUIT
UCP 274A, So,   '*',               ;  EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
UCP F8FF, Co,   '*',               ;  Private Use, Last
UCP FB01, Ll,   'fi',              ;  LATIN SMALL LIGATURE FI
UCP FB02, Ll,   'fl',              ;  LATIN SMALL LIGATURE FL
UCP FB2A, Lo,   'Sh',              ;  HEBREW LETTER SHIN WITH SHIN DOT
UCP FB2B, Lo,   'Sh',              ;  HEBREW LETTER SHIN WITH SIN DOT
UCP FB35, Lo,   'V',               ;  HEBREW LETTER VAV WITH DAGESH
UCP FB4B, Lo,   'V',               ;  HEBREW LETTER VAV WITH HOLAM
UCP FB56, Lo,   'P',               ;  ARABIC LETTER PEH ISOLATED FORM
UCP FB58, Lo,   'P',               ;  ARABIC LETTER PEH INITIAL FORM
UCP FB66, Lo,   'T',               ;  ARABIC LETTER TTEH ISOLATED FORM
UCP FB68, Lo,   'T',               ;  ARABIC LETTER TTEH INITIAL FORM
UCP FB7A, Lo,   'Ch',              ;  ARABIC LETTER TCHEH ISOLATED FORM
UCP FB7C, Lo,   'Ch',              ;  ARABIC LETTER TCHEH INITIAL FORM
UCP FB84, Lo,   'D',               ;  ARABIC LETTER DAHAL ISOLATED FORM
UCP FB88, Lo,   'D',               ;  ARABIC LETTER DDAL ISOLATED FORM
UCP FB8A, Lo,   'J',               ;  ARABIC LETTER JEH ISOLATED FORM
UCP FB8C, Lo,   'R',               ;  ARABIC LETTER RREH ISOLATED FORM
UCP FB8E, Lo,   'K',               ;  ARABIC LETTER KEHEH ISOLATED FORM
UCP FB92, Lo,   'G',               ;  ARABIC LETTER GAF ISOLATED FORM
UCP FB94, Lo,   'G',               ;  ARABIC LETTER GAF INITIAL FORM
UCP FB9E, Lo,   'N',               ;  ARABIC LETTER NOON GHUNNA ISOLATED FORM
UCP FBA6, Lo,   'H',               ;  ARABIC LETTER HEH GOAL ISOLATED FORM
UCP FBA8, Lo,   'H',               ;  ARABIC LETTER HEH GOAL INITIAL FORM
UCP FBA9, Lo,   'H',               ;  ARABIC LETTER HEH GOAL MEDIAL FORM
UCP FBAA, Lo,   'H',               ;  ARABIC LETTER HEH DOACHASHMEE ISOLATED FORM
UCP FBAE, Lo,   'Ye',              ;  ARABIC LETTER YEH BARREE ISOLATED FORM
UCP FBB0, Lo,   'Ye',              ;  ARABIC LETTER YEH BARREE WITH HAMZA ABOVE ISOLATED FORM
UCP FBFC, Lo,   'Ye',              ;  ARABIC LETTER FARSI YEH ISOLATED FORM
UCP FBFD, Lo,   'Ye',              ;  ARABIC LETTER FARSI YEH FINAL FORM
UCP FBFE, Lo,   'Ye',              ;  ARABIC LETTER FARSI YEH INITIAL FORM
UCP FE7C, Lo,   'Sh',              ;  ARABIC SHADDA ISOLATED FORM
UCP FE7D, Lo,   '`',               ;  ARABIC SHADDA MEDIAL FORM
UCP FE80, Lo,   "'",               ;  ARABIC LETTER HAMZA ISOLATED FORM
UCP FE81, Lo,   'A',               ;  ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
UCP FE82, Lo,   'A',               ;  ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
UCP FE83, Lo,   'A',               ;  ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
UCP FE84, Lo,   'A',               ;  ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
UCP FE85, Lo,   'W',               ;  ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
UCP FE89, Lo,   'Ye',              ;  ARABIC LETTER YEH WITH HAMZA ABOVE ISOLATED FORM
UCP FE8A, Lo,   'Ye',              ;  ARABIC LETTER YEH WITH HAMZA ABOVE FINAL FORM
UCP FE8B, Lo,   'Y',               ;  ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
UCP FE8D, Lo,   'A',               ;  ARABIC LETTER ALEF ISOLATED FORM
UCP FE8E, Lo,   'A',               ;  ARABIC LETTER ALEF FINAL FORM
UCP FE8F, Lo,   'B',               ;  ARABIC LETTER BEH ISOLATED FORM
UCP FE91, Lo,   'B',               ;  ARABIC LETTER BEH INITIAL FORM
UCP FE93, Lo,   'T',               ;  ARABIC LETTER TEH MARBUTA ISOLATED FORM
UCP FE95, Lo,   'T',               ;  ARABIC LETTER TEH ISOLATED FORM
UCP FE97, Lo,   'T',               ;  ARABIC LETTER TEH INITIAL FORM
UCP FE99, Lo,   'Th',              ;  ARABIC LETTER THEH ISOLATED FORM
UCP FE9B, Lo,   'Th',              ;  ARABIC LETTER THEH INITIAL FORM
UCP FE9D, Lo,   'J',               ;  ARABIC LETTER JEEM ISOLATED FORM
UCP FE9F, Lo,   'J',               ;  ARABIC LETTER JEEM INITIAL FORM
UCP FEA1, Lo,   'H',               ;  ARABIC LETTER HAH ISOLATED FORM
UCP FEA3, Lo,   'H',               ;  ARABIC LETTER HAH INITIAL FORM
UCP FEA5, Lo,   'Kh',              ;  ARABIC LETTER KHAH ISOLATED FORM
UCP FEA7, Lo,   'Kh',              ;  ARABIC LETTER KHAH INITIAL FORM
UCP FEA9, Lo,   'D',               ;  ARABIC LETTER DAL ISOLATED FORM
UCP FEAB, Lo,   'Dh',              ;  ARABIC LETTER THAL ISOLATED FORM
UCP FEAD, Lo,   'R',               ;  ARABIC LETTER REH ISOLATED FORM
UCP FEAF, Lo,   'Z',               ;  ARABIC LETTER ZAIN ISOLATED FORM
UCP FEB1, Lo,   'S',               ;  ARABIC LETTER SEEN ISOLATED FORM
UCP FEB3, Lo,   'S',               ;  ARABIC LETTER SEEN INITIAL FORM
UCP FEB5, Lo,   'Sh',              ;  ARABIC LETTER SHEEN ISOLATED FORM
UCP FEB7, Lo,   'Sh',              ;  ARABIC LETTER SHEEN INITIAL FORM
UCP FEB9, Lo,   'S',               ;  ARABIC LETTER SAD ISOLATED FORM
UCP FEBB, Lo,   'S',               ;  ARABIC LETTER SAD INITIAL FORM
UCP FEBD, Lo,   'D',               ;  ARABIC LETTER DAD ISOLATED FORM
UCP FEBF, Lo,   'D',               ; ﺿ ARABIC LETTER DAD INITIAL FORM
UCP FEC1, Lo,   'T',               ;  ARABIC LETTER TAH ISOLATED FORM
UCP FEC3, Lo,   'T',               ;  ARABIC LETTER TAH INITIAL FORM
UCP FEC5, Lo,   'Z',               ;  ARABIC LETTER ZAH ISOLATED FORM
UCP FEC7, Lo,   'Z',               ;  ARABIC LETTER ZAH INITIAL FORM
UCP FEC9, Lo,   "'",               ;  ARABIC LETTER AIN ISOLATED FORM
UCP FECA, Lo,   "'",               ;  ARABIC LETTER AIN FINAL FORM
UCP FECB, Lo,   "'",               ;  ARABIC LETTER AIN INITIAL FORM
UCP FECC, Lo,   "'",               ;  ARABIC LETTER AIN MEDIAL FORM
UCP FECD, Lo,   'Gh',              ;  ARABIC LETTER GHAIN ISOLATED FORM
UCP FECE, Lo,   'Gh',              ;  ARABIC LETTER GHAIN FINAL FORM
UCP FECF, Lo,   'Gh',              ;  ARABIC LETTER GHAIN INITIAL FORM
UCP FED0, Lo,   'Gh',              ;  ARABIC LETTER GHAIN MEDIAL FORM
UCP FED1, Lo,   'F',               ;  ARABIC LETTER FEH ISOLATED FORM
UCP FED3, Lo,   'F',               ;  ARABIC LETTER FEH INITIAL FORM
UCP FED5, Lo,   'Q',               ;  ARABIC LETTER QAF ISOLATED FORM
UCP FED7, Lo,   'Q',               ;  ARABIC LETTER QAF INITIAL FORM
UCP FED9, Lo,   'K',               ;  ARABIC LETTER KAF ISOLATED FORM
UCP FEDB, Lo,   'K',               ;  ARABIC LETTER KAF INITIAL FORM
UCP FEDD, Lo,   'L',               ;  ARABIC LETTER LAM ISOLATED FORM
UCP FEDF, Lo,   'L',               ;  ARABIC LETTER LAM INITIAL FORM
UCP FEE0, Lo,   'L',               ;  ARABIC LETTER LAM MEDIAL FORM
UCP FEE1, Lo,   'M',               ;  ARABIC LETTER MEEM ISOLATED FORM
UCP FEE3, Lo,   'M',               ;  ARABIC LETTER MEEM INITIAL FORM
UCP FEE5, Lo,   'N',               ;  ARABIC LETTER NOON ISOLATED FORM
UCP FEE7, Lo,   'N',               ;  ARABIC LETTER NOON INITIAL FORM
UCP FEE9, Lo,   'H',               ;  ARABIC LETTER HEH ISOLATED FORM
UCP FEEB, Lo,   'H',               ;  ARABIC LETTER HEH INITIAL FORM
UCP FEEC, Lo,   'H',               ;  ARABIC LETTER HEH MEDIAL FORM
UCP FEED, Lo,   'W',               ;  ARABIC LETTER WAW ISOLATED FORM
UCP FEEF, Lo,   'A',               ;  ARABIC LETTER ALEF MAKSURA ISOLATED FORM
UCP FEF0, Lo,   'A',               ;  ARABIC LETTER ALEF MAKSURA FINAL FORM
UCP FEF1, Lo,   'Y',               ;  ARABIC LETTER YEH ISOLATED FORM
UCP FEF2, Lo,   'Y',               ;  ARABIC LETTER YEH FINAL FORM
UCP FEF3, Lo,   'Y',               ;  ARABIC LETTER YEH INITIAL FORM
UCP FEF5, Lo,   'LA',              ;  ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
UCP FEF6, Lo,   'LA',              ;  ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
UCP FEF7, Lo,   'LA',              ;  ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
UCP FEF8, Lo,   'LA',              ;  ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
UCP FEFB, Lo,   'LA',              ;  ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
UCP FEFC, Lo,   'LA',              ;  ARABIC LIGATURE LAM WITH ALEF FINAL FORM
UCP FEFF, Bm,   '',                ;   ZERO WIDTH NO-BREAK SPACE
UCP FFFD, ??,   '?',               ; ? Replacement of malformed input character.
UCP FFFE, ??,   '',                ;   Character is undefined in output encoding.
UCP FFFF, ??,   '' ,               ;   Not a valid Unicode character
CP CPid, CPname, CPrem, CPurl, CPtt
Macro CP populates data sections, which define code page encodings, at assembly time.
Input
CPid is numeric identifier 0..65535 in Microsoft assignment.
CPname is encoding display name, e.g. "ISO-8859-2".
CPrem defines alternative names (AlsoKnownAs) and other remarks, e.g. "Latin 2 (Central European)".
CPurl is URL of authoritative source, e.g. https://en.wikipedia.org/wiki/Windows-1250.
CPtt is 0 or 128 words (4 hexadecimal digits) with corresponding BMP codepoint of upper 128 characters. Offset of translation table CPtt is omitted in ASCII and Unicode encodings (it is replaced with -1).
CP %MACRO CPid, CPname, CPrem, CPurl, CPtt
[CPid]
  DW %CPid
[CPinfo]
CPname%CPid: DB %CPname,0
[CPname]
 DW CPname%CPid: - SECTION# [CPinfo]
[CPinfo]
CPrem%CPid:  DB %CPrem,0
[CPrem]
 DW CPrem%CPid: - SECTION# [CPinfo]
[CPinfo]
CPurl%CPid:  DB "%CPurl",0
[CPurl]
 DW CPurl%CPid - SECTION# [CPinfo]
%TTlength %SETA %# - 4
  %IF %TTlength = 0 || %TTlength = 128
    %IF %TTlength = 128
      [CPtt]
      CPtt%CPid:
      tt %FOR %*{1+4..128+4}
           DW 0x%tt
         %ENDFOR tt
      [CPtable]
      DW CPtt%CPid: - SECTION# [CPtt]
    %ELSE ; %TTlength=0 in ASCII or Unicode encodings.
      [CPtable]
         DW -1  ; This value signalises no translation table.
    %ENDIF
  %ELSE
    %ERROR Invalid CP %CPid %CPname table (%TTlength instead of 128).
  %ENDIF
 %ENDMACRO CP
CodePages
This division defines data of all encodings supported by EuroConv: encoding identifier, name, remark, URL, translation table.
The data are tossed by macro CP into their sections at assembly time.
Documentation
CodePage identifiers (the first argument of macro CP) use Microsoft assignment. Other possible nomenclatures:
IBM
Macintosh
Microsoft
MIME
Unofficial CP numeric identifiers (assigned by EuroConv): 667,895,10080,10083,10084,10089,10101.
; Plain ASCII 7bit encoding (American Standard Code for Information Interchange).
CP 20127,"ASCII","7-bit encoding",https://en.wikipedia.org/wiki/ASCII
;
; Unicode encodings (Unicode Transformation Format)
CP 65001,"UTF-8","AKA IBM1208",https://en.wikipedia.org/wiki/UTF-8
CP  1200,"UTF-16LE","UCS-2LE (used in MS Windows)",https://en.wikipedia.org/wiki/UTF-16
CP  1201,"UTF-16BE","UCS-2BE",https://en.wikipedia.org/wiki/UTF-16
CP 12000,"UTF-32LE","UCS-4LE",https://en.wikipedia.org/wiki/UTF-32
CP 12001,"UTF-32BE","UCS-4BE",https://en.wikipedia.org/wiki/UTF-32
;
; OEM (Original Equipment Manufacturer) and ANSI (American National Standards Institute) 8bit encodings.
CP   437,"IBM437","OEM-US",\
https://en.wikipedia.org/wiki/Code_page_437,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00A2,00A3,00A5,20A7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   667,"Mazovia","AKA CP790, AKA CP991 (Polish)",\
https://en.wikipedia.org/wiki/Mazovia_encoding,\
00C7,00FC,00E9,00E2,00E4,00E0,0105,00E7,00EA,00EB,00E8,00EF,00EE,0107,00C4,0104,\ 80..8F
0118,0119,0142,00F4,00F6,0106,00FB,00F9,015A,00D6,00DC,00A2,0141,00A5,015B,0192,\ 90..9F
0179,017B,00F3,00D3,0144,0143,017A,017C,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   737,"IBM737","(Greek)",\
https://en.wikipedia.org/wiki/Code_page_737,\
0391,0392,0393,0394,0395,0396,0397,0398,0399,039A,039B,039C,039D,039E,039F,03A0,\ 80..8F
03A1,03A3,03A4,03A5,03A6,03A7,03A8,03A9,03B1,03B2,03B3,03B4,03B5,03B6,03B7,03B8,\ 90..9F
03B9,03BA,03BB,03BC,03BD,03BE,03BF,03C0,03C1,03C3,03C2,03C4,03C5,03C6,03C7,03C8,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03C9,03AC,03AD,03AE,03CA,03AF,03CC,03CD,03CB,03CE,0386,0388,0389,038A,038C,038E,\ E0..EF
038F,00B1,2265,2264,03AA,03AB,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   775,"IBM775","(Baltic Rim)",\
https://en.wikipedia.org/wiki/Code_page_775,\
0106,00FC,00E9,0101,00E4,0123,00E5,0107,0142,0113,0156,0157,012B,0179,00C4,00C5,\ 80..8F
00C9,00E6,00C6,014D,00F6,0122,00A2,015A,015B,00D6,00DC,00F8,00A3,00D8,00D7,00A4,\ 90..9F
0100,012A,00F3,017B,017C,017A,201D,00A6,00A9,00AE,00AC,00BD,00BC,0141,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,0104,010C,0118,0116,2563,2551,2557,255D,012E,0160,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,0172,016A,255A,2554,2569,2566,2560,2550,256C,017D,\ C0..CF
0105,010D,0119,0117,012F,0161,0173,016B,017E,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
00D3,00DF,014C,0143,00F5,00D5,00B5,0144,0136,0137,013B,013C,0146,0112,0145,2019,\ E0..EF
00AD,00B1,201C,00BE,00B6,00A7,00F7,201E,00B0,2219,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   850,"IBM850","DOS-Latin1 (Western European)",\
https://en.wikipedia.org/wiki/Code_page_850,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,00D7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00AE,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
00F0,00D0,00CA,00CB,00C8,0131,00CD,00CE,00CF,2518,250C,2588,2584,00A6,00CC,2580,\ D0..DF
00D3,00DF,00D4,00D2,00F5,00D5,00B5,00FE,00DE,00DA,00DB,00D9,00FD,00DD,00AF,00B4,\ E0..EF
00AD,00B1,2017,00BE,00B6,00A7,00F7,00B8,00B0,00A8,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   851,"IBM851","DOS-Greek-1",\
https://en.wikipedia.org/wiki/Code_page_851,\
00C7,00FC,00E9,00E2,00E4,00E0,0386,00E7,00EA,00EB,00E8,00EF,00EE,0388,00C4,0389,\ 80..8F
038A,FFFE,038C,00F4,00F6,038E,00FB,00F9,038F,00D6,00DC,03AC,00A3,03AD,03AE,03AF,\ 90..9F
03CA,0390,03CC,03CD,0391,0392,0393,0394,0395,0396,0397,00BD,0398,0399,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,039A,039B,039C,039D,2563,2551,2557,255D,039E,039F,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,03A0,03A1,255A,2554,2569,2566,2560,2550,256C,03A3,\ C0..CF
03A4,03A5,03A6,03A7,03A8,03A9,03B1,03B2,03B3,2518,250C,2588,2584,03B4,03B5,2580,\ D0..DF
03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,03C0,03C1,03C3,03C2,03C4,0384,\ E0..EF
00AD,00B1,03C5,03C6,03C7,00A7,03C8,0385,00B0,00A8,03C9,03CB,03B0,03CE,25A0,00A0,\ F0..FF
;
CP   852,"IBM852","DOS-Latin2 (Central European)",\
https://en.wikipedia.org/wiki/Code_page_852,\
00C7,00FC,00E9,00E2,00E4,016F,0107,00E7,0142,00EB,0150,0151,00EE,0179,00C4,0106,\ 80..8F
00C9,0139,013A,00F4,00F6,013D,013E,015A,015B,00D6,00DC,0164,0165,0141,00D7,010D,\ 90..9F
00E1,00ED,00F3,00FA,0104,0105,017D,017E,0118,0119,00AC,017A,010C,015F,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,00C1,00C2,011A,015E,2563,2551,2557,255D,017B,017C,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,0102,0103,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
0111,0110,010E,00CB,010F,0147,00CD,00CE,011B,2518,250C,2588,2584,0162,016E,2580,\ D0..DF
00D3,00DF,00D4,0143,0144,0148,0160,0161,0154,00DA,0155,0170,00FD,00DD,0163,00B4,\ E0..EF
00AD,02DD,02DB,02C7,02D8,00A7,00F7,00B8,00B0,00A8,02D9,0171,0158,0159,25A0,00A0,\ F0..FF
;
CP   853,"IBM853","(Turkish, Maltese, Esperanto)",\
https://en.wikipedia.org/wiki/Code_page_853,\
00C7,00FC,00E9,00E2,00E4,00E0,0109,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,0108,\ 80..8F
00C9,010B,010A,00F4,00F6,00F2,00FB,00F9,0130,00D6,00DC,011D,00A3,011C,00D7,0135,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,011E,011F,0124,0125,FFFE,00BD,0134,015F,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,00C1,00C2,00C0,015E,2563,2551,2557,255D,017B,017C,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,015C,015D,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
FFFE,FFFE,00CA,00CB,00C8,0131,00CD,00CE,00CF,2518,250C,2588,2584,FFFE,00CC,2580,\ D0..DF
00D3,00DF,00D4,00D2,0120,0121,00B5,0126,0127,00DA,00DB,00D9,016C,016D,00B7,00B4,\ E0..EF
00AD,FFFE,2113,0149,02D8,00A7,00F7,00B8,00B0,00A8,02D9,FFFE,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   855,"IBM855","(Cyrillic, Serbian, Macedonian, Bulgarian)",\
https://en.wikipedia.org/wiki/Code_page_855,\
0452,0402,0453,0403,0451,0401,0454,0404,0455,0405,0456,0406,0457,0407,0458,0408,\ 80..8F
0459,0409,045A,040A,045B,040B,045C,040C,045E,040E,045F,040F,044E,042E,044A,042A,\ 90..9F
0430,0410,0431,0411,0446,0426,0434,0414,0435,0415,0444,0424,0433,0413,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,0445,0425,0438,0418,2563,2551,2557,255D,0439,0419,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,043A,041A,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
043B,041B,043C,041C,043D,041D,043E,041E,043F,2518,250C,2588,2584,041F,044F,2580,\ D0..DF
042F,0440,0420,0441,0421,0442,0422,0443,0423,0436,0416,0432,0412,044C,042C,2116,\ E0..EF
00AD,044B,042B,0437,0417,0448,0428,044D,042D,0449,0429,0447,0427,00A7,25A0,00A0,\ F0..FF
;
CP   856,"IBM856","(Hebrew)",\
https://en.wikipedia.org/wiki/Code_page_856,\
05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ 80..8F
05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,FFFE,00A3,FFFE,00D7,20AA,\ 90..9F
200E,200F,202A,202B,202D,202E,202C,FFFE,FFFE,00AE,00AC,00BD,00BC,20AC,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,FFFE,FFFE,FFFE,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,FFFE,FFFE,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,2518,250C,2588,FFFE,00A6,2590,2580,\ D0..DF
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,00AF,00B4,\ E0..EF
00AD,00B1,2017,00BE,00B6,00A7,00F7,00B8,00B0,00A8,2022,00B9,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   857,"IBM857","(Turkish)",\
https://en.wikipedia.org/wiki/Code_page_857,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,0131,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,0130,00D6,00DC,00F8,00A3,00D8,015E,015F,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,011E,011F,00BF,00AE,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
00BA,00AA,00CA,00CB,00C8,20AC,00CD,00CE,00CF,2518,250C,2588,2584,00A6,00CC,2580,\ D0..DF
00D3,00DF,00D4,00D2,00F5,00D5,00B5,FFFE,00D7,00DA,00DB,00D9,00EC,00FF,00AF,00B4,\ E0..EF
00AD,00B1,FFFE,00BE,00B6,00A7,00F7,00B8,00B0,00A8,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   858,"IBM858","(Western European)",\
https://en.wikipedia.org/wiki/Code_page_858,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,00D7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00AE,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
00F0,00D0,00CA,00CB,00C8,20AC,00CD,00CE,00CF,2518,250C,2588,2584,00A6,00CC,2580,\ D0..DF
00D3,00DF,00D4,00D2,00F5,00D5,00B5,00FE,00DE,00DA,00DB,00D9,00FD,00DD,00AF,00B4,\ E0..EF
00AD,00B1,2017,00BE,00B6,00A7,00F7,00B8,00B0,00A8,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   859,"IBM859","Latin 9 (Western European)",\
https://en.wikipedia.org/wiki/Code_page_859,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,00D7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00AE,00AC,0153,0152,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF
00F0,00D0,00CA,00CB,00C8,20AC,00CD,00CE,00CF,2518,250C,2588,2584,0160,00CC,2580,\ D0..DF
00D3,00DF,00D4,00D2,00F5,00D5,00B5,00FE,00DE,00DA,00DB,00D9,00FD,00DD,00AF,017D,\ E0..EF
00AD,00B1,FFFE,0178,00B6,00A7,00F7,017E,00B0,0161,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF
;
CP   860,"IBM860","(Portuguese)",\
https://en.wikipedia.org/wiki/Code_page_860,\
00C7,00FC,00E9,00E2,00E3,00E0,00C1,00E7,00EA,00CA,00E8,00CD,00D4,00EC,00C3,00C2,\ 80..8F
00C9,00C0,00C8,00F4,00F5,00F2,00DA,00F9,00CC,00D5,00DC,00A2,00A3,00D9,20A7,00D3,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00D2,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   861,"IBM861","(Icelandic)",\
https://en.wikipedia.org/wiki/Code_page_861,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00D0,00F0,00DE,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00FE,00FB,00DD,00FD,00D6,00DC,00F8,00A3,00D8,20A7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00C1,00CD,00D3,00DA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   862, "IBM862","(Hebrew)",\
https://en.wikipedia.org/wiki/Code_page_862,\
05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ 80..8F
05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,00A2,00A3,00A5,20A7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   863,"IBM863","(French Canadian)",\
https://en.wikipedia.org/wiki/Code_page_863,\
00C7,00FC,00E9,00E2,00C2,00E0,00B6,00E7,00EA,00EB,00E8,00EF,00EE,2017,00C0,00A7,\ 80..8F
00C9,00C8,00CA,00F4,00CB,00CF,00FB,00F9,00A4,00D4,00DC,00A2,00A3,00D9,00DB,0192,\ 90..9F
00A6,00B4,00F3,00FA,00A8,00B8,00B3,00AF,00CE,2310,00AC,00BD,00BC,00BE,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   864,"IBM864","(Arabic)",\
https://en.wikipedia.org/wiki/Code_page_864,\
00B0,00B7,2219,221A,2592,2500,2502,253C,2524,252C,251C,2534,2510,250C,2514,2518,\ 80.8F
03B2,221E,03C6,00B1,00BD,00BC,2248,00AB,00BB,FEF7,FEF8,FFFE,FFFE,FEFB,FEFC,FFFE,\ 90..9F
00A0,00AD,FE82,00A3,00A4,FE84,FFFE,20AC,FE8E,FE8F,FE95,FE99,060C,FE9D,FEA1,FEA5,\ A0..AF
0660,0661,0662,0663,0664,0665,0666,0667,0668,0669,FED1,061B,FEB1,FEB5,FEB9,061F,\ B0..BF
00A2,FE80,FE81,FE83,FE85,FECA,FE8B,FE8D,FE91,FE93,FE97,FE9B,FE9F,FEA3,FEA7,FEA9,\ C0..CF
FEAB,FEAD,FEAF,FEB3,FEB7,FEBB,FEBF,FEC1,FEC5,FECB,FECF,00A6,00AC,00F7,00D7,FEC9,\ D0..DF
0640,FED3,FED7,FEDB,FEDF,FEE3,FEE7,FEEB,FEED,FEEF,FEF3,FEBD,FECC,FECE,FECD,FEE1,\ E0..EF
FE7D,0651,FEE5,FEE9,FEEC,FEF0,FEF2,FED0,FED5,FEF5,FEF6,FEDD,FED9,FEF1,25A0,FFFE,\ F0..FF
;
CP   865,"IBM865","(Nordic)",\
https://en.wikipedia.org/wiki/Code_page_865,\
00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F
00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,20A7,0192,\ 90..9F
00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00A4,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   866,"IBM866","AKA CP1125 (Cyrillic Russian)",\
https://en.wikipedia.org/wiki/Code_page_866,\
0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ 80..8F
0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ 90..9F
0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ E0..EF
0401,0451,0404,0454,0407,0457,040E,045E,00B0,2219,00B7,221A,2116,00A4,25A0,00A0,\ F0..FF
;
CP   867,"IBM867","(Hebrew)",\
https://en.wikipedia.org/wiki/Code_page_867,\
05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ 80..8F
05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,00A2,00A3,00A5,FFFE,20AA,\ 90..9F
200E,200F,202A,202B,202D,202E,202C,FFFE,FFFE,2310,00AC,00BD,00BC,20AC,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   868,"IBM868","(Urdu)",\
https://en.wikipedia.org/wiki/Code_page_868,\
06F0,06F1,06F2,06F3,06F4,06F5,06F6,06F7,06F8,06F9,060C,061B,061F,FE81,FE8D,FE8E,\ 80..8F
FFFF,FE8F,FE91,FB56,FB58,FE93,FE95,FE97,FB66,FB68,FE99,FE9B,FE9D,FE9F,FB7A,FB7C,\ 90..9F
FEA1,FEA3,FEA5,FEA7,FEA9,FB88,FEAB,FEAD,FB8C,FEAF,FB8A,FEB1,FEB3,FEB5,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,FEB7,FEB9,FEBB,FEBD,2563,2551,2557,255D,FEBF,FEC3,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,FEC7,FEC9,255A,2554,2569,2566,2560,2550,256C,FECA,\ C0..CF
FECB,FECC,FECD,FECE,FECF,FED0,FED1,FED3,FED5,2518,250C,2588,2584,FED7,FB8E,2580,\ D0..DF
FEDB,FB92,FB94,FEDD,FEDF,FEE0,FEE1,FEE3,FB9E,FEE5,FEE7,FE85,FEED,FBA6,FBA8,FBA9,\ E0..EF
00AD,FBAA,FE80,FE89,FE8A,FE8B,FBFC,FBFD,FBFE,FBB0,FBAE,FE7C,FE7D,FFFE,25A0,00A0,\ F0..FF
;
CP   869,"IBM869","(Greek 2, Modern)",\
https://en.wikipedia.org/wiki/Code_page_869,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,0386,20AC,00B7,00AC,00A6,2018,2019,0388,2015,0389,\ 80..8F
038A,03AA,038C,FFFE,FFFE,038E,03AB,00A9,038F,00B2,00B3,03AC,00A3,03AD,03AE,03AF,\ 90..9F
03CA,0390,03CC,03CD,0391,0392,0393,0394,0395,0396,0397,00BD,0398,0399,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,039A,039B,039C,039D,2563,2551,2557,255D,039E,039F,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,03A0,03A1,255A,2554,2569,2566,2560,2550,256C,03A3,\ C0..CF
03A4,03A5,03A6,03A7,03A8,03A9,03B1,03B2,03B3,2518,250C,2588,2584,03B4,03B5,2580,\ D0..DF
03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,03C0,03C1,03C3,03C2,03C4,0384,\ E0..EF
00AD,00B1,03C5,03C6,03C7,00A7,03C8,0385,00B0,00A8,03C9,03CB,03B0,03CE,25A0,00A0,\ F0..FF
;
CP   874,"IBM874","AKA ISO-8859-11, TIS-620 (Thai)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-11#Code_page_874,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0E01,0E02,0E03,0E04,0E05,0E06,0E07,0E08,0E09,0E0A,0E0B,0E0C,0E0D,0E0E,0E0F,\ A0..AF
0E10,0E11,0E12,0E13,0E14,0E15,0E16,0E17,0E18,0E19,0E1A,0E1B,0E1C,0E1D,0E1E,0E1F,\ B0..BF
0E20,0E21,0E22,0E23,0E24,0E25,0E26,0E27,0E28,0E29,0E2A,0E2B,0E2C,0E2D,0E2E,0E2F,\ C0..CF
0E30,0E31,0E32,0E33,0E34,0E35,0E36,0E37,0E38,0E39,0E3A,0E49,0E4A,0E4B,0E4C,0E3F,\ D0..DF
0E40,0E41,0E42,0E43,0E44,0E45,0E46,0E47,0E48,0E49,0E4A,0E4B,0E4C,0E4D,0E4E,0E4F,\ E0..EF
0E50,0E51,0E52,0E53,0E54,0E55,0E56,0E57,0E58,0E59,0E5A,0E5B,00A2,00AC,00A6,00A0,\ F0..FF
;
CP   878,"KOI8-R","AKA IBM878 AKA Windows-20866 (Cyrillic, Russian, Bulgarian)",\
https://en.wikipedia.org/wiki/KOI8-R,\
2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F
2591,2592,2593,2320,25A0,2219,221A,2248,2264,2265,00A0,2321,00B0,00B2,00B7,00F7,\ 90..9F
2550,2551,2552,0451,2553,2554,2555,2556,2557,2558,2559,255A,255B,255C,255D,255E,\ A0..AF
255F,2560,2561,0401,2562,2563,2564,2565,2566,2567,2568,2569,256A,256B,256C,00A9,\ B0..BF
044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF
043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF
042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF
041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF
;
CP   880,"KOI8-E","ISO-IR-111 (Belarusian, Macedonian, Serbian, Ukrainian)",\
https://en.wikipedia.org/wiki/ISO-IR-111,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0452,0453,0451,0454,0455,0456,0457,0458,0459,045A,045B,045C,00AD,045E,045F,\ A0..AF
2116,0402,0403,0401,0404,0405,0406,0407,0408,0409,040A,040B,040C,00A4,040E,040F,\ B0..BF
044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF
043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF
042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF
041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF
;
CP   882,"KOI8-T","(Tajik)",\
https://en.wikipedia.org/wiki/KOI8-T,\
049B,0493,201A,0492,201E,2026,2020,2021,FFFE,2030,04B3,2039,04B2,04B7,04B6,FFFE,\ 80..8F
049A,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,FFFE,203A,FFFE,FFFE,FFFE,FFFE,\ 90..9F
FFFE,04EF,04EE,0451,00A4,04E2,00A6,00A7,FFFE,FFFE,FFFE,00AB,00AC,00AD,00AE,FFFE,\ A0..F
00B0,00B1,00B2,0401,FFFE,04E1,00B6,00B7,FFFE,2122,FFFE,00BB,FFFE,FFFE,FFFE,00A9,\ B0..BF
0444,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF
043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF
0424,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF
041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF
;
CP   884,"KOI8-F","Fingertip SW (Cyrilic)",\
https://en.wikipedia.org/wiki/KOI8-F,\
2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F
2591,2018,2019,201C,201D,2022,2013,2014,00A9,2122,00A0,00BB,00AE,00AB,00B7,00A4,\ 90..9F
00A0,0452,0453,0451,0454,0455,0456,0457,0458,0459,045A,045B,045C,045D,045E,045F,\ A0..AF
2116,0402,0403,0401,0404,0405,0406,0407,0408,0409,040A,040B,040C,040D,040E,040F,\ B0..BF
044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF
043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF
042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF
041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF
;
CP   885,"KOI8-CS","CSN 369103 (Czech, Slovak)",\
https://cs.wikipedia.org/wiki/KOI#KOI8-CS,\
0411,0412,0413,00A7,00DF,0414,0401,0416,0417,0418,0419,041A,FFFE,FFFE,041B,041C,\ 80..8F
041D,041F,0420,0422,0423,0424,2588,2584,2580,0426,0427,0428,0429,042A,042B,042C,\ 90..9F
042D,042E,042F,0431,0432,0433,0434,0451,0436,0437,0438,0439,043A,00A7,043B,043C,\ A0..AF
043D,043F,0440,0442,0443,0444,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ B0..BF
250C,00E1,2514,010D,010F,011B,0155,2500,00FC,00ED,016F,013A,013E,00F6,0148,00F3,\ C0..CF
00F4,00E4,0159,0161,0165,00FA,251C,00E9,00E0,00FD,017E,252C,2567,258C,2590,253C,\ D0..DF
2510,00C1,2518,010C,010E,011A,0154,2502,00DC,00CD,016E,0139,013D,00D6,0147,00D3,\ E0..EF
00D4,00C4,0158,0160,0164,00DA,2524,00C9,00C0,00DD,017D,2534,207F,00B7,25A0,00A0,\ F0..FF
;
CP   895,"Kamenicky","AKA DOS-895 AKA KEYBCS2 (Czech, Slovak)",\
https://en.wikipedia.org/wiki/Kamenick%C3%BD_encoding,\
010C,00FC,00E9,010F,00E4,010E,0164,010D,011B,011A,0139,00CD,013E,013A,00C4,00C1,\ 80..8F
00C9,017E,017D,00F4,00F6,00D3,016F,00DA,00FD,00D6,00DC,0160,013D,00DD,0158,0165,\ 90..9F
00E1,00ED,00F3,00FA,0148,0147,016E,00D4,0161,0159,0155,0154,00BC,00A7,00AB,00BB,\ A0..AF
2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF
2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF
2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF
03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF
2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF
;
CP   912,"IBM912","(Central European)",\
https://en.wikipedia.org/wiki/Code_page_912,\
2591,2592,2593,2502,2524,2518,250C,2588,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ 80..8F
2514,2534,252C,251C,2500,253C,2584,2580,255A,2554,2569,2566,2560,2550,256C,00AE,\ 90..9F
00A0,0104,02D8,0141,00A4,013D,015A,00A7,00A8,0160,015E,0164,0179,00AD,017D,017B,\ A0..AF
00B0,0105,02DB,0142,00B4,013E,015B,02C7,00B8,0161,015F,0165,017A,02DD,017E,017C,\ B0..BF
0154,00C1,00C2,0102,00C4,0139,0106,00C7,010C,00C9,0118,00CB,011A,00CD,00CE,010E,\ C0..CF
0110,0143,0147,00D3,00D4,0150,00D6,00D7,0158,016E,00DA,0170,00DC,00DD,0162,00DF,\ D0..DF
0155,00E1,00E2,0103,00E4,013A,0107,00E7,010D,00E9,0119,00EB,011B,00ED,00EE,010F,\ E0..EF
0111,0144,0148,00F3,00F4,0151,00F6,00F7,0159,016F,00FA,0171,00FC,00FD,0163,02D9,\ F0..FF
;
CP  1006,"IBM1006","(Arabic)",\
https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/CP1006.TXT,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,06F0,06F1,06F2,06F3,06F4,06F5,06F6,06F7,06F8,06F9,060C,061B,00AD,061F,FE81,\ A0..AF
FE8D,FE8E,FE8E,FE8F,FE91,FB56,FB58,FE93,FE95,FE97,FB66,FB68,FE99,FE9B,FE9D,FE9F,\ B0..BF
FB7A,FB7C,FEA1,FEA3,FEA5,FEA7,FEA9,FB84,FEAB,FEAD,FB8C,FEAF,FB8A,FEB1,FEB3,FEB5,\ C0..CF
FEB7,FEB9,FEBB,FEBD,FEBF,FEC1,FEC5,FEC9,FECA,FECB,FECC,FECD,FECE,FECF,FED0,FED1,\ D0..DF
FED3,FED5,FED7,FED9,FEDB,FB92,FB94,FEDD,FEDF,FEE0,FEE1,FEE3,FB9E,FEE5,FEE7,FE85,\ E0..EF
FEED,FBA6,FBA8,FBA9,FBAA,FE80,FE89,FE8A,FE8B,FEF1,FEF2,FEF3,FBB0,FBAE,FE7C,FE7D,\ F0..FF
;
CP  1167,"KOI8-RU","IBM1167 (Cyrillic, Russian, Ukrainian, Belarusian)",\
https://en.wikipedia.org/wiki/KOI8-RU,\
2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F
2591,2592,2593,201C,25A0,2219,201D,2014,2116,2122,00A0,00BB,00AE,00AB,00B7,00A4,\ 90..9F
2550,2551,2552,0451,0454,2554,0456,0457,2557,2558,2559,255A,255B,0491,045D,255E,\ A0..AF
255F,2560,2561,0401,0404,2563,0406,0407,2566,2567,2568,2569,256A,0490,040D,00A9,\ B0..BF
044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF
043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF
042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF
041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF
;
CP  1168,"KOI8-U","IBM1168 (Cyrillic, Ukrainian)",\
https://en.wikipedia.org/wiki/KOI8-U,\
2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F
2591,2592,2593,2320,25A0,2219,221A,2248,2264,2265,00A0,2321,00B0,00B2,00B7,00F7,\ 90..9F
2550,2551,2552,0451,0454,2554,0456,0457,2557,2558,2559,255A,255B,0491,255D,255E,\ A0..AF
255F,2560,2561,0401,0404,2563,0406,0407,2566,2567,2568,2569,256A,0490,256C,00A9,\ B0..BF
044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF
043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF
042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF
041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF
;
CP  1250,"Windows-1250","(Central European)",\
https://en.wikipedia.org/wiki/Windows-1250,\
20AC,FFFE,201A,FFFE,201E,2026,2020,2021,FFFE,2030,0160,2039,015A,0164,017D,0179,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,0161,203A,015B,0165,017E,017A,\ 90..9F
00A0,02C7,02D8,0141,00A4,0104,00A6,00A7,00A8,00A9,015E,00AB,00AC,00AD,00AE,017B,\ A0..AF
00B0,00B1,02DB,0142,00B4,00B5,00B6,00B7,00B8,0105,015F,00BB,013D,02DD,013E,017C,\ B0..BF
0154,00C1,00C2,0102,00C4,0139,0106,00C7,010C,00C9,0118,00CB,011A,00CD,00CE,010E,\ C0..CF
0110,0143,0147,00D3,00D4,0150,00D6,00D7,0158,016E,00DA,0170,00DC,00DD,0162,00DF,\ D0..DF
0155,00E1,00E2,0103,00E4,013A,0107,00E7,010D,00E9,0119,00EB,011B,00ED,00EE,010F,\ E0..EF
0111,0144,0148,00F3,00F4,0151,00F6,00F7,0159,016F,00FA,0171,00FC,00FD,0163,02D9,\ F0..FF
;
CP  1251,"Windows-1251","(Cyrillic)",\
https://en.wikipedia.org/wiki/Windows-1251,\
0402,0403,201A,0453,201E,2026,2020,2021,20AC,2030,0409,2039,040A,040C,040B,040F,\ 80..8F
0452,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,0459,203A,045A,045C,045B,045F,\ 90..9F
00A0,040E,045E,0408,00A4,0490,00A6,00A7,0401,00A9,0404,00AB,00AC,00AD,00AE,0407,\ A0..AF
00B0,00B1,0406,0456,0491,00B5,00B6,00B7,0451,2116,0454,00BB,0458,0405,0455,0457,\ B0..BF
0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ C0..CF
0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ D0..DF
0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ E0..EF
0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ F0..FF
;
CP  1252,"Windows-1252","ISO-8859-1, Latin 1 (Western European)",\
https://en.wikipedia.org/wiki/Windows-1252,\
20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,0160,2039,0152,FFFE,017D,FFFE,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,0161,203A,0153,FFFE,017E,0178,\ 90..9F
00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF
00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF
00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
00F0,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,00FD,00FE,00FF,\ F0..FF
;
CP  1253,"Windows-1253","(Greek Modern)",\
https://en.wikipedia.org/wiki/Windows-1253,\
20AC,FFFE,201A,0192,201E,2026,2020,2021,FFFE,2030,FFFE,2039,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,FFFE,203A,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0385,0386,00A3,00A4,00A5,00A6,00A7,00A8,00A9,FFFE,00AB,00AC,00AD,00AE,2015,\ A0..AF
00B0,00B1,00B2,00B3,0384,00B5,00B6,00B7,0388,0389,038A,00BB,038C,00BD,038E,038F,\ B0..BF
0390,0391,0392,0393,0394,0395,0396,0397,0398,0399,039A,039B,039C,039D,039E,039F,\ C0..CF
03A0,03A1,FFFE,03A3,03A4,03A5,03A6,03A7,03A8,03A9,03AA,03AB,03AC,03AD,03AE,03AF,\ D0..DF
03B0,03B1,03B2,03B3,03B4,03B5,03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,\ E0..EF
03C0,03C1,03C2,03C3,03C4,03C5,03C6,03C7,03C8,03C9,03CA,03CB,03CC,03CD,03CE,FFFE,\ F0..FF
;
CP  1254,"Windows-1254","(Turkish)",\
https://en.wikipedia.org/wiki/Windows-1254,\
20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,0160,2039,0152,FFFE,FFFE,FFFE,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,0161,203A,0153,FFFE,FFFE,0178,\ 90..9F
00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF
00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
011E,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,0130,015E,00DF,\ D0..DF
00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..ED
011F,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,0131,015F,00FF,\ F0..FF
;
CP  1255,"Windows-1255","(Hebrew)",\
https://en.wikipedia.org/wiki/Windows-1255,\
20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,FFFE,2039,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,FFFE,203A,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,00A1,00A2,00A3,20AA,00A5,00A6,00A7,00A8,00A9,00D7,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00F7,00BB,00BC,00BD,00BE,00BF,\ B0..BF
05B0,05B1,05B2,05B3,05B4,05B5,05B6,05B7,05B8,05B9,05BA,05BB,05BC,05BD,05BE,05BF,\ C0..CF
05C0,05C1,05C2,05C3,05F0,05F1,05F2,05F3,05F4,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ D0..CF
05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ E0..EF
05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,FFFE,FFFE,200E,200F,FFFE,\ F0..FF
;
CP  1256,"Windows-1256","(Arabic)",\
https://en.wikipedia.org/wiki/Windows-1256,\
20AC,067E,201A,0192,201E,2026,2020,2021,02C6,2030,0679,2039,0152,0686,0698,0688,\ 80..8F
06AF,2018,2019,201C,201D,2022,2013,2014,06A9,2122,0691,203A,0153,200C,200D,06BA,\ 90..9F
00A0,060C,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,06BE,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,061B,00BB,00BC,00BD,00BE,061F,\ B0..BF
06C1,0621,0622,0623,0624,0625,0626,0627,0628,0629,062A,062B,062C,062D,062E,062F,\ C0..CF
0630,0631,0632,0633,0634,0635,0636,00D7,0637,0638,0639,063A,0640,0641,0642,0643,\ D0..DF
00E0,0644,00E2,0645,0646,0647,0648,00E7,00E8,00E9,00EA,00EB,0649,064A,00EE,00EF,\ E0..EF
064B,064C,064D,064E,00F4,064F,0650,00F7,0651,00F9,0652,00FB,00FC,200E,200F,06D2,\ F0..FF
;
CP  1257,"Windows-1257","(Baltic)",\
https://en.wikipedia.org/wiki/Windows-1257,\
20AC,FFFE,201A,FFFE,201E,2026,2020,2021,FFFE,2030,FFFE,2039,FFFE,00A8,02C7,00B8,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,FFFE,203A,FFFE,00AF,02DB,FFFE,\ 90..9F
00A0,FFFE,00A2,00A3,00A4,FFFE,00A6,00A7,00D8,00A9,0156,00AB,00AC,00AD,00AE,00C6,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00F8,00B9,0157,00BB,00BC,00BD,00BE,00E6,\ B0..BF
0104,012E,0100,0106,00C4,00C5,0118,0112,010C,00C9,0179,0116,0122,0136,012A,013B,\ C0..CF
0160,0143,0145,00D3,014C,00D5,00D6,00D7,0172,0141,015A,016A,00DC,017B,017D,00DF,\ D0..DF
0105,012F,0101,0107,00E4,00E5,0119,0113,010D,00E9,017A,0117,0123,0137,012B,013C,\ E0..EF
0161,0144,0146,00F3,014D,00F5,00F6,00F7,0173,0142,015B,016B,00FC,017C,017E,02D9,\ F0..FF
;
CP  1258,"Windows-1258","(Vietnamese)",\
https://en.wikipedia.org/wiki/Windows-1258,\
20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,FFFE,2039,0152,FFFE,FFFE,FFFE,\ 80..8F
FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,FFFE,203A,0153,FFFE,FFFE,0178,\ 90..9F
00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF
00C0,00C1,00C2,0102,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,0300,00CD,00CE,00CF,\ C0..CF
0110,00D1,0309,00D3,00D4,01A0,00D6,00D7,00D8,00D9,00DA,00DB,00DC,01AF,0303,00DF,\ D0..DF
00E0,00E1,00E2,0103,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,0301,00ED,00EE,00EF,\ E0..EF
0111,00F1,0323,00F3,00F4,01A1,00F6,00F7,00F8,00F9,00FA,00FB,00FC,01B0,20AB,00FF,\ F0..FF
;
CP 10000,"Mac-Roman","Macintosh Roman (Western European)",\
https://en.wikipedia.org/wiki/Mac_OS_Roman,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF
221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,03A9,00E6,00F8,\ B0..BF
00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,20AC,2039,203A,FB01,FB02,\ D0..DF
2021,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
F8FF,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0..FF
;
CP 10004,"Mac-Arabic","Macintosh Arabic",\
https://en.wikipedia.org/wiki/MacArabic_encoding,\
00C4,00A0,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,06BA,00AB,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,2026,00EE,00EF,00F1,00F3,00BB,00F4,00F6,00F7,00FA,00F9,00FB,00FC,\ 90..9F
0020,0021,0022,0023,0024,066A,0026,0027,0028,0029,002A,002B,060C,002D,002E,002F,\ A0..AF
0660,0661,0662,0663,0664,0665,0666,0667,0668,0669,003A,061B,003C,003D,003E,061F,\ B0..BF
274A,0621,0622,0623,0624,0625,0626,0627,0628,0629,062A,062B,062C,062D,062E,062F,\ C0..CF
0630,0631,0632,0633,0634,0635,0636,0637,0638,0639,063A,005B,005C,005D,005E,005F,\ D0..DF
0640,0641,0642,0643,0644,0645,0646,0647,0648,0649,064A,064B,064C,064D,064E,064F,\ E0..EF
0650,0651,0652,067E,0679,0686,06D5,06A4,06AF,0688,0691,007B,007C,007D,0698,06D2,\ F0..FF
;
CP 10005,"Mac-Hebrew","Macintosh Hebrew",\
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/HEBREW.TXT,\
00C4,05F2,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
0020,0021,0022,0023,0024,0025,20AA,0027,0029,0028,002A,002B,002C,002D,002E,002F,\ A0..AF
0030,0031,0032,0033,0034,0035,0036,0037,0038,0039,003A,003B,003C,003D,003E,003F,\ B0..BF
FFFF,201E,FFFF,FFFF,FFFF,FFFF,05BC,FB4B,FB35,2026,00A0,05B8,05B7,05B5,05B6,05B4,\ C0..CF
2013,2014,201C,201D,2018,2019,FB2A,FB2B,05BF,05B0,05B2,05B1,05BB,05B9,05B8,05B3,\ D0..DF
05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ E0..EF
05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,007D,005D,007B,005B,007C,\ F0..FF
;
CP 10006,"Mac-Greek","Macintosh Greek",\
https://en.wikipedia.org/wiki/MacGreek_encoding,\
00C4,00B9,00B2,00C9,00B3,00D6,00DC,0385,00E0,00E2,00E4,0384,00A8,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00A3,2122,00EE,00EF,2022,00BD,2030,00F4,00F6,00A6,20AC,00F9,00FB,00FC,\ 90..9F
2020,0393,0394,0398,039B,039E,03A0,00DF,00AE,00A9,03A3,03AA,00A7,2260,00B0,00B7,\ A0..AF
0391,00B1,2264,2265,00A5,0392,0395,0396,0397,0399,039A,039C,03A6,03AB,03A8,03A9,\ B0..BF
03AC,039D,00AC,039F,03A1,2248,03A4,00AB,00BB,2026,00A0,03A5,03A7,0386,0388,0153,\ C0..CF
2013,2015,201C,201D,2018,2019,00F7,0389,038A,038C,038E,03AD,03AE,03AF,03CC,038F,\ D0..DF
03CD,03B1,03B2,03C8,03B4,03B5,03C6,03B3,03B7,03B9,03BE,03BA,03BB,03BC,03BD,03BF,\ E0..EF
03C0,03CE,03C1,03C3,03C4,03B8,03C9,03C2,03C7,03C5,03B6,03CA,03CB,0390,03B0,00AD,\ F0..FF
;
CP 10007,"Mac-Cyrillic","Macintosh Cyrillic",\
https://en.wikipedia.org/wiki/Mac_OS_Cyrillic_encoding,\
0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ 80..8F
0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ 90..9F
2020,00B0,0490,00A3,00A7,2022,00B6,0406,00AE,00A9,2122,0402,0452,2260,0403,0453,\ A0..AF
221E,00B1,2264,2265,0456,00B5,0491,0408,0404,0454,0407,0457,0409,0459,040A,045A,\ B0..BF
0458,0405,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,040B,045B,040C,045C,0455,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,201E,040E,045E,040F,045F,2116,0401,0451,044F,\ D0..DF
0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ E0..EF
0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,20AC,\ F0..FF
;
CP 10010,"Mac-Romanian","Macintosh Romanian",\
https://en.wikipedia.org/wiki/Mac_OS_Romanian_encoding,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,0102,015E,\ A0..AF
221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,2126,0103,015F,\ B0..BF
00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,00A4,2039,203A,0162,0163,\ D0..DF
2021,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
FFFE,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0..FF
;
CP 10017,"Mac-Ukrainian","Macintosh Ukrainian",\
https://en.wikipedia.org/wiki/Mac_OS_Ukrainian_encoding,\
0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ 80..8F
0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ 90..9F
2020,00B0,0490,00A3,00A7,2022,00B6,0406,00AE,00A9,2122,0402,0452,2260,0403,0453,\ A0..AF
221E,00B1,2264,2265,0456,00B5,0491,0408,0404,0454,0407,0457,0409,0459,040A,045A,\ B0..BF
0458,0405,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,040B,045B,040C,045C,0455,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,201E,040E,045E,040F,045F,2116,0401,0451,044F,\ D0..DF
0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ E0..EF
0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,00A4,\ F0..FF
;
CP 10021,"Mac-Thai","Macintosh Thai",\
https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/THAI.TXT,\
00AB,00BB,2026,0E48,0E49,0E4A,0E4B,0E4C,0E48,0E49,0E4A,0E4B,0E4C,201C,201D,0E4D,\ 80..8F
FFFE,2022,0E31,0E47,0E34,0E35,0E36,0E37,0E48,0E49,0E4A,0E4B,0E4C,2018,2019,FFFE,\ 90..9F
00A0,0E01,0E02,0E03,0E04,0E05,0E06,0E07,0E08,0E09,0E0A,0E0B,0E0C,0E0D,0E0E,0E0F,\ A0..AF
0E10,0E11,0E12,0E13,0E14,0E15,0E16,0E17,0E18,0E19,0E1A,0E1B,0E1C,0E1D,0E1E,0E1F,\ B0..BF
0E20,0E21,0E22,0E23,0E24,0E25,0E26,0E27,0E28,0E29,0E2A,0E2B,0E2C,0E2D,0E2E,0E2F,\ C0..CF
0E30,0E31,0E32,0E33,0E34,0E35,0E36,0E37,0E38,0E39,0E3A,2060,200B,2013,2014,0E3F,\ D0..DF
0E40,0E41,0E42,0E43,0E44,0E45,0E46,0E47,0E48,0E49,0E4A,0E4B,0E4C,0E4D,2122,0E4F,\ E0..EF
0E50,0E51,0E52,0E53,0E54,0E55,0E56,0E57,0E58,0E59,00AE,00A9,FFFE,FFFE,FFFE,FFFE,\ F0..FF
;
CP 10029,"Mac-CE","Macintosh Central European, MAC-Latin2",\
https://en.wikipedia.org/wiki/Mac_OS_Central_European_encoding,\
00C4,0100,0101,00C9,0104,00D6,00DC,00E1,0105,010C,00E4,010D,0106,0107,00E9,0179,\ 80..8F
017A,010E,00ED,010F,0112,0113,0116,00F3,0117,00F4,00F6,00F5,00FA,011A,011B,00FC,\ 90..9F
2020,00B0,0118,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,0119,00A8,2260,0123,012E,\ A0..AF
012F,012A,2264,2265,012B,0136,2202,2211,0142,013B,013C,013D,013E,0139,013A,0145,\ B0..BF
0146,0143,00AC,221A,0144,0147,2206,00AB,00BB,2026,00A0,0148,0150,00D5,0151,014C,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,25CA,014D,0154,0155,0158,2039,203A,0159,0156,\ D0..DF
0157,0160,201A,201E,0161,015A,015B,00C1,0164,0165,00CD,017D,017E,016A,00D3,00D4,\ E0..EF
016B,016E,00DA,016F,0170,0171,0172,0173,00DD,00FD,0137,017B,0141,017C,0122,02C7,\ F0..FF
;
CP 10079,"Mac-Icelandic","Macintosh Icelandic",\
https://en.wikipedia.org/wiki/Mac_OS_Icelandic_encoding,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
00DD,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF
221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,2126,00E6,00F8,\ B0..BF
00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,20AC,00D0,00F0,00DE,00FE,\ D0..DF
00FD,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
FFFE,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0..FF
;
CP 10080,"Mac-Inuit","Macintosh Inuit",\
https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/INUIT.TXT,\
1403,1404,1405,1406,140A,140B,1431,1432,1433,1434,1438,1439,1449,144E,144F,1450,\ 80..8F
1451,1455,1456,1466,146D,146E,146F,1470,1472,1473,1483,148B,148C,148D,148E,1490,\ 90..9F
1491,00B0,14A1,14A5,14A6,2022,00B6,14A7,00AE,00A9,2122,14A8,14AA,14AB,14BB,14C2,\ A0..AF
14C3,14C4,14C5,14C7,14C8,14D0,14EF,14F0,14F1,14F2,14F4,14F5,1505,14D5,14D6,14D7,\ B0..BF
14D8,14DA,14DB,14EA,1528,1529,152A,152B,152D,2026,00A0,152E,153E,1555,1556,1557,\ C0..CF
2013,2014,201C,201D,2018,2019,1558,1559,155A,155D,1546,1547,1548,1549,154B,154C,\ D0..DF
1550,157F,1580,1581,1582,1583,1584,1585,158F,1590,1591,1592,1593,1594,1595,1671,\ E0..EF
1672,1673,1674,1675,1676,1596,15A0,15A1,15A2,15A3,15A4,15A5,15A6,157C,0141,0142,\ F0..FF
;
CP 10081,"Mac-Turkish","Macintosh Turkish",\
https://en.wikipedia.org/wiki/Mac_OS_Turkish_encoding,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF
221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,2126,00E6,00F8,\ B0..BF
00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,011E,011F,0130,0131,015E,015F,\ D0..DF
2021,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
FFFE,00D2,00DA,00DB,00D9,FFFE,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0.FF
;
CP 10082,"Mac-Croatian","Macintosh Serbian Latin",\
https://en.wikipedia.org/wiki/Mac_OS_Croatian_encoding,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,0160,2122,00B4,00A8,2260,017D,00D8,\ A0..AF
221E,00B1,2264,2265,2206,00B5,2202,2211,220F,0161,222B,00AA,00BA,03A9,017E,00F8,\ B0..BF
00BF,00A1,00AC,221A,0192,2248,0106,00AB,010C,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
0110,2014,201C,201D,2018,2019,00F7,25CA,F8FF,00A9,2044,20AC,2039,203A,00C6,00BB,\ D0..DF
2013,00B7,201A,201E,2030,00C2,0107,00C1,010D,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
0111,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,03C0,00CB,02DA,00B8,00CA,00E6,02C7,\ F0..FF
;
CP 10083,"Mac-Gaelic","Macintosh Gaelic",\
https://en.wikipedia.org/wiki/Mac_OS_Gaelic,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF
1E02,00B1,2264,2265,1E03,010A,010B,1E0A,1E0B,1E1E,1E1F,0120,0121,1E40,00E6,00F8,\ B0..BF
1E41,1E56,1E57,027C,0192,017F,1E60,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
2013,2014,201C,201D,2018,2019,1E61,1E9B,00FF,0178,1E6A,20AC,2039,203A,0176,0177,\ D0..DF
1E6B,00B7,1EF2,1EF3,204A,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
2618,00D2,00DA,00DB,00D9,0131,00DD,00FD,0174,0175,1E84,1E85,1E80,1E81,1E82,1E83,\ F0..FF
;
CP 10084,"Mac-Celtic","Macintosh Celtic",\
https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CELTIC.TXT,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF
221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,03A9,00E6,00F8,\ B0..BF
00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,20AC,2039,203A,0176,0177,\ D0..DF
2021,00B7,1EF2,1EF3,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
2663,00D2,00DA,00DB,00D9,0131,00DD,00FD,0174,0175,1E84,1E85,1E80,1E81,1E82,1E83,\ F0..FF
;
CP 10089,"Mac-Latin","AKA Kermit, Macintosh Latin",\
https://en.wikipedia.org/wiki/Macintosh_Latin_encoding,\
00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F
00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F
00DD,00B0,00A2,00A3,00A7,00D7,00B6,00DF,00AE,00A9,00B2,00B4,00A8,00B3,00C6,00D8,\ A0..AF
00B9,00B1,00BC,00BD,00A5,00B5,FFFE,FFFE,FFFE,FFFE,00BE,00AA,00BA,FFFE,00E6,00F8,\ B0..BF
00BF,00A1,00AC,0141,0192,02CB,FFFE,00AB,00BB,00A6,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF
00AD,FFFE,FFFE,FFFE,0142,FFFE,00F7,FFFE,00FF,0178,FFFE,00A4,00D0,00F0,00DE,00FE,\ D0..DF
00FD,00B7,FFFE,FFFE,FFFE,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF
FFFE,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0.FF
;
CP 10101, "NextStep","AKA OpenStep",\
https://www.unicode.org/Public/MAPPINGS/VENDORS/NEXT/NEXTSTEP.TXT,\
00A0,00C0,00C1,00C2,00C3,00C4,00C5,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ 80..8F
00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D9,00DA,00DB,00DC,00DD,00DE,00B5,00D7,00F7,\ 90..9F
00A9,00A1,00A2,00A3,2044,00A5,0192,00A7,00A4,2019,201C,00AB,2039,203A,FB01,FB02,\ A0..AF
00AE,2013,2020,2021,00B7,00A6,00B6,2022,201A,201E,201D,00BB,2026,2030,00AC,00BF,\ B0..BF
00B9,02CB,00B4,02C6,02DC,00AF,02D8,02D9,00A8,00B2,02DA,00B8,00B3,02DD,02DB,02C7,\ C0..CF
2014,00B1,00BC,00BD,00BE,00E0,00E1,00E2,00E3,00E4,00E5,00E7,00E8,00E9,00EA,00EB,\ D0..DF
00EC,00C6,00ED,00AA,00EE,00EF,00F0,00F1,0141,00D8,0152,00BA,00F2,00F3,00F4,00F5,\ E0..EF
00F6,00E6,00F9,00FA,00FB,0131,00FC,00FD,0142,00F8,0153,00DF,00FE,00FF,FFFD,FFFD,\ F0..FF
;
CP 28591,"ISO-8859-1","Latin 1 (Western European)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-1,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF
00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF
00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
00F0,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,00FD,00FE,00FF,\ F0..FF
;
CP 28592,"ISO-8859-2","Latin 2 (Central European)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-1,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0104,02D8,0141,00A4,013D,015A,00A7,00A8,0160,015E,0164,0179,00AD,017D,017B,\ A0..AF
00B0,0105,02DB,0142,00B4,013E,015B,02C7,00B8,0161,015F,0165,017A,02DD,017E,017C,\ B0..BF
0154,00C1,00C2,0102,00C4,0139,0106,00C7,010C,00C9,0118,00CB,011A,00CD,00CE,010E,\ C0..CF
0110,0143,0147,00D3,00D4,0150,00D6,00D7,0158,016E,00DA,0170,00DC,00DD,0162,00DF,\ D0..DF
0155,00E1,00E2,0103,00E4,013A,0107,00E7,010D,00E9,0119,00EB,011B,00ED,00EE,010F,\ E0..EF
0111,0144,0148,00F3,00F4,0151,00F6,00F7,0159,016F,00FA,0171,00FC,00FD,0163,02D9,\ F0..FF
;
CP 28593,"ISO-8859-3","Latin 3 (Turkish, Maltese, Esperanto)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-3,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0126,02D8,00A3,00A4,FFFE,0124,00A7,00A8,0130,015E,011E,0134,00AD,FFFE,017B,\ A0..AF
00B0,0127,00B2,00B3,00B4,00B5,0125,00B7,00B8,0131,015F,011F,0135,00BD,FFFE,017C,\ B0..BF
00C0,00C1,00C2,FFFE,00C4,010A,0108,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
FFFE,00D1,00D2,00D3,00D4,0120,00D6,00D7,011C,00D9,00DA,00DB,00DC,016C,015C,00DF,\ D0..DF
00E0,00E1,00E2,FFFE,00E4,010B,0109,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
FFFE,00F1,00F2,00F3,00F4,0121,00F6,00F7,011D,00F9,00FA,00FB,00FC,016D,015D,02D9,\ F0..FF
;
CP 28594,"ISO-8859-4","Latin 4 (Baltic)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-4,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0104,0138,0156,00A4,0128,013B,00A7,00A8,0160,0112,0122,0166,00AD,017D,00AF,\ A0..AF
00B0,0105,02DB,0157,00B4,0129,013C,02C7,00B8,0161,0113,0123,0167,014A,017E,014B,\ B0..BF
0100,00C1,00C2,00C3,00C4,00C5,00C6,012E,010C,00C9,0118,00CB,0116,00CD,00CE,012A,\ C0..CF
0110,0145,014C,0136,00D4,00D5,00D6,00D7,00D8,0172,00DA,00DB,00DC,0168,016A,00DF,\ D0..DF
0101,00E1,00E2,00E3,00E4,00E5,00E6,012F,010D,00E9,0119,00EB,0117,00ED,00EE,012B,\ E0..EF
0111,0146,014D,0137,00F4,00F5,00F6,00F7,00F8,0173,00FA,00FB,00FC,0169,016B,02D9,\ F0..FF
;
CP 28595,"ISO-8859-5","Latin/Cyrillic",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-5,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0401,0402,0403,0404,0405,0406,0407,0408,0409,040A,040B,040C,00AD,040E,040F,\ A0..AF
0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ B0..BF
0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ C0..CF
0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ D0..DF
0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ E0..EF
2116,0451,0452,0453,0454,0455,0456,0457,0458,0459,045A,045B,045C,00A7,045E,045F,\ F0..FF
;
CP 28596,"ISO-8859-6","Latin/Arabic",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-6,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,FFFE,FFFE,FFFE,00A4,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,060C,00AD,FFFE,FFFE,\ A0..AF
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,061B,FFFE,FFFE,FFFE,061F,\ B0..BF
FFFE,0621,0622,0623,0624,0625,0626,0627,0628,0629,062A,062B,062C,062D,062E,062F,\ C0..CF
0630,0631,0632,0633,0634,0635,0636,0637,0638,0639,063A,FFFE,FFFE,FFFE,FFFE,FFFE,\ D0..DF
0640,0641,0642,0643,0644,0645,0646,0647,0648,0649,064A,064B,064C,064D,064E,064F,\ E0..EF
0650,0651,0652,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ F0..FF
;
CP 28597,"ISO-8859-7","Latin/Greek",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-7,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,2018,2019,00A3,20AC,20AF,00A6,00A7,00A8,00A9,037A,00AB,00AC,00AD,FFFE,2015,\ A0..AF
00B0,00B1,00B2,00B3,0384,0385,0386,00B7,0388,0389,038A,00BB,038C,00BD,038E,038F,\ B0..BF
0390,0391,0392,0393,0394,0395,0396,0397,0398,0399,039A,039B,039C,039D,039E,039F,\ C0..CF
03A0,03A1,FFFE,03A3,03A4,03A5,03A6,03A7,03A8,03A9,03AA,03AB,03AC,03AD,03AE,03AF,\ D0..DF
03B0,03B1,03B2,03B3,03B4,03B5,03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,\ E0..EF
03C0,03C1,03C2,03C3,03C4,03C5,03C6,03C7,03C8,03C9,03CA,03CB,03CC,03CD,03CE,FFFE,\ F0..FF
;
CP 28598,"ISO-8859-8","Latin/Hebrew, AKA IBM916 ",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-8,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,FFFE,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00D7,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00F7,00BB,00BC,00BD,00BE,FFFE,\ B0..BF
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ C0..CF
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,2017,\ D0..DF
05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ E0..EF
05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,FFFE,FFFE,200E,200F,FFFE,\ F0..FF
;
CP 28599,"ISO-8859-9","Latin 5, AKA IBM920 (Turkish)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-9,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF
00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
011E,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,0130,015E,00DF,\ D0..DF
00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
011F,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,0131,015F,00FF,\ F0..FF
;
CP 28600,"ISO-8859-10","Latin 6, AKA IBM919 (Nordic)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-10,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0104,0112,0122,012A,0128,0136,00A7,013B,0110,0160,0166,017D,00AD,016A,014A,\ A0..AF
00B0,0105,0113,0123,012B,0129,0137,00B7,013C,0111,0161,0167,017E,2015,016B,014B,\ B0..BF
0100,00C1,00C2,00C3,00C4,00C5,00C6,012E,010C,00C9,0118,00CB,0116,00CD,00CE,00CF,\ C0..CF
00D0,0145,014C,00D3,00D4,00D5,00D6,0168,00D8,0172,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF
0101,00E1,00E2,00E3,00E4,00E5,00E6,012F,010D,00E9,0119,00EB,0117,00ED,00EE,00EF,\ E0..EF
00F0,0146,014D,00F3,00F4,00F5,00F6,0169,00F8,0173,00FA,00FB,00FC,00FD,00FE,0138,\ F0..FF
;
CP 28601,"ISO-8859-11","Latin/Thai",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-11,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0E01,0E02,0E03,0E04,0E05,0E06,0E07,0E08,0E09,0E0A,0E0B,0E0C,0E0D,0E0E,0E0F,\ A0..AF
0E10,0E11,0E12,0E13,0E14,0E15,0E16,0E17,0E18,0E19,0E1A,0E1B,0E1C,0E1D,0E1E,0E1F,\ B0..BF
0E20,0E21,0E22,0E23,0E24,0E25,0E26,0E27,0E28,0E29,0E2A,0E2B,0E2C,0E2D,0E2E,0E2F,\ C0..CF
0E30,0E31,0E32,0E33,0E34,0E35,0E36,0E37,0E38,0E39,0E3A,FFFE,FFFE,FFFE,FFFE,0E3F,\ D0..DF
0E40,0E41,0E42,0E43,0E44,0E45,0E46,0E47,0E48,0E49,0E4A,0E4B,0E4C,0E4D,0E4E,0E4F,\ E0..EF
0E50,0E51,0E52,0E53,0E54,0E55,0E56,0E57,0E58,0E59,0E5A,0E5B,FFFE,FFFE,FFFE,FFFE,\ F0..FF
;
CP 28603,"ISO-8859-13","Latin 7, AKA IBM921 (Baltic)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-12,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,201D,00A2,00A3,00A4,201E,00A6,00A7,00D8,00A9,0156,00AB,00AC,00AD,00AE,00C6,\ A0..AF
00B0,00B1,00B2,00B3,201C,00B5,00B6,00B7,00F8,00B9,0157,00BB,00BC,00BD,00BE,00E6,\ B0..BF
0104,012E,0100,0106,00C4,00C5,0118,0112,010C,00C9,0179,0116,0122,0136,012A,013B,\ C0..CF
0160,0143,0145,00D3,014C,00D5,00D6,00D7,0172,0141,015A,016A,00DC,017B,017D,00DF,\ D0..DF
0105,012F,0101,0107,00E4,00E5,0119,0113,010D,00E9,017A,0117,0123,0137,012B,013C,\ E0..EF
0161,0144,0146,00F3,014D,00F5,00F6,00F7,0173,0142,015B,016B,00FC,017C,017E,2019,\ F0..FF
;
CP 28604,"ISO-8859-14","Latin 8 (Celtic)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-14,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,1E02,1E03,00A3,010A,010B,1E0A,00A7,1E80,00A9,1E82,1E0B,1EF2,00AD,00AE,0178,\ A0..AF
1E1E,1E1F,0120,0121,1E40,1E41,00B6,1E56,1E81,1E57,1E83,1E60,1EF3,1E84,1E85,1E61,\ B0..BF
00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
0174,00D1,00D2,00D3,00D4,00D5,00D6,1E6A,00D8,00D9,00DA,00DB,00DC,00DD,0176,00DF,\ D0..DF
00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
0175,00F1,00F2,00F3,00F4,00F5,00F6,1E6B,00F8,00F9,00FA,00FB,00FC,00FD,0177,00FF,\ F0..FF
;
CP 28605,"ISO-8859-15","Latin 9, AKA IBM923 (Western Europe)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-15,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,00A1,00A2,00A3,20AC,00A5,0160,00A7,0161,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF
00B0,00B1,00B2,00B3,017D,00B5,00B6,00B7,017E,00B9,00BA,00BB,0152,0153,0178,00BF,\ B0..BF
00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF
00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
00F0,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,00FD,00FE,00FF,\ F0..FF
;
CP 28606,"ISO-8859-16","Latin 10 (South-Eastern Europe)",\
https://en.wikipedia.org/wiki/ISO/IEC_8859-16 ,\
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F
FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F
00A0,0104,0105,0141,20AC,201E,0160,00A7,0161,00A9,0218,00AB,0179,00AD,017A,017B,\ A0..AF
00B0,00B1,010C,0142,017D,201D,00B6,00B7,017E,010D,0219,00BB,0152,0153,0178,017C,\ B0..BF
00C0,00C1,00C2,0102,00C4,0106,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF
0110,0143,00D2,00D3,00D4,0150,00D6,015A,0170,00D9,00DA,00DB,00DC,0118,021A,00DF,\ D0..DF
00E0,00E1,00E2,0103,00E4,0107,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF
0111,0144,00F2,00F3,00F4,0151,00F6,015B,0171,00F9,00FA,00FB,00FC,0119,021B,00FF,\ F0..FF
;
DosData
contains other static data used in DOS variant of EuroConvertor.
[DATA]
SegINPUT      DW PARA# [INPUT]   ; Paragraph address of segment [INPUT].
SegOUTPUT     DW PARA# [OUTPUT]  ; Paragraph address of segment [OUTPUT].
InpEncId      DW 0    ;  Input encoding identifier (437..65001).
OutEncId      DW 0    ; Output encoding identifier (437..65001).
InpEncSt      DW 0    ; Input  encoding flags, see above.
OutEncSt      DW 0    ; Output encoding flags, see above.
Relevance     DD 0    ; Total sum of relevances of all characters in the input text.
BestRelevance DD 8000_0000 ; Best relevance autodetected so far. Begin with lowest negative dword.
InpErrorsLo   DW 0    ; Number of input characters which are not defined in input encoding.
InpErrorsHi   DW 0    ; Number of input characters which are not defined in input encoding.
OutErrorsLo   DW 0    ; Number of characters which are not defined in output encoding.
OutErrorsHi   DW 0    ; Number of characters which are not defined in output encoding.
InpFileName   DB 128*B 0
OutFileName   DB 128*B 0
InpHandle     DW -1
OutHandle     DW -1
InpFileSizeLo DW 0    ; Input file size, lower word.
InpFileSizeHi DW 0    ; Input file size, higher word.
OutFileSizeLo DW 0    ; Output file size, lower word.
OutFileSizeHi DW 0    ; Output file size, higher word.
InpEnd        DW 0    ; Offset behind the last byte of input text.
OutputPtr     DW 0    ; Offset of the next free position in [OUTPUT].
TrTable       DW 0    ; Offset of selected array of 128 WORDs with code points of OEM/ANSI encodings.
EntitySkip    DW 0    ; Number of bytes skipped when HTML entity is decoded on input (0..8).
CharSize      DB 0    ; Character width in bytes (1..4) used to increase relevance during autodetection.
Errorlevel    DB 0    ; 0=normal end, 2=invalid characters, 4=I/O error, 8=wrong syntax.
TempString    DB 16*B ; Working room for string manipulation.

; Byte Order Mark definitions.
BOM_UTF32LE DB 0xFF,0xFE,0x00,0x00
BOM_UTF16LE EQU BOM_UTF32LE
BOM_UTF32BE DB 0x00,0x00,0xFE,0xFF
BOM_UTF16BE EQU BOM_UTF32BE + 2
BOM_UTF8    DB 0xEF,0xBB,0xBF,0

HelpText:
D "Program:   EuroConvertor version DOS %Version",13,10
D "Function:  Conversion of text file encoding.",13,10
D "Format:    Dual DOS/Windows application.",13,10
D "Licence:   Freeware by vitsoft",13,10
D "Arguments: InpEncoding OutEncoding InpFileName OutFileName",13,10
D "Example:   euroconv ISO8859-2 utf16le/BOM input.txt output.txt",13,10
D "Encodings: euroconv enc | more",13,10
D "Manual:    https://vitsoft.info/econv_en.htm",13,10,0
[CODE]
↑ DosMain
DOS program entry point.
Input
Four command-line arguments, as declared in HelpText.
Invoked by
DOS loader.
Invokes
DosAutodetect, DosConvert, DosEncList DosInfoText, DosParseEnc,
DosMain: PROC
    PUSH PARA# [DATA]
    POP  DS
    GetArg 1                       ; Input encoding retrieve by GetArg.
    JCXZ .Help:
    CMPB [ES:SI],'-'
    JE .Help:
    CMPB [ES:SI],'/'
    JE .Help:
    Invoke DosParseEnc, SI,CX
    JC .UnknownEncoding:
    MOV [InpEncId],AX
    MOV [InpEncSt],BX
    JNSt BX,encStEnc,.10:
    CALL DosEncList:               ; Display list of supported encodings.
    JMP .Abort:
.10:GetArg 2                       ; Output encoding.
    Invoke DosParseEnc, SI,CX
    JNC .13:
.UnknownEncoding: ; Encoding is at ES:SI,CX.
    OR [Errorlevel],8
    JCXZ .Help:
    PUSH DS,ES
      POP DS
       StdOutput SI,Size=CX
      POP DS
    StdOutput =B' is not supported encoding.',Eol=Yes
    JMP .Abort:
.13:MOV [OutEncId],AX
    MOV [OutEncSt],BX
    GetArg 3                       ; Input file name.
    JNC .16:
.Help:StdOutput HelpText
    OR [Errorlevel],8
    JMP .Abort:
.16:MOV DI,InpFileName
    MOV DX,DI
.19:LODSB [ES:SI]
    CMP AL,'"'
    JE .23:
    MOV [DS:DI],AL
    INC DI
.23:LOOP .19:
    DosAPI AH=3Dh,AL=fileRead+filePermitAll ; OPEN EXISTING FILE.
    MOV [InpHandle],AX
    JNC .26:
.InputError:
    OR [Errorlevel],4
    StdOutput =B"Error reading input file ", InpFileName, Eol=Yes
    JMP .Abort:
.26:PUSH DS
     DosAPI BX=AX,AH=3Fh,CX=SIZE#[INPUT],DX=0,DS=[SegINPUT] ; READ FROM FILE OR DEVICE.
    POP DS
    JC .InputError:
    MOV [InpEnd],AX ; Number of read bytes = end of read data in [INPUT].
    ADD [InpFileSizeLo],AX
    GetArg 4                        ; Output file name.
    JC .Help:
    MOV DI,OutFileName
    MOV DX,DI
.29:LODSB [ES:SI]
    CMP AL,'"'
    JE .33:
    MOV [DS:DI],AL
    INC DI
.33:LOOP .29:
    DosAPI AH=3Ch   ; CREATE OR TRUNCATE FILE.
    MOV [OutHandle],AX
    JNC .36:
.OutputError:
    OR [Errorlevel],4
    StdOutput =B"Error writing output file ", OutFileName, Eol=Yes
    JMP .Abort:
.36:; Cmdline parameters are valid and accepted. Fix input encoding.
    MOV ES,[SegINPUT]
    SUB SI,SI
    MOV DX,[InpEnd]
    ; Skip the fix if input endianess is irrelevant or explicitly specified.
    JSt  [InpEncSt],encStLe|encStBe|encStAscii|encStUtf8|encStAuto, .69:
    JNSt [InpEncSt],encStUtf16, .53:
    ; Autodetect input UTF-16 endianess of text ES:SI..ES:DX.
    MOV AX,DX
    SUB AX,SI
    CMP AX,2
    JB .46:    ; Skip if text is too short.
    MOV AX,[ES:SI]
    CMP AX,[BOM_UTF16LE]
    JNE .43:
    SetSt [InpEncSt],encStBom
.39:SetSt [InpEncSt],encStLe
    MOV AX,1200
    JMP .79:
.43:CMP AX,[BOM_UTF16BE]
    JNE .49:
    SetSt [InpEncSt],encStBom
.46:SetSt [InpEncSt],encStBe
    MOV EAX,1201
    JMP .79:
.49:; No 16bit BOM is present in input, perform empiric autodetection.
    Invoke DosConvert, 1200,SI,DX,Void    ; Try UTF-16LE.
    MOV BX,AX ; Number of input errors if UTF-16LE.
    Invoke DosConvert, 1201,SI,DX,Void    ; Try UTF-16BE.
    CMP AX,BX
    JBE .46:    ; UTF16BE detected.
    JMP .39:    ; UTF16LE detected.
.53:JNSt [InpEncSt],encStUtf32, .69:
    ; Autodetect input UTF-32 endianess of text ES:SI..ES:DX.
    MOV AX,DX
    SUB AX,SI
    CMP AX,4
    JB .63:
    MOV EAX,[ES:SI]
    CMP EAX,[BOM_UTF32LE]
    JNE .59:
    SetSt [InpEncSt],encStBom
.56:SetSt [InpEncSt],encStLe
    MOV AX,12000
    JMP .79:
.59:CMP EAX,[BOM_UTF32BE]
    JNE .66:
    SetSt [InpEncSt],encStBom
.63:SetSt [InpEncSt],encStBe
    MOV AX,12001
    JMP .79:
.66:; No 32bit BOM is present in input, perform empiric autodetection.
    Invoke DosConvert, 12000,SI,DX,Void     ; Try UTF-32LE.
    MOV BX,AX ; Number of input errors if UTF-32LE.
    Invoke DosConvert, 12001,SI,DX,Void    ; Try UTF-32BE.
    CMP AX,BX
    JBE .63:  ; UTF32BE detected.
    JMP .56:  ; UTF32LE detected.
.69:JNSt [InpEncSt],encStAuto, .79:
    CALL DosAutodetect
.79:MOV DX,[InpEnd]
    MOV ES,[SegINPUT]
    MOV CX,DX
    SUB CX,SI
    ; [InpEncId] is now finally specified. [INPUT]:SI..DX is input text, CX its size. It may start with BOM.
    ; Select the output encoding procedure.
    MOV AX,[OutEncId]
    JNSt [OutEncSt],encStAuto|encStOem,.82
    PUSH BX,DX
      DosAPI AX=6601h ; GET GLOBAL CODE PAGE TABLE.
      MOV AX,437 ; If DosAPI failed, use default IBM437.
      JC .81:
      MOV AX,BX  ; AX is now the number of active code page.
.81:  MOV [OutEncId],AX
      SetSt [OutEncSt],encStOem
    POP DX,BX
.82:Dispatch AX,65001,1200,1201,12000,12001,20127
    ; Undispatched output encoding is 8bit OEM/ANSI.
    MOV CX,[TableDir.CodePages]
    MOV DI,[TableDir.CPid]
    PUSH DS
    POP ES
    REPNE SCASW
    JNE .20127:  ; Unsupported output encoding - use ASCII.
    SUB DI,[TableDir.CPid]
    DEC DI,DI
    ADD DI,[TableDir.CPtable]
    MOV AX,[DI]  ; AX is now offset of translation table in section [CPtt].
    ADD AX,[TableDir.CPtt]
    MOV [TrTable],AX ; Offset of translation table in segment [DATA].
    MOV DI, To8bit:
    JMP .89:
.65001:MOV DI, ToUTF8:
    CMP CX,4
    JBE .89:
    LODSD [ES:SI]
    DEC SI
    AND EAX,0x00FF_FFFF
    SetSt [InpEncSt],encStBom
    CMP EAX,[BOM_UTF8]
    JE .89:
    RstSt [InpEncSt],encStBom
    SUB SI,3   ; No input UTF8-BOM is present.
    JMP .89:
.1200:
.1201:MOV DI, ToUTF16:
    CMP CX,2
    JB .89:
    SetSt [InpEncSt],encStBom
    TEST AL,1 ; Difference between BE and LE.
    LODSW
    JZ .83:
    CMP AX,[BOM_UTF16BE]
    JE .89:
.83:CMP AX,[BOM_UTF16LE]
    JE .89:
    SUB SI,2 ; No input UTF16-BOM is present.
    RstSt [InpEncSt],encStBom
    JMP .89:
.12000:
.12001:MOV DI, ToUTF32:
    CMP CX,4
    JB .89:
    SetSt [InpEncSt],encStBom
    TEST AL,1 ; Difference between BE and LE.
    LODSD
    JZ .86:
    CMP EAX,[BOM_UTF32BE]
    JE .89:
.86:CMP EAX,[BOM_UTF32LE]
    JE .89:
    SUB ESI,4 ; No input BOM is present.
    RstSt [InpEncSt],encStBom
    JMP .89:
.20127: MOV DI, ToASCII:
.89: ; [INPUT]:SI..DX is now input text with BOM removed.
    JNSt [OutEncSt],encStUtf,.94: ; No BOM in non-Unicode encodings.
    JNSt [OutEncSt],encStBom,.94: ; No output BOM if not explicitely requested.
    ; Output BOM was requested. Write BOM before invokation of DosConvert.
    MOV AX,[OutEncId]
    Dispatch AX,1200d,1201d,12000d,12001d,65001d
    JMP .94:      ; Non-Unicode enxoding cannot have BOM.
.65001d:
    MOV EAX,[BOM_UTF8]
    CALL OutputAL
    SHR EAX,8
    CALL OutputAL
    SHR EAX,8
    CALL OutputAL
    JMP .94:
.12001d:
    MOV EAX,[BOM_UTF32BE]
    CALL OutputEAX
    JMP .94:
.12000d:
    MOV EAX,[BOM_UTF32LE]
    CALL OutputEAX
    JMP .94:
.1201d:
    MOV AX,[BOM_UTF16BE]
    JMP .92:
.1200d:
    MOV AX,[BOM_UTF16LE]
.92:CALL OutputAX
.94:MOV DX,[InpEnd]
    ; [INPUT]:SI..DX is input text without BOM. Output encoding callback procedure is now in CS:DI.
    Invoke DosConvert,[InpEncId],SI,DX,DI
    ADD [InpErrorsLo],AX
    ADC [InpErrorsHi],0
    PUSH DS
      DosAPI AH=3Fh,BX=[InpHandle],CX=SIZE#[INPUT],DX=0,DS=[SegINPUT] ; READ FROM FILE OR DEVICE.
    POP DS
    JC .InputError:
    SUB SI,SI
    MOV [InpEnd],AX ; This many bytes have been read from input file.
    ADD [InpFileSizeLo],AX
    ADC [InpFileSizeHi],SI
    TEST AX
    JNZ .94: ; If not end of file yet.
    CALL OutputFlush:
    Invoke DosInfoText, =' In', InpFileName,[InpFileSizeLo],[InpFileSizeHi],[InpEncId],[InpEncSt],[InpErrorsLo],[InpErrorsHi]
    Invoke DosInfoText, ='Out', OutFileName,[OutFileSizeLo],[OutFileSizeHi],[OutEncId],[OutEncSt],[OutErrorsLo],[OutErrorsHi]
    MOV EAX,[InpErrorsLo]
    OR  EAX,[OutErrorsLo]
    JZ .Abort:
    OR [Errorlevel],2
.Abort:
    DosAPI AH=3Eh,BX=[OutHandle] ; CLOSE FILE.
    DosAPI AH=3Eh,BX=[InpHandle] ; CLOSE FILE.
    TerminateProgram [Errorlevel]
  ENDPROC DosMain:
↑ DosConvert InpEncId, TextPtr, TextEnd, DosOutputProc
Convert each character in input text from input encoding to Unicode point, and then call DosOutputProc to handle the codepoint (compute its relevance or convert it to output encoding and save to output file).
Input
InpEncId is numeric encoding identifier 0..65535.
TextPtr is offset in [INPUT] of the beginning of text loaded in memory.
TextEnd is offset behind the end of text loaded in memory.
DosOutputProc is offset in [CODE] of callback procedure (subprogram defined in division DosOutputProc which handles each codepoint).
DosOutputProc Input
EAX is UniCodePoint character. It may also be a replacement character 0x0000_FFFD when input error occured.
DosOutputProc Output
CF=0 if the character was stored succesfully.
BX,CX,DX,SI must be preserved.
DosOutputProc Error
CF=1 on output error. Further conversion will be cancelled.
Output
AX= number of input errors (malformed or undefined characters).
Invoked by
DosMain
Calls
DosHtmlDecode, one of DosOutputProc subprograms.
DosConvert Procedure InpEncId,TextPtr,TextEnd,DosOutputProc
   PUSH ES
    MOV AX,[%InpEncId]
    MOV SI,[%TextPtr]
    MOV DX,[%TextEnd]
    SUB CX,CX
    MOV [%ReturnAX],CX  ; Initialize input error counter.
    INC CX
    MOV [CharSize],CX    ; Initialize CharSize=1. It will be updated if input encoding is Unicode.
    AND CX,AX ; Let ECX=1 for odd InpEncId, ECX=0 for even InpEncId (endianess in UTF-16 and UTF-32).
    MOV ES,[SegINPUT]
    Dispatch AX,65001,20127,1200,1201,12000,12001 ; Special encodings UTF or ASCII.
    ; Undispatched encodings is 8bit, let's select the translation table.
    ; Convert from OEM or ANSI 8bit encoding.
    PUSH DS
    POP ES
    MOV CX,[TableDir.CodePages]
    MOV DI,[TableDir.CPid]
    MOV BX,-1
    REPNE SCASW
    JNE .05:  ; Unsupported output encoding.
    SUB DI,[TableDir.CPid]
    LEA BX,[DI-2]
    ADD BX,[TableDir.CPtable] ; BX now points to an offset of translation table in section [CPtt]. Or -1 if no table.
    MOV BX,[BX]
    CMP BX,-1 ; In case that this codepage has no translation table.
    JNE .10:
.05:MOV [%ReturnAX],BX
    JMP .90:
.10:ADD BX,[TableDir.CPtt]; DS:BX now points to translation table with 128 WORDs.
    XOR AX,AX
.15:CMP SI,DX
    JNB .90:
    XOR EAX,EAX
    MOV ES,[SegINPUT]
    LODSB [ES:SI]
    CMP AL,128
    JB .20:
    MOV DI,AX
    ADD DI,AX
    SUB DI,256
    MOV AX,[BX+DI]         ; Translate character 0x80..0xFF to Unicode.
    CMP AX,Replacement
    CMC                    ; CF=1 if AX=0xFFFD,0xFFFE or 0xFFFF (replacement or undefined).
    ADCW [%ReturnAX],0     ; Input error.
.20:CMP AX,'&' ; Possible beginning of HTML entity.
    JNE .22:
    JNSt [InpEncSt],encStHtm|encStHtml,.22: ; Skip if HTML entity should be ignored.
    CALL DosHtmlDecode:
.22:CALL [%DosOutputProc]
    JC .90:
    JMP .15: ; The next input character.

.65001: ; Convert from UTF-8 encoding.
    SUB DX,SI
    DecodeUTF8 SI,.Store,Size=DX,Width=32 ; Uses macro from string16.htm.
    JC .23:
    JECXZ .23:
    ; CX bytes from [INPUT] was left undecoded.
    XOR DX,DX
    NEG CX
    NOT DX
    DosAPI AX=4201h,BX=[InpHandle] ;  LSEEK DX:CX bytes back from current file position.
.23:JMP .90:

.Store:PROC ; Internal subprocedure .Store is callback from the macro DecodeUTF8.
         ; It is expected to pass decoded codepoint EAX to %DosOutputProc.
         MOV [CharSize],1  ; CharSize will be applied in codepage autodetection in GetRelevance.
         CMP EAX,80h
         JNA .2:
         MOV [CharSize],2
         CMP EAX,800h
         JNA .2:
         MOV [CharSize],3
    .2:  CMP EAX,Replacement
         JNE .3:
         INCW [%ReturnAX] ; Replacement and unsupported codepoints increment input error counter.
    .3:  MOV CX,[EntitySkip]
         JCXZ .5:
         DEC CX
         MOV [EntitySkip],CX
         RET ; Ignore the remaining letters of already decoded HTML entity.
    .5:  CMP EAX,'&'          ; Possible beginning of HTML entity.
         JNE .9:
         JNSt [InpEncSt],encStHtm|encStHtml,.9:
         MOV DI,SI
         CALL DosHtmlDecode:
         SUB SI,DI ; How many bytes should decoder advance to skip the decoded entity (0..9).
         MOV [EntitySkip],SI
    .9:  JMP [BP+52] ; %DosOutputProc in DosConvert's frame.
       ENDP .Store:

.20127: ; Convert from ASCII encoding.
    CMP SI,DX
    JNB .90:
    XOR EAX,EAX
    LODSB [ES:SI]
    CMP AL,128
    JB .25:
    MOV AX,Replacement
    INCW [%ReturnAX] ; Input error.
.25:JNSt [InpEncSt],encStHtm|encStHtml,.27:
    CMP AX,'&'
    JNE .27:
    CALL DosHtmlDecode:
.27:CALL [%DosOutputProc]
    JC .90:
    JMP .20127: ; The next character.

.1200: ; Convert from UTF-16LE encoding. ECX=0.
.1201: ; Convert from UTF-16BE encoding. ECX=1.
    MOVB [CharSize],2
    MOV BX,DX
    SUB BX,SI
    AND BX,1
    JZ .30:
    SUB DX,BX ; Text size is not WORD aligned, truncate and count this as input error.
    INCW [%ReturnAX]
.30:CMP SI,DX
    JNB .90:
    XOR EAX,EAX
    LODSW [ES:SI]
    JCXZ .35:
    XCHG AL,AH  ; Convert UTF-16BE to UTF-16LE.
.35:CMP AX,0xD7FF
    JBE .55:
    CMP AX,0xE000
    JAE .55:
    ; High surrogate expected (0xD800..0xDBFF).
    SUB AX,0xD800
    CMP AX,0x0400
    JAE .45:
    MOV EDI,EAX   ; Temporary save high 10 bits.
    SHL EDI,10
    CMP SI,DX
    JNB .45:
    LODSW [ES:SI] ; Fetch the low surrogate.
    JCXZ .40:
    XCHG AL,AH    ; Convert UTF-16BE to UTF-16LE.
.40:SUB AX,0xDC00 ; Low surrogate expected (0xDC00..0xDFFF).
    JB .45:
    CMP AX,0x0400
    JB .50:
.45:MOV AX,Replacement
    INCW [%ReturnAX]
    JMP .55:
.50:LEA EAX,[EAX+EDI+0x10000] ; Compose codepoint from both surrogates.
.55:JNSt [InpEncSt],encStHtm|encStHtml,.57:
    CMP EAX,'&'
    JNE .57:
    CALL DosHtmlDecode:
.57:CALL [%DosOutputProc]
    JC .90:
    JMP .30:

.12000: ; Convert from UTF-32LE encoding. ECX=0.
.12001: ; Convert from UTF-32BE encoding. ECX=1.
    MOVB [CharSize],4
    MOV BX,DX
    SUB BX,SI
    AND BX,3
    JZ .60:
    SUB DX,BX ; Text size is not DWORD aligned, truncate and count this as input error.
    INCW [%ReturnAX]
.60:CMP SI,DX
    JNB .90:
    LODSD [ES:SI]
    JCXZ .65:        ; CX=1 if UTF-32BE.
    BSWAP EAX        ; Convert UTF-32BE to UTF-32LE.
.65:CMP EAX,10FFFFh
    JA .70:          ; Invalid above 10FFFFh.
    CMP EAX,0xD800
    JB .80:          ; Valid below 0xD800.
    CMP EAX,0xDFFF   ; Invalid below 0xDFFF.
    JA .80:
.70:MOV EAX,Replacement
    INCW [%ReturnAX]
.80:JNSt [InpEncSt],encStHtm|encStHtml,.85:
    CMP EAX,'&'
    JNE .85:
    CALL DosHtmlDecode:
.85:CALL [%DosOutputProc]
    JNC .60:
.90:POP ES
    EndProcedure DosConvert
DosOutputProc
is a collection of output procedures used as callback from DosConvert:
Void does nothing, it is used when DosConvert needs to count input errors (input endianess autodetection).
GetRelevance: accumulates this codepoint's property (positive or negative number) in global variable Relevance. This is used for input encoding autodetection.
Replace: is used to replace codepoint unsupported by output encoding with its entity/question mark/transliteration, which will be written to OutFile. Global memory variable [OutErrors] is incremented.
ToASCII:, To8bit:, ToUTF8:, ToUTF16:, ToUTF32: are used to convert the codepoint into output encoding and write it to the output.
Input
EAX= codepoint, i.e. integer number in the range 0..0xD7FF or 0xE000..0x10FFFF.
OutHandle file must be opened for writing.
Output
CF=0 when the codepoint was processed succesfully. Otherwise DosConvert cancels further conversion.
EAX, EDI are undefined, other registers must be preserved.
Called by
DosConvert as a callback procedure with register-calling convention.
Void PROC         ; Empty conversion.
      CLC
      RET
     ENDP Void

GetRelevance: PROC ; Relevance is probability, that codepoint EAX appears in input text.
    PUSH CX,DX,DI,ES
     TEST EAX,0xFFFF_0000
     JNZ .4:
     MOV CX,[TableDir.CodePoints]
     MOV DI,[TableDir.CodePoint]
     PUSH DS
     POP ES
     REPNE SCASW
 .4: MOV EAX,??           ; Relevance of invalid character is negative.
     JNE .9:
     SUB DI,2
     SUB DI,[TableDir.CodePoint]
     SHR DI,1
     ADD DI,[TableDir.Relevance]
     MOVSXB AX,[DI]
 .9: MOVZXB CX,[CharSize]
     IMUL CX
     ADD [Relevance+0],AX  ; Accumulate the value in global memory location.
     ADC [Relevance+2],DX
     CLC
    POP ES,DI,DX,CX
    RET
   ENDP GetRelevance:

Replace: PROC ; Replace codepoint EAX which does not exists in output encoding.
   PUSH BX,CX,SI,ES
     PUSH DS
     POP ES
     MOV BX,[OutEncSt]
     MOV SI,TempString
     JSt BX,encStIgn,.9:  ; Ignore unsupported character.
     MOV DI,SI
     JNSt BX,encStQm,.2:
  .1:MOV DI,SI
     MOV EAX,'?'
     MOV [DI],AX          ; Replace codepoint EAX with question mark.
     JMP .6:
  .2:JNSt BX,encStHtm|encStHtml,.3:
     MOVD [DI],'&#x'      ; Replace codepoint EAX with its HTML entity.
     ADD DI,3
     StoH DI,Align=Left
     MOV AX,';'
     STOSW
     JMP .6:
  .3:TEST EAX,0xFFFF_0000 ; Replace codepoint EAX with its transliteration.
     JNZ .1:
     MOV CX,[TableDir.CodePoints]
     MOV DI,[TableDir.CodePoint]
     REPNE SCASW
     JNE .1:
     SUB DI,2
     SUB DI,[TableDir.CodePoint]
     SHL DI,1
     ADD DI,[TableDir.Translit]
     MOV EAX,[DI]          ; EAX now contains 0..4 ASCII character, NUL padded.
     MOV DI,SI             ; TempString.
  .4:AND AL,7Fh
     JZ .5:                ; End of TempString.
     STOSB
     SHR EAX,8
     JMP .4:
  .5:STOSB                 ; NUL-terminate replacement string.
  .6:LODSB
     CMP AL,0
     JZ .9:
     JSt BX,encStUtf16,.7:
     JSt BX,encStUtf32,.8:
     CALL ToASCII:
     JMP .6:
  .7:CALL ToUTF16:
     JMP .6:
  .8:CALL ToUTF32:
     JMP .6:
.9:POP ES,SI,CX,BX
   ADD [OutErrorsLo],1
   ADC [OutErrorsHi],0
   CLC
   RET
 ENDP Replace:

ToASCII: PROC ; Convert codepoint EAX to ASCII encoding.
    CMP EAX,127
    JA Replace:
    CALL OutputAL
    RET
   ENDP ToASCII:

To8bit: PROC  ; Convert codepoint EAX to OEM/ANSI encoding using [TrTable].
   CMP EAX,127
   JBE .8:           ; ASCII 7bit characters are copied verbatim.
   TEST EAX,0xFFFF_0000
   JNZ Replace:      ; Character outside BMP is replaced with question mark.
   MOV DI,[TrTable]
   PUSH CX,ES,DS
     POP ES
     MOV CX,128
     REPNE SCASW     ; Search for codepoint in TrTable.
   POP ES,CX
   JNE Replace:      ; If codepoint EAX is not supported by output encoding.
   SUB DI,[TrTable]  ; DI is now 2,4,6,8,,,256.
   SHR DI,1          ; DI is now 1,2,3,4,,,128.
   LEA AX,[DI+128-1] ; AL is now 128,129,,,255.
.8:CALL OutputAL
   RET
  ENDP To8bit:

ToUTF32: PROC ; Convert codepoint EAX to UTF-32 encoding.
     JNSt [OutEncSt],encStBe, .8:
     BSWAP EAX
  .8:CALL OutputEAX
     RET
  ENDP ToUTF32:

ToUTF16: PROC ; Convert codepoint EAX to UTF-16 encoding.
    TEST EAX,0xFFFF_0000
    JZ .5:
    ; Character outside BMP will be written as two surrogates.
    SUB EAX,0x0001_0000
    MOV EDI,EAX
    SHR EDI,10
    ADD EDI,0xD800 ; EDI is now the high surrogate.
    XCHG EDI,EAX
    CALL .5:
    XCHG EAX,EDI   ; Restore original codepoint in EAX.
    AND EAX,0x3FF
    ADD AX,0xDC00  ; AX is now the low surrogate.
 .5:JNSt [OutEncSt],encStBe,.8:
    XCHG AL,AH
 .8:CALL OutputAX
    RET
  ENDP ToUTF16:

ToUTF8: PROC  ; Convert codepoint EAX to UTF-8 encoding.
    MOV EDI,EAX
    CMP EAX,0x7F
    JBE .8:
    CMP EAX,0x7FF
    JBE .4:
    CMP EAX,0xFFFF
    JBE .2:
    ; 4byte encoding.
    SHR EAX,18
    OR AL,0xF0
    CALL .8:
    MOV EAX,EDI
    SHR EAX,12
    CALL .6:
    MOV EAX,EDI
    SHR EAX,6
    CALL .6:
    JMP .5:
 .2: ; 3byte encoding.
    SHR EAX,12
    OR AL,0xE0
    CALL .8:
    MOV EAX,EDI
    SHR EAX,6
    CALL .6:
    MOV EAX,EDI
    JMP .6:
 .4:; 2byte encoding.
    SHR EAX,6
    OR AL,0xC0
    CALL .8:
 .5:MOV EAX,EDI
 .6:AND EAX,0x3F
    OR AL,0x80
 .8:CALL OutputAL
    RET
  ENDP ToUTF8:

OutputAL PROC ; Write byte from AL to output.
    PUSH DI,ES
      MOV DI,[OutputPtr]
      CMP DI,SIZE# [OUTPUT]
      JB .8:
      CALL OutputFlush:
      SUB DI,DI
  .8: PUSH PARA# [OUTPUT]
      POP ES
      STOSB
      MOV [OutputPtr],DI
      CLC
    POP ES,DI
    RET
  ENDP OutputAL

OutputAX PROC ; Write word from AX to output.
    PUSH DI,ES
      MOV DI,[OutputPtr]
      CMP DI,SIZE# [OUTPUT] - 1
      JB .8:
      CALL OutputFlush:
      SUB DI,DI
   .8:PUSH PARA# [OUTPUT]
      POP ES
      STOSW
      MOV [OutputPtr],DI
      CLC
    POP ES,DI
    RET
  ENDP OutputAX

OutputEAX PROC ; Write dword from EAX to output.
    PUSH DI,ES
      MOV DI,[OutputPtr]
      CMP DI,SIZE# [OUTPUT] - 3
      JB .8:
      CALL OutputFlush:
      SUB DI,DI
   .8:PUSH PARA# [OUTPUT]
      POP ES
      STOSD
      MOV [OutputPtr],DI
      CLC
    POP ES,DI
    RET
  ENDP OutputEAX

OutputFlush:PROC ; Write contents of [OUTPUT] segment to OutFile.
                 ; Input: [OutputPtr] is the size to be written.
                 ; Output:[OutputPtr] is set to 0.
      PUSHAW
       MOV CX,[OutputPtr]
       JCXZ .9:
       PUSH DS
        SUB DX,DX
        DosAPI AH=40h,BX=[OutHandle],DS=[SegOUTPUT] ; WRITE TO FILE OR DEVICE.
       POP DS
       JC DosMain.OutputError: ; Abort on error.
       SUB DI,DI
       ADD [OutFileSizeLo],AX
       ADC [OutFileSizeHi],DI
       MOV [OutputPtr],DI
  .9: POPAW
      RET
    ENDP OutputFlush:
↑ DosHtmlDecode
Proc DosHtmlDecode is called from DosConvert when the convertor encounters an ampersand in input text.
It will recognize HTML entity in input stream, e.g. &nbsp; and convert the entity to its codepoint, e.g. to 0x0000_00A0.
HTML entity may be
  1. named, e.g. &nbsp; (case sensitive),
  2. decimal, e.g. &#160;, or
  3. hexadecimal, e.g. &#xA0; (case insensitive).
Input
EAX=0x0000_0026 (ampersand).
ES=PARA# [INPUT]
DS=PARA# [DATA]
ES:SI points behind the ampersand in [INPUT] text
ES:DX points behind the end of [INPUT] text
DS:[InpEncSt] flags encStUtf16,encStUtf32 specify input character width (8,16,32)
DS:[InpEncSt] flags encStLe,encStBe specify input character endianess
DS:[InpEncSt] flags encStHtm,encStHtml specify if ASCII entities &amp; &lt; &gt; &quot; should be decoded, too.
Output
All registers are unchanged if no valid entity was decoded. Otherwise
EAX=entity codepoint
ES:SI points behind the terminating semicolon in [INPUT] text.
Called from
DosConvert
Invokes
-
DosHtmlDecode PROC
     PUSHAW
     MOV BP,SP
%EntEnd %SET BP-2 ; Local WORD variable for offset behind the semicolon, which terminates the entity.
     SUB SP,2
     PUSH ES
       MOV CX,DX
       MOV DI,SI
       SUB CX,SI
       JNC .00:
       MOV CX,0xFFFC
   .00:MOV EAX,';'
       JSt [InpEncSt],encStUtf16, .10:
       JSt [InpEncSt],encStUtf32, .30:
       REPNE SCASB ; Search for entity terminator in 8bit input stream.
       JNE .90:
       MOV [%EntEnd],DI ; Remember the input stream position behind semicolon.
       SUB DI,SI
       MOV CX,DI ; Size of potential entity.
       CMP CX,SIZE# TempString
       JA .90:
       MOV DI,TempString ; Copy entity to TempString in segment DS.
   .05:LODSB [ES:SI]
       CMP AL,128
       JAE .90:
       MOV [DI],AL
       INC DI
       LOOP .05:  ; Copy entity contents, e.g. nbsp; or #x78AB, to TempString.
       JMP .50:
   .10:SHR CX,1    ; Input characters are 16bit.
       JNSt [InpEncSt],encStBe,.15:
       XCHG AL,AH
   .15:REPNE SCASW ; Search for entity terminator in 16bit input stream.
       JNE .90:
       MOV [%EntEnd],DI ; Remember the input stream position behind semicolon.
       SUB DI,SI
       MOV CX,DI ; Size of potential entity.
       SHR CX,1  ; Size in WORDs.
       JZ .90:
       CMP CX,SIZE# TempString
       JA .90:
       MOV DI,TempString ; Convert entity to 7bit ASCII and copy to TempString.
   .20:LODSW [ES:SI]
       JNSt [InpEncSt],encStBe,.25:
       XCHG AL,AH
   .25:CMP AX,128
       JAE .90:
       MOV [DI],AL
       INC DI
       LOOP .20:
       JMP .50:
   .30:SHR ECX,2         ; Input characters are 32bit.
       JNSt [InpEncSt],encStBe,.35:
       BSWAP EAX
   .35:REPNE SCASD       ; Search for entity terminator in 32bit input stream.
       JNE .90:
       MOV [%EntEnd],DI  ; Remember the input stream position behind semicolon.
       SUB DI,SI
       MOV CX,DI         ; Size of potential entity.
       SHR CX,2          ; Size in DWORDs.
       JZ .90:
       CMP CX,SIZE# TempString
       JA .90:
       MOV DI,TempString ; Convert entity to 7bit ASCII and copy to TempString.
   .40:LODSD [ES:SI]
       JNSt [InpEncSt],encStBe,.45:
       BSWAP EAX
   .45:CMP EAX,128
       JAE .90:
       MOV [DI],AL
       INC DI
       LOOP .40:
   .50:MOV SI,TempString ; DS:SI now points to the entity (without ampersand) terminated with semicolon.
       PUSH DS
       POP ES
       SUB DI,SI         ; DI is now the size of HTML entity in ASCII bytes, including semicolon.
       LODSB
       CMP AL,'#'        ; Test if the entity is numeric.
       JE .65:
       CMP DI,8+1        ; Longer named entities are not supported.
       JA .90:
       DEC SI            ; 1st character is not #, so its a named entity.
       MOV CL,5
       LODSD
       XOR EBX,EBX
       DEC EBX           ; Prepare mask to EBX.
       SUB CX,DI
       JS .55:
       ; TempString has 1..4 letters.
       SAL CX,3
       SHR EBX,CL
       AND EAX,EBX       ; Pad the shorter entity name with NUL bytes.
       MOV CX,[TableDir.Entities4]
       MOV DI,[TableDir.EntName4]
       REPNE SCASD       ; Search for the entity by name.
       JNE .90:
       SUB DI,4
       SUB DI,[TableDir.EntName4]
       SHR DI,1
       ADD DI,[TableDir.EntVal4]
       MOVZXW EAX,[DI]
       JMP .80:          ; Decoded entity codepoint is now in EAX.
   .55: ; Entity has 5..8 letters.
       XCHG EAX,EDX      ; Temporarily save first four characters to EDX.
       LODSD             ; Fifth to eighth characters.
       ADD CX,4
       SAL CX,3
       SHR EBX,CL
       AND EAX,EBX       ; Pad the entity with NUL bytes.
       XCHG EDX,EAX
       MOV CX,[TableDir.Entities8]
       MOV DI,[TableDir.EntName8A]
   .60:REPNE SCASD       ; Search for the entity by its first four letters.
       JNE .90:
       PUSH DI
         SUB DI,4
         SUB DI,[TableDir.EntName8A]
         ADD DI,[TableDir.EntName8B]
         CMP EDX,[DI]    ; Compare masked fifth..eighth letters.
       POP DI
       JNE .60:          ; Continue search if the entity differed in 5..8th characters.
       SUB DI,4
       SUB DI,[TableDir.EntName8A]
       SHR DI,1
       ADD DI,[TableDir.EntVal8]
       MOVZXW EAX,[DI]
       JMP .80:          ; Decoded entity codepoint is now in EAX.
   .65: ; Numeric entity expected.
       LODSB
       CMP AL,'0'
       JB .90:
       OR AL,'x'^'X'
       CMP AL,'x'
       JE .70:
       DEC SI   ; DS:SI should now point to decimal number terminated with semicolon.
       LodDD SI ; Use macro LodDD from library cpuext16.
       JMP .75:
   .70:; DS:SI should now point to hexadecimal number terminated with semicolon.
       LodHD SI ; Use macro LodHD from library cpuext16.
   .75:JC .90:  ; Abort if wrong number syntax.
       TEST DX
       JNZ .90: ; Abort if decoded entity is not in BMP.
       CMPB [SI],';'
       JNE .90:
   .80:; AX is decoded codepoint, [%EntEnd] is offset behind the entity in text.
       JSt [InpEncSt],encStHtml,.85:
       CMP AX,128
       JB .90:         ; Skip if ASCII entities should not be converted.
   .85:MOV DX,[%EntEnd]
       MOV [BP+2],DX   ; %ReturnSI.
       MOV [BP+14],AX  ; %ReturnAX.
 .90:  XOR EAX,EAX
     POP ES
     MOV SP,BP
     POPAW
     RET
    ENDP DosHtmlDecode
↑ DosAutodetect
This PROC will calculate the total relevance of characters in a given input encoding.
Relevance will increase when translated character is common letter, and it will decrease when the character is nonalphabetical.
Relevance detections is repeated for each supported encoding and procedure returns the encoding with highest achieved relevance.
Only the first 48 KB of big file is used for autodetection.
Input
ES:SI points to the 1st byte of input text.
ES:DX points behind the block of investigated input text.
Output
[InpEncId] is set. All registers are undefined.
Called by
DosMain
DosAutodetect PROC
    ; Autodetect input encoding of text ES:SI..ES:DX.
    SUB CX,CX         ; CX will be CP index (0,1,2,,,[CodePages]).
    MOV [BestRelevance+0],CX
    MOV [BestRelevance+2],0x8000 ; Initialize BestRelevance with lowest signed integer.
    MOV [InpEncId],CX
    MOV [InpEncSt],encStAuto
.20:SUB EBP,EBP
    MOV [Relevance],EBP
    MOV BP,[TableDir.CPid]
    ADD BP,CX
    ADD BP,CX
    MOV AX,[DS:BP]
    CMP AX,912
    JNE .40:
    DEC [Relevance] ; Slightly discriminate IBM912 against almost identical ISO8859-2.
.40:Invoke DosConvert,AX,SI,DX, GetRelevance:
    MOV EAX,[Relevance]
    CMP EAX,[BestRelevance]
    JLE .80:
    ; Encoding indexed by CX is better candidate than all previous ones.
    MOV [BestRelevance],EAX
    MOV BP,[TableDir.CPid]
    ADD BP,CX
    ADD BP,CX
    MOV AX,[DS:BP]
    MOV [InpEncId],AX  ; Remember the so far best input encoding.
.80:INC CX   ; Try the next codepage.
    CMP CX,[TableDir.CodePages]
    JB .20:
    RET
   ENDP DosAutodetect
DosGrep Needle, TextBegin, TextEnd
Macro DosGrep searches case-sensitive for the string Needle in Text.
Input
SS= is segment of the text.
DS= is data+literal segment (PARA# %Needle)
Needle is the searched text, e.g. utf.
TextBegin is offset of the 1st character of Text.
TextEnd is offset behind the last character of Text.
Output
ZF=1 if Needle was found.
SS:DI points right behind the Needle inside the Text.
Error
ZF=0 if Needle not found.
DI=TextBegin.
Expanded by
DosParseEnc
DosGrep %MACRO Needle, TextBegin, TextEnd
       PUSHW =B'%Needle', %TextEnd
       MOV DI,%TextBegin
       CALL DosGrep@RT
DosGrep@RT:PROC1
         PUSHAW
         MOV BP,SP
         PUSH ES,DS
          POP ES
          MOV SI,[%Par2]  ; ES:SI is now the Needle.
          GetLength$ SI   ; Return Needle size in CX.
          MOV DX,CX
          DEC DX
          MOV CX,[%Par1]  ; TextEnd.
          SUB CX,DI       ; Text size.
          JB .9:
          LODSB [DS:SI]   ; First character of Needle.
          PUSH SS
          POP ES
    .1:   REPNE SCASB [ES:DI] ; Search for the 1st char of Needle.
          JNE .9:
          PUSH CX,SI,DI
            MOV CX,DX
            REPE CMPSB    ; Compare the rest of Needle.
          POP DI,SI,CX
          JNE .1:
          ADD DI,DX
          MOV [%ReturnDI],DI
          CMP DX,DX       ; Set ZF=1.
   .9:   POP ES
         POPAW
         RET 2*2          ; ZF=0 if Needle was not found.
       ENDP1 DosGrep@RT
     %ENDMACRO DosGrep
DosGrepNum TextBegin, TextEnd
Macro DosGrepNum searches for a continuous sequence of one or more decimal digits between TextBegin..TextEnd.
Input
SS= is segment of Text.
TextBegin is offset of the 1st character of Text.
TextEnd is offset behind the last character of Text.
Output
CF=0 if a number exists in Text.
AX= number value.
SS:SI= points right behind the number.
Error
CF=1 if no decimal digit exists in Text or the number is too big.
AX, SI unchanged.
Expanded by
DosParseEnc
DosGrepNum %MACRO TextBegin, TextEnd
       PUSHW %TextEnd, %TextBegin
       CALL DosGrepNum@RT
DosGrepNum@RT:PROC1
       PUSHAW
         MOV BP,SP
         MOV CX,[%Par2] ; TextEnd.
         MOV SI,[%Par1] ; TextBegin.
         SUB CX,SI
         JB .9:
         SUB AX,AX
   .3:   LODSB [SS:SI]
         SUB AL,'0'
         JB .5:
         CMP AL,9
         JNA .7:
   .5:   DEC CX
         JNZ .3:
         STC
         JMP .9:
   .7:   DEC SI
         PUSH DS,SS
           POP DS
           LodD SI
         POP DS
         JC .9:
         MOV [%ReturnSI],SI
         MOV [%ReturnAX],AX
   .9: POPAW
       RET 2*2 ; CF=0 if a valid number EAX was found.
     ENDP1 DosGrepNum@RT
  %ENDMACRO DosGrepNum
↑ DosParseEnc EncPtr, EncSize
Recognize the encoding name in a given string which was submitted to EuroConvertor as the first or second cmdline argument.
Input
ES= is segment with EncPtr.
EncPtr is offset of a quoted or unquoted string with encoding specification, e.g. Utf-16-LE-BOM.
EncSize is the string size in bytes.
Output
CF=0, AX= detected encoding identifier 0..65535.
BX=detected encoding flags.
Error
CF=1 if no supported encoding detected.
AX=0
Expands
DosGrep, DosGrepNum
Invoked by
DosMain.
DosParseEnc Procedure EncPtr, EncSize
Enc$    LocalVar Size=32 ; Input string converted to lower case.
Enc$End LocalVar         ; Pointer to the end of string in Enc$.
        ClearLocalVar
     PUSH ES
      MOV SI,[%EncPtr]
      MOV CX,[%EncSize]
      XOR BX,BX
      MOV [%ReturnAX],BX
      MOV [%ReturnBX],BX
      CMP CX,24
      JNB .Err:   ; Argument is too long.
      CMP CL,2
      JB .Err:    ; Argument is too short.
      CMPB [ES:SI],'"'
      JNE .NoQ:
      INC SI      ; Argument is in quotes.
      DEC CX,CX   ; Omit the quotes.
.NoQ: TEST CX
      JZ .Err:    ; Argument is empty.
      MOV DX,CX
      LEA DI,[%Enc$]
.LoCa:LODSB [ES:SI]
      OR AL,0x20  ; Simplified conversion to lower case.
      MOV [SS:DI],AL
      INC DI
      DEC CX
      JNZ .LoCa:
      LEA SI,[%Enc$]
      ADD DX,SI
      MOV [%Enc$End],DX
      ; Parse all encoding properties from text SS:SI..SS:DX into flags in BX.
 prop %FOR ascii,utf,bom,le,be,htm,html,qm,ign,transl,oem,ansi,auto,enc
        DosGrep %prop,SI,DX
        %Prop1 %SETC '%prop[1]' & ~('A'^'a') ; Uppercase the 1st letter of %prop.
        JNE .N%prop:
        SetSt BX,encSt%Prop1%prop[2..]
        .N%prop:
      %ENDFOR prop
      JNSt BX,encStHtml, .Shtm:
      RstSt BX,encStHtm
.Shtm:JNSt BX,encStAscii, .Nas:
      MOV AX,20127
      JMP .End:
.Nas: JNSt BX,encStAuto|encStEnc, .Nau:
      XOR AX,AX
      JMP .End:
.Nau: JNSt BX,encStOem|encStAnsi, .Noe:
      PUSH BX,DX
        DosAPI AX=6601h ; GET GLOBAL CODE PAGE TABLE.
        MOV AX,437 ; If DosAPI failed, use default IBM437.
        JC .Noa:
        MOV AX,BX  ; AX is now the number of active code page.
.Noa: POP DX,BX
      SetSt BX,encStAuto
      JMP .End:
.Noe: JNSt BX,encStUtf,.Nut:
      DosGrepNum SI,DX  ; Distinguish UTF-8/16/32.
      JC .Err:
      Dispatch AL,8,16,32
.Err: STC
      JMP .Ret:
.8:   SetSt BX,encStUtf8
      MOV AX,65001
      JMP .End:
.16:  SetSt BX,encStUtf16
      MOV AX,1200
      JMP .En:
.32:  SetSt BX,encStUtf32
      MOV AX,12000
.En:  JSt BX,encStLe, .End:
      JNSt BX,encStBe, .End: ; Endianess will be detected later.
      INC AX
      JMP .End:
.Nut: ; Try direct CPid numeric specification.
      DosGrepNum SI,DX
      JC .Nnum: ; No number in encoding specifications.
      MOV DI,[TableDir.CPid]
      MOV CX,[TableDir.CodePages] ; Number of supported code pages.
      PUSH DS ; Try to find the number AX in array [CPid].
      POP ES
      REPNE SCASW
      JE .End:
      ; String Enc$ is not direct CPid. It could contain some alternative CP number:
      Dispatch AX,8859,790,916,919,920,921,923,991,1208
      JMP .Nnum:
.1208:MOV AX,65001  ; IBM1208 is alias of UTF-8 = CP65001.
      SetSt BX,encStUtf+encStUtf8
      JMP .End:
.790:
.991: MOV AX,667    ; IBM790,IBM991 is Mazovia=CP667.
      JMP .End:
.916: MOV AX,28598  ; "ISO-8859-8","IBM916, Latin/Hebrew"
      JMP .End:
.919: MOV AX,28600  ;"ISO-8859-10","IBM919, Latin 6, Nordic"
      JMP .End:
.920: MOV AX,28599  ;"ISO-8859-9","IBM920, Latin 5, Turkish",
      JMP .End:
.921: MOV AX,28603  ; "ISO-8859-13","IBM921, Latin 7, Baltic"
      JMP .End:
.923: MOV AX,28605  ; "ISO-8859-15","IBM923, Latin 9, Western Europe"
      JMP .End:
.8859:DosGrepNum SI,DX ; Get the number following ISO-8859-.
      JC .Err:
      TEST AX          ; ISO-8859-0  is not supported.
      JZ .Err:
      CMP AX,12        ; ISO-8859-12 is not supported.
      JE .Err:
      CMP AX,16        ; 8859-1 .. 8859-16 is supported.
      JA .Err:
      ADD AX,28590
      JMP .End:
.Nnum: ; No numeric CPid in Enc$ was found (or 8 in KOI8). Try letter-only strings.
      LEA SI,[%Enc$]
      MOV AX,10101
      DosGrep nextstep,SI,DX
      JE .End:
      MOV AX,667
      DosGrep mazo,SI,DX  ; Try Mazovia.
      JE .End:
      MOV AX,895
      DosGrep kame,SI,DX  ; Try Kamenických alias KEYBCS2.
      JE .End:
      DosGrep bcs2,SI,DX
      JE .End:
      DosGrep koi8,SI,DX  ; Try KOI8.
      JNE .MAC:
      MOV SI,DI           ; Points behind KOI8.
      MOV AX,885          ; Try KOI8-CS.
      DosGrep cs,SI,DX
      JE .End:
      MOV AX,878          ; Try KOI8-R.
      DosGrep r,SI,DX
      JE .End:
      MOV AX,1168         ; Try KOI8-U.
      DosGrep u,SI,DX
      JE .End:
      MOV AX,880          ; Try KOI8-E.
      DosGrep e,SI,DX
      JE .End:
      MOV AX,882          ; Try KOI8-T.
      DosGrep t,SI,DX
      JE .End:
      MOV AX,884          ; Try KOI8-F.
      DosGrep f,SI,DX
      JE .End:
.MAC: LEA SI,[%Enc$]
      DosGrep mac,SI,DX   ; Try Macintosh.
      JNE .Err:           ; If no other choices are left, give up.
      MOV AX,10010
      DosGrep romanian,SI,DX
      JE .End:
      DosGrep rumun,SI,DX
      JE .End:
      MOV AX,10000
      DosGrep roman,SI,DX
      JE .End:
      MOV AX,10004
      DosGrep arab,SI,DX
      JE .End:
      MOV AX,10005
      DosGrep hebre,SI,DX
      JE .End:
      MOV AX,10006
      DosGrep greek,SI,DX
      JE .End:
      MOV AX,10007
      DosGrep cyril,SI,DX
      JE .End:
      MOV AX,10017
      DosGrep ukr,SI,DX
      JE .End:
      MOV Ax,10021
      DosGrep thai,SI,DX
      JE .End::
      MOV AX,10079
      DosGrep iceland,SI,DX
      JE .End:
      MOV AX,10029
      DosGrep ce,SI,DX
      JE .End:
      MOV AX,10080
      DosGrep inuit,SI,DX
      JE .End:
      MOV AX,10081
      DosGrep turk,SI,DX
      JE .End:
      MOV AX,10082
      DosGrep croat,SI,DX
      JE .End:
      MOV AX,10083
      DosGrep gael,SI,DX
      JE .End:
      MOV AX,10084
      DosGrep celtic,SI,DX
      JE .End:
      MOV AX,10089
      DosGrep latin,SI,DX
      JE .End:
      DosGrep kermit,SI,DX
      JE .End:
      STC
      JMP .Ret:
.End: MOV [%ReturnAX],AX
      MOV [%ReturnBX],BX
.Ret:POP ES
    EndProcedure DosParseEnc   ; CF=error
↑ DosEncList
Display the list of all supported encodings on console.
Input
-
Output
AX,BX,CX,DX,SI,DI,ES changed.
Called by
DosMain
DosEncList PROC
    PUSH DS
    POP ES
    StdOutput ='  EuroConv supported encodings',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  OEM/ANSI 8bit code pages:',Eol=Yes
    SUB DX,DX ; Encoding index 0..[CodePages].
.10:MOV SI,[TableDir.CPid]
    ADD SI,DX
    ADD SI,DX
    LODSW
    Dispatch AX,20127,65001,1200,1201,12000,12001
    StoD TempString
    XOR AX,AX
    STOSW
    MOV BX,[TableDir.CPname]
    ADD BX,DX
    ADD BX,DX
    MOV BX,[BX]
    ADD BX,[TableDir.CPinfo]
    MOV SI,[TableDir.CPrem]
    ADD SI,DX
    ADD SI,DX
    MOV SI,[SI]
    ADD SI,[TableDir.CPinfo]
    StdOutput ='CP',TempString,=', "',BX,='" ',SI,Eol=Yes
.20127:
.65001:
.1200:
.1201:
.12000:
.12001:
    INC DX
    CMP DX,[TableDir.CodePages]
    JB .10:
    StdOutput Eol=Yes
    StdOutput ='  Plain ASCII 7bit encoding:',Eol=Yes
    StdOutput ='CP20127, "ASCII"',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  Unicode encoding:',Eol=Yes
    StdOutput ='CP1200,  "UTF-16LE"',Eol=Yes
    StdOutput ='CP1201,  "UTF-16BE"',Eol=Yes
    StdOutput ='         "UTF-16" (endianess will be autodetected)',Eol=Yes
    StdOutput ='CP12000, "UTF-32LE"',Eol=Yes
    StdOutput ='CP12001, "UTF-32BE"',Eol=Yes
    StdOutput ='         "UTF-32" (endianess will be autodetected)',Eol=Yes
    StdOutput ='CP65001, "UTF-8"',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  Special assignment:',Eol=Yes
    DosAPI AX=6601h ; GET GLOBAL CODE PAGE TABLE.
    MOV AX,437 ; If DosAPI failed, use default IBM437.
    JC .Noa:
    MOV AX,BX
.Noa:StoD TempString
    XOR AX,AX
    STOSB
    StdOutput ='OEM    = console encoding selected by regional settings: CP',TempString,Eol=Yes
    StdOutput ='AUTO   = autodetect encoding',Eol=Yes
    StdOutput ='ENC    = display this list of supported encodings.',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  Encoding modifiers:',Eol=Yes
    StdOutput ='/BOM   = write Byte Order Mark (valid only with UTF encodings)',Eol=Yes
    StdOutput ='/IGN   = omit characters not supported in output encoding',Eol=Yes
    StdOutput ='/QM    = replace characters not supported in output encoding with "?"',Eol=Yes
    StdOutput ='/HTML  = replace characters not supported in output encoding with HTML-entity',Eol=Yes
    StdOutput ='/TRANS = transliterate characters not supported in output encoding (default)',Eol=Yes
    RET
   ENDP DosEncList
↑ DosInfoText Direction, FileName, FileSizeLo, FileSizeHi, EncId, EncSt, ErrorsLo, ErrorsHi
Display the final information on console.
Input
Direction is ASCIIZ string "In" or "Out" in segment DS.
FileName is ofset of ASCIIZ file name in segment DS.
FileSizeLo,FileSizeHi is lower and higher WORD with the file size.
Encoding is the text encoding identifier CPid (0..65535).
ErrorsLo,ErrorsHi is lower and higher WORD with the number of of input or output errors.
Output
ES=DS, other GPR are preserved.
Invoked by
DosMain
DosInfoText Procedure Direction, FileName, FileSizeLo, FileSizeHi, EncId, EncSt, ErrorsLo, ErrorsHi
    PUSH DS
    POP ES
    MOV DX,[%FileName]
    StdOutput [%Direction],='put file:     "',DX,='"',Eol=Yes
    MOV AX,[%FileSizeLo]
    MOV DX,[%FileSizeHi]
    StoDD TempString
    XOR AX,AX
    STOSB
    StdOutput [%Direction],='put size:      ',TempString, Eol=Yes
    MOV AX,[%EncId]
    StoD TempString, Signed=No
    MOVB [DI],0
    StdOutput [%Direction],='put encoding:  CP',TempString,=', "'
    MOV CX,[TableDir.CodePages]
    MOV DI,[TableDir.CPid]
    REPNE SCASW
    SUB DI,2
    SUB DI,[TableDir.CPid]
    MOV DX,DI ; Temporary store CP word index to DX.
    ADD DI,[TableDir.CPname]
    MOV DI,[DI]
    ADD DI,[TableDir.CPinfo]
    StdOutput DI
    MOV BX,[%EncSt]
    JNSt BX,encStBom,.20:
    StdOutput ="/BOM"
.20:JNSt BX,encStIgn,.30:
    StdOutput ='/IGN'
.30:JNSt BX,encStQm, .40:
    StdOutput ='/QM'
.40:JNSt BX,encStHtm,.50:
    StdOutput ="/HTM"
.50:JNSt BX,encStHtml,.60:
    StdOutput ="/HTML"
.60:MOV DI,DX ; Restore CP word index.
    ADD DI,[TableDir.CPrem]
    MOV DI,[DI]
    ADD DI,[TableDir.CPinfo]
    StdOutput ='", ',DI
    JNSt BX,encStOem,.80:
    StdOutput ='", ',="(OEM),"
.80:JNSt BX,encStAuto,.90:
    StdOutput =' (autodetected)'
.90:StdOutput Eol=Yes
    MOV AX,[%ErrorsLo]
    MOV DX,[%ErrorsHi]
    StoDD TempString
    XOR AX,AX
    STOSB
    StdOutput [%Direction],='put errors:    ',TempString, Eol=Yes
   EndProcedure DosInfoText
   ENDPROGRAM eurocond  ; End of 16bit DOS version of EuroConvertor (used as a stub of Windows version).
WinHeader

WinHeader specifies format of Windows program and included macrolibraries and it also declares the segment order.

Windows version reuses at runtime the Tables defined in DOS variant. They are located in front of the first segment [.text] within the stub at the beginning of image at virtual address specified as ImageBase (usually 0x0040_0000).

Windows variant uses macros with the same names as the DOS variant. As both DOS and Windows programs are defined in one source file, the names of 16bit macros will be dropped here.

If you don't have the import library objlib\winapi.lib , create it first with the sample project DLL2LIB : change to directory ..\prowin32\ and execute euroasm dll2lib.htm.
euroconv PROGRAM Format=PE, Width=32, Entry=WinMain, \
                 IconFile=euroconv.ico, StubFile="eurocond.exe"
         %DROPMACRO * ; Discard homonymous macros declared in DOS macrolibraries included to stub program.
         INCLUDEHEAD1 wins.htm, winscon.htm, winsgui.htm, winapi.htm, winsdlg.htm, \ Include 32bit macrolibraries.
                      winfile.htm, stdcall.htm, cpuext32.htm, status32.htm, string32.htm
         INCLUDEHEAD euroconv.htm ; Include common constants defined above in DosHeader.
         INCLUDEHEAD ..\easource\pfmz.htm ; Include declaration of MZ EXE Header structure.
         LINK winapi.lib  ; Import library which declares Windows functions invoked with macro WinAPI.
[.text]  SEGMENT ALIGN=64 ; Specify order of segments in Windows version.
[.data]  SEGMENT
[.bss]   SEGMENT
WinConData
contains static data used in Windows console variant of EuroConvertor.
See also
constants and flag names defined in DosHeader's HEAD division.
WinBlockSize  EQU  1M    ; Used to limit autodetection text size in Windows version.
; Static initialized global data.
[.data]
; Byte Order Mark definitions.
BOM_UTF32LE DB 0xFF,0xFE,0x00,0x00
BOM_UTF16LE EQU BOM_UTF32LE
BOM_UTF32BE DB 0x00,0x00,0xFE,0xFF
BOM_UTF16BE EQU BOM_UTF32BE + 2
BOM_UTF8    DB 0xEF,0xBB,0xBF,0
WndClassName D "EuroConv",0
InfoText     D "EuroConv - convertor of text file encoding.",0
HelpText:
D "Program:     EuroConvertor WIN version %Version",13,10
D "Function:    Conversion of text file encoding.",13,10
D "Format:      Dual DOS/Windows console/GUI application.",13,10
D "Licence:     freeware by vitsoft",13,10
D "Arguments:   InpEncoding OutEncoding InpFileName OutFileName",13,10
D "Example:     euroconv ISO8859-2 utf16/LE/BOM input.txt output.txt",13,10
D "Encodings:   euroconv enc | more",13,10
D "Interactive: euroconv",13,10
D "Manual:      https://vitsoft.info/econv_en.htm",13,10,0
; Static uninitialized data.
[.bss]
; 32bit copy of pointers from TableDir relocated from stub by WinMain.
CodePoint    D DWORD ; Offset of WORD  array in section [CodePoint].
Relevance    D DWORD ; Offset of BYTE  array in section [Relevance].
Translit     D DWORD ; Offset of DWORD array in section [Translit].
CodePoints   D DWORD ; The number of supported codepoints, i.e. the length of previous arrays.
EntVal4      D DWORD ; Offset of WORD  array in section [EntVal4].
EntName4     D DWORD ; Offset of DWORD array in section [EntName4].
Entities4    D DWORD ; The number of supported HTML entities with 1..4 characters.
EntVal8      D DWORD ; Offset of WORD  array in section [EntVal8].
EntName8A    D DWORD ; Offset of DWORD array in section [EntNamw8A].
EntName8B    D DWORD ; Offset of DWORD array in section [EntNamw8B].
Entities8    D DWORD ; The number of supported HTML entities with 5..8 characters.
CPid         D DWORD ; Offset of WORD  array in section [CPid].
CPname       D DWORD ; Offset of WORD  array in section [CPname].
CPrem        D DWORD ; Offset of WORD  array in section [CPrem].
CPurl        D DWORD ; Offset of WORD  array in section [CPurl].
CPtable      D DWORD ; Offset of WORD  array in section [CPtable].
CPinfo       D DWORD ; Offset of ASCIIZ strings in section [CPinfo].
CPtt         D DWORD ; Offset of 256*BYTE blocks in section [CPtt].
CodePages    D DWORD ; The number of supported encodings, i.e. the length of CP* arrays.
; Other static data of console variant.
Errorlevel   D DWORD ; ; 0=normal end, 2=invalid characters, 4=I/O error, 8=wrong syntax.
RunFromGUI   D DWORD ; Nonzero when the CON version was forked from GUI version.
InpEncId     D DWORD ; Input encoding identifiers (437..65001).
OutEncId     D DWORD ; Output encoding identifiers (437..65001).
OemEncId     D DWORD ; Output OEM encoding of current user.
AnsiEncId    D DWORD ; Output ANSI encoding of current user.
InpEncSt     D DWORD ; Input  encoding flags, see above.
OutEncSt     D DWORD ; Output encoding flags, see above.
InpErrors    D DWORD ; Number of input characters which are not defined in input encoding.
OutErrors    D DWORD ; Number of characters which are not defined in output encoding.
InpBegin     D DWORD ; Pointer to the first byte of input text mapped in memory.
InpEnd       D DWORD ; Pointer behind the last byte of input text mapped in memory.
DetectSize   D DWORD ; Input size used for autodetection. Max 1M.
SumRelevance D DWORD ; Total sum of relevances of all characters in the input text.
TrTable      D DWORD ; Offset of selected array of 128 WORDs with code points of OEM/ANSI encodings.
EntitySkip   D DWORD ; Number of bytes skipped when HTML entity is decoded on input (0..8).
CharSize     D DWORD ; Character width in bytes (1..4) used to increase relevance during autodetection.
InpFile      DS FILE ; FILE structure for access encapsulated by macros
OutFile      DS FILE ;    from library winfile.
TempString   D 128*BYTE ; Working room for string manipulation.
Cmd$         D 24 + 2 * MAX_PATH_SIZE * BYTE ; Room for the cmdline constructed dynamically by WinGui.
[.text]
↑ WinMain

Windows program entry point.

The DOS stub is already mapped in address virtual space of Windows version. Offsets of 16bit data sections will be recalculated from TableDir to 32bit pointers and used in this Windows program.

Input
Invoked by
MS Windows loader.
Invokes
WinConvert, WinGui, WinInfoText, WinParseEnc.
Calls
WinAutodetect, WinEncList, WinParseEnc.
[.text]
WinMain Procedure hWnd, uMsg, wParam, lParam
    Clear SEGMENT# [.bss], Size=SIZE# [.bss]
    MOV ESI,%^ImageBase  ; DOS stub is mapped at this VA.
    MOVZXW EDX,[ESI+PFMZ_DOS_HEADER.e_cparhdr] ; Paragraph size of MZ DOS header + relocations.
    SAL EDX,4 ; Convert paragraph size to bytes.
    LEA EBX,[ESI+EDX]
    CMPD [EBX+0],"%Signature[1..4]"  ; Check if TableDir is present in stub.
    JNE .BadLink:
    CMPD [EBX+4],"%Signature[5..8]"
    JNE .BadLink:
    ; Copy/Relocate table directory from stub to 32bit [.bss] segment.
member %FOR CodePoints,Entities4,Entities8,CodePages  ; Copy scalar values.
         MOVZXW EAX,[EBX+TABLEDIR.%member]
         MOV [%member],EAX
       %ENDFOR member
member %FOR CodePoint,Relevance,Translit,EntVal4,EntName4,\ Relocate pointers.
            EntVal8,EntName8A,EntName8B,CPid,CPname,CPrem,\
            CPurl,CPtable,CPinfo,CPtt
         MOVZXW EAX,[EBX+TABLEDIR.%member]
         ADD EAX,EBX
         MOV [%member],EAX
       %ENDFOR member
    WinAPI GetOEMCP
    MOV [OemEncId],EAX
    WinAPI GetACP
    MOV [AnsiEncId],EAX
    GetArg 5 ; Undocumented 5th argument was used when this instance was forked from GUI.
    JC .10:  ; Its value is arbitrary. Its existence will set flag RunFromGUI.
    ORB [RunFromGUI],1
.10:GetArg 1                       ; Input encoding.
    StripQuotes ESI,ECX
    TEST ECX
    JNZ WinCON:
    Invoke WinGui,[%hWnd],[%uMsg],[%wParam],[%lParam] ; When run without parameters, switch to windowed subsystem.
    JMP .Abort:
WinCON: ; If some command-line parameter was specified, continue with console subsystem.
    CMPB [ESI],'-'
    JE .Help:
    CMPB [ESI],'/'
    JE .Help:
    Invoke WinParseEnc, ESI,ECX
    JC .UnknownEncoding:
    MOV [InpEncId],EAX
    MOV [InpEncSt],EBX
    JNSt EBX,encStEnc,.11:
    CALL WinEncList:                  ; Display list of supported encodings.
    JMP .Abort:
.11:GetArg 2                       ; Output encoding.
    StripQuotes ESI,ECX
    Invoke WinParseEnc, ESI,ECX
    JNC .14:
.UnknownEncoding:
    ORB [Errorlevel],8
    JECXZ .Help:
    StdOutput ESI,Size=ECX
    StdOutput =' is not supported encoding.',Eol=Yes
    JMP .Abort:
.BadLink:
    StdOutput ='Internal error, TableDir not found.',Eol=Yes
    JMP .Abort:
.14:MOV [OutEncId],EAX
    MOV [OutEncSt],EBX
    GetArg 3                       ; Input file name.
    JNC .19:
.Help:StdOutput HelpText
    ORB [Errorlevel],8
    JMP .Abort:
.19:StripQuotes ESI,ECX
    JECXZ .Help:
    FileAssign  InpFile,ESI,Size=ECX
    FileMapOpen InpFile
    JNC .23:
    StdOutput ='Error reading input file "', InpFile.Name, ='"', Eol=Yes
    ORB [Errorlevel],4
    JMP .Abort:
.23:MOV [DetectSize],EAX
    CMP EAX,WinBlockSize
    JBE .27:
    MOV [DetectSize],WinBlockSize
.27:ADD EAX,ESI
    MOV [InpBegin],ESI
    MOV [InpEnd],EAX
    GetArg 4                        ; Output file name.
    JC .Help:
    StripQuotes ESI,ECX
    JECXZ .Help:
    FileAssign OutFile,ESI,Size=ECX
    FileStreamCreate OutFile, BufSize=64K
    JNC .31:
.OutputError:
    StdOutput ='Error writing output file "', OutFile.Name, ='"', Eol=Yes
    ORB [Errorlevel],4
    JMP .Abort:
.31:; Cmdline parameters are valid and accepted. Fix input encoding.
    MOV ESI,[InpBegin]
    MOV EDX,[DetectSize]
    ADD EDX,ESI
    ; Skip the fix if input endianess is irrelevant or explicitly specified.
    JSt [InpEncSt],encStLe|encStBe|encStAscii|encStUtf8|encStAuto, .69:
    JNSt [InpEncSt],encStUtf16, .49:
    ; Autodetect input UTF-16 endianess of text ESI..EDX.
    MOV EAX,EDX
    SUB EAX,ESI
    CMP EAX,2
    JB .42:    ; Skip if text is too short.
    MOVZXW EAX,[ESI]
    CMP AX,[BOM_UTF16LE]
    JNE .38:
    SetSt [InpEncSt],encStBom
.34:SetSt [InpEncSt],encStLe
    MOV EAX,1200
    JMP .73:
.38:CMP AX,[BOM_UTF16BE]
    JNE .46:
    SetSt [InpEncSt],encStBom
.42:SetSt [InpEncSt],encStBe
    MOV EAX,1201
    JMP .73:
.46:; No BOM is present in input, perform empiric autodetection.
    Invoke WinConvert, 1200,ESI,EDX,Void    ; Try UTF-16LE.
    MOV EBX,EAX ; Number of input errors if UTF-16LE.
    Invoke WinConvert, 1201,ESI,EDX,Void    ; Try UTF-16BE.
    CMP EAX,EBX
    JBE .42:    ; UTF16BE detected.
    JMP .34:    ; UTF16LE detected.
.49:JNSt [InpEncSt],encStUtf32, .69:
    ; Autodetect input UTF-32 endianess of text ESI..EDX.
    MOV EAX,EDX
    SUB EAX,ESI
    CMP EAX,4
    JB .61:
    MOV EAX,[ESI]
    CMP EAX,[BOM_UTF32LE]
    JNE .57:
    SetSt [InpEncSt],encStBom
.53:SetSt [InpEncSt],encStLe
    MOV EAX,12000
    JMP .73:
.57:CMP EAX,[BOM_UTF32BE]
    JNE .65:
    SetSt [InpEncSt],encStBom
.61:SetSt [InpEncSt],encStBe
    MOV EAX,12001
    JMP .73:
.65:; No BOM is present in input, perform empiric autodetection.
    Invoke WinConvert, 12000,ESI,EDX,Void     ; Try UTF-32LE.
    MOV EBX,EAX ; Number of input errors if UTF-32LE.
    Invoke WinConvert, 12001,ESI,EDX,Void    ; Try UTF-32BE.
    CMP EAX,EBX
    JBE .61:  ; UTF32BE detected.
    JMP .53:  ; UTF32LE detected.
.69:JNSt [InpEncSt],encStAuto, .61:
    ; Autodetect input encoding of text ESI..EDX.
    PUSH ESI
      CALL WinAutodetect:
    POP ESI
 .73:MOV EDX,[InpEnd]
    ; [InpEncId] is now finally specified. ESI..EDX is input text. It may start with BOM.
    ; Select the output encoding procedure.
    MOV EAX,[OutEncId]
    JNSt [OutEncSt],encStAuto|encStOem,.77:
    MOV EAX,[OemEncId]
    MOV [OutEncId],EAX
    SetSt [OutEncSt],encStAuto|encStOem
.77:Dispatch AX,65001,1200,1201,12000,12001,20127
    ; Undispatched output encoding is 8bit OEM/ANSI.
    MOV ECX,[CodePages]
    MOV EDI,[CPid]
    REPNE SCASW
    JNE .20127:  ; Unsupported output encoding - use ASCII.
    SUB EDI,2
    SUB EDI,[CPid]
    ADD EDI,[CPtable]
    MOVZXW EAX,[EDI]
    ADD EAX,[CPtt]
    MOV [TrTable],EAX
    MOV EDI, To8bit:
    JMP .88:
.65001:MOV EDI, ToUTF8:
    MOV ECX,EDX
    SUB ECX,ESI
    CMP ECX,4
    JBE .88:
    LODSD
    DEC ESI
    AND EAX,0x00FF_FFFF
    CMP EAX,[BOM_UTF8]
    JE .88:
    SUB ESI,3   ; No input UTF8-BOM is present.
    JMP .88:
.1200:
.1201:MOV EDI, ToUTF16:
    MOV ECX,EDX
    SUB ECX,ESI
    CMP ECX,2
    JB .88:
    TEST AL,1 ; Difference between BE and LE.
    LODSW
    JZ .81:
    CMP AX,[BOM_UTF16BE]
    JE .88:
.81:CMP AX,[BOM_UTF16LE]
    JE .88:
    SUB ESI,2 ; No input BOM is present.
    JMP .88:
.12000:
.12001: MOV EDI, ToUTF32:
    MOV ECX,EDX
    SUB ECX,ESI
    CMP ECX,4
    JB .88:
    TEST AL,1 ; Difference between BE and LE.
    LODSD
    JZ .84:
    CMP EAX,[BOM_UTF32BE]
    JE .88:
.84:CMP EAX,[BOM_UTF32LE]
    JE .88:
    SUB ESI,4 ; No input BOM is present.
    JMP .88:
.20127: MOV EDI, ToASCII:
.88: ; ESI..EDX is now input text with input BOM removed.
    JNSt [OutEncSt],encStUtf,.95: ; No BOM in non-Unicode encodings.
    JNSt [OutEncSt],encStBom,.95: ; No output BOM if not explicitely requested.
    ; Output BOM was requested. Write BOM before invokation of WinConvert.
    MOV EAX,[OutEncId]
    Dispatch AX,1200d,1201d,12000d,12001d
    FileStreamWrite OutFile,BOM_UTF8,3
    JMP .90:
.12001d:FileStreamWrite OutFile,BOM_UTF32BE,4
    JMP .90:
.12000d:FileStreamWrite OutFile,BOM_UTF32LE,4
    JMP .90:
.1201d: FileStreamWrite OutFile,BOM_UTF16BE,2
    JMP .90:
.1200d:FileStreamWrite OutFile,BOM_UTF16LE,2
.90:JC .OutputError:
.95:; ESI..EDX is input text without BOM. Output encoding callback procedure is now in EDI.
    Invoke WinConvert,[InpEncId],ESI,EDX,EDI ; The final conversion.
    MOV [InpErrors],EAX
    Invoke WinInfoText, =' In', InpFile,[InpEncId],[InpEncSt],[InpErrors]
    Invoke WinInfoText, ='Out', OutFile,[OutEncId],[OutEncSt],[OutErrors]
    MOV EAX,[InpErrors]
    OR  EAX,[OutErrors]
    JZ .Abort:
    ORB [Errorlevel],2
.Abort:
    FileClose OutFile, InpFile
    TESTB [RunFromGUI],1 ; Test if CON was forked from GUI version.
    JZ .NoGUI:
    StdOutput Eol=Yes ; Otherwise give the user some time to read InfoText.
    StdOutput ="Press any key to quit.",Eol=Yes
     WinAPI GetStdHandle,STD_INPUT_HANDLE
     MOV EBX,EAX
     PUSH ESI
     MOV ESI,ESP
       WinAPI GetConsoleMode,EBX,ESI
     POP ESI
     WinAPI SetConsoleMode,EBX,0 ; Switch the console to raw mode.
     WinAPI ReadConsole,EBX,TempString+4,1,TempString,0
     WinAPI SetConsoleMode,EBX,ESI ; Restore original console mode from ESI.
.NoGUI:
    TerminateProgram [Errorlevel]
  EndProcedure WinMain
↑ WinConvert InpEncId, TextPtr, TextEnd, WinOutputProc
Convert each character in input text from input encoding to Unicode point, and then call WinOutputProc to handle the codepoint (compute its relevance or convert it to output encoding).
Input
InpEncId is numeric encoding identifier 0..65535.
TextPtr is pointer to the beginning of text loaded in memory.
TextEnd is pointer behind the end of text loaded in memory.
WinOutputProc is address of callback procedure (subprogram defined in division WinOutputProc which handles each codepoint).
WinOutputProc Input
EAX is UniCodePoint character. It may also be a replacement character 0x0000_FFFD when input error occured.
WinOutputProc Output
CF=0 if the character was stored succesfully.
EBX,ECX,EDX,ESI must be preserved.
WinOutputProc Error
CF=1 on output error. Further conversion will be cancelled.
Output
EAX= number of input errors (malformed or undefined characters).
Invoked by
WinMain
Calls
WinHtmlDecode, WinOutputProc.
WinConvert Procedure InpEncId,TextPtr,TextEnd,WinOutputProc
    MOV EAX,[%InpEncId]
    MOV ESI,[%TextPtr]
    MOV EDX,[%TextEnd]
    SUB ECX,ECX
    MOV [%ReturnEAX],ECX  ; Initialize input error counter.
    INC ECX
    MOV [CharSize],ECX    ; Initialize CharSize=1. It will be increased if input encoding is Unicode.
    AND ECX,EAX ; Let ECX=1 for odd InpEncId, ECX=0 for even InpEncId (endianess in UTF-16 and UTF-32).
    Dispatch AX,65001,20127,1200,1201,12000,12001 ; Special encodings UTF or ASCII.
    ; Undispatched encodings is 8bit, let's select the translation table.
    ; Convert from OEM or ANSI 8bit encoding.
    MOV ECX,[CodePages]
    MOV EDI,[CPid]
    REPNE SCASW
    JNE .05:
    SUB EDI,[CPid] ; EDI is now offset of used CP in array of WORDs.
    ADD EDI,[CPtable]
    MOVZXW EBX,[EDI-2] ; EAX is now the corresponding offset of translation table in array CPtable.
    CMP BX,-1 ; If the CP doesn't have translation table (UTF or ASCII).
    JNE .10:
.05:; Unsupported CPid.
    DECD [%ReturnEAX]  ; Change error counter from 0 to 0xFFFF_FFFF if wrong CPid.
    JMP .90:
.10:ADD EBX,[CPtt]  ; EBX now points to translation table with 128 WORDs.
.15:CMP ESI,EDX
    JNB .90:
    XOR EAX,EAX
    LODSB
    CMP AL,128
    JB .20:
    MOV AX,[2*EAX+EBX-256]  ; Translate character 0x80..0xFF to Unicode.
    CMP AX,Replacement
    CMC                   ; CF=1 if AX=0xFFFD,0xFFFE or 0xFFFF (replacement or undefined).
    ADCD [%ReturnEAX],0   ; Input error.
.20:CMP EAX,'&' ; Possible beginning of HTML entity.
    JNE .22:
    JNSt [InpEncSt],encStHtm|encStHtml,.22: ; Skip if HTML entity should be ignored.
    CALL WinHtmlDecode:
.22:CALL [%WinOutputProc]
    JC .90:
    JMP .15: ; The next input character.
.65001: ; Convert from UTF-8 encoding.
    SUB EBX,EBX
    SUB EDX,ESI
    MOV [%ReturnEAX],EBX
    DecodeUTF8 ESI,.Store,Size=EDX,Width=32 ; Uses macro from string32.htm.
    JMP .90:

.Store:PROC ; Internal subprocedure .Store is callback from the macro DecodeUTF8.
         ; It is expected to pass decoded codepoint EAX to %WinOutputProc.
         MOVB [CharSize],1  ; CharSize will be applied in codepage autodetection in GetRelevance.
         CMP EAX,80h
         JNA .2:
         MOVB [CharSize],2
         CMP EAX,800h
         JNA .2:
         MOVB [CharSize],3
    .2:  CMP EAX,Replacement
         CMC
         ADCD [%ReturnEAX],0 ; Replacement and unsupported codepoints increment input error counter.
         MOV ECX,[EntitySkip]
         JECXZ .5:
         DEC ECX
         MOV [EntitySkip],ECX
         RET ; Ignore the remaining letters of already decoded HTML entity.
    .5:  CMP EAX,'&'          ; Possible beginning of HTML entity.
         JNE .9:
         JNSt [InpEncSt],encStHtm|encStHtml,.9:
         MOV EDI,ESI
         CALL WinHtmlDecode:
         SUB ESI,EDI ; How many bytes should decoder advance to skip the decoded entity (0..9).
         MOV [EntitySkip],ESI
    .9:  JMP [%WinOutputProc]
       ENDP .Store:

.20127: ; Convert from ASCII encoding.
    CMP ESI,EDX
    JNB .90:
    XOR EAX,EAX
    LODSB
    CMP AL,128
    JB .25:
    MOV AX,Replacement
    INCD [%ReturnEAX] ; Input error.
.25:JNSt [InpEncSt],encStHtm|encStHtml,.27:
    CALL WinHtmlDecode
.27:CALL [%WinOutputProc]
    JC .90:
    JMP .20127: ; The next character.

.1200: ; Convert from UTF-16LE encoding. ECX=0.
.1201: ; Convert from UTF-16BE encoding. ECX=1.
    MOVB [CharSize],2
    MOV EBX,EDX
    SUB EBX,ESI
    AND EBX,1
    JZ .30:
    SUB EDX,EBX ; Text size is not WORD aligned, truncate.
    INCD [%ReturnEAX]
.30:CMP ESI,EDX
    JNB .90:
    XOR EAX,EAX
    LODSW
    JECXZ .35:
    XCHG AL,AH  ; Convert UTF-16BE to UTF-16LE.
.35:CMP AX,0xD7FF
    JBE .55:
    CMP AX,0xE000
    JAE .55:
    ; High surrogate expected (0xD800..0xDBFF).
    SUB AX,0xD800
    CMP AX,0x0400
    JAE .45:
    MOV EDI,EAX ; Temporary save high 10 bits.
    SHL EDI,10
    CMP ESI,EDX
    JNB .45:
    LODSW       ; Fetch the low surrogate.
    JECXZ .40:
    XCHG AL,AH  ; Convert UTF-16BE to UTF-16LE.
.40:; Low surrogate expected (0xDC00..0xDFFF).
    SUB AX,0xDC00
    JB .45:
    CMP AX,0x0400
    JB .50:
.45:MOV EAX,Replacement
    INCD [%ReturnEAX]
    JMP .55:
.50:LEA EAX,[EAX+EDI+0x10000] ; Compose codepoint from both surrogates.
.55:JNSt [InpEncSt],encStHtm|encStHtml,.57:
    CALL WinHtmlDecode:
.57:CALL [%WinOutputProc]
    JC .90:
    JMP .30:

.12000: ; Convert from UTF-32LE encoding. ECX=0.
.12001: ; Convert from UTF-32BE encoding. ECX=1.
    MOVB [CharSize],4
    MOV EBX,EDX
    SUB EBX,ESI
    AND EBX,3
    JZ .60:
    SUB EDX,EBX ; Text size is not DWORD aligned, truncate.
    INCD [%ReturnEAX]
.60:CMP ESI,EDX
    JNB .90:
    LODSD
    JECXZ .65:  ; ECX=1 if UTF-32BE.
    BSWAP EAX  ; Convert UTF-32BE to UTF-32LE.
.65:CMP EAX,10FFFFh
    JA .70:          ; Invalid above 10FFFFh.
    CMP EAX,0xD800
    JB .80:          ; Valid below 0xD800.
    CMP EAX,0xDFFF   ; Invalid below 0xDFFF.
    JA .80:
.70:MOV EAX,Replacement
    INCD [%ReturnEAX]
.80:JNSt [InpEncSt],encStHtm|encStHtml,.85:
    CALL WinHtmlDecode:
.85:CALL [%WinOutputProc]
    JNC .60:
.90:EndProcedure WinConvert
WinOutputProc
is a collection of output procedures used as callback from WinConvert:
Void does nothing, it is used when WinConvert needs to count input errors (input endianess autodetection).
Relevance accumulates this codepoint's property (positive or negative number) in global variable [Relevance]. This is used for input encoding autodetection.
Replace is used to replace codepoint unsupported by output encoding with its entity/question mark/transliteration, which will be written to OutFile. Global memory variable [OutErrors] is incremented.
ToASCII, To8bit, ToUTF8, ToUTF16, ToUTF32 are used to convert the codepoint into output encoding and write it to OutFile.
Input
EAX= codepoint, i.e. integer number in the range 0..0xD7FF or 0xE000..0x10FFFF.
OutFile must be opened for writing.
Output
CF=0 when the codepoint was processed succesfully. Otherwise WinConvert cancels further conversion.
EAX, EDI are undefined, other registers must be preserved.
Called by
WinConvert as a callback procedure with register-calling convention.
Void PROC         ; Empty conversion.
      CLC
      RET
     ENDP Void

GetRelevance PROC ; Relevance is probability, that codepoint EAX appears in input text.
    PUSH ECX,EDI
     TEST EAX,0xFFFF_0000
     JNZ .4:
     MOV EDI,[CodePoint]
     MOV ECX,[CodePoints]
     REPNE SCASW
 .4: MOV EAX,??    ; Relevance of invalid character is negative.
     JNE .9:
     DEC EDI,EDI
     SUB EDI,[CodePoint]
     SHR EDI,1
     ADD EDI,[Relevance]
     MOVSXB EAX,[EDI]
 .9: IMUL EAX,[CharSize] ; Longer characters have bigger relevance.
     ADD [SumRelevance],EAX ; Accumulate the value in global memory location.
     CLC
    POP EDI,ECX
    RET
   ENDP GetRelevance:

Replace PROC      ; Replace codepoint EAX which does not exists in output encoding.
   PUSH EBX,ECX,ESI
     MOV EBX,[OutEncSt]
     MOV ESI,TempString
     JSt EBX,encStIgn,.9: ; Ignore unsupported character.
     MOV EDI,ESI
     JNSt EBX,encStQm,.2:
  .1:MOV EDI,ESI
     MOVW [EDI],'?'       ; Replace codepoint EAX with question mark.
     JMP .6:
  .2:JNSt EBX,encStHtm|encStHtml,.3:
     MOVD [EDI],'&#x'     ; Replace codepoint EAX with its HTML entity.
     ADD EDI,3
     StoH EDI,Align=Left
     MOV AX,';'
     STOSW
     JMP .6:
  .3:TEST EAX,0xFFFF_0000 ; Replace codepoint EAX with its transliteration.
     JNZ .1:
     MOV ECX,[CodePoints]
     MOV EDI,[CodePoint]
     REPNE SCASW
     JNE .1:
     DEC EDI,EDI
     SUB EDI,[CodePoint]
     SHL EDI,1
     ADD EDI,[Translit]
     MOV EAX,[EDI] ; EAX now contains 0..4 ASCII character, NUL padded.
     MOV EDI,ESI ; TempString.
  .4:CMP AL,0
     JZ .5:
     STOSB
     SHR EAX,8
     JMP .4:
  .5:STOSB       ; NUL-terminate replacement string.
  .6:SUB EAX,EAX ; TempString at ESI is now ASCIIZ string with replacement.
     LODSB
     CMP AL,0
     JZ .9:
     JSt EBX,encStUtf16,.7:
     JSt EBX,encStUtf32,.8:
     FileStreamWriteByte OutFile
     JMP .6:
  .7:CALL ToUTF16:
     JMP .6:
  .8:CALL ToUTF32:
     JMP .6:
.9:POP ESI,ECX,EBX
   INCD [OutErrors]
   CLC
   RET
 ENDP Replace

ToASCII PROC     ; Convert codepoint EAX to ASCII encoding.
    CMP EAX,127
    JA Replace:
    FileStreamWriteByte OutFile
    RET
   ENDP ToASCII

To8bit PROC      ; Convert codepoint EAX to OEM/ANSI encoding using [TrTable].
   CMP EAX,127
   JBE .8:       ; ASCII 7bit characters are copied verbatim.
   TEST EAX,0xFFFF_0000
   JNZ Replace:  ; Character outside BMP is replaced with question mark.
   MOV EDI,[TrTable]
   PUSH ECX
     MOV ECX,128
     REPNE SCASW ; Search for codepoint in TrTable.
   POP ECX
   JNE Replace:  ; If codepoint EAX is not supported by output encoding.
   SUB EDI,[TrTable]     ; EDI is now 2,4,6,8,,,256.
   SHR EDI,1             ; EDI is now 1,2,3,4,,,128.
   LEA EAX,[EDI+128-1]   ;  AL is now 128,129,,,255.
.8:FileStreamWriteByte OutFile
   RET
  ENDP To8bit

ToUTF32 PROC     ; Convert codepoint EAX to UTF-32 encoding.
     JNSt [OutEncSt],encStBe, .8:
     BSWAP EAX
  .8:FileStreamWriteDword OutFile
     RET
  ENDP ToUTF32

ToUTF16 PROC     ; Convert codepoint EAX to UTF-16 encoding.
    TEST EAX,0xFFFF_0000
    JZ .5:
    ; Character outside BMP will be written as two surrogates.
    SUB EAX,0x10000
    MOV EDI,EAX
    SHR EDI,10
    ADD EDI,0xD800 ; EDI is now the high surrogate.
    XCHG EDI,EAX
    JNSt [OutEncSt],encStBe,.3:
    XCHG AL,AH
 .3:FileStreamWriteWord OutFile
    XCHG EAX,EDI   ; Restore original codepoint in EAX.
    AND EAX,0x3FF
    ADD EAX,0xDC00 ; EDI is now the low surrogate.
 .5:JNSt [OutEncSt],encStBe,.8:
    XCHG AL,AH
 .8:FileStreamWriteWord OutFile
    RET
  ENDP ToUTF16

ToUTF8 PROC     ; Convert codepoint EAX to UTF-8 encoding.
    MOV EDI,EAX
    CMP EAX,0x7F
    JBE .8:
    CMP EAX,0x7FF
    JBE .4:
    CMP EAX,0xFFFF
    JBE .2:
    ; 4byte encoding.
    SHR EAX,18
    OR AL,0xF0
    CALL .8:
    MOV EAX,EDI
    SHR EAX,12
    CALL .6:
    MOV EAX,EDI
    SHR EAX,6
    CALL .6:
    JMP .5:
 .2: ; 3byte encoding.
    SHR EAX,12
    OR AL,0xE0
    CALL .8:
    MOV EAX,EDI
    SHR EAX,6
    CALL .6:
    MOV EAX,EDI
    JMP .6:
 .4:; 2byte encoding.
    SHR EAX,6
    OR AL,0xC0
    CALL .8:
 .5:MOV EAX,EDI
 .6:AND EAX,0x3F
    OR AL,0x80
 .8:FileStreamWriteByte OutFile
    RET
  ENDP ToUTF8
WinHtmlDecode
Proc WinHtmlDecode is called from WinConvert when the convertor encounters an ampersand in input text.
It will recognize HTML entity in input stream, e.g. &nbsp; and convert the entity to its codepoint, e.g. to 0x0000_00A0.
HTML entity may be
  1. named, e.g. &nbsp; (case sensitive),
  2. decimal, e.g. &#160;, or
  3. hexadecimal, e.g. &#xA0; (case insensitive).
Input
EAX=0x0000_0026 (ampersand).
ESI points behind the ampersand in input text.
EDX points behind the end of input text.
[InpEncSt] flags encStUtf16,encStUtf32 specify input character width (8,16,32).
[InpEncSt] flags encStLe,encStBe specify input character endianess.
[InpEncSt] flags encStHtm,encStHtml specify if ASCII entity &<>" should be decoded, too.
Output
All registers unchanged if no valid entity was decoded. Otherwise
EAX= entity codepoint.
ESI points behind the terminating semicolon in input text.
Called by
WinConvert
WinHtmlDecode:PROC
     PUSHAD
       MOV ECX,EDX
       MOV EDI,ESI
       SUB ECX,ESI
       MOV EAX,';'
       JSt [InpEncSt],encStUtf16, .10:
       JSt [InpEncSt],encStUtf32, .30:
       REPNE SCASB ; Search for entity terminator in 8bit input stream.
       JNE .90:
       MOV EBP,EDI ; Remember the input stream position behind semicolon.
       SUB EDI,ESI
       MOV ECX,EDI
       CMP ECX,SIZE# TempString
       JA .90:
       MOV EDI,TempString
       REP MOVSB
       JMP .50:
   .10:SHR ECX,1
       JNSt [InpEncSt],encStBe,.15:
       XCHG AL,AH
   .15:REPNE SCASW ; Search for entity terminator in 16bit input stream.
       JNE .90:
       MOV EBP,EDI ; Remember the input stream position behind semicolon.
       SUB EDI,ESI
       MOV ECX,EDI
       SHR ECX,1
       JZ .90:
       CMP ECX,SIZE# TempString
       JA .90:
       MOV EDI,TempString
   .20:LODSW
       JNSt [InpEncSt],encStBe,.25:
       XCHG AL,AH
   .25:CMP AX,128
       JAE .90:
       STOSB
       LOOP .20:
       JMP .50:
   .30:SHR ECX,2
       JNSt [InpEncSt],encStBe,.35:
       BSWAP EAX
   .35:REPNE SCASD ; Search for entity terminator in 32bit input stream.
       JNE .90:
       MOV EBP,EDI ; Remember the input stream position behind semicolon.
       SUB EDI,ESI
       MOV ECX,EDI
       SHR ECX,2
       JZ .90:
       CMP ECX,SIZE# TempString
       JA .90:
       MOV EDI,TempString
   .40:LODSD
       JNSt [InpEncSt],encStBe,.45:
       BSWAP EAX
   .45:CMP EAX,128
       JAE .90:
       STOSB
       LOOP .40:
   .50:MOV ESI,TempString
       SUB EDI,ESI ; EDI is now the size of HTML entity in ASCII bytes, including semicolon.
       LODSB
       CMP AL,'#'  ; Test if the entity is numeric.
       JE .65:
       CMP EDI,8+1 ; Longer named entities are not supported.
       JA .90:
       DEC ESI
       MOV CL,5
       LODSD       ; First to fourth characters of entity.
       SUB ECX,EDI
       JS .55:
       ; TempString has 1..4 letters.
       XOR EBX,EBX
       DEC EBX     ; Prepare mask to EBX.
       SAL ECX,3
       SHR EBX,CL
       AND EAX,EBX ; NUL-pad shorter entity name.
       MOV ECX,[Entities4]
       MOV EDI,[EntName4]
       REPNE SCASD ; Search for the entity by name.
       JNE .90:
       SUB EDI,4
       SUB EDI,[EntName4]
       SHR EDI,1
       ADD EDI,[EntVal4]
       MOVZXW EAX,[EDI]
       JMP .80: ; Decoded entity codepoint is now in EAX.
   .55:XCHG EAX,EDX ; Temporarily save first four characters to EDX.
       LODSD        ; Fifth to eighth characters.
       ADD ECX,4
       JS .90:
       XOR EBX,EBX
       DEC EBX
       SAL ECX,3
       SHR EBX,CL
       AND EAX,EBX  ; NUL-pad shorter entity name.
       XCHG EDX,EAX ; Fifth..eighth characters are now in EDX.
       MOV ECX,[Entities8]
       MOV EDI,[EntName8A]
   .60:REPNE SCASD  ; Search for the entity by its first four letters.
       JNE .90:
       PUSH EDI
         SUB EDI,4
         SUB EDI,[EntName8A]
         ADD EDI,[EntName8B]
         CMP EDX,[EDI]
       POP EDI
       JNE .60:     ; if 5. and higher letters do not match.
       SUB EDI,4
       SUB EDI,[EntName8A]
       SHR EDI,1
       ADD EDI,[EntVal8]
       MOVZXW EAX,[EDI]
       JMP .80:  ; Decoded entity codepoint is now in EAX.
   .65:LODSB     ; Numeric entity expected.
       CMP AL,'0'
       JB .90:
       OR AL,'x'^'X'
       CMP AL,'x'
       JE .75:
       DEC ESI   ; ESI should now point to decimal number terminated with semicolon.
       LodD ESI  ; Use macro LodD or LodH from library cpuext32.
       JMP .77:
   .75:LodH ESI  ; ESI should now point to hexadecimal number terminated with semicolon.
   .77:JC .90:
       XCHG EAX,EDX
        LODSB
        CMP AL,';'
       XCHG EDX,EAX
       JNE .90:
   .80:; EAX is decoded codepoint, EBP is pointer behind the entity in text.
       JSt [InpEncSt],encStHtml,.85:
       CMP EAX,128
       JB .90:           ; Skip if ASCII entities should not be converted.
   .85:MOV [ESP+4],EBP   ; %ReturnESI.
       MOV [ESP+28],EAX  ; %ReturnEAX.
 .90:POPAD
     RET
    ENDP WinHtmlDecode:
↑ WinAutodetect

This PROC will calculate the total relevance of characters in a given input encoding. Relevance will increase when translated character is common letter, and it will decrease when the character is nonalphabetical. This is repeated with each supported encoding and procedure returns the encoding with highest achieved relevance.

Only first 1 MB of big file is used for autodetection.

Input
ESI points to the beginning of input file in memory.
EDX points behind the input file in memory.
Output
[InpEncId] contains the autodetected encoding identifier with the best relevance.
EAX,EBX,ECX,EDX,ESI,EDI are undefined.
Called by
WinMain
Invokes
WinConvert.
Calls
WinSetEncBox
WinAutodetect: PROC
    SUB ECX,ECX  ; ECX will be CP index (0,1,2,,,[CodePages]).
    MOV [InpEncId],ECX
    MOV [InpEncSt],encStAuto
    MOV EBX,0x8000_0000 ; EBX will remember the so far best relevance.
.A1:MOV [SumRelevance],0
    PUSH ECX,EDX
      WinAPI SendMessage,[hInpEncBox],LB_SETCURSEL,ECX,0
      TEST EAX
      JZ .A2:  ; Skip animation when not in GUI version.
      WinAPI Sleep,50
.A2:POP EDX,ECX
    MOV EAX,[CPid]
    MOVZXW EAX,[EAX+2*ECX]
    CMP AX,912
    JNE .A3:
    DEC [SumRelevance] ; Slightly discriminate IBM912 against almost identical ISO8859-2.
.A3:Invoke WinConvert,EAX,ESI,EDX, GetRelevance:
    MOV EAX,[SumRelevance]
    CMP EAX,EBX
    JLE .A4:
    ; Encoding indexed by ECX is better candidate than all previous ones.
    MOV EBX,EAX
    MOV EAX,[CPid]
    MOVZXW EAX,[EAX+2*ECX]
    MOV [InpEncId],EAX  ; Remember the so far best input encoding.
.A4:INC ECX   ; Try the next codepage.
    CMP CX,[CodePages]
    JB .A1:
    MOV EBX,[hInpEncBox]
    MOV EDX,[InpEncId]
    CALL WinSetEncBox:
    WinAPI SetFocus,[hInpEncBox]
    RET
   ENDP WinAutodetect:
↑ WinSetEncBox
This PROC will animate autoselection in listbox for input or output selection of encoding. It does nothing when GUI window was not created.
Input
EBX is handle of listbox for encoding selection in GUI.
EDX in encoding identifier where the cursor of selection will be positioned.
Output
EAX,ECX,ESI,EDI are undefined.
Called by
WinAutodetect, WinCreate
WinSetEncBox:PROC ; Input: EBX=EncBoxHandle, EDX=EncId.
    MOV EDI,[CodePages]
.A6:PUSH EDX
      WinAPI SendMessage,EBX,LB_SETCURSEL,EDI,0
    POP EDX
    TEST EAX
    JZ .A8:
    MOV ESI,TempString
    PUSH EDX
      WinAPI SendMessage,EBX,LB_GETTEXT,EDI,ESI
    POP EDX
    LODSW
    CMP AX,'CP'
    JNE .A7:
    LodD ESI
    CMP EAX,EDX
    JE .A9:
.A7:PUSH EDX
      WinAPI Sleep,20
    POP EDX
.A8:DEC EDI
    JNS .A6:
.A9:RET
   ENDP WinSetEncBox:
WinGrep Needle, TextBegin, TextEnd
Macro WinGrep searches case-sensitive for the string Needle in Text.
Input
Needle is searched text, e.g. utf.
TextBegin is pointer to the 1st character of Text.
TextEnd is pointer behind the last character of Text.
Output
CF=0, ZF=1 if Needle was found.
EDI points right behind the Needle.
Error
ZF=0 if Needle not found.
EDI points to TextBegin.
Expanded by
WinParseEnc
WinGrep %MACRO Needle, TextBegin, TextEnd
       PUSHD %TextEnd, =B'%Needle'
       MOV EDI,%TextBegin
       CALL WinGrep@RT
WinGrep@RT:PROC1
         PUSHAD
          MOV ESI,[ESP+36] ; Pointer to the Needle.
          GetLength$ ESI   ; Return Needle size in ECX.
          LEA EDX,[ECX-1]  ; Size of Needle.
          MOV ECX,[ESP+40] ; TextEnd.
          SUB ECX,EDI      ; Text size.
          JB .9:
          LODSB            ; First character of Needle.
    .1:   REPNE SCASB
          JNE .9:
          PUSH ECX,ESI,EDI
           MOV ECX,EDX
           REPE CMPSB
          POP EDI,ESI,ECX
          JNE .1:
          LEA EDI,[EDI+EDX]
          MOV [ESP+0],EDI   ; %ReturnEDI.
   .9:   POPAD
         RET  ; ZF=1 if Needle was found.
       ENDP1 WinGrep@RT
     %ENDMACRO WinGrep
WinGrepNum TextBegin, TextEnd
Macro WinGrepNum searches for a continuous sequence of decimal digits between TextBegin..TextEnd.
Input
TextBegin is pointer to the 1st character of Text.
TextEnd is pointer behind the last character of Text.
Output
CF=0 if a number exists in Text.
EAX= number value.
ESI= points right behind the number.
Error
CF=1 if no decimal digit exists in Text.
EAX, ESI unchanged.
Expanded by
WinParseEnc
WinGrepNum %MACRO TextBegin, TextEnd
       PUSHD %TextEnd, %TextBegin
       CALL WinGrepNum@RT
WinGrepNum@RT:PROC1
       PUSHAD
         MOV ECX,[ESP+40] ; TextEnd.
         MOV ESI,[ESP+36] ; TextBegin.
         SUB ECX,ESI
         JB .9:
         SUB EAX,EAX
   .3:   LODSB
         SUB AL,'0'
         JB .5:
         CMP AL,9
         JNA .7:
   .5:   DEC ECX
         JNZ .3:
         STC
         JMP .9:
   .7:   DEC ESI
         LodD ESI
         JC .9:
         MOV [ESP+4],ESI
         MOV [ESP+28],EAX
   .9: POPAD
       RET  ; CF=0 if a valid number EAX was found.
     ENDP1 WinGrepNum@RT
  %ENDMACRO WinGrepNum
↑ WinParseEnc Enc$Ptr, Enc$Size
Recognize encoding in a given string which was submitted to EuroConv as the first and second cmdline argument.
Input
Enc$Ptr is pointer to a quoted or unquoted string with encoding specification, e.g. Utf-16-LE-BOM.
Enc$Size is string size in bytes.
Output
CF=0, EAX= detected encoding identifier 0..65535.
EBX=detected encoding flags.
Error
CF=1 if no supported encoding detected.
EAX=0
Expands
WinGrep, WinGrepNum
Invoked by
WinMain.
WinParseEnc Procedure Enc$Ptr, Enc$Size
Enc$    LocalVar Size=32 ; Input string converted to lower case.
Enc$End LocalVar         ; Pointer to the end of string in Enc$.
      ClearLocalVar
      MOV ESI,[%Enc$Ptr]
      MOV ECX,[%Enc$Size]
      XOR EBX,EBX
      MOV [%ReturnEAX],EBX
      MOV [%ReturnEBX],EBX
      CMP ECX,24
      JNB .Err:
      StripQuotes ESI,ECX
      TEST ECX
      JZ   .Err:,DIST=NEAR   ; Argument is empty.
      LEA EDI,[%Enc$]
.LoCa:LODSB
      OR AL,0x20  ; Simplified conversion to lower case.
      STOSB
      DEC ECX
      JNZ .LoCa:
      MOV [%Enc$End],EDI
      MOV EDX,EDI
      LEA ESI,[%Enc$]
      ; Parse all encoding properties from text ESI..EDX into flags in EBX.
 prop %FOR ascii,utf,bom,le,be,htm,html,qm,ign,transl,oem,ansi,auto,enc
        WinGrep %prop,ESI,EDX
        %Prop1 %SETC '%prop[1]' & ~('A'^'a') ; Uppercase the 1st letter of %prop.
        JNE .N%prop:
        SetSt EBX,encSt%Prop1%prop[2..]
        .N%prop:
      %ENDFOR prop
      JNSt EBX,encStHtml, .Shtm:
      RstSt EBX,encStHtm
.Shtm:JNSt EBX,encStAscii, .Nas:
      MOV EAX,20127
      JMP .End1:
.Nas: JNSt EBX,encStAuto|encStEnc, .Nau:
      XOR EAX,EAX
      JMP .End1:
.Nau: JNSt EBX,encStOem, .Noe:
      WinAPI GetOEMCP
      SetSt EBX,encStAuto
      JMP .End1:
.Noe: JNSt EBX,encStAnsi, .Nan:
      MOV EAX,[AnsiEncId]
.End1:JMP .End:,DIST=NEAR
.Nan: JNSt EBX,encStUtf,.Nut:
      WinGrepNum ESI,EDX
      JC .Err:
      Dispatch AL,8,16,32
.Err: STC
      JMP .Ret:
.8:   SetSt EBX,encStUtf8
      MOV EAX,65001
      JMP .End:
.16:  SetSt EBX,encStUtf16
      MOV EAX,1200
      JMP .En:
.32:  SetSt EBX,encStUtf32
      MOV EAX,12000
.En:  JSt EBX,encStLe, .End:
      JNSt EBX,encStBe, .End: ; Endianess will be detected later.
      INC EAX
      JMP .End:,DIST=NEAR
.Nut: ; Try direct CPid numeric specification.
      WinGrepNum ESI,EDX
      JC .Nnum:
      MOV EDI,[CPid] ; Try to find the number EAX in array [CPid].
      MOV ECX,[CodePages] ; Number of supported code pages.
      REPNE SCASW
      JE .End2:,DIST=NEAR
      ; String Enc$ is not direct CPid. It could contain some alternative CP number:
      Dispatch AX,8859,790,916,919,920,921,923,991,1208
      JMP .Nnum:
.1208:MOV AX,65001  ; IBM1208 is UTF-8 = CP65001.
      SetSt EBX,encStUtf+encStUtf8
      JMP .End2:
.790:
.991: MOV AX,667    ; IBM790,IBM991 is Mazovia=CP667.
      JMP .End2:
.916: MOV AX,28598  ; "ISO-8859-8","IBM916, Latin/Hebrew"
      JMP .End2:
.919: MOV AX,28600  ;"ISO-8859-10","IBM919, Latin 6, Nordic"
      JMP .End2:
.920: MOV AX,28599  ;"ISO-8859-9","IBM920, Latin 5, Turkish",
      JMP .End2:
.921: MOV AX,28603  ; "ISO-8859-13","IBM921, Latin 7, Baltic"
      JMP .End2:
.923: MOV AX,28605  ; "ISO-8859-15","IBM923, Latin 9, Western Europe"
      JMP .End2:
.8859:WinGrepNum ESI,EDX ; Get number behind ISO-8859-.
      JC .Err:
      TEST EAX  ; ISO-8859-0  is not supported.
      JZ .Err:
      CMP AX,12 ; ISO-8859-12 is not supported.
      JE .Err:
      CMP AX,16 ; 8859-1 .. 8859-16 is supported.
      JA .Err:
      ADD EAX,28590
.End2:JMP .End:,DIST=NEAR
.Nnum: ; No numeric CPid in Enc$ was found (or 8 in KOI8). Try letter-only strings.
      LEA ESI,[%Enc$]
      MOV AX,10101
      WinGrep nextstep,ESI,EDX
      JE .End2:
      MOV AX,667
      WinGrep mazo,ESI,EDX  ; Try Mazovia.
      JE .End2:
      MOV AX,895
      WinGrep kame,ESI,EDX  ; Try Kamenických alias KEYBCS2.
      JE .End3:
      WinGrep bcs2,ESI,EDX
      JE .End3:
      WinGrep koi8,ESI,EDX  ; Try KOI8.
      JNE .MAC:
      MOV ESI,EDI        ; Points behind KOI8.
      MOV AX,885         ; Try KOI8-CS.
      WinGrep cs,ESI,EDX
      JE .End3:
      MOV AX,878         ; Try KOI8-R.
      WinGrep r,ESI,EDX
      JE .End3:
      MOV AX,1168        ; Try KOI8-U.
      WinGrep u,ESI,EDX
      JE .End3:
      MOV AX,880         ; Try KOI8-E.
      WinGrep e,ESI,EDX
      JE .End3:
      MOV AX,882         ; Try KOI8-T.
      WinGrep t,ESI,EDX
      JE .End3:
      MOV AX,884         ; Try KOI8-F.
      WinGrep f,ESI,EDX
.End3:JE .End4:,DIST=NEAR
.MAC: LEA ESI,[%Enc$]
      WinGrep mac,ESI,EDX   ; Try Macintosh.
      JNE .Err:          ; If no other choices left, give up.
      MOV AX,10010
      WinGrep romanian,ESI,EDX
      JE .End4:
      WinGrep rumun,ESI,EDX
      JE .End4:
      MOV AX,10000
      WinGrep roman,ESI,EDX
      JE .End4:
      MOV AX,10004
      WinGrep arab,ESI,EDX
      JE .End4:
      MOV AX,10005
      WinGrep hebre,ESI,EDX
.End4:JE .End5:
      MOV AX,10006
      WinGrep greek,ESI,EDX
      JE .End5:
      MOV AX,10007
      WinGrep cyril,ESI,EDX
      JE .End5:
      MOV AX,10017
      WinGrep ukr,ESI,EDX
      JE .End5:
      MOV Ax,10021
      WinGrep thai,ESI,EDX
      JE .End5:
      MOV AX,10079
      WinGrep iceland,ESI,EDX
      JE .End5:
      MOV AX,10029
      WinGrep ce,ESI,EDX
      JE .End5:
      MOV AX,10080
      WinGrep inuit,ESI,EDX
.End5:JE .End:
      MOV AX,10081
      WinGrep turk,ESI,EDX
      JE .End:
      MOV AX,10082
      WinGrep croat,ESI,EDX
      JE .End:
      MOV AX,10083
      WinGrep gael,ESI,EDX
      JE .End:
      MOV AX,10084
      WinGrep celtic,ESI,EDX
      JE .End:
      MOV AX,10089
      WinGrep latin,ESI,EDX
      JE .End:
      WinGrep kermit,ESI,EDX
      JE .End:
      STC
      JMP .Ret:
.End: MOV [%ReturnEAX],EAX
      MOV [%ReturnEBX],EBX
.Ret:EndProcedure WinParseEnc   ; CF=error
↑ WinEncList
Display the list of all supported encodings on console.
Input
-
Output
EAX,EBX,ECX,EDX,ESI,EDI changed.
Invoked by
WinMain
WinEncList PROC
    StdOutput ='  EuroConv supported encodings',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  OEM/ANSI 8bit code pages:',Eol=Yes
    SUB EDX,EDX ; Encoding index 0..[CodePages].
.10:MOV EAX,[CPid]
    MOVZXW EAX,[EAX+2*EDX]
    Dispatch AX,20127,65001,1200,1201,12000,12001
    StoD TempString
    XOR EAX,EAX
    STOSD
    MOV EBX,[CPname]
    MOVZXW EBX,[EBX+2*EDX]
    MOV ESI,[CPinfo]
    LEA ESI,[ESI+EBX]
    MOV EBX,[CPrem]
    MOVZXW EBX,[EBX+2*EDX]
    MOV EDI,[CPinfo]
    LEA EDI,[EDI+EBX]
    StdOutput ='CP',TempString,=', "',ESI,='" ',EDI,Eol=Yes
.20127:
.65001:
.1200:
.1201:
.12000:
.12001:
    INC EDX
    CMP EDX,[CodePages]
    JB .10:
    StdOutput Eol=Yes
    StdOutput ='  Plain ASCII 7bit encoding:',Eol=Yes
    StdOutput ='CP20127, "ASCII"',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  Unicode encoding:',Eol=Yes
    StdOutput ='CP1200,  "UTF-16LE"',Eol=Yes
    StdOutput ='CP1201,  "UTF-16BE"',Eol=Yes
    StdOutput ='                  "UTF-16" (endianess will be autodetected)',Eol=Yes
    StdOutput ='CP12000, "UTF-32LE"',Eol=Yes
    StdOutput ='CP12001, "UTF-32BE"',Eol=Yes
    StdOutput ='                  "UTF-32" (endianess will be autodetected)',Eol=Yes
    StdOutput ='CP65001, "UTF-8"',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  Special assignment:',Eol=Yes
    WinAPI GetOEMCP
    StoD TempString
    XOR EAX,EAX
    STOSB
    StdOutput ='OEM    = console encoding selected by regional settings: CP',TempString,Eol=Yes
    WinAPI GetACP
    StoD TempString
    XOR EAX,EAX
    STOSB
    StdOutput ='ANSI   = graphic encoding selected by regional settings: CP',TempString,Eol=Yes
    StdOutput ='AUTO   = autodetect encoding',Eol=Yes
    StdOutput ='ENC    = display this list of supported encodings.',Eol=Yes
    StdOutput Eol=Yes
    StdOutput ='  Encoding modifiers:',Eol=Yes
    StdOutput ='/BOM   = write Byte Order Mark (valid only with UTF encodings)',Eol=Yes
    StdOutput ='/IGN   = omit characters not supported in output encoding',Eol=Yes
    StdOutput ='/QM    = replace characters not supported in output encoding with "?"',Eol=Yes
    StdOutput ='/HTML  = replace characters not supported in output encoding with HTML-entity',Eol=Yes
    StdOutput ='/TRANS = transliterate characters not supported in output encoding (default)',Eol=Yes
    RET
   ENDP WinEncList
↑ WinInfoText Direction, File, EncId, EncSt, Errors
Write the final information on console.
Input
Direction is ASCIIZ string "In" or "Out".
File is pointer to FILE object (InpFile or OutFile).
Encoding is the text encoding identifier (0..65535).
Errors is number of input or output errors.
Output
-
Invoked by
WinMain
WinInfoText Procedure Direction, File, EncId, EncSt, Errors
    MOV EBX,[%File]
    LEA EDX,[EBX+FILE.Name]
    StdOutput [%Direction],='put file:     "',EDX,='"', Eol=Yes
    MOV EAX,[EBX+FILE.Size]
    StoD TempString
    XOR EAX,EAX
    STOSD
    StdOutput [%Direction],='put size:      ',TempString, Eol=Yes
    MOV EAX,[%EncId]
    MOV EDI,EDX
    MOV EBX,EAX
    StoD
    XOR EAX,EAX
    STOSD
    StdOutput [%Direction],='put encoding:  CP',EDX, =', "'
    MOV EAX,EBX
    MOV ECX,[CodePages]
    MOV EDI,[CPid]
    REPNE SCASW
    DEC EDI,EDI
    SUB EDI,[CPid]
    MOV EDX,[CPname]
    ADD EDX,EDI
    MOVZXW EDX,[EDX]
    ADD EDX,[CPinfo]
    StdOutput EDX
    MOV EBX,[%EncSt]
    JNSt EBX,encStBom,.20:
    StdOutput ="/BOM"
.20:JNSt EBX,encStIgn,.30:
    StdOutput ="/IGN"
.30:JNSt EBX,encStQm, .40:
    StdOutput ="/QM"
.40:JNSt EBX,encStHtm,.50:
    StdOutput ="/HTM"
.50:JNSt EBX,encStHtml,.60:
    StdOutput ="/HTML"
.60:MOV EDX,[CPrem]
    ADD EDX,EDI
    MOVZXW EDX,[EDX]
    ADD EDX,[CPinfo]
    StdOutput ='", ',EDX,
    JNSt EBX,encStOem,.70:
    StdOutput ='", ',="(OEM),"
.70:JNSt EBX,encStAnsi,.80:
    StdOutput ='", ',="(ANSI),"
.80:JNSt EBX,encStAuto,.90:
    StdOutput =' (autodetected)'
.90:StdOutput Eol=Yes
    MOV EAX,[%Errors]
    StoD TempString
    XOR EAX,EAX
    STOSD
    StdOutput [%Direction],='put errors:    ',TempString, Eol=Yes
   EndProcedure WinInfoText
WinGuiData
contains static data used in Windows graphic (interactive) variant of EuroConvertor.
See also
constants and flag names defined in DosHeader's HEAD division.

%ControlList %SET \ Enumeration of common controls names.
InpExploreBtn,    \ Button [Explore].
InpEdit,          \ Field for input file name.
InpAutodetectBtn, \ Button [Autodetect].
InpEncBox,        \ Selection box for input encoding.
InpStIgnRadio,    \ Radiobutton to ignore entities.
InpStHtmRadio,    \ Radiobutton to convert non-ASCII entites.
InpStHtmlRadio,   \ Radiobutton to convert all entities.
OutExploreBtn,    \ Button [Explore].
OutEdit,          \ Field for output file name.
OutEncBox,        \ Selection box for output encoding.
OutStOemBtn,      \ Button [OEM].
OutStAnsiBtn,     \ Button [ANSI].
OutStTranslRadio, \ Radiobutton to transliterate.
OutStHtmlRadio,   \ Radiobutton to convert to entity.
OutStQmRadio,     \ Radiobutton to replace with ?.
OutStIgnRadio,    \ Radiobutton to ignore.
OutStBomCheck,    \ Checkbox to prefix output BOM.
CmdEdit,          \ Field for command line.
CmdConvertBtn,    \ Button [Convert].
CmdQuitBtn,       \ Button [Quit].
;
; Numeric identifiers of common controls.
 %id %SETA WM_APP  ; Bias of WM_COMMAND identifiers to avoid collision with system ids.
ctrl %FOR %ControlList
       %id %SETA %id+1
       id%ctrl EQU %id
     %ENDFOR ctrl
idStatusBar    EQU %id+1
[.bss] ; Static data of GUI variant.
hMainWindow       D DWORD ; Handle of the main window.
hStatusBar        D DWORD ; Handle of the status strip at the bottom.
; Array with handles of common controls window.
hBegin: ; Pointer to the first common control window handle.
ctrl %FOR %ControlList
       h%ctrl D DWORD
     %ENDFOR ctrl
hEnd:   ; Pointer behind the last common control window handle.
; Array with original versions of common control WndProc.
; Arrays PrevProcBegin..PrevProcEnd and hBegin..hEnd are synchronized.
PrevProcBegin: ; Pointer to the first common control original WndProc.
ctrl %FOR %ControlList
       PrevProc%ctrl D DWORD
     %ENDFOR ctrl
     D DWORD ; Behind the last common control original WndProc.
;  Windows GUI structures.
WndClassEx   DS WNDCLASSEX   ; Definition of the window class structure.
Msg          DS MSG          ; Window message.
StartupInfo  DS STARTUP_INFO ; Process properties.
ProcessInfo  DS PROCESS_INFORMATION
InpFileDlg   DS OPENFILENAME
OutFileDlg   DS OPENFILENAME
[.text]
WinGui, hWnd, uMsg, wParam, lParam
Entry point of Windows Graphic User Interface.
Invoked by
WinMain.
Calls
WinCreate, WinUpdate.
WinGui Procedure
    CALL WinCreate
    CALL WinUpdate
    WinAPI ShowWindow, [hMainWindow], SW_SHOWNORMAL
    WinAPI UpdateWindow, [hMainWindow]
    MOV EBX,[hInpEncBox]
    MOV EDX,[AnsiEncId]
    MOV [InpEncId],EDX
    CALL WinSetEncBox ; Set ANSI as a default input encoding.
    MOV EBX,[hOutEncBox]
    MOV EDX,65001
    MOV [OutEncId],EDX
    CALL WinSetEncBox ; Set UFT-8 as a default output encoding.
    WinAPI UpdateWindow, [hMainWindow]
.MsgLoop:
     WinAPI GetMessage, Msg,0,0,0
     TEST EAX
     JZ .MsgQuit:                 ; ZF signalises message WM_QUIT - request for program termination.
     WinAPI TranslateMessage, Msg ; Remap character keys from national keyboards.
     WinAPI DispatchMessage,  Msg ; Let Windows call our WinProc.
     JMP .MsgLoop:                ; Wait for another message.
.MsgQuit:
     TerminateProgram [Errorlevel]
   EndProcedure WinGui
WinProc, hWnd, uMsg, wParam, lParam
This is a callback procedure which receives and handles messages for the program main GUI window.
Involved messages are dispatched to their handlers.
Ignored messages are passed to DefWindowProc.
As the main program window is completely painted by common controls (windows of class "STATIC","BUTTON","LISTBOX","EDIT"), WinProc doesn't have to handle WM_PAINT, WM_SIZE etc.
Handler Input
EAX=uMsg, EBX=hWnd, ESI=wParam, EDI=lParam.
Handler Output
EAX=0 if the message was completely processed by the handler. Otherwise the message is processed by WinAPI DefWindowProc and EAX outputs its return value.
EBX,ECX,EDX,ESI,EDI may be destroyed in the handlers.
Invoked by
WinAPI DispatchMessage.
WinProc Procedure hWnd, uMsg, wParam, lParam
     MOV EBX,[%hWnd]
     MOV EAX,[%uMsg]
     MOV ESI,[%wParam]
     MOV EDI,[%lParam] ; Load msg attributes to registers for handler's convenience.
    ; Fork message uMsg=EAX to its handler using macro Dispatch:
     Dispatch EAX, WM_COMMAND, WM_DESTROY
.Def:WinAPI DefWindowProc,[%hWnd],[%uMsg],[%wParam],[%lParam]  ; Ignored events pass to DefWindowProc.
     JMP .Ret:,DIST=NEAR  ; Go to EndProcedure WinProc with value EAX returned from DefWindowProc.
     ; Message handlers terminate with a jump to label .Def: or .Ret0:.
.WM_COMMAND:  ; User clicked on some common control. (LOWORD) wParam specifies which one.
     MOV EAX,0xFFFF
     AND EAX,ESI
 ctrl %FOR %ControlList
       CMP EAX,id%ctrl
       JE .id%ctrl
      %ENDFOR ctrl
     JMP .Def:

.idInpStIgnRadio:
.idInpStHtmRadio:
.idInpStHtmlRadio:
.idOutStBomCheck:
.idOutStTranslRadio:
.idOutStHtmlRadio:
.idOutStQmRadio:
.idOutStIgnRadio:
.EN_SETFOCUS:
.EN_KILLFOCUS:
.Update:CALL WinUpdate
     JMP .Ret0:

.idInpExploreBtn: ; Ctrl-O or [Explore] was pressed.
     WinAPI GetOpenFileName,InpFileDlg
     WinAPI SetWindowText,[hInpEdit],InpFile.Name
     JMP .Update:

.idInpAutodetectBtn:
    WinAPI SendMessage,[hInpEdit],WM_GETTEXTLENGTH,0,0
    TEST EAX
    JNZ .Auto:
 .0:WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Select the input file first.'
    JMP .Ret0:
 .Auto:
    WinAPI SendMessage,[hInpEdit],WM_GETTEXT,MAX_PATH_SIZE,InpFile.Name
    MOVB [EAX+InpFile.Name],0
    WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Detecting input encoding, please wait...'
    MOV ESI,InpFile.Name
    GetLength$ ESI
    StripQuotes ESI,ECX
    JECXZ .0:
    FileAssign  InpFile,ESI,Size=ECX
    FileMapOpen InpFile
    JC .0:
    MOV [DetectSize],EAX
    CMP EAX,WinBlockSize
    JBE .7:
    MOV [DetectSize],WinBlockSize
 .7:ADD EAX,ESI
    MOV [InpBegin],ESI
    MOV [InpEnd],EAX
    MOV EDX,EAX
    CALL WinAutodetect:
    FileClose InpFile
    JMP .Update:

.idOutStAnsiBtn:MOV EDX,[AnsiEncId]
    JMP .SetEnc:
.idOutStOemBtn:MOV EDX,[OemEncId]
 .SetEnc:MOV [OutEncId],EDX
    MOV EBX,[hOutEncBox]
    CALL WinSetEncBox:
    WinAPI SetFocus,[hOutEncBox]
    JMP .Update:

.idOutExploreBtn:  ; Ctrl-S or [Explore] was pressed.
     WinAPI GetSaveFileName,OutFileDlg
     WinAPI SetWindowText,[hOutEdit],OutFile.Name
     JMP .Update:

.idInpEdit:
.idOutEdit:
.idCmdEdit:
    SHR ESI,16
    Dispatch ESI, EN_SETFOCUS, EN_KILLFOCUS
    JMP .Def:
.idInpEncBox:
.idOutEncBox:
     SHR ESI,16
     CMP ESI,LBN_SELCHANGE
     JNE .Def:
     JMP .Update:

.idCmdConvertBtn:
    WinAPI SendMessage,[hCmdEdit],WM_GETTEXT,SIZE# Cmd$, Cmd$
    StdOutput Eol=Yes
    StdOutput Cmd$,Eol=Yes
    WinAPI GetStartupInfo,StartupInfo
    WinAPI CreateProcess,0,Cmd$,0,0,0,0,0,0,StartupInfo,ProcessInfo
    WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Converting, please wait...'
    WinAPI Sleep,999
.idCmdQuitBtn:
.WM_DESTROY: ; GUI program terminates.
     WinAPI PostQuitMessage,0     ; Tell Windows to quit this program with errorlevel 0.
    ; JMP .Ret0:
.Ret0:XOR EAX,EAX
.Ret: MOV [%ReturnEAX],EAX
    EndProcedure WinProc
WinCtrlProc, hWnd, uMsg, wParam, lParam
This is a callback procedure which receives and handles messages for common-controls window.
It only handles the key VK_ESCAPE (quit), VK_TAB (focus change) and WM_SYSKEYDOWN (accelerators), otherwise it invokes the original WinProc of the common control, which was saved in WinProc.WM_CREATE.
Invoked by
WinAPI DispatchMessage.
WinCtrlProc Procedure hWnd, uMsg, wParam, lParam
     ; Find the corresponding original WndProc offset and put it to EDI.
     MOV ESI,[%wParam]
     MOV EDI,hBegin
     MOV ECX,(hEnd-hBegin)/4
     MOV EAX,[%hWnd]
     REPNE SCASD
     JE .Found:
     MOV EDI,hMainWindow
     JMP .Def:
.Found:MOV EBX,EDI ; EBX now points to the next common control window handle in array hBegin..hEnd.
     ADD EDI,PrevProcBegin-hBegin-4 ; EDI now points to the common control original own WndProc.
     MOV EAX,[%uMsg]
     Dispatch EAX,WM_KEYDOWN,WM_SYSKEYDOWN
     JMP .Def:
.WM_KEYDOWN:
     Dispatch ESI,VK_TAB,VK_ESCAPE
.Def:WinAPI CallWindowProc,[EDI],[%hWnd],[%uMsg],[%wParam],[%lParam] ; Common control window's own WndProc is in EDI.
     JMP .Ret:
.WM_SYSKEYDOWN: ; Accelerator key character together with Alt was pressed.
     OR ESI,'X'^'x' ; Convert the character to lower case.
     Dispatch ESI,'e','f','i','a','1','2','3','s','x','o','n','m','5','6','7','8','b','d','c','q'
     JMP .Def:
.e:  MOV EBX,hInpExploreBtn
     JMP .Focus:
.f:  MOV EBX,hInpEdit
     JMP .Focus:
.i:  MOV EBX,hInpEncBox
     JMP .Focus:
.a:  MOV EBX,hInpAutodetectBtn
     JMP .Focus:
.1:  MOV EBX,hInpStIgnRadio
     JMP .Focus:
.2:  MOV EBX,hInpStHtmRadio
     JMP .Focus:
.3:  MOV EBX,hInpStHtmlRadio
     JMP .Focus:
.s:  MOV EBX,hOutEdit
     JMP .Focus:
.x:  MOV EBX,hOutExploreBtn
     JMP .Focus:
.o:  MOV EBX,hOutEncBox
     JMP .Focus:
.n:  MOV EBX,hOutStAnsiBtn
     JMP .Focus:
.m:  MOV EBX,hOutStOemBtn
     JMP .Focus:
.5:  MOV EBX,hOutStTranslRadio
     JMP .Focus:
.6:  MOV EBX,hOutStHtmlRadio
     JMP .Focus:
.7:  MOV EBX,hOutStQmRadio
     JMP .Focus:
.8:  MOV EBX,hOutStIgnRadio
     JMP .Focus:
.b:  MOV EBX,hOutStBomCheck
     JMP .Focus:
.d:  MOV EBX,hCmdEdit
     JMP .Focus:
.c:  MOV EBX,hCmdConvertBtn
     JMP .Focus:
.q:  MOV EBX,hCmdQuitBtn
     JMP .Focus:
.VK_ESCAPE: ; Quit the program.
     WinAPI SendMessage,[hMainWindow],WM_DESTROY,0,0
     JMP .Ret0:
.VK_TAB: ; Move focus to other common control window.
     WinAPI GetAsyncKeyState,VK_SHIFT
     SAL AX,1
     JNC .NoShift:
     ; Shift-TAB was pressed, cycle backward to previous window (EBX-8).
     SUB EBX,8 ; EBX now points to the previous window handle.
     CMP EBX,hBegin:
     JAE .NoShift:
     MOV EBX,hEnd-4 ; Cycle from the top window to the bottom.
.NoShift: ; Shift was not pressed, move focus to the next window (EBX+0).
     CMP EBX,hEnd
     JB .Focus:
     MOV EBX,hBegin:  ; Cycle from the bottom window to the topmost.
.Focus: ; Set focus to window whose handle is addressed by EBX.
     WinAPI SetFocus,[EBX]
.Ret0:SUB EAX,EAX
.Ret:MOV [%ReturnEAX],EAX
    EndProcedure WinCtrlProc
↑ WinCreate
Register class of the main GUI program window and create all common control windows as children of [hMainWindow].
Input
-
Output
-
Called by
WinGui
WinCreate PROC ; Main window.
    MOV [WndClassEx.cbSize],SIZE# WNDCLASSEX
    MOV [WndClassEx.lpszClassName],WndClassName
    MOV [WndClassEx.style],CS_HREDRAW|CS_VREDRAW
    MOV [WndClassEx.lpfnWndProc],WinProc
    WinAPI GetModuleHandle,0
    MOV [WndClassEx.hInstance],EAX
    WinAPI LoadIcon,[WndClassEx.hInstance],="#1" ; PROGRAM IconFile= property is registerred as Nr.1.
    MOV [WndClassEx.hIcon],EAX
    MOV [WndClassEx.hbrBackground],COLOR_BTNFACE +1
    WinAPI RegisterClassEx,WndClassEx
    ; Main window.
    WinAPI CreateWindowEx, WS_EX_CLIENTEDGE,                        \
           WndClassName, WndClassName, WS_OVERLAPPEDWINDOW,         \
           CW_USEDEFAULT,CW_USEDEFAULT,660,786, \
           0, 0, [WndClassEx.hInstance], 0
    MOV [hMainWindow],EAX
   ; EuroConv icon.
    WinAPI CreateWindowEx,0,="STATIC",="#1",WS_CHILD+WS_VISIBLE+SS_ICON,\
           12,0,32,32,[hMainWindow],0,[WndClassEx.hInstance],0
    WinAPI SendMessage,EAX,STM_SETICON,[WndClassEx.hIcon],0
    ; Title
    WinAPI CreateWindowEx,0,="STATIC",InfoText,\
           WS_CHILD+WS_VISIBLE,\
           50,10,540,20,[hMainWindow],0,[WndClassEx.hInstance],0
   ; Input form.
    WinAPI CreateWindowEx,0,='STATIC',0, \
           WS_CHILD+WS_VISIBLE+SS_BLACKFRAME,\
           10,30,620,274,[hMainWindow],0,[WndClassEx.hInstance],0
    ; Initialize input Explore dialogue, see OPENFILENAME.
    MOV [InpFileDlg.lStructSize],SIZE# OPENFILENAME
    MOV EAX,[hMainWindow]
    MOV [InpFileDlg.hwndOwner],EAX
    MOV [InpFileDlg.lpstrFile],InpFile.Name
    MOV [InpFileDlg.nMaxFile],SIZE# InpFile.Name
    MOV [InpFileDlg.Flags],OFN_FILEMUSTEXIST
    WinAPI CreateWindowEx,0,="STATIC",="Input &file name to open",WS_CHILD+WS_VISIBLE ,\
          50,42,210,20,[hMainWindow],0,[WndClassEx.hInstance],0
    ; Input button [Explore].
    WinAPI CreateWindowEx,0,='BUTTON',="&Explore",WS_CHILD+WS_VISIBLE+BS_DEFPUSHBUTTON,\
           500,36,120,20,[hMainWindow],idInpExploreBtn,[WndClassEx.hInstance],0
    MOV [hInpExploreBtn],EAX
    ; Edit input file name.
    WinAPI CreateWindowEx,0,='EDIT',InpFile.Name,WS_CHILD+WS_VISIBLE+WS_BORDER+ES_AUTOHSCROLL, \
           20,60,600,20,[hMainWindow],idInpEdit,[WndClassEx.hInstance],0
    MOV [hInpEdit],EAX
    WinAPI SendMessage,EAX,EM_SETLIMITTEXT,SIZE# InpFile.Name,0
    WinAPI CreateWindowEx,0,="STATIC",="&Input file encoding",WS_CHILD+WS_VISIBLE,\
           50,91,210,20,[hMainWindow],0,[WndClassEx.hInstance],0
    ; Input button [Autodetect].
    WinAPI CreateWindowEx,0,='BUTTON',="&Autodetect",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\
           500,84,120,20,[hMainWindow],idInpAutodetectBtn,[WndClassEx.hInstance],0
   MOV [hInpAutodetectBtn],EAX
    ; Listbox for selection of input encoding.
    WinAPI CreateWindowEx,0,='LISTBOX',0,WS_CHILD+WS_VISIBLE+WS_BORDER+WS_VSCROLL+WS_HSCROLL+LBS_NOTIFY,\
           20,109,600,140,[hMainWindow],idInpEncBox,[WndClassEx.hInstance],0
    MOV [hInpEncBox],EAX
    MOV EDI,EAX
    CALL WinFillEncBox
    ; Radiobuttons for input HTML entities.
    WinAPI CreateWindowEx,0,='BUTTON',="&1 Ignore all HTML entities.",\
            WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON+WS_GROUP,\
            50,240,440,20,[hMainWindow],idInpStIgnRadio,[WndClassEx.hInstance],0
    MOV [hInpStIgnRadio],EAX
    WinAPI SendMessage,EAX,BM_SETCHECK,1,0  ; Use "Ignore all" as default selection.
    WinAPI CreateWindowEx,0,='BUTTON',="&2 Ignore ASCII HTML entities && &< &> &".",\
           WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\ +SS_NOPREFIX,\ With SS_NOPREFIX it doesn't show any text.
            50,260,440,20,[hMainWindow],idInpStHtmRadio,[WndClassEx.hInstance],0
    MOV [hInpStHtmRadio],EAX
    WinAPI CreateWindowEx,0,='BUTTON',="&3 Convert all HTML entities.",\
           WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\
            50,280,440,20,[hMainWindow],idInpStHtmlRadio,[WndClassEx.hInstance],0
    MOV [hInpStHtmlRadio],EAX
   ; Output form.
    WinAPI CreateWindowEx,0,='STATIC',0, \
           WS_CHILD+WS_VISIBLE+WS_GROUP+SS_BLACKFRAME,\
           10,320,620,294,[hMainWindow],0,[WndClassEx.hInstance],0
    WinAPI CreateWindowEx,0,="STATIC",="Output file name to &save",WS_CHILD+WS_VISIBLE ,\
          50,332,210,20,[hMainWindow],0,[WndClassEx.hInstance],0
    ; Initialize output Explore dialogue, see OPENFILENAME.
    MOV [OutFileDlg.lStructSize],SIZE# OPENFILENAME
    MOV EAX,[hMainWindow]
    MOV [OutFileDlg.hwndOwner],EAX
    MOV [OutFileDlg.lpstrFile],OutFile.Name
    MOV [OutFileDlg.nMaxFile],SIZE# OutFile.Name
    MOV [OutFileDlg.Flags],OFN_OVERWRITEPROMPT
    ; Output button [Explore].
    WinAPI CreateWindowEx,0,='BUTTON',="E&xplore",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\
           500,326,120,20,[hMainWindow],idOutExploreBtn,[WndClassEx.hInstance],0
    MOV [hOutExploreBtn],EAX
    ; Edit output file name.
    WinAPI CreateWindowEx,0,='EDIT',OutFile.Name,WS_CHILD+WS_VISIBLE+WS_BORDER+ES_AUTOHSCROLL, \
           20,350,600,20,[hMainWindow],idOutEdit,[WndClassEx.hInstance],0
    MOV [hOutEdit],EAX
    WinAPI SendMessage,EAX,EM_SETLIMITTEXT,SIZE# OutFile.Name,0
    WinAPI CreateWindowEx,0,="STATIC",="&Output file encoding",WS_CHILD+WS_VISIBLE,\
           50,381,210,20,[hMainWindow],0,[WndClassEx.hInstance],0
    ; Output button [ANSI].
    WinAPI CreateWindowEx,0,='BUTTON',="A&NSI",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\
           360,378,120,20,[hMainWindow],idOutStAnsiBtn,[WndClassEx.hInstance],0
    MOV [hOutStAnsiBtn],EAX
    ; Output button [OEM].
    WinAPI CreateWindowEx,0,='BUTTON',="OE&M",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\
           500,378,120,20,[hMainWindow],idOutStOemBtn,[WndClassEx.hInstance],0
    MOV [hOutStOemBtn],EAX
    ; Listbox for selection of output encoding.
    WinAPI CreateWindowEx,0,='LISTBOX',0,WS_CHILD+WS_VISIBLE+WS_BORDER+WS_VSCROLL+WS_HSCROLL+LBS_NOTIFY,\
           20,399,600,140,[hMainWindow],idOutEncBox,[WndClassEx.hInstance],0
    MOV [hOutEncBox],EAX
    MOV EDI,EAX
    CALL WinFillEncBox
    WinAPI SendMessage,EDI,LB_SETCURSEL,0,0  ; Use 0-th option as default selection.
    ; Radiobuttons for invalid output characters.
    WinAPI CreateWindowEx,0,='BUTTON',="&5 Transliterate invalid characters to ASCII Latin.",\
            WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON+WS_GROUP,\
            50,530,420,20,[hMainWindow],idOutStTranslRadio,[WndClassEx.hInstance],0
    MOV [hOutStTranslRadio],EAX
    WinAPI SendMessage,EAX,BM_SETCHECK,1,0  ; Use "Transliterate" as default selection.
    WinAPI CreateWindowEx,0,='BUTTON',="&6 Convert invalid characters to HTML entities.",\
           WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\
            50,550,420,20,[hMainWindow],idOutStHtmlRadio,[WndClassEx.hInstance],0
    MOV [hOutStHtmlRadio],EAX
    WinAPI CreateWindowEx,0,='BUTTON',="&7 Replace invalid characters with '?'.",\
            WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\
            50,570,420,20,[hMainWindow],idOutStQmRadio,[WndClassEx.hInstance],0
    MOV [hOutStQmRadio],EAX
    WinAPI CreateWindowEx,0,='BUTTON',="&8 Ignore invalid characters.",\
           WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\
            50,590,420,20,[hMainWindow],idOutStIgnRadio,[WndClassEx.hInstance],0
    MOV [hOutStIgnRadio],EAX
    ; Output checkbox [Use BOM in UTF].
    WinAPI CreateWindowEx,0,='BUTTON',="Use &BOM in UTF",WS_CHILD+WS_VISIBLE+BS_AUTOCHECKBOX+WS_GROUP,\
           460,590,140,20,[hMainWindow],idOutStBomCheck,[WndClassEx.hInstance],0
   MOV [hOutStBomCheck],EAX
   ; Command form.
    WinAPI CreateWindowEx,0,='STATIC',0, \
           WS_CHILD|WS_VISIBLE+WS_GROUP+SS_BLACKFRAME,\
           10,630,620,84,[hMainWindow],0,[WndClassEx.hInstance],0
    WinAPI CreateWindowEx,0,="STATIC",="Comman&d",WS_CHILD+WS_VISIBLE ,\
          50,642,210,20,[hMainWindow],0,[WndClassEx.hInstance],0
    ; Edit the command.
    WinAPI CreateWindowEx,0,='EDIT',Cmd$,WS_CHILD+WS_VISIBLE+WS_BORDER+ES_AUTOHSCROLL, \
           20,660,600,20,[hMainWindow],idCmdEdit,[WndClassEx.hInstance],0
    MOV [hCmdEdit],EAX
    WinAPI SendMessage,EAX,EM_SETLIMITTEXT,SIZE# Cmd$,0
    ; Command button [Quit].
    WinAPI CreateWindowEx,0,='BUTTON',="&Quit",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\
           20,684,120,20,[hMainWindow],idCmdQuitBtn,[WndClassEx.hInstance],0
    MOV [hCmdQuitBtn],EAX
    ; Command button [Convert].
    WinAPI CreateWindowEx,0,='BUTTON',="&Convert",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\
           500,684,120,20,[hMainWindow],idCmdConvertBtn,[WndClassEx.hInstance],0
    MOV [hCmdConvertBtn],EAX
    ; Status bar.
    WinAPI CreateStatusWindow,WS_CHILD+WS_BORDER+WS_VISIBLE,='EuroConv',[hMainWindow],idStatusBar
    MOV [hStatusBar],EAX
    WinAPI SendMessage,EAX,SB_SIMPLE,1,0  ; Tell the status bar to be simple.
    WinAPI SetFocus,[hInpExploreBtn]
    ; Common control child windows are now created.
    ; Let's hook their WndProc and replace it with WinCtrlProc, which will process VK_TAB.
 ctrl %FOR %ControlList
        WinAPI SetWindowLong,[h%ctrl], GWL_WNDPROC, WinCtrlProc
        MOV [PrevProc%ctrl],EAX
      %ENDFOR ctrl
    RET
   ENDP WinCreate
↑ WinFillEncBox
Fill the listbox EDI with all supported encoding names and remarks.
Input
EDI=[hInpEncBox] or [hOutEncBox]
Output
EAX,ECX,EDX are undefined.
Called by
WinCreate
WinFillEncBox PROC ; Fill the listbox EDI with all supported encoding names+remarks.
    WinAPI SendMessage,EDI, LB_SETHORIZONTALEXTENT,400,0
 opt %FOR 'CP20127, "ASCII"',   \
          'CP1200,  "UTF-16LE"',\
          'CP1201,  "UTF-16BE"',\
          '                "UTF-16" (endianess will be autodetected)',\
          'CP12000, "UTF-32LE"',\
          'CP12001, "UTF-32BE"',\
          '                "UTF-32" (endianess will be autodetected)',\
          'CP65001, "UTF-8"'
       WinAPI SendMessage,EDI, LB_ADDSTRING,0,=%opt
     %ENDFOR opt
     SUB EDX,EDX ; Encoding index 0..[CodePages].
 .10:MOV EAX,[CPid]
     MOVZXW EAX,[EAX+2*EDX]
     Dispatch AX,20127,65001,1200,1201,12000,12001
     PUSH EDI
      MOV EDI,TempString
      MOVW [EDI],'CP'
      INC EDI,EDI
      StoD EDI
      MOV AX,', '
      STOSW
      MOV AL,'"'
      STOSB
      MOV EBX,[CPname]
      MOVZXW EBX,[EBX+2*EDX]
      MOV ESI,[CPinfo]
      ADD ESI,EBX
      GetLength$ ESI
      REP MOVSB
      MOV AX,'" '
      STOSW
      MOV EBX,[CPrem]
      MOVZXW EBX,[EBX+2*EDX]
      MOV ESI,[CPinfo]
      ADD ESI,EBX
      GetLength$ ESI
      REP MOVSB
      SUB EAX,EAX
      STOSB
     POP EDI
     PUSH EDX
      WinAPI SendMessage,EDI, LB_ADDSTRING,0,TempString
     POP EDX
.20127:
.65001:
.1200:
.1201:
.12000:
.12001:
    INC EDX
    CMP EDX,[CodePages]
    JB .10:
    RET
   ENDP WinFillEncBox
↑ WinUpdate
Update the main window after the user clicked on some of its common control.
Input
-
Output
All registers are preserved.
Invoked by
WinGui, WinProc.
WinUpdate PROC
   PUSHAD
    MOV EDI,Cmd$
    MOV ESI,='euroconv '
    MOV ECX,9
    REP MOVSB
    WinAPI SendMessage,[hInpEncBox],LB_GETCURSEL,0,0
    INC EAX
    JZ .10:  ; On LB_ERR=-1 use 0-th option.
    DEC EAX
.10:WinAPI SendMessage,[hInpEncBox],LB_GETTEXT,EAX,EDI
    MOV AL,'"'
    MOV ECX,32
    MOV EBX,EDI
    REPNE SCASB ; Find 1st quote.
    MOV ESI,EDI
    REPNE SCASB ; Find 2nd quote.
    DEC EDI
    MOV ECX,EDI
    SUB ECX,ESI
    MOV EDI,EBX
    REP MOVSB
    WinAPI SendMessage,[hInpStHtmRadio],BM_GETCHECK,0,0
    TEST EAX
    JZ .25:
    MOV ESI,="/HTM"
    MOV ECX,4
    REP MOVSB
.25:WinAPI SendMessage,[hInpStHtmlRadio],BM_GETCHECK,0,0
    TEST EAX
    JZ .30:
    MOV ESI,="/HTML"
    MOV ECX,5
    REP MOVSB
.30:MOV AL,' '
    STOSB
    WinAPI SendMessage,[hOutEncBox],LB_GETCURSEL,0,0
    INC EAX
    JZ .35:  ; On LB_ERR=-1 use 0-th option.
    DEC EAX
.35:WinAPI SendMessage,[hOutEncBox],LB_GETTEXT,EAX,EDI
    MOV AL,'"'
    MOV ECX,32
    MOV EBX,EDI
    REPNE SCASB ; Find 1st quote.
    MOV ESI,EDI
    REPNE SCASB ; Find 2nd quote.
    DEC EDI
    MOV ECX,EDI
    SUB ECX,ESI
    MOV EDI,EBX
    REP MOVSB
    WinAPI SendMessage,[hOutStBomCheck],BM_GETCHECK,0,0
    TEST EAX
    JZ .48:
    MOV ESI,="/BOM"
    MOV ECX,4
    REP MOVSB
.48:WinAPI SendMessage,[hOutStHtmlRadio],BM_GETCHECK,0,0
    TEST EAX
    JZ .50:
    MOV ESI,="/HTML"
    MOV ECX,5
    REP MOVSB
.50:WinAPI SendMessage,[hOutStQmRadio],BM_GETCHECK,0,0
    TEST EAX
    JZ .55:
    MOV ESI,="/QM"
    MOV ECX,3
    REP MOVSB
.55:WinAPI SendMessage,[hOutStIgnRadio],BM_GETCHECK,0,0
    TEST EAX
    JZ .60:
    MOV ESI,="/IGN"
    MOV ECX,4
    REP MOVSB
.60:MOV AL,' '
    STOSB
    WinAPI SendMessage,[hInpEdit],WM_GETTEXTLENGTH,0,0
    TEST EAX
    JNZ .65:
    MOV EAX,"NUL "
    STOSD
    JMP .70:
.65:MOV ESI,EAX
    MOV AL,'"'
    STOSB
    WinAPI SendMessage,[hInpEdit],WM_GETTEXT,MAX_PATH_SIZE,EDI
    ADD EDI,ESI
    MOV AX,'" '
    STOSW
.70:WinAPI SendMessage,[hOutEdit],WM_GETTEXTLENGTH,0,0
    TEST EAX
    JNZ .75:
    MOV EAX,"NUL "
    STOSD
    JMP .80:
.75:MOV ESI,EAX
    MOV AL,'"'
    STOSB
    WinAPI SendMessage,[hOutEdit],WM_GETTEXT,MAX_PATH_SIZE,EDI
    ADD EDI,ESI
    MOV AX,'" '
    STOSW
.80:MOV EAX,"/W" ; 5th argument tells console version to wait on [Enter] when it terminates.
    STOSD
    WinAPI SetWindowText,[hCmdEdit],Cmd$
    WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Select text file names and their encodings, then press [Convert].'
   POPAD
   RET
   ENDP WinUpdate
 ENDPROGRAM euroconv

▲Back to the top▲