This file euroconv.htm
is the source text in EuroAssembler of program
EuroConvertor, which provides conversion of text file encodings.
A single executable file euroconv.exe
works both in 16bit DOS and in 32bit or 64bit Windows.
The DOS version is assembled first and then linked as a MZ stub file to PE Windows version.
Static database of supported characters and conversion tables is shared between both versions.
EuroConvertor expects four arguments on its command line in fixed order:
euroconv.exe IBM852 UTF-8 input.txt output.txt
When the program is run without arguments (by the click in Explorer, for instance), it launches interactive GUI version, where the arguments can be selected from menu interactively.
If you are interrested in EuroConvertor and don't want to download the whole EuroAssembler necessary to compile its source, you can download the assembled binaryeuroconv.exetogether with more detailed manual from the website vit$oft freeware as the file euroconv.zip [42 KB].
..\objlib\winapi.libyet,
prowin32subdirectory with command
euroasm dll2lib.htm
.prowin32subdirectory with command
euroasm euroconv.htm
.euroconv InputEncoding OutputEncoding InputFile OutputFile
Header defines format of DOS program (MZ stub created as eurocond.exe
in the current directory).
It also specifies included macrolibraries, named constants, static data.
Order of segments is important in this program, [DATA] segment must be linked first because Windows version searches for the Unicode table directory at the beginning of its stub. That is why the segment map is explicitly specified here.
Segment [DATA] hosts two databases implemented as sections,
each section contains one array of bytes, words or dwords:
1. UCP database with codepoints of Unicode characters from
Basic Multilingual Plane (BMP) and their properties: CodePoint value,
character category and its relevance in text, Latin transliteration,
corresponding HTML entity.
2. CP database with code page encodings supported by EuroConv,
and their properties: CP identifier, standard and alternative name,
authoritative URL where it was defined, translation table.
Header also defines some constants and structures which are common for both DOS and Windows variant.
EUROASM Unicode=Off eurocond PROGRAM Format=MZ, Model=Compact, Width=16, Entry=DosMain ; Includes macro libraries from maclib directory: INCLUDEHEAD1 doss.htm, dosapi.htm, stdcal16.htm, status16.htm, cpuext.htm, cpuext16.htm, string16.htm ; Desired segment map : [DATA] SEGMENT PURPOSE=DATA ; Static data segment. [CodePoint] ; Array of WORDs with codepoint value of character. [Relevance] ; Array of BYTEs with signed values of probability that this char.category appears in text. [Translit] ; Array of DWORDs with transliterated ASCII characters, NUL padded. [EntVal4] ; array of WORDs with codepoint values of HTML entities which have 1..4 characters. [EntName4] ; Array of DWORDs with HTML entities, which have 1..4 characters. NUL padded. [EntVal8] ; Array of WORDs with codepoint values of HTML entities which have 5..8 characters. [EntName8A] ; Array of DWORDs with first four characters of entities which have 5..8 characters. [EntName8B] ; Array of DWORDs with fifth to eighth characters of entities which have 5..8 characters. NUL padded. [CPid] ; Array of WORDs with codepage identifier in MS assignment. [CPname] ; Array of WORDs with offsets of CP display name in [CPinfo] section. [CPrem] ; Array of WORDs with offsets of CP remark in [CPinfo] section. [CPurl] ; Array of WORDs with offsets of CP URL in [CPinfo] section. [CPtable] ; Array of WORDS with offsets of 8bit translation tables in [CPtt] section. [CPinfo] ; Zero-terminated byte strings with codepage names, remarks, URLs. [CPtt] ; Translation tables of OEM/ANSI encodings. Each table has 128 WORDs. [CODE] SEGMENT PURPOSE=CODE ; Program code segment. [INPUT] SEGMENT PURPOSE=BSS ; Input file read area. D DataBlockSize * BYTE ; Reserve 48 KB of uninitialized space. [OUTPUT] SEGMENT PURPOSE=BSS ; Output file write area. D DataBlockSize * BYTE ; Reserve 48 KB of uninitialized space. [STACK] SEGMENT PURPOSE=STACK ; Machine stack. D DosStackSize * WORD ; Reserve 16 KB of uninitialized space. [DATA] ; Switch back to data segment.
HEAD ; The following part of DosHeader will be included to Windows version of EuroConvertor, too.
; Constants: %Version %SET 20190330 ; It will be displayed by euroconv.exe /? or euroconv.exe --help. %Signature %SET TableDir ; 8 characters for TABLEDIR identification. Replacement EQU 0xFFFD ; Codepoint of unsupported character. DataBlockSize EQU 48K ; I/O file operation block size. Max.64K-4, DWORD aligned. DosStackSize EQU 8K ; Stack size reserved for DOS version. WORD aligned. ; Boolean encoding property names used as flags in InpEncSt and OutEncSt: encStTransl = 0x0000 ; Characters unsupported by output encoding are transliterated to visually similar ASCII (default). encStIgn = 0x0001 ; Characters unsupported by output encoding are ignored (omitted from output). encStQm = 0x0002 ; Characters unsupported by output encoding are replaced with question mark. encStHtm = 0x0004 ; HTML entities for UCP>127 are detected on input and encoded on output if not defined in CP. encStHtml = 0x0008 ; All HTML entities are detected on input and encoded on output if not defined in CP. encStOem = 0x0010 ; System default OEM encoding. encStAnsi = 0x0020 ; System default ANSI encoding. encStAscii = 0x0040 ; CodePage is ASCII (7bit). encStUtf = 0x0080 ; CodePage is Unicode (UTF). encStUtf8 = 0x0100 ; CodePage is UTF-8. encStUtf16 = 0x0200 ; CodePage is UTF-16. encStUtf32 = 0x0400 ; CodePage is UTF-32. encStLe = 0x0800 ; CodePage is Little Endian encoded. encStBe = 0x1000 ; CodePage is Big Endian encoded. encStBom = 0x2000 ; CodePage has BOM. encStAuto = 0x4000 ; Autodetection of enconding was requested. encStEnc = 0x8000 ; Display the list of all supported encodings. ; Character categories and their assigned relevance used for autodetection of input encoding: Bm = +32 ; Byte order mark (FEFF). Cc = -8 ; Other, control Cf = -4 ; Other, format Co = -6 ; Other, Private Use Fm = +4 ; Format control (LF,CR,TAB,space} Ll = +8 ; Letter, lowercase Lm = -2 ; Letter, modifier Lo = +1 ; Letter, other Lu = +7 ; Letter, uppercase Mn = -6 ; Mark, nonspacing Nd = +4 ; Number, decimal digit No = +2 ; Number, other Pd = +1 ; Punctuation, dash Pe = +1 ; Punctuation, close Pf = +1 ; Punctuation, final quote Pi = +1 ; Punctuation, initial quote Po = +1 ; Punctuation, other Ps = +1 ; Punctuation, open Sc = +1 ; Symbol, currency Sk = -5 ; Symbol, modifier Sm = -4 ; Symbol, math So = -5 ; Symbol, other Zs = +2 ; Separator, space ?? = -32 ; Not a valid character. TABLEDIR STRUC ; Section directory keeps addresses of database sections. .Signature D 8*B ; Random text used for TableDir identification. .CodePoint D W ; Offset of WORD array in section [CodePoint]. .Relevance D W ; Offset of BYTE array in section [Relevance]. .Translit D W ; Offset of DWORD array in section [Translit]. .CodePoints D W ; The number of supported codepoints, i.e. the length of previous arrays. .EntVal4 D W ; Offset of WORD array in section [EntVal4]. .EntName4 D W ; Offset of DWORD array in section [EntName4]. .Entities4 D W ; The number of supported HTML entities with 1..4 characters. .EntVal8 D W ; Offset of WORD array in section [EntVal8]. .EntName8A D W ; Offset of DWORD array in section [EntNamw8A]. .EntName8B D W ; Offset of DWORD array in section [EntNamw8B]. .Entities8 D W ; The number of supported HTML entities with 5..8 characters. .CPid D W ; Offset of WORD array in section [CPid]. .CPname D W ; Offset of WORD array in section [CPname]. .CPrem D W ; Offset of WORD array in section [CPrem]. .CPurl D W ; Offset of WORD array in section [CPurl]. .CPtable D W ; Offset of WORD array in section [CPtable]. .CPinfo D W ; Offset of ASCIIZ strings in section [CPinfo]. .CPtt D W ; Offset of 128*WORD blocks in section [CPtt]. .CodePages D W ; The number of supported encodings, i.e. the length of CP* arrays. ENDSTRUC TABLEDIR
ENDHEAD
Database of supported Unicode characters, their properties and conversion tables
is located in sections of segment [DATA]
.
Data are stored as synchronized arrays of bytes, words or dwords,
each array is in its own section. Such arrangement allows to search for an item with single instruction
REPNE SCAS
and retrieve other properties from the same line of their synchronized arrays.
Following arrays are mutually synchronized:
CodePoint with Relevance and Translit (their length is [TableDir.CodePoints]).
EntVal4 with EntName4 (their length is [TableDir.Entities4]).
EntVal8 with EntName8A and EntName8B (their length is [TableDir.Entities8]).
CPid with CPname, CPrem, CPurl,CPtable (their length is [TableDir.CodePages]).
At the very beginning of [DATA] segment resides a structured variable TableDir which specifies addresses of database sections. The sections are filled at assembly time by macros CP and UCP.
[DATA] TableDir DS TABLEDIR, \ Its members are defined statically at asm-time: .Signature= "%Signature", \ Random text used for TableDir identification. .CodePoint= SECTION# [CodePoint], \ Offset of WORD array in section [CodePoint]. .Relevance= SECTION# [Relevance], \ Offset of BYTE array in section [Relevance]. .Translit= SECTION# [Translit] , \ Offset of DWORD array in section [Translit]. .CodePoints=SIZE# [CodePoint] /2, \ The number of supported codepoints, i.e. the length of previous arrays. .EntVal4= SECTION# [EntVal4] , \ Offset of WORD array in section [EntVal4]. .EntName4= SECTION# [EntName4] , \ Offset of DWORD array in section [EntName4]. .Entities4= SIZE# [EntVal4] / 2 , \ The number of supported HTML entities with 1..4 characters. .EntVal8= SECTION# [EntVal8] , \ Offset of WORD array in section [EntVal8]. .EntName8A= SECTION# [EntName8A], \ Offset of DWORD array in section [EntNamw8A]. .EntName8B= SECTION# [EntName8B], \ Offset of DWORD array in section [EntNamw8B]. .Entities8= SIZE# [EntVal8] / 2 , \ The number of supported HTML entities with 5..8 characters. .CPid= SECTION# [CPid] , \ Offset of WORD array in section [CPid]. .CPname= SECTION# [CPname] , \ Offset of WORD array in section [CPname](offsets in CPinfo). .CPrem= SECTION# [CPrem] , \ Offset of WORD array in section [CPrem] (offsets in CPinfo). .CPurl= SECTION# [CPurl] , \ Offset of WORD array in section [CPurl] (offsets in CPinfo). .CPtable= SECTION# [CPtable] , \ Offset of WORD array in section [CPtable] (offsets in CPtt). .CPinfo= SECTION# [CPinfo] , \ Offset of ASCIIZ strings with name, rem and url in section [CPinfo]. .CPtt= SECTION# [CPtt] , \ Offset of 256*BYTE blocks in section [CPtt]. .CodePages= SIZE# [CPid] / 2, \ The number of supported encodings, i.e. the length of CP* arrays. ;
UCP %MACRO CodePoint, Relevance, Translit, Entity [CodePoint] DW 0x%CodePoint [Relevance] DB %Relevance [Translit] DD %Translit + 0 %EntSize %SETS %Entity %IF %EntSize <= 4 && %EntSize >= 1 [EntVal4] DW 0x%CodePoint [EntName4] DD "%Entity" %ENDIF %IF %EntSize <=8 && %EntSize >=5 [EntVal8] DW 0x%CodePoint [EntName8A] DD "%Entity[1..4]" [EntName8B] DD "%Entity[5..8]" %ENDIF %IF %EntSize > 8 %ERROR HTML entity %Entity is too long. %ENDIF [DATA] %ENDMACRO UCP
; ╔codepoint value ; ║ ╔character category - relevance ; ║ ║ ╔transliteration to ASCII ; ║ ║ ║ ╔HTML entity UCP 0000, Cc, ' ', ; control NUL UCP 0001, Cc, ' ', ; control SOH UCP 0002, Cc, ' ', ; control STX UCP 0003, Cc, ' ', ; control ETX UCP 0004, Cc, ' ', ; control EOT UCP 0005, Cc, ' ', ; control ENQ UCP 0006, Cc, ' ', ; control ACK UCP 0007, Cc, ' ', ; control BEL UCP 0008, Cc, ' ', ; control BS UCP 0009, Fm, ' ', tab ; control HT UCP 000A, Fm, ' ', newline ; control LF UCP 000B, Cc, ' ', ; control VT UCP 000C, Cc, ' ', ; control FF UCP 000D, Fm, ' ', ; control CR UCP 000E, Cc, ' ', ; control SO UCP 000F, Cc, ' ', ; control SI UCP 0010, Cc, ' ', ; control DLE UCP 0011, Cc, ' ', ; control DC1 UCP 0012, Cc, ' ', ; control DC2 UCP 0013, Cc, ' ', ; control DC3 UCP 0014, Cc, ' ', ; control DC4 UCP 0015, Cc, ' ', ; control NAK UCP 0016, Cc, ' ', ; control SYN UCP 0017, Cc, ' ', ; control ETB UCP 0018, Cc, ' ', ; control CAN UCP 0019, Cc, ' ', ; control EM UCP 001A, Cc, ' ', ; control SUB UCP 001B, Cc, ' ', ; control ESC UCP 001C, Cc, ' ', ; control FS UCP 001D, Cc, ' ', ; control GS UCP 001E, Cc, ' ', ; control RS UCP 001F, Cc, ' ', ; control US UCP 0020, Fm, ' ', ; SPACE UCP 0021, Po, '!', excl ; ! EXCLAMATION MARK UCP 0022, Po, '"', quot ; " QUOTATION MARK UCP 0023, Po, '#', num ; # NUMBER SIGN UCP 0024, Sc, '$', dollar ; $ DOLLAR SIGN UCP 0025, Po, '%', percnt ; % PERCENT SIGN UCP 0026, Po, '&', amp ; & AMPERSAND UCP 0027, Po, "'", apos ; ' APOSTROPHE UCP 0028, Ps, '(', lpar ; ( LEFT PARENTHESIS UCP 0029, Pe, ')', rpar ; ) RIGHT PARENTHESIS UCP 002A, Po, '*', ast ; * ASTERISK UCP 002B, Sm, '+', plus ; + PLUS SIGN UCP 002C, Po, ',', comma ; , COMMA UCP 002D, Pd, '-', ; - HYPHEN-MINUS UCP 002E, Po, '.', period ; . FULL STOP UCP 002F, Po, '/', sol ; / SOLIDUS UCP 0030, Nd, '0', ; 0 DIGIT ZERO UCP 0031, Nd, '1', ; 1 DIGIT ONE UCP 0032, Nd, '2', ; 2 DIGIT TWO UCP 0033, Nd, '3', ; 3 DIGIT THREE UCP 0034, Nd, '4', ; 4 DIGIT FOUR UCP 0035, Nd, '5', ; 5 DIGIT FIVE UCP 0036, Nd, '6', ; 6 DIGIT SIX UCP 0037, Nd, '7', ; 7 DIGIT SEVEN UCP 0038, Nd, '8', ; 8 DIGIT EIGHT UCP 0039, Nd, '9', ; 9 DIGIT NINE UCP 003A, Po, ':', colon ; : COLON UCP 003B, Po, ';', semi ; ; SEMICOLON UCP 003C, Sm, '<', lt ; < LESS-THAN SIGN UCP 003D, Sm, '=', equals ; = EQUALS SIGN UCP 003E, Sm, '>', gt ; > GREATER-THAN SIGN UCP 003F, Po, '?', quest ; ? QUESTION MARK UCP 0040, Po, '@', commat ; @ COMMERCIAL AT UCP 0041, Lu, 'A', ; A LATIN CAPITAL LETTER A UCP 0042, Lu, 'B', ; B LATIN CAPITAL LETTER B UCP 0043, Lu, 'C', ; C LATIN CAPITAL LETTER C UCP 0044, Lu, 'D', ; D LATIN CAPITAL LETTER D UCP 0045, Lu, 'E', ; E LATIN CAPITAL LETTER E UCP 0046, Lu, 'F', ; F LATIN CAPITAL LETTER F UCP 0047, Lu, 'G', ; G LATIN CAPITAL LETTER G UCP 0048, Lu, 'H', ; H LATIN CAPITAL LETTER H UCP 0049, Lu, 'I', ; I LATIN CAPITAL LETTER I UCP 004A, Lu, 'J', ; J LATIN CAPITAL LETTER J UCP 004B, Lu, 'K', ; K LATIN CAPITAL LETTER K UCP 004C, Lu, 'L', ; L LATIN CAPITAL LETTER L UCP 004D, Lu, 'M', ; M LATIN CAPITAL LETTER M UCP 004E, Lu, 'N', ; N LATIN CAPITAL LETTER N UCP 004F, Lu, 'O', ; O LATIN CAPITAL LETTER O UCP 0050, Lu, 'P', ; P LATIN CAPITAL LETTER P UCP 0051, Lu, 'Q', ; Q LATIN CAPITAL LETTER Q UCP 0052, Lu, 'R', ; R LATIN CAPITAL LETTER R UCP 0053, Lu, 'S', ; S LATIN CAPITAL LETTER S UCP 0054, Lu, 'T', ; T LATIN CAPITAL LETTER T UCP 0055, Lu, 'U', ; U LATIN CAPITAL LETTER U UCP 0056, Lu, 'V', ; V LATIN CAPITAL LETTER V UCP 0057, Lu, 'W', ; W LATIN CAPITAL LETTER W UCP 0058, Lu, 'X', ; X LATIN CAPITAL LETTER X UCP 0059, Lu, 'Y', ; Y LATIN CAPITAL LETTER Y UCP 005A, Lu, 'Z', ; Z LATIN CAPITAL LETTER Z UCP 005B, Ps, '[', lbrack ; [ LEFT SQUARE BRACKET UCP 005C, Po, '\', bsol ; \ REVERSE SOLIDUS UCP 005D, Pe, ']', rbrack ; ] RIGHT SQUARE BRACKET UCP 005E, Sk, '^', hat ; ^ CIRCUMFLEX ACCENT UCP 005F, Pe, '_', lowbar ; _ LOW LINE UCP 0060, Sk, '`', grave ; ` GRAVE ACCENT UCP 0061, Ll, 'a', ; a LATIN SMALL LETTER A UCP 0062, Ll, 'b', ; b LATIN SMALL LETTER B UCP 0063, Ll, 'c', ; c LATIN SMALL LETTER C UCP 0064, Ll, 'd', ; d LATIN SMALL LETTER D UCP 0065, Ll, 'e', ; e LATIN SMALL LETTER E UCP 0066, Ll, 'f', ; f LATIN SMALL LETTER F UCP 0067, Ll, 'g', ; g LATIN SMALL LETTER G UCP 0068, Ll, 'h', ; h LATIN SMALL LETTER H UCP 0069, Ll, 'i', ; i LATIN SMALL LETTER I UCP 006A, Ll, 'j', ; j LATIN SMALL LETTER J UCP 006B, Ll, 'k', ; k LATIN SMALL LETTER K UCP 006C, Ll, 'l', ; l LATIN SMALL LETTER L UCP 006D, Ll, 'm', ; m LATIN SMALL LETTER M UCP 006E, Ll, 'n', ; n LATIN SMALL LETTER N UCP 006F, Ll, 'o', ; o LATIN SMALL LETTER O UCP 0070, Ll, 'p', ; p LATIN SMALL LETTER P UCP 0071, Ll, 'q', ; q LATIN SMALL LETTER Q UCP 0072, Ll, 'r', ; r LATIN SMALL LETTER R UCP 0073, Ll, 's', ; s LATIN SMALL LETTER S UCP 0074, Ll, 't', ; t LATIN SMALL LETTER T UCP 0075, Ll, 'u', ; u LATIN SMALL LETTER U UCP 0076, Ll, 'v', ; v LATIN SMALL LETTER V UCP 0077, Ll, 'w', ; w LATIN SMALL LETTER W UCP 0078, Ll, 'x', ; x LATIN SMALL LETTER X UCP 0079, Ll, 'y', ; y LATIN SMALL LETTER Y UCP 007A, Ll, 'z', ; z LATIN SMALL LETTER Z UCP 007B, Ps, '{', lbrace ; { LEFT CURLY BRACKET UCP 007C, Sm, '|', verbar ; | VERTICAL LINE UCP 007D, Pe, '}', rbrace ; } RIGHT CURLY BRACKET UCP 007E, Sm, '~', tilde ; ~ TILDE UCP 007F, Cc, ' ', ; control DEL UCP 00A0, Zs, ' ', nbsp ; NO-BREAK SPACE UCP 00A1, Po-2, '!', iexcl ; ¡ INVERTED EXCLAMATION MARK UCP 00A2, Sc, 'c', cent ; ¢ CENT SIGN UCP 00A3, Sc, 'L', pound ; £ POUND SIGN UCP 00A4, Sc, '$', curren ; ¤ CURRENCY SIGN UCP 00A5, Sc, 'Y', yen ; ¥ YEN SIGN UCP 00A6, So, '|', brvbar ; ¦ BROKEN BAR UCP 00A7, Po, '#', sect ; § SECTION SIGN UCP 00A8, Sk, '', uml ; ¨ DIAERESIS UCP 00A9, So, '(c)', copy ; © COPYRIGHT SIGN UCP 00AA, Lo-8, 'f', ordf ; ª FEMININE ORDINAL INDICATOR UCP 00AB, Pi, '<<', laquo ; « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK UCP 00AC, Sm, '_', not ; ¬ NOT SIGN UCP 00AD, Cf, '-', shy ; SOFT HYPHEN UCP 00AE, So, '(R)', reg ; ® REGISTERED SIGN UCP 00AF, Sk, '', macr ; ¯ MACRON UCP 00B0, So, '`', deg ; ° DEGREE SIGN UCP 00B1, Sm, '+', plusmn ; ± PLUS-MINUS SIGN UCP 00B2, No-5, '2', sup2 ; ² SUPERSCRIPT TWO UCP 00B3, No-5, '3', sup3 ; ³ SUPERSCRIPT THREE UCP 00B4, Sk, '', acute ; ´ ACUTE ACCENT UCP 00B5, Ll-6, 'u', micro ; µ MICRO SIGN UCP 00B6, Po, 'P', para ; ¶ PILCROW SIGN UCP 00B7, Po, '.', middot ; · MIDDLE DOT UCP 00B8, Sk, '', cedil ; ¸ CEDILLA UCP 00B9, No-5, '1', sup1 ; ¹ SUPERSCRIPT ONE UCP 00BA, Lo-8, 'm', ordm ; º MASCULINE ORDINAL INDICATOR UCP 00BB, Pf, '>>', raquo ; » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK UCP 00BB, Pf, '>>', raquo ; » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK UCP 00BC, No-5, '1/4', frac14 ; ¼ VULGAR FRACTION ONE QUARTER UCP 00BD, No-5, '1/2', frac12 ; ½ VULGAR FRACTION ONE HALF UCP 00BE, No-5, '3/4', frac34 ; ¾ VULGAR FRACTION THREE QUARTERS UCP 00BF, Po, '?', iquest ; ¿ INVERTED QUESTION MARK UCP 00C0, Lu, 'A', Agrave ; À LATIN CAPITAL LETTER A WITH GRAVE UCP 00C1, Lu+2, 'A', Aacute ; Á LATIN CAPITAL LETTER A WITH ACUTE UCP 00C2, Lu, 'A', Acirc ;  LATIN CAPITAL LETTER A WITH CIRCUMFLEX UCP 00C3, Lu-2, 'A', Atilde ; à LATIN CAPITAL LETTER A WITH TILDE UCP 00C4, Lu, 'A', Auml ; Ä LATIN CAPITAL LETTER A WITH DIAERESIS UCP 00C5, Lu-2, 'A', Aring ; Å LATIN CAPITAL LETTER A WITH RING ABOVE UCP 00C6, Lu, 'AE', AElig ; Æ LATIN CAPITAL LETTER AE UCP 00C7, Lu, 'C', Ccedil ; Ç LATIN CAPITAL LETTER C WITH CEDILLA UCP 00C8, Lu, 'E', Egrave ; È LATIN CAPITAL LETTER E WITH GRAVE UCP 00C9, Lu+2, 'E', Eacute ; É LATIN CAPITAL LETTER E WITH ACUTE UCP 00CA, Lu-2, 'E', Ecirc ; Ê LATIN CAPITAL LETTER E WITH CIRCUMFLEX UCP 00CB, Lu, 'E', Euml ; Ë LATIN CAPITAL LETTER E WITH DIAERESIS UCP 00CC, Lu, 'I', Igrave ; Ì LATIN CAPITAL LETTER I WITH GRAVE UCP 00CD, Lu+2, 'I', Iacute ; Í LATIN CAPITAL LETTER I WITH ACUTE UCP 00CE, Lu, 'I', Icirc ; Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX UCP 00CF, Lu-2, 'I', Iuml ; Ï LATIN CAPITAL LETTER I WITH DIAERESIS UCP 00D0, Lu-2, 'D', ETH ; Ð LATIN CAPITAL LETTER ETH UCP 00D1, Lu, 'N', Ntilde ; Ñ LATIN CAPITAL LETTER N WITH TILDE UCP 00D2, Lu, 'O', Ograve ; Ò LATIN CAPITAL LETTER O WITH GRAVE UCP 00D3, Lu+2, 'O', Oacute ; Ó LATIN CAPITAL LETTER O WITH ACUTE UCP 00D4, Lu, 'O', Ocirc ; Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX UCP 00D5, Lu, 'O', Otilde ; Õ LATIN CAPITAL LETTER O WITH TILDE UCP 00D6, Lu, 'O', Ouml ; Ö LATIN CAPITAL LETTER O WITH DIAERESIS UCP 00D7, Sm, 'x', times ; × MULTIPLICATION SIGN UCP 00D8, Lu, 'O', Oslash ; Ø LATIN CAPITAL LETTER O WITH STROKE UCP 00D9, Lu, 'U', Ugrave ; Ù LATIN CAPITAL LETTER U WITH GRAVE UCP 00DA, Lu+2, 'U', Uacute ; Ú LATIN CAPITAL LETTER U WITH ACUTE UCP 00DB, Lu, 'U', Ucirc ; Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX UCP 00DC, Lu, 'U', Uuml ; Ü LATIN CAPITAL LETTER U WITH DIAERESIS UCP 00DD, Lu+2, 'Y', Yacute ; Ý LATIN CAPITAL LETTER Y WITH ACUTE UCP 00DE, Lu-2, 'TH', THORN ; Þ LATIN CAPITAL LETTER THORN UCP 00DF, Ll, 'ss', szlig ; ß LATIN SMALL LETTER SHARP S UCP 00E0, Ll, 'a', agrave ; à LATIN SMALL LETTER A WITH GRAVE UCP 00E1, Ll+2, 'a', aacute ; á LATIN SMALL LETTER A WITH ACUTE UCP 00E2, Ll-2, 'a', acirc ; â LATIN SMALL LETTER A WITH CIRCUMFLEX UCP 00E3, Ll-2, 'a', atilde ; ã LATIN SMALL LETTER A WITH TILDE UCP 00E4, Ll, 'a', auml ; ä LATIN SMALL LETTER A WITH DIAERESIS UCP 00E5, Ll, 'a', aring ; å LATIN SMALL LETTER A WITH RING ABOVE UCP 00E6, Ll, 'ae', aelig ; æ LATIN SMALL LETTER AE UCP 00E7, Ll, 'c', ccedil ; ç LATIN SMALL LETTER C WITH CEDILLA UCP 00E8, Ll, 'e', egrave ; è LATIN SMALL LETTER E WITH GRAVE UCP 00E9, Ll+2, 'e', eacute ; é LATIN SMALL LETTER E WITH ACUTE UCP 00EA, Ll, 'e', ecirc ; ê LATIN SMALL LETTER E WITH CIRCUMFLEX UCP 00EB, Ll, 'e', euml ; ë LATIN SMALL LETTER E WITH DIAERESIS UCP 00EC, Ll, 'i', igrave ; ì LATIN SMALL LETTER I WITH GRAVE UCP 00ED, Ll+2, 'i', iacute ; í LATIN SMALL LETTER I WITH ACUTE UCP 00EE, Ll, 'i', icirc ; î LATIN SMALL LETTER I WITH CIRCUMFLEX UCP 00EF, Ll, 'i', iuml ; ï LATIN SMALL LETTER I WITH DIAERESIS UCP 00F0, Ll-2, 'd', eth ; ð LATIN SMALL LETTER ETH UCP 00F1, Ll, 'n', ntilde ; ñ LATIN SMALL LETTER N WITH TILDE UCP 00F2, Ll, 'o', ograve ; ò LATIN SMALL LETTER O WITH GRAVE UCP 00F3, Ll+2, 'o', oacute ; ó LATIN SMALL LETTER O WITH ACUTE UCP 00F4, Ll, 'o', ocirc ; ô LATIN SMALL LETTER O WITH CIRCUMFLEX UCP 00F5, Ll, 'o', otilde ; õ LATIN SMALL LETTER O WITH TILDE UCP 00F6, Ll, 'o', ouml ; ö LATIN SMALL LETTER O WITH DIAERESIS UCP 00F7, Sm, '/', divide ; ÷ DIVISION SIGN UCP 00F8, Ll, 'o', oslash ; ø LATIN SMALL LETTER O WITH STROKE UCP 00F9, Ll, 'u', ugrave ; ù LATIN SMALL LETTER U WITH GRAVE UCP 00FA, Ll+2, 'u', uacute ; ú LATIN SMALL LETTER U WITH ACUTE UCP 00FB, Ll+2, 'u', ucirc ; û LATIN SMALL LETTER U WITH CIRCUMFLEX UCP 00FC, Ll, 'u', uuml ; ü LATIN SMALL LETTER U WITH DIAERESIS UCP 00FD, Ll+2, 'y', yacute ; ý LATIN SMALL LETTER Y WITH ACUTE UCP 00FE, Ll-2, 'th', thorn ; þ LATIN SMALL LETTER THORN UCP 00FF, Ll, 'y', yuml ; ÿ LATIN SMALL LETTER Y WITH DIAERESIS UCP 0100, Lu, 'A', Amacr ; Ā LATIN CAPITAL LETTER A WITH MACRON UCP 0101, Ll, 'a', amacr ; ā LATIN SMALL LETTER A WITH MACRON UCP 0102, Lu, 'A', Abreve ; Ă LATIN CAPITAL LETTER A WITH BREVE UCP 0103, Ll, 'a', abreve ; ă LATIN SMALL LETTER A WITH BREVE UCP 0104, Lu+2, 'A', Aogon ; Ą LATIN CAPITAL LETTER A WITH OGONEK UCP 0105, Ll+2, 'a', aogon ; ą LATIN SMALL LETTER A WITH OGONEK UCP 0106, Lu+2, 'C', Cacute ; Ć LATIN CAPITAL LETTER C WITH ACUTE UCP 0107, Ll+2, 'c', cacute ; ć LATIN SMALL LETTER C WITH ACUTE UCP 0108, Lu, 'C', Ccirc ; Ĉ LATIN CAPITAL LETTER C WITH CIRCUMFLEX UCP 0109, Ll, 'c', ccirc ; ĉ LATIN SMALL LETTER C WITH CIRCUMFLEX UCP 010A, Lu, 'C', Cdot ; Ċ LATIN CAPITAL LETTER C WITH DOT ABOVE UCP 010B, Ll, 'c', cdot ; ċ LATIN SMALL LETTER C WITH DOT ABOVE UCP 010C, Lu+2, 'C', Ccaron ; Č LATIN CAPITAL LETTER C WITH CARON UCP 010D, Ll+2, 'c', ccaron ; č LATIN SMALL LETTER C WITH CARON UCP 010E, Lu, 'D', Dcaron ; Ď LATIN CAPITAL LETTER D WITH CARON UCP 010F, Ll, 'd', dcaron ; ď LATIN SMALL LETTER D WITH CARON UCP 0110, Lu, 'D', Dstrok ; Đ LATIN CAPITAL LETTER D WITH STROKE UCP 0111, Ll, 'd', dstrok ; đ LATIN SMALL LETTER D WITH STROKE UCP 0112, Lu, 'E', Emacr ; Ē LATIN CAPITAL LETTER E WITH MACRON UCP 0113, Ll, 'e', emacr ; ē LATIN SMALL LETTER E WITH MACRON UCP 0116, Lu, 'E', Edot ; Ė LATIN CAPITAL LETTER E WITH DOT ABOVE UCP 0117, Ll, 'e', edot ; ė LATIN SMALL LETTER E WITH DOT ABOVE UCP 0118, Lu+2, 'E', Eogon ; Ę LATIN CAPITAL LETTER E WITH OGONEK UCP 0119, Ll+2, 'e', eogon ; ę LATIN SMALL LETTER E WITH OGONEK UCP 011A, Lu+2, 'E', Ecaron ; Ě LATIN CAPITAL LETTER E WITH CARON UCP 011B, Ll+2, 'e', ecaron ; ě LATIN SMALL LETTER E WITH CARON UCP 011C, Lu, 'G', Gcirc ; Ĝ LATIN CAPITAL LETTER G WITH CIRCUMFLEX UCP 011D, Ll, 'g', gcirc ; ĝ LATIN SMALL LETTER G WITH CIRCUMFLEX UCP 011E, Lu, 'G', Gbreve ; Ğ LATIN CAPITAL LETTER G WITH BREVE UCP 011F, Ll, 'g', gbreve ; ğ LATIN SMALL LETTER G WITH BREVE UCP 0120, Lu, 'G', Gdot ; Ġ LATIN CAPITAL LETTER G WITH DOT ABOVE UCP 0121, Ll, 'g', gdot ; ġ LATIN SMALL LETTER G WITH DOT ABOVE UCP 0122, Lu, 'G', Gcedil ; Ģ LATIN CAPITAL LETTER G WITH CEDILLA UCP 0123, Ll, 'g', gcedil ; ģ LATIN SMALL LETTER G WITH CEDILLA UCP 0124, Lu, 'H', Hcirc ; Ĥ LATIN CAPITAL LETTER H WITH CIRCUMFLEX UCP 0125, Ll, 'h', hcirc ; ĥ LATIN SMALL LETTER H WITH CIRCUMFLEX UCP 0126, Lu, 'H', Hstrok ; Ħ LATIN CAPITAL LETTER H WITH STROKE UCP 0127, Ll, 'h', hstrok ; ħ LATIN SMALL LETTER H WITH STROKE UCP 0128, Lu, 'I', Itilde ; Ĩ LATIN CAPITAL LETTER I WITH TILDE UCP 0129, Ll, 'i', itilde ; ĩ LATIN SMALL LETTER I WITH TILDE UCP 012A, Lu, 'I', Imacr ; Ī LATIN CAPITAL LETTER I WITH MACRON UCP 012B, Ll, 'i', imacr ; ī LATIN SMALL LETTER I WITH MACRON UCP 012E, Lu, 'I', Iogon ; Į LATIN CAPITAL LETTER I WITH OGONEK UCP 012F, Ll, 'i', iogon ; į LATIN SMALL LETTER I WITH OGONEK UCP 0130, Lu, 'I', Idot ; İ LATIN CAPITAL LETTER I WITH DOT ABOVE UCP 0131, Ll, 'i', inodot ; ı LATIN SMALL LETTER DOTLESS I UCP 0134, Lu, 'J', Jcirc ; Ĵ LATIN CAPITAL LETTER J WITH CIRCUMFLEX UCP 0135, Ll, 'j', jcirc ; ĵ LATIN SMALL LETTER J WITH CIRCUMFLEX UCP 0136, Lu, 'K', Kcedil ; Ķ LATIN CAPITAL LETTER K WITH CEDILLA UCP 0137, Ll, 'k', kcedil ; ķ LATIN SMALL LETTER K WITH CEDILLA UCP 0138, Ll, 'k', kgreen ; ĸ LATIN SMALL LETTER KRA UCP 0139, Lu, 'L', Lacute ; Ĺ LATIN CAPITAL LETTER L WITH ACUTE UCP 013A, Ll, 'l', lacute ; ĺ LATIN SMALL LETTER L WITH ACUTE UCP 013B, Lu, 'L', Lcedil ; Ļ LATIN CAPITAL LETTER L WITH CEDILLA UCP 013C, Ll, 'l', lcedil ; ļ LATIN SMALL LETTER L WITH CEDILLA UCP 013D, Lu, 'L', Lcaron ; Ľ LATIN CAPITAL LETTER L WITH CARON UCP 013E, Ll, 'l', lcaron ; ľ LATIN SMALL LETTER L WITH CARON UCP 013F, Ll, 'L', Lmidot ; Ŀ LATIN CAPITAL LETTER L WITH MIDDLE DOT UCP 0140, Ll, 'l', lmidot ; ŀ LATIN SMALL LETTER L WITH MIDDLE DOT UCP 0141, Lu+2, 'L', Lstrok ; Ł LATIN CAPITAL LETTER L WITH STROKE UCP 0142, Ll+2, 'l', lstrok ; ł LATIN SMALL LETTER L WITH STROKE UCP 0143, Lu, 'N', Nacute ; Ń LATIN CAPITAL LETTER N WITH ACUTE UCP 0144, Ll, 'n', nacute ; ń LATIN SMALL LETTER N WITH ACUTE UCP 0145, Lu, 'N', Ncedil ; Ņ LATIN CAPITAL LETTER N WITH CEDILLA UCP 0146, Ll, 'n', ncedil ; ņ LATIN SMALL LETTER N WITH CEDILLA UCP 0147, Lu, 'N', Ncaron ; Ň LATIN CAPITAL LETTER N WITH CARON UCP 0148, Ll, 'n', ncaron ; ň LATIN SMALL LETTER N WITH CARON UCP 0149, Ll, 'n', Napos ; ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE UCP 014A, Lu-2, 'N', napos ; Ŋ LATIN CAPITAL LETTER ENG UCP 014B, Ll-2, 'n', eng ; ŋ LATIN SMALL LETTER ENG UCP 014C, Lu, 'O', Omacr ; Ō LATIN CAPITAL LETTER O WITH MACRON UCP 014D, Ll, 'o', omacr ; ō LATIN SMALL LETTER O WITH MACRON UCP 0150, Lu+2, 'O', Odblac ; Ő LATIN CAPITAL LETTER O WITH DOUBLE ACUTE UCP 0151, Ll+2, 'o', odblac ; ő LATIN SMALL LETTER O WITH DOUBLE ACUTE UCP 0152, Lu, 'OE', OElig ; Œ LATIN CAPITAL LIGATURE OE UCP 0153, Ll, 'oe', oelig ; œ LATIN SMALL LIGATURE OE UCP 0154, Lu, 'R', Racute ; Ŕ LATIN CAPITAL LETTER R WITH ACUTE UCP 0155, Ll, 'r', racute ; ŕ LATIN SMALL LETTER R WITH ACUTE UCP 0156, Lu, 'R', Rcedil ; Ŗ LATIN CAPITAL LETTER R WITH CEDILLA UCP 0157, Ll, 'r', rcedil ; ŗ LATIN SMALL LETTER R WITH CEDILLA UCP 0158, Lu+2, 'R', Rcaron ; Ř LATIN CAPITAL LETTER R WITH CARON UCP 0159, Ll+2, 'r', rcaron ; ř LATIN SMALL LETTER R WITH CARON UCP 015A, Lu, 'S', Sacute ; Ś LATIN CAPITAL LETTER S WITH ACUTE UCP 015B, Ll, 's', sacute ; ś LATIN SMALL LETTER S WITH ACUTE UCP 015C, Lu, 'S', Scirc ; Ŝ LATIN CAPITAL LETTER S WITH CIRCUMFLEX UCP 015D, Ll, 's', scirc ; ŝ LATIN SMALL LETTER S WITH CIRCUMFLEX UCP 015E, Lu, 'S', Scedil ; Ş LATIN CAPITAL LETTER S WITH CEDILLA UCP 015F, Ll, 's', scedil ; ş LATIN SMALL LETTER S WITH CEDILLA UCP 0160, Lu+2, 'S', Scaron ; Š LATIN CAPITAL LETTER S WITH CARON UCP 0161, Ll+2, 's', scaron ; š LATIN SMALL LETTER S WITH CARON UCP 0162, Lu, 'T', Tcedil ; Ţ LATIN CAPITAL LETTER T WITH CEDILLA UCP 0163, Ll, 't', tcedil ; ţ LATIN SMALL LETTER T WITH CEDILLA UCP 0164, Lu, 'T', Tcaron ; Ť LATIN CAPITAL LETTER T WITH CARON UCP 0165, Ll, 't', tcaron ; ť LATIN SMALL LETTER T WITH CARON UCP 0166, Lu, 'T', Tstrok ; Ŧ LATIN CAPITAL LETTER T WITH STROKE UCP 0167, Ll, 't', tstrok ; ŧ LATIN SMALL LETTER T WITH STROKE UCP 0168, Lu, 'U', Utilde ; Ũ LATIN CAPITAL LETTER U WITH TILDE UCP 0169, Ll, 'u', utilde ; ũ LATIN SMALL LETTER U WITH TILDE UCP 016A, Lu, 'U', Umacr ; Ū LATIN CAPITAL LETTER U WITH MACRON UCP 016B, Ll, 'u', umacr ; ū LATIN SMALL LETTER U WITH MACRON UCP 016C, Lu, 'U', Ubreve ; Ŭ LATIN CAPITAL LETTER U WITH BREVE UCP 016D, Ll, 'u', ubreve ; ŭ LATIN SMALL LETTER U WITH BREVE UCP 016E, Lu+2, 'U', Uring ; Ů LATIN CAPITAL LETTER U WITH RING ABOVE UCP 016F, Ll+2, 'u', uring ; ů LATIN SMALL LETTER U WITH RING ABOVE UCP 0170, Lu+2, 'U', Udblac ; Ű LATIN CAPITAL LETTER U WITH DOUBLE ACUTE UCP 0171, Ll+2, 'u', udblac ; ű LATIN SMALL LETTER U WITH DOUBLE ACUTE UCP 0172, Lu, 'U', Uogon ; Ų LATIN CAPITAL LETTER U WITH OGONEK UCP 0173, Ll, 'u', uogon ; ų LATIN SMALL LETTER U WITH OGONEK UCP 0174, Lu, 'W', Wcirc ; Ŵ LATIN CAPITAL LETTER W WITH CIRCUMFLEX UCP 0175, Ll, 'w', wcirc ; ŵ LATIN SMALL LETTER W WITH CIRCUMFLEX UCP 0176, Lu, 'Y', Ycirc ; Ŷ LATIN CAPITAL LETTER Y WITH CIRCUMFLEX UCP 0177, Ll, 'y', ycirc ; ŷ LATIN SMALL LETTER Y WITH CIRCUMFLEX UCP 0178, Lu, 'Y', Yuml ; Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS UCP 0179, Lu, 'Z', Zacute ; Ź LATIN CAPITAL LETTER Z WITH ACUTE UCP 017A, Ll, 'z', zacute ; ź LATIN SMALL LETTER Z WITH ACUTE UCP 017B, Lu+2, 'Z', Zdot ; Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE UCP 017C, Ll+2, 'z', zdot ; ż LATIN SMALL LETTER Z WITH DOT ABOVE UCP 017D, Lu+2, 'Z', Zcaron ; Ž LATIN CAPITAL LETTER Z WITH CARON UCP 017E, Ll+2, 'z', zcaron ; ž LATIN SMALL LETTER Z WITH CARON UCP 017F, Ll, 's', ; ſ LATIN SMALL LETTER LONG S UCP 0192, Ll-2, 'f', fnof ; ƒ LATIN SMALL LETTER F WITH HOOK UCP 01A0, Lu, 'O', ; Ơ LATIN CAPITAL LETTER O WITH HORN UCP 01A1, Ll, 'U', ; ơ LATIN SMALL LETTER O WITH HORN UCP 01AF, Lu, 'u', ; Ư LATIN CAPITAL LETTER U WITH HORN UCP 01B0, Ll, 'S', ; ư LATIN SMALL LETTER U WITH HORN UCP 01F5, Ll, 'g', ; ǵ LATIN SMALL LETTER G WITH ACUTE UCP 0218, Lu, 's', ; Ș LATIN CAPITAL LETTER S WITH COMMA BELOW UCP 0219, Ll, 'T', ; ș LATIN SMALL LETTER S WITH COMMA BELOW UCP 021A, Lu, 't', ; Ț LATIN CAPITAL LETTER T WITH COMMA BELOW UCP 021B, Ll, '', ; ț LATIN SMALL LETTER T WITH COMMA BELOW UCP 0237, Ll, 'j', jmath ; ȷ LATIN SMALL LETTER DOTLESS J UCP 027C, Ll, 'r', ; ɼ LATIN SMALL LETTER R WITH LONG LEG UCP 02C6, Lm, '', circ ; ˆ MODIFIER LETTER CIRCUMFLEX ACCENT UCP 02C7, Lm, '', caron ; ˇ CARON UCP 02CB, Lm, '`', ; ˋ MODIFIER LETTER GRAVE ACCENT UCP 02D8, Sk, '', breve ; ˘ BREVE UCP 02D9, Sk, '', dot ; ˙ DOT ABOVE UCP 02DA, Sk, '', ring ; ˚ RING ABOVE UCP 02DB, Sk, '', ogon ; ˛ OGONEK UCP 02DC, Sk, '', tilde ; ˜ SMALL TILDE UCP 02DD, Sk, '', dblac ; ˝ DOUBLE ACUTE ACCENT UCP 0300, Mn, '', ; ̀ COMBINING GRAVE ACCENT UCP 0301, Mn, '', ; ́ COMBINING ACUTE ACCENT UCP 0303, Mn, '', ; ̃ COMBINING TILDE UCP 0309, Mn, '', ; ̉ COMBINING HOOK ABOVE UCP 0323, Mn, '', ; ̣ COMBINING DOT BELOW UCP 0332, Mn, '', underbar ; ̲ COMBINING LOW LINE UCP 037A, Lm, '', ; ͺ GREEK YPOGEGRAMMENI UCP 0384, Sk, '', ; ΄ GREEK TONOS UCP 0385, Sk, ' ', ; ΅ GREEK DIALYTIKA TONOS UCP 0386, Lu, 'A', ; Ά GREEK CAPITAL LETTER ALPHA WITH TONOS UCP 0387, Po, '', ; · GREEK ANO TELEIA UCP 0388, Lu, 'E', ; Έ GREEK CAPITAL LETTER EPSILON WITH TONOS UCP 0389, Lu, 'H', ; Ή GREEK CAPITAL LETTER ETA WITH TONOS UCP 038A, Lu, 'I', ; Ί GREEK CAPITAL LETTER IOTA WITH TONOS UCP 038C, Lu, 'O', ; Ό GREEK CAPITAL LETTER OMICRON WITH TONOS UCP 038E, Lu, 'Y', ; Ύ GREEK CAPITAL LETTER UPSILON WITH TONOS UCP 038F, Lu, 'O', ; Ώ GREEK CAPITAL LETTER OMEGA WITH TONOS UCP 0390, Ll, 'i', ; ΐ GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS UCP 0391, Lu, 'A', Alpha ; Α GREEK CAPITAL LETTER ALPHA UCP 0392, Lu, 'B', Beta ; Β GREEK CAPITAL LETTER BETA UCP 0393, Lu, 'G', Gamma ; Γ GREEK CAPITAL LETTER GAMMA UCP 0394, Lu, 'D', Delta ; Δ GREEK CAPITAL LETTER DELTA UCP 0395, Lu, 'E', Epsilon ; Ε GREEK CAPITAL LETTER EPSILON UCP 0396, Lu, 'Z', Zeta ; Ζ GREEK CAPITAL LETTER ZETA UCP 0397, Lu, 'H', Eta ; Η GREEK CAPITAL LETTER ETA UCP 0398, Lu, 'Th', Theta ; Θ GREEK CAPITAL LETTER THETA UCP 0399, Lu, 'I', Iota ; Ι GREEK CAPITAL LETTER IOTA UCP 039A, Lu, 'K', Kappa ; Κ GREEK CAPITAL LETTER KAPPA UCP 039B, Lu, 'L', Lambda ; Λ GREEK CAPITAL LETTER LAMDA UCP 039C, Lu, 'M', Mu ; Μ GREEK CAPITAL LETTER MU UCP 039D, Lu, 'N', Nu ; Ν GREEK CAPITAL LETTER NU UCP 039E, Lu, 'X', Xi ; Ξ GREEK CAPITAL LETTER XI UCP 039F, Lu, 'O', Omicron ; Ο GREEK CAPITAL LETTER OMICRON UCP 03A0, Lu, 'P', Pi ; Π GREEK CAPITAL LETTER PI UCP 03A1, Lu, 'R', Rho ; Ρ GREEK CAPITAL LETTER RHO UCP 03A3, Lu, 'S', Sigma ; Σ GREEK CAPITAL LETTER SIGMA UCP 03A4, Lu, 'T', Tau ; Τ GREEK CAPITAL LETTER TAU UCP 03A5, Lu, 'Y', Upsilon ; Υ GREEK CAPITAL LETTER UPSILON UCP 03A6, Lu, 'F', Phi ; Φ GREEK CAPITAL LETTER PHI UCP 03A7, Lu, 'Ch', Chi ; Χ GREEK CAPITAL LETTER CHI UCP 03A8, Lu, 'Ps', Psi ; Ψ GREEK CAPITAL LETTER PSI UCP 03A9, Lu, 'O', Omega ; Ω GREEK CAPITAL LETTER OMEGA UCP 03AA, Lu, 'I', ; Ϊ GREEK CAPITAL LETTER IOTA WITH DIALYTIKA UCP 03AB, Lu, 'Y', ; Ϋ GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA UCP 03AC, Ll, 'a', ; ά GREEK SMALL LETTER ALPHA WITH TONOS UCP 03AD, Ll, 'e', ; έ GREEK SMALL LETTER EPSILON WITH TONOS UCP 03AE, Ll, 'h', ; ή GREEK SMALL LETTER ETA WITH TONOS UCP 03AF, Ll, 'i', ; ί GREEK SMALL LETTER IOTA WITH TONOS UCP 03B0, Ll, 'u', ; ΰ GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS UCP 03B1, Ll, 'a', alpha ; α GREEK SMALL LETTER ALPHA UCP 03B2, Ll, 'b', beta ; β GREEK SMALL LETTER BETA UCP 03B3, Ll, 'g', gamma ; γ GREEK SMALL LETTER GAMMA UCP 03B4, Ll, 'd', delta ; δ GREEK SMALL LETTER DELTA UCP 03B5, Ll, 'e', epsilon ; ε GREEK SMALL LETTER EPSILON UCP 03B6, Ll, 'z', zeta ; ζ GREEK SMALL LETTER ZETA UCP 03B7, Ll, 'h', eta ; η GREEK SMALL LETTER ETA UCP 03B8, Ll, 'th', theta ; θ GREEK SMALL LETTER THETA UCP 03B9, Ll, 'i', iota ; ι GREEK SMALL LETTER IOTA UCP 03BA, Ll, 'k', kappa ; κ GREEK SMALL LETTER KAPPA UCP 03BB, Ll, 'l', lambda ; λ GREEK SMALL LETTER LAMDA UCP 03BC, Ll, 'm', mu ; μ GREEK SMALL LETTER MU UCP 03BD, Ll, 'n', nu ; ν GREEK SMALL LETTER NU UCP 03BE, Ll, 'x', xi ; ξ GREEK SMALL LETTER XI UCP 03BF, Ll, 'o', omicron ; ο GREEK SMALL LETTER OMICRON UCP 03C0, Ll, 'p', pi ; π GREEK SMALL LETTER PI UCP 03C1, Ll, 'r', rho ; ρ GREEK SMALL LETTER RHO UCP 03C2, Ll, 's', sigmaf ; ς GREEK SMALL LETTER FINAL SIGMA UCP 03C3, Ll, 's', sigma ; σ GREEK SMALL LETTER SIGMA UCP 03C4, Ll, 't', tau ; τ GREEK SMALL LETTER TAU UCP 03C5, Ll, 'u', upsilon ; υ GREEK SMALL LETTER UPSILON UCP 03C6, Ll, 'f', phi ; φ GREEK SMALL LETTER PHI UCP 03C7, Ll, 'ch', chi ; χ GREEK SMALL LETTER CHI UCP 03C8, Ll, 'ps', psi ; ψ GREEK SMALL LETTER PSI UCP 03C9, Ll, 'o', omega ; ω GREEK SMALL LETTER OMEGA UCP 03CA, Ll, 'i', ; ϊ GREEK SMALL LETTER IOTA WITH DIALYTIKA UCP 03CB, Ll, 'u', ; ϋ GREEK SMALL LETTER UPSILON WITH DIALYTIKA UCP 03CC, Ll, 'o', ; ό GREEK SMALL LETTER OMICRON WITH TONOS UCP 03CD, Ll, 'u', ; ύ GREEK SMALL LETTER UPSILON WITH TONOS UCP 03CE, Ll, 'o', ; ώ GREEK SMALL LETTER OMEGA WITH TONOS UCP 03D1, Ll, 'th', thetasym ; ϑ GREEK THETA SYMBOL UCP 03D2, Ll, 'u', upsih ; ϒ GREEK UPSILON WITH HOOK SYMBOL UCP 03D6, Ll, 'p', piv ; ϖ GREEK PI SYMBOL UCP 03DC, Ll, 'p', Gammad ; Ϝ GREEK CAPITAL LETTER DIGAMMA UCP 03DD, Ll, 'p', gammad ; ϝ GREEK SMALL LETTER DIGAMMA UCP 0401, Lu-1, 'Io', IOcy ; Ё CYRILLIC CAPITAL LETTER IO UCP 0402, Lu-1, 'Dj', DJcy ; Ђ CYRILLIC CAPITAL LETTER DJE UCP 0403, Lu-1, 'G', GJcy ; Ѓ CYRILLIC CAPITAL LETTER GJE UCP 0404, Lu, 'E', Jukcy ; Є CYRILLIC CAPITAL LETTER UKRAINIAN IE UCP 0405, Lu-1, 'S', DScy ; Ѕ CYRILLIC CAPITAL LETTER DZE UCP 0406, Lu-1, 'I', Iukcy ; І CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I UCP 0407, Lu-1, 'I', YIcy ; Ї CYRILLIC CAPITAL LETTER YI UCP 0408, Lu-1, 'J', Jsercy ; Ј CYRILLIC CAPITAL LETTER JE UCP 0409, Lu-1, 'Lj', LJcy ; Љ CYRILLIC CAPITAL LETTER LJE UCP 040A, Lu-1, 'Nj', NJcy ; Њ CYRILLIC CAPITAL LETTER NJE UCP 040B, Lu-1, 'Cj', TSHcy ; Ћ CYRILLIC CAPITAL LETTER TSHE UCP 040C, Lu-1, 'K', KJcy ; Ќ CYRILLIC CAPITAL LETTER KJE UCP 040D, Lu-1, 'I', ; Ѝ CYRILLIC CAPITAL LETTER I WITH GRAVE UCP 040E, Lu, 'Y', Ubrcy ; Ў CYRILLIC CAPITAL LETTER SHORT U UCP 040F, Lu-1, 'Dz', DZcy ; Џ CYRILLIC CAPITAL LETTER DZHE UCP 0410, Lu+2, 'A', Acy ; А CYRILLIC CAPITAL LETTER A UCP 0411, Lu, 'B', Bcy ; Б CYRILLIC CAPITAL LETTER BE UCP 0412, Lu, 'V', Vcy ; В CYRILLIC CAPITAL LETTER VE UCP 0413, Lu, 'G', Gcy ; Г CYRILLIC CAPITAL LETTER GHE UCP 0414, Lu, 'D', Dcy ; Д CYRILLIC CAPITAL LETTER DE UCP 0415, Lu+2, 'E', IEcy ; Е CYRILLIC CAPITAL LETTER IE UCP 0416, Lu, 'Zh', ZHcy ; Ж CYRILLIC CAPITAL LETTER ZHE UCP 0417, Lu, 'Z', Zcy ; З CYRILLIC CAPITAL LETTER ZE UCP 0418, Lu, 'I', Icy ; И CYRILLIC CAPITAL LETTER I UCP 0419, Lu, 'J', Jcy ; Й CYRILLIC CAPITAL LETTER SHORT I UCP 041A, Lu, 'K', Kcy ; К CYRILLIC CAPITAL LETTER KA UCP 041B, Lu, 'L', Lcy ; Л CYRILLIC CAPITAL LETTER EL UCP 041C, Lu, 'M', Mcy ; М CYRILLIC CAPITAL LETTER EM UCP 041D, Lu, 'N', Ncy ; Н CYRILLIC CAPITAL LETTER EN UCP 041E, Lu+2, 'O', Ocy ; О CYRILLIC CAPITAL LETTER O UCP 041F, Lu, 'P', Pcy ; П CYRILLIC CAPITAL LETTER PE UCP 0420, Lu, 'R', Rcy ; Р CYRILLIC CAPITAL LETTER ER UCP 0421, Lu, 'C', Scy ; С CYRILLIC CAPITAL LETTER ES UCP 0422, Lu, 'T', Tcy ; Т CYRILLIC CAPITAL LETTER TE UCP 0423, Lu, 'U', Ucy ; У CYRILLIC CAPITAL LETTER U UCP 0424, Lu, 'F', Fcy ; Ф CYRILLIC CAPITAL LETTER EF UCP 0425, Lu, 'Kh', KHcy ; Х CYRILLIC CAPITAL LETTER HA UCP 0426, Lu, 'C', TScy ; Ц CYRILLIC CAPITAL LETTER TSE UCP 0427, Lu, 'Ch', CHcy ; Ч CYRILLIC CAPITAL LETTER CHE UCP 0428, Lu, 'Sh', SHcy ; Ш CYRILLIC CAPITAL LETTER SHA UCP 0429, Lu, 'Shch',SHCHcy ; Щ CYRILLIC CAPITAL LETTER SHCHA UCP 042A, Lu-1, "'", Hardcy ; Ъ CYRILLIC CAPITAL LETTER HARD SIGN UCP 042B, Lu, 'Y', Ycy ; Ы CYRILLIC CAPITAL LETTER YERU UCP 042C, Lu, '', Softcy ; Ь CYRILLIC CAPITAL LETTER SOFT SIGN UCP 042D, Lu, 'E', Ecy ; Э CYRILLIC CAPITAL LETTER E UCP 042E, Lu+1, 'Yu', YUcy ; Ю CYRILLIC CAPITAL LETTER YU UCP 042F, Lu+1, 'Ya', YAcy ; Я CYRILLIC CAPITAL LETTER YA UCP 0430, Ll+2, 'a', acy ; а CYRILLIC SMALL LETTER A UCP 0431, Ll, 'b', bcy ; б CYRILLIC SMALL LETTER BE UCP 0432, Ll, 'v', vcy ; в CYRILLIC SMALL LETTER VE UCP 0433, Ll, 'g', gcy ; г CYRILLIC SMALL LETTER GHE UCP 0434, Ll, 'd', dcy ; д CYRILLIC SMALL LETTER DE UCP 0435, Ll+2, 'e', iecy ; е CYRILLIC SMALL LETTER IE UCP 0436, Ll, 'zh', zhcy ; ж CYRILLIC SMALL LETTER ZHE UCP 0437, Ll, 'z', zcy ; з CYRILLIC SMALL LETTER ZE UCP 0438, Ll, 'i', icy ; и CYRILLIC SMALL LETTER I UCP 0439, Ll, 'j', jcy ; й CYRILLIC SMALL LETTER SHORT I UCP 043A, Ll, 'k', kcy ; к CYRILLIC SMALL LETTER KA UCP 043B, Ll, 'l', lcy ; л CYRILLIC SMALL LETTER EL UCP 043C, Ll, 'm', mcy ; м CYRILLIC SMALL LETTER EM UCP 043D, Ll, 'n', ncy ; н CYRILLIC SMALL LETTER EN UCP 043E, Ll+2, 'o', ocy ; о CYRILLIC SMALL LETTER O UCP 043F, Ll, 'p', pcy ; п CYRILLIC SMALL LETTER PE UCP 0440, Ll, 'r', rcy ; р CYRILLIC SMALL LETTER ER UCP 0441, Ll, 's', scy ; с CYRILLIC SMALL LETTER ES UCP 0442, Ll, 't', tcy ; т CYRILLIC SMALL LETTER TE UCP 0443, Ll, 'u', ucy ; у CYRILLIC SMALL LETTER U UCP 0444, Ll, 'f', fcy ; ф CYRILLIC SMALL LETTER EF UCP 0445, Ll, 'kh', khcy ; х CYRILLIC SMALL LETTER HA UCP 0446, Ll, 'c', tscy ; ц CYRILLIC SMALL LETTER TSE UCP 0447, Ll, 'ch', chcy ; ч CYRILLIC SMALL LETTER CHE UCP 0448, Ll, 'sh', shcy ; ш CYRILLIC SMALL LETTER SHA UCP 0449, Ll, 'shch',shchcy ; щ CYRILLIC SMALL LETTER SHCHA UCP 044A, Ll-1, "'", hardcy ; ъ CYRILLIC SMALL LETTER HARD SIGN UCP 044B, Ll, 'y', ycy ; ы CYRILLIC SMALL LETTER YERU UCP 044C, Ll, '', softcy ; ь CYRILLIC SMALL LETTER SOFT SIGN UCP 044D, Ll, 'e', ecy ; э CYRILLIC SMALL LETTER E UCP 044E, Ll+1, 'yu', yucy ; ю CYRILLIC SMALL LETTER YU UCP 044F, Ll+1, 'ya', yacy ; я CYRILLIC SMALL LETTER YA UCP 0451, Ll, 'io', iocy ; ё CYRILLIC SMALL LETTER IO UCP 0452, Ll, 'dj', djcy ; ђ CYRILLIC SMALL LETTER DJE UCP 0453, Ll, 'g', gjcy ; ѓ CYRILLIC SMALL LETTER GJE UCP 0454, Ll, 'e', jukcy ; є CYRILLIC SMALL LETTER UKRAINIAN IE UCP 0455, Ll, 's', dscy ; ѕ CYRILLIC SMALL LETTER DZE UCP 0456, Ll, 'i', iukcy ; і CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I UCP 0457, Ll, 'i', yicy ; ї CYRILLIC SMALL LETTER YI UCP 0458, Ll, 'j', jsercy ; ј CYRILLIC SMALL LETTER JE UCP 0459, Ll, 'lj', ljcy ; љ CYRILLIC SMALL LETTER LJE UCP 045A, Ll, 'nj', njcy ; њ CYRILLIC SMALL LETTER NJE UCP 045B, Ll, 'cj', tshcy ; ћ CYRILLIC SMALL LETTER TSHE UCP 045C, Ll, 'k', kjcy ; ќ CYRILLIC SMALL LETTER KJE UCP 045D, Ll, 'i', ; ѝ CYRILLIC SMALL LETTER I WITH GRAVE UCP 045E, Ll, 'y', ubrcy ; ў CYRILLIC SMALL LETTER SHORT U UCP 045F, Ll, 'dz', dzcy ; џ CYRILLIC SMALL LETTER DZHE UCP 0490, Lu-2, 'G', ; Ґ CYRILLIC CAPITAL LETTER GHE WITH UPTURN UCP 0491, Ll-2, 'g', ; ґ CYRILLIC SMALL LETTER GHE WITH UPTURN UCP 0492, Lu-2, 'G', ; Ғ CYRILLIC CAPITAL LETTER GHE WITH STROKE UCP 0493, Ll-2, 'g', ; ғ CYRILLIC SMALL LETTER GHE WITH STROKE UCP 049A, Lu-2, 'G', ; Қ CYRILLIC CAPITAL LETTER KA WITH DESCENDER UCP 049B, Ll-2, 'g', ; қ CYRILLIC SMALL LETTER KA WITH DESCENDER UCP 04B2, Lu-2, 'H', ; Ҳ CYRILLIC CAPITAL LETTER HA WITH DESCENDER UCP 04B3, Ll-2, 'h', ; ҳ CYRILLIC SMALL LETTER HA WITH DESCENDER UCP 04B6, Lu-2, 'Ch', ; Ҷ CYRILLIC CAPITAL LETTER CHE WITH DESCENDER UCP 04B7, Ll-2, 'ch', ; ҷ CYRILLIC SMALL LETTER CHE WITH DESCENDER UCP 04E1, Ll-2, 'z', ; ӡ CYRILLIC SMALL LETTER ABKHASIAN DZE UCP 04E2, Lu-2, 'I', ; Ӣ CYRILLIC CAPITAL LETTER I WITH MACRON UCP 04EE, Lu-2, 'U', ; Ӯ CYRILLIC CAPITAL LETTER U WITH MACRON UCP 04EF, Ll-2, 'u', ; ӯ CYRILLIC SMALL LETTER U WITH MACRON UCP 05B0, Mn, '', ; ְ HEBREW POINT SHEVA UCP 05B1, Mn, '', ; ֱ HEBREW POINT HATAF SEGOL UCP 05B2, Mn, '', ; ֲ HEBREW POINT HATAF PATAH UCP 05B3, Mn, '', ; ֳ HEBREW POINT HATAF QAMATS UCP 05B4, Mn, '', ; ִ HEBREW POINT HIRIQ UCP 05B5, Mn, '', ; ֵ HEBREW POINT TSERE UCP 05B6, Mn, '', ; ֶ HEBREW POINT SEGOL UCP 05B7, Mn, '', ; ַ HEBREW POINT PATAH UCP 05B8, Mn, '', ; ָ HEBREW POINT QAMATS UCP 05B9, Mn, '', ; ֹ HEBREW POINT HOLAM UCP 05BA, Mn, '', ; ֺ HEBREW POINT HOLAM HASER FOR VAV UCP 05BB, Mn, '', ; ֻ HEBREW POINT QUBUTS UCP 05BC, Mn, '', ; ּ HEBREW POINT DAGESH OR MAPIQ UCP 05BD, Mn, '', ; ֽ HEBREW POINT METEG UCP 05BE, Pd, '-', ; ־ HEBREW PUNCTUATION MAQAF UCP 05BF, Mn, '', ; ֿ HEBREW POINT RAFE UCP 05C0, Po, '|', ; ׀ HEBREW PUNCTUATION PASEQ UCP 05C1, Mn, '', ; ׁ HEBREW POINT SHIN DOT UCP 05C2, Mn, '', ; ׂ HEBREW POINT SIN DOT UCP 05C3, Po, ':', ; ׃ HEBREW PUNCTUATION SOF PASUQ UCP 05D0, Lo, 'A', ; א HEBREW LETTER ALEF UCP 05D1, Lo, 'B', ; ב HEBREW LETTER BET UCP 05D2, Lo, 'G', ; ג HEBREW LETTER GIMEL UCP 05D3, Lo, 'D', ; ד HEBREW LETTER DALET UCP 05D4, Lo, 'H', ; ה HEBREW LETTER HE UCP 05D5, Lo, 'V', ; ו HEBREW LETTER VAV UCP 05D6, Lo, 'Z', ; ז HEBREW LETTER ZAYIN UCP 05D7, Lo, 'H', ; ח HEBREW LETTER HET UCP 05D8, Lo, 'T', ; ט HEBREW LETTER TET UCP 05D9, Lo, 'Yi', ; י HEBREW LETTER YOD UCP 05DA, Lo, 'Kh', ; ך HEBREW LETTER FINAL KAF UCP 05DB, Lo, 'Kh', ; כ HEBREW LETTER KAF UCP 05DC, Lo, 'L', ; ל HEBREW LETTER LAMED UCP 05DD, Lo, 'M', ; ם HEBREW LETTER FINAL MEM UCP 05DE, Lo, 'M', ; מ HEBREW LETTER MEM UCP 05DF, Lo, 'N', ; ן HEBREW LETTER FINAL NUN UCP 05E0, Lo, 'N', ; נ HEBREW LETTER NUN UCP 05E1, Lo, 'S', ; ס HEBREW LETTER SAMEKH UCP 05E2, Lo, 'A', ; ע HEBREW LETTER AYIN UCP 05E3, Lo, 'P', ; ף HEBREW LETTER FINAL PE UCP 05E4, Lo, 'P', ; פ HEBREW LETTER PE UCP 05E5, Lo, 'Tz', ; ץ HEBREW LETTER FINAL TSADI UCP 05E6, Lo, 'Tz', ; צ HEBREW LETTER TSADI UCP 05E7, Lo, 'K', ; ק HEBREW LETTER QOF UCP 05E8, Lo, 'R', ; ר HEBREW LETTER RESH UCP 05E9, Lo, 'Sh', ; ש HEBREW LETTER SHIN UCP 05EA, Lo, 'T', ; ת HEBREW LETTER TAV UCP 05F0, Lo, 'W', ; װ HEBREW LIGATURE YIDDISH DOUBLE VAV UCP 05F1, Lo, 'V', ; ױ HEBREW LIGATURE YIDDISH VAV YOD UCP 05F2, Lo, 'W', ; ײ HEBREW LIGATURE YIDDISH DOUBLE YOD UCP 05F3, Po, "'", ; ׳ HEBREW PUNCTUATION GERESH UCP 05F4, Po, '"', ; ״ HEBREW PUNCTUATION GERSHAYIM UCP 060C, Po, ',', ; ، ARABIC COMMA UCP 061B, Po, ';', ; ؛ ARABIC SEMICOLON UCP 061F, Po, '?', ; ؟ ARABIC QUESTION MARK UCP 0621, Lo, "'", ; ء ARABIC LETTER HAMZA UCP 0622, Lo, 'A', ; آ ARABIC LETTER ALEF WITH MADDA ABOVE UCP 0623, Lo, 'A', ; أ ARABIC LETTER ALEF WITH HAMZA ABOVE UCP 0624, Lo, 'W', ; ؤ ARABIC LETTER WAW WITH HAMZA ABOVE UCP 0625, Lo, 'A', ; إ ARABIC LETTER ALEF WITH HAMZA BELOW UCP 0626, Lo, 'Y', ; ئ ARABIC LETTER YEH WITH HAMZA ABOVE UCP 0627, Lo, 'A', ; ا ARABIC LETTER ALEF UCP 0628, Lo, 'B', ; ب ARABIC LETTER BEH UCP 0629, Lo, 'T', ; ة ARABIC LETTER TEH MARBUTA UCP 062A, Lo, 'T', ; ت ARABIC LETTER TEH UCP 062B, Lo, 'Th', ; ث ARABIC LETTER THEH UCP 062C, Lo, 'J', ; ج ARABIC LETTER JEEM UCP 062D, Lo, 'H', ; ح ARABIC LETTER HAH UCP 062E, Lo, 'Kh', ; خ ARABIC LETTER KHAH UCP 062F, Lo, 'D', ; د ARABIC LETTER DAL UCP 0630, Lo, 'Dh', ; ذ ARABIC LETTER THAL UCP 0631, Lo, 'R', ; ر ARABIC LETTER REH UCP 0632, Lo, 'Z', ; ز ARABIC LETTER ZAIN UCP 0633, Lo, 'S', ; س ARABIC LETTER SEEN UCP 0634, Lo, 'Sh', ; ش ARABIC LETTER SHEEN UCP 0635, Lo, 'S', ; ص ARABIC LETTER SAD UCP 0636, Lo, 'D', ; ض ARABIC LETTER DAD UCP 0637, Lo, 'T', ; ط ARABIC LETTER TAH UCP 0638, Lo, 'Z', ; ظ ARABIC LETTER ZAH UCP 0639, Lo, "'", ; ع ARABIC LETTER AIN UCP 063A, Lo, 'Gh', ; غ ARABIC LETTER GHAIN UCP 0640, Lm, '_', ; ـ ARABIC TATWEEL UCP 0641, Lo, 'F', ; ف ARABIC LETTER FEH UCP 0642, Lo, 'Q', ; ق ARABIC LETTER QAF UCP 0643, Lo, 'K', ; ك ARABIC LETTER KAF UCP 0644, Lo, 'L', ; ل ARABIC LETTER LAM UCP 0645, Lo, 'M', ; م ARABIC LETTER MEEM UCP 0646, Lo, 'N', ; ن ARABIC LETTER NOON UCP 0647, Lo, 'H', ; ه ARABIC LETTER HEH UCP 0648, Lo, 'W', ; و ARABIC LETTER WAW UCP 0649, Lo, 'A', ; ى ARABIC LETTER ALEF MAKSURA UCP 064A, Lo, 'Y', ; ي ARABIC LETTER YEH UCP 064B, Mn, 'A', ; ً ARABIC FATHATAN UCP 064C, Mn, 'U', ; ٌ ARABIC DAMMATAN UCP 064D, Mn, 'I', ; ٍ ARABIC KASRATAN UCP 064E, Mn, 'A', ; َ ARABIC FATHA UCP 064F, Mn, 'U', ; ُ ARABIC DAMMA UCP 0650, Mn, 'I', ; ِ ARABIC KASRA UCP 0651, Mn, '', ; ّ ARABIC SHADDA UCP 0652, Mn, '', ; ْ ARABIC SUKUN UCP 0660, Nd, '0', ; ٠ ARABIC-INDIC DIGIT ZERO UCP 0661, Nd, '1', ; ١ ARABIC-INDIC DIGIT ONE UCP 0662, Nd, '2', ; ٢ ARABIC-INDIC DIGIT TWO UCP 0663, Nd, '3', ; ٣ ARABIC-INDIC DIGIT THREE UCP 0664, Nd, '4', ; ٤ ARABIC-INDIC DIGIT FOUR UCP 0665, Nd, '5', ; ٥ ARABIC-INDIC DIGIT FIVE UCP 0666, Nd, '6', ; ٦ ARABIC-INDIC DIGIT SIX UCP 0667, Nd, '7', ; ٧ ARABIC-INDIC DIGIT SEVEN UCP 0668, Nd, '8', ; ٨ ARABIC-INDIC DIGIT EIGHT UCP 0669, Nd, '9', ; ٩ ARABIC-INDIC DIGIT NINE UCP 066A, Po, '%', ; ٪ ARABIC PERCENT SIGN UCP 0679, Lo, 'T', ; ٹ ARABIC LETTER TTEH UCP 067E, Lo, 'P', ; پ ARABIC LETTER PEH UCP 0686, Lo, 'Ch', ; چ ARABIC LETTER TCHEH UCP 0688, Lo, 'D', ; ڈ ARABIC LETTER DDAL UCP 0691, Lo, 'R', ; ڑ ARABIC LETTER RREH UCP 0698, Lo, 'J', ; ژ ARABIC LETTER JEH UCP 06A4, Lo, 'V', ; ڤ ARABIC LETTER VEH UCP 06A9, Lo, 'Kh', ; ک ARABIC LETTER KEHEH UCP 06AF, Lo, 'G', ; گ ARABIC LETTER GAF UCP 06BA, Lo, 'N', ; ں ARABIC LETTER NOON GHUNNA UCP 06BE, Lo, 'H', ; ھ ARABIC LETTER HEH DOACHASHMEE UCP 06C1, Lo, 'H', ; ہ ARABIC LETTER HEH GOAL UCP 06D2, Lo, 'Y', ; ے ARABIC LETTER YEH BARREE UCP 06D5, Lo, 'Ae', ; ە ARABIC LETTER AE UCP 06F0, Nd, '0', ; ۰ EXTENDED ARABIC-INDIC DIGIT ZERO UCP 06F1, Nd, '1', ; ۱ EXTENDED ARABIC-INDIC DIGIT ONE UCP 06F2, Nd, '2', ; ۲ EXTENDED ARABIC-INDIC DIGIT TWO UCP 06F3, Nd, '3', ; ۳ EXTENDED ARABIC-INDIC DIGIT THREE UCP 06F4, Nd, '4', ; ۴ EXTENDED ARABIC-INDIC DIGIT FOUR UCP 06F5, Nd, '5', ; ۵ EXTENDED ARABIC-INDIC DIGIT FIVE UCP 06F6, Nd, '6', ; ۶ EXTENDED ARABIC-INDIC DIGIT SIX UCP 06F7, Nd, '7', ; ۷ EXTENDED ARABIC-INDIC DIGIT SEVEN UCP 06F8, Nd, '8', ; ۸ EXTENDED ARABIC-INDIC DIGIT EIGHT UCP 06F9, Nd, '9', ; ۹ EXTENDED ARABIC-INDIC DIGIT NINE UCP 0E01, Lo, 'K', ; ก THAI CHARACTER KO KAI UCP 0E02, Lo, 'Kh', ; ข THAI CHARACTER KHO KHAI UCP 0E03, Lo, 'Kh', ; ฃ THAI CHARACTER KHO KHUAT UCP 0E04, Lo, 'Kh', ; ค THAI CHARACTER KHO KHWAI UCP 0E05, Lo, 'Kh', ; ฅ THAI CHARACTER KHO KHON UCP 0E06, Lo, 'Kh', ; ฆ THAI CHARACTER KHO RAKHANG UCP 0E07, Lo, 'Ng', ; ง THAI CHARACTER NGO NGU UCP 0E08, Lo, 'Ch', ; จ THAI CHARACTER CHO CHAN UCP 0E09, Lo, 'Ch', ; ฉ THAI CHARACTER CHO CHING UCP 0E0A, Lo, 'Ch', ; ช THAI CHARACTER CHO CHANG UCP 0E0B, Lo, 'S', ; ซ THAI CHARACTER SO SO UCP 0E0C, Lo, 'Ch', ; ฌ THAI CHARACTER CHO CHOE UCP 0E0D, Lo, 'Y', ; ญ THAI CHARACTER YO YING UCP 0E0E, Lo, 'D', ; ฎ THAI CHARACTER DO CHADA UCP 0E0F, Lo, 'T', ; ฏ THAI CHARACTER TO PATAK UCP 0E10, Lo, 'Th', ; ฐ THAI CHARACTER THO THAN UCP 0E11, Lo, 'Th', ; ฑ THAI CHARACTER THO NANGMONTHO UCP 0E12, Lo, 'Th', ; ฒ THAI CHARACTER THO PHUTHAO UCP 0E13, Lo, 'N', ; ณ THAI CHARACTER NO NEN UCP 0E14, Lo, 'D', ; ด THAI CHARACTER DO DEK UCP 0E15, Lo, 'T', ; ต THAI CHARACTER TO TAO UCP 0E16, Lo, 'Th', ; ถ THAI CHARACTER THO THUNG UCP 0E17, Lo, 'Th', ; ท THAI CHARACTER THO THAHAN UCP 0E18, Lo, 'Th', ; ธ THAI CHARACTER THO THONG UCP 0E19, Lo, 'N', ; น THAI CHARACTER NO NU UCP 0E1A, Lo, 'B', ; บ THAI CHARACTER BO BAIMAI UCP 0E1B, Lo, 'P', ; ป THAI CHARACTER PO PLA UCP 0E1C, Lo, 'Ph', ; ผ THAI CHARACTER PHO PHUNG UCP 0E1D, Lo, 'F', ; ฝ THAI CHARACTER FO FA UCP 0E1E, Lo, 'Ph', ; พ THAI CHARACTER PHO PHAN UCP 0E1F, Lo, 'F', ; ฟ THAI CHARACTER FO FAN UCP 0E20, Lo, 'Ph', ; ภ THAI CHARACTER PHO SAMPHAO UCP 0E21, Lo, 'M', ; ม THAI CHARACTER MO MA UCP 0E22, Lo, 'Y', ; ย THAI CHARACTER YO YAK UCP 0E23, Lo, 'R', ; ร THAI CHARACTER RO RUA UCP 0E24, Lo, 'R', ; ฤ THAI CHARACTER RU UCP 0E25, Lo, 'L', ; ล THAI CHARACTER LO LING UCP 0E26, Lo, 'L', ; ฦ THAI CHARACTER LU UCP 0E27, Lo, 'W', ; ว THAI CHARACTER WO WAEN UCP 0E28, Lo, 'S', ; ศ THAI CHARACTER SO SALA UCP 0E29, Lo, 'S', ; ษ THAI CHARACTER SO RUSI UCP 0E2A, Lo, 'S', ; ส THAI CHARACTER SO SUA UCP 0E2B, Lo, 'H', ; ห THAI CHARACTER HO HIP UCP 0E2C, Lo, 'L', ; ฬ THAI CHARACTER LO CHULA UCP 0E2D, Lo, 'O', ; อ THAI CHARACTER O ANG UCP 0E2E, Lo, 'H', ; ฮ THAI CHARACTER HO NOKHUK UCP 0E2F, Lo, 'A', ; ฯ THAI CHARACTER PAIYANNOI UCP 0E30, Lo, 'A', ; ะ THAI CHARACTER SARA A UCP 0E31, Mn, '', ; ั THAI CHARACTER MAI HAN-AKAT UCP 0E32, Lo, 'A', ; า THAI CHARACTER SARA AA UCP 0E33, Lo, 'A', ; ำ THAI CHARACTER SARA AM UCP 0E34, Mn, '', ; ิ THAI CHARACTER SARA I UCP 0E35, Mn, '', ; ี THAI CHARACTER SARA II UCP 0E36, Mn, '', ; ึ THAI CHARACTER SARA UE UCP 0E37, Mn, '', ; ื THAI CHARACTER SARA UEE UCP 0E38, Mn, '', ; ุ THAI CHARACTER SARA U UCP 0E39, Mn, '', ; ู THAI CHARACTER SARA UU UCP 0E3A, Mn, '', ; ฺ THAI CHARACTER PHINTHU UCP 0E3F, Sc, '$', ; ฿ THAI CURRENCY SYMBOL BAHT UCP 0E40, Lo, 'E', ; เ THAI CHARACTER SARA E UCP 0E41, Lo, 'AE', ; แ THAI CHARACTER SARA AE UCP 0E42, Lo, 'O', ; โ THAI CHARACTER SARA O UCP 0E43, Lo, 'I', ; ใ THAI CHARACTER SARA AI MAIMUAN UCP 0E44, Lo, 'I', ; ไ THAI CHARACTER SARA AI MAIMALAI UCP 0E45, Lo, 'A', ; ๅ THAI CHARACTER LAKKHANGYAO UCP 0E46, Lm, '`', ; ๆ THAI CHARACTER MAIYAMOK UCP 0E47, Mn, '', ; ็ THAI CHARACTER MAITAIKHU UCP 0E48, Mn, '', ; ่ THAI CHARACTER MAI EK UCP 0E49, Mn, '', ; ้ THAI CHARACTER MAI THO UCP 0E4A, Mn, '', ; ๊ THAI CHARACTER MAI TRI UCP 0E4B, Mn, '', ; ๋ THAI CHARACTER MAI CHATTAWA UCP 0E4C, Mn, '', ; ์ THAI CHARACTER THANTHAKHAT UCP 0E4D, Mn, '', ; ํ THAI CHARACTER NIKHAHIT UCP 0E4E, Mn, '', ; ๎ THAI CHARACTER YAMAKKAN UCP 0E4F, Po, '#', ; ๏ THAI CHARACTER FONGMAN UCP 0E50, Nd, '0', ; ๐ THAI DIGIT ZERO UCP 0E51, Nd, '1', ; ๑ THAI DIGIT ONE UCP 0E52, Nd, '2', ; ๒ THAI DIGIT TWO UCP 0E53, Nd, '3', ; ๓ THAI DIGIT THREE UCP 0E54, Nd, '4', ; ๔ THAI DIGIT FOUR UCP 0E55, Nd, '5', ; ๕ THAI DIGIT FIVE UCP 0E56, Nd, '6', ; ๖ THAI DIGIT SIX UCP 0E57, Nd, '7', ; ๗ THAI DIGIT SEVEN UCP 0E58, Nd, '8', ; ๘ THAI DIGIT EIGHT UCP 0E59, Nd, '9', ; ๙ THAI DIGIT NINE UCP 0E5A, Po, '|', ; ๚ THAI CHARACTER ANGKHANKHU UCP 0E5B, Po, '>>', ; ๛ THAI CHARACTER KHOMUT UCP 1403, Lo, 'I', ; ᐃ CANADIAN SYLLABICS I UCP 1404, Lo, 'Ii', ; ᐄ CANADIAN SYLLABICS II UCP 1405, Lo, 'O', ; ᐅ CANADIAN SYLLABICS O UCP 1406, Lo, 'Oo', ; ᐆ CANADIAN SYLLABICS OO UCP 140A, Lo, 'A', ; ᐊ CANADIAN SYLLABICS A UCP 140B, Lo, 'Aa', ; ᐋ CANADIAN SYLLABICS AA UCP 1431, Lo, 'Pi', ; ᐱ CANADIAN SYLLABICS PI UCP 1432, Lo, 'Pii', ; ᐲ CANADIAN SYLLABICS PII UCP 1433, Lo, 'Po', ; ᐳ CANADIAN SYLLABICS PO UCP 1434, Lo, 'Poo', ; ᐴ CANADIAN SYLLABICS POO UCP 1438, Lo, 'Pa', ; ᐸ CANADIAN SYLLABICS PA UCP 1439, Lo, 'Paa', ; ᐹ CANADIAN SYLLABICS PAA UCP 1449, Lo, 'P', ; ᑉ CANADIAN SYLLABICS P UCP 144E, Lo, 'Ti', ; ᑎ CANADIAN SYLLABICS TI UCP 144F, Lo, 'Tii', ; ᑏ CANADIAN SYLLABICS TII UCP 1450, Lo, 'To', ; ᑐ CANADIAN SYLLABICS TO UCP 1451, Lo, 'Too', ; ᑑ CANADIAN SYLLABICS TOO UCP 1455, Lo, 'Ta', ; ᑕ CANADIAN SYLLABICS TA UCP 1456, Lo, 'Taa', ; ᑖ CANADIAN SYLLABICS TAA UCP 1466, Lo, 'T', ; ᑦ CANADIAN SYLLABICS T UCP 146D, Lo, 'Ki', ; ᑭ CANADIAN SYLLABICS KI UCP 146E, Lo, 'Kii', ; ᑮ CANADIAN SYLLABICS KII UCP 146F, Lo, 'Ko', ; ᑯ CANADIAN SYLLABICS KO UCP 1470, Lo, 'Koo', ; ᑰ CANADIAN SYLLABICS KOO UCP 1472, Lo, 'Ka', ; ᑲ CANADIAN SYLLABICS KA UCP 1473, Lo, 'Kaa', ; ᑳ CANADIAN SYLLABICS KAA UCP 1483, Lo, 'K', ; ᒃ CANADIAN SYLLABICS K UCP 148B, Lo, 'Ci', ; ᒋ CANADIAN SYLLABICS CI UCP 148C, Lo, 'Cii', ; ᒌ CANADIAN SYLLABICS CII UCP 148D, Lo, 'Co', ; ᒍ CANADIAN SYLLABICS CO UCP 148E, Lo, 'Coo', ; ᒎ CANADIAN SYLLABICS COO UCP 1490, Lo, 'Ca', ; ᒐ CANADIAN SYLLABICS CA UCP 1491, Lo, 'Caa', ; ᒑ CANADIAN SYLLABICS CAA UCP 14A1, Lo, 'C', ; ᒡ CANADIAN SYLLABICS C UCP 14A5, Lo, 'Mi', ; ᒥ CANADIAN SYLLABICS MI UCP 14A6, Lo, 'Mii', ; ᒦ CANADIAN SYLLABICS MII UCP 14A7, Lo, 'Mo', ; ᒧ CANADIAN SYLLABICS MO UCP 14A8, Lo, 'Moo', ; ᒨ CANADIAN SYLLABICS MOO UCP 14AA, Lo, 'Ma', ; ᒪ CANADIAN SYLLABICS MA UCP 14AB, Lo, 'Maa', ; ᒫ CANADIAN SYLLABICS MAA UCP 14BB, Lo, 'M', ; ᒻ CANADIAN SYLLABICS M UCP 14C2, Lo, 'Ni', ; ᓂ CANADIAN SYLLABICS NI UCP 14C3, Lo, 'Nii', ; ᓃ CANADIAN SYLLABICS NII UCP 14C4, Lo, 'No', ; ᓄ CANADIAN SYLLABICS NO UCP 14C5, Lo, 'Noo', ; ᓅ CANADIAN SYLLABICS NOO UCP 14C7, Lo, 'Na', ; ᓇ CANADIAN SYLLABICS NA UCP 14C8, Lo, 'Naa', ; ᓈ CANADIAN SYLLABICS NAA UCP 14D0, Lo, 'N', ; ᓐ CANADIAN SYLLABICS N UCP 14D5, Lo, 'Li', ; ᓕ CANADIAN SYLLABICS LI UCP 14D6, Lo, 'Lii', ; ᓖ CANADIAN SYLLABICS LII UCP 14D7, Lo, 'Lo', ; ᓗ CANADIAN SYLLABICS LO UCP 14D8, Lo, 'Loo', ; ᓘ CANADIAN SYLLABICS LOO UCP 14DA, Lo, 'La', ; ᓚ CANADIAN SYLLABICS LA UCP 14DB, Lo, 'Laa', ; ᓛ CANADIAN SYLLABICS LAA UCP 14EA, Lo, 'L', ; ᓪ CANADIAN SYLLABICS L UCP 14EF, Lo, 'Si', ; ᓯ CANADIAN SYLLABICS SI UCP 14F0, Lo, 'Sii', ; ᓰ CANADIAN SYLLABICS SII UCP 14F1, Lo, 'So', ; ᓱ CANADIAN SYLLABICS SO UCP 14F2, Lo, 'Soo', ; ᓲ CANADIAN SYLLABICS SOO UCP 14F4, Lo, 'Sa', ; ᓴ CANADIAN SYLLABICS SA UCP 14F5, Lo, 'Saa', ; ᓵ CANADIAN SYLLABICS SAA UCP 1505, Lo, 'S', ; ᔅ CANADIAN SYLLABICS S UCP 1528, Lo, 'Yi', ; ᔨ CANADIAN SYLLABICS YI UCP 1529, Lo, 'Yii', ; ᔩ CANADIAN SYLLABICS YII UCP 152A, Lo, 'Yo', ; ᔪ CANADIAN SYLLABICS YO UCP 152B, Lo, 'Yoo', ; ᔫ CANADIAN SYLLABICS YOO UCP 152D, Lo, 'Ya', ; ᔭ CANADIAN SYLLABICS YA UCP 152E, Lo, 'Yaa', ; ᔮ CANADIAN SYLLABICS YAA UCP 153E, Lo, 'Y', ; ᔾ CANADIAN SYLLABICS Y UCP 1546, Lo, 'Ri', ; ᕆ CANADIAN SYLLABICS RI UCP 1547, Lo, 'Rii', ; ᕇ CANADIAN SYLLABICS RII UCP 1548, Lo, 'Ro', ; ᕈ CANADIAN SYLLABICS RO UCP 1549, Lo, 'Roo', ; ᕉ CANADIAN SYLLABICS ROO UCP 154B, Lo, 'Ra', ; ᕋ CANADIAN SYLLABICS RA UCP 154C, Lo, 'Raa', ; ᕌ CANADIAN SYLLABICS RAA UCP 1550, Lo, 'R', ; ᕐ CANADIAN SYLLABICS R UCP 1555, Lo, 'Fi', ; ᕕ CANADIAN SYLLABICS FI UCP 1556, Lo, 'Fii', ; ᕖ CANADIAN SYLLABICS FII UCP 1557, Lo, 'Fo', ; ᕗ CANADIAN SYLLABICS FO UCP 1558, Lo, 'Foo', ; ᕘ CANADIAN SYLLABICS FOO UCP 1559, Lo, 'Fa', ; ᕙ CANADIAN SYLLABICS FA UCP 155A, Lo, 'Faa', ; ᕚ CANADIAN SYLLABICS FAA UCP 155D, Lo, 'F', ; ᕝ CANADIAN SYLLABICS F UCP 157C, Lo, 'H', ; ᕼ CANADIAN SYLLABICS NUNAVUT H UCP 157F, Lo, 'Qi', ; ᕿ CANADIAN SYLLABICS QI UCP 1580, Lo, 'Qii', ; ᖀ CANADIAN SYLLABICS QII UCP 1581, Lo, 'Qo', ; ᖁ CANADIAN SYLLABICS QO UCP 1582, Lo, 'Qoo', ; ᖂ CANADIAN SYLLABICS QOO UCP 1583, Lo, 'Qa', ; ᖃ CANADIAN SYLLABICS QA UCP 1584, Lo, 'Qaa', ; ᖄ CANADIAN SYLLABICS QAA UCP 1585, Lo, 'Q', ; ᖅ CANADIAN SYLLABICS Q UCP 158F, Lo, 'Ngi', ; ᖏ CANADIAN SYLLABICS NGI UCP 1590, Lo, 'Ngii', ; ᖐ CANADIAN SYLLABICS NGII UCP 1591, Lo, 'Ngo', ; ᖑ CANADIAN SYLLABICS NGO UCP 1592, Lo, 'Ngoo', ; ᖒ CANADIAN SYLLABICS NGOO UCP 1593, Lo, 'Nga', ; ᖓ CANADIAN SYLLABICS NGA UCP 1594, Lo, 'Ngaa', ; ᖔ CANADIAN SYLLABICS NGAA UCP 1595, Lo, 'Ng', ; ᖕ CANADIAN SYLLABICS NG UCP 1596, Lo, 'Nng', ; ᖖ CANADIAN SYLLABICS NNG UCP 15A0, Lo, 'Lhi', ; ᖠ CANADIAN SYLLABICS LHI UCP 15A1, Lo, 'Lhii', ; ᖡ CANADIAN SYLLABICS LHII UCP 15A2, Lo, 'Lho', ; ᖢ CANADIAN SYLLABICS LHO UCP 15A3, Lo, 'Lhoo', ; ᖣ CANADIAN SYLLABICS LHOO UCP 15A4, Lo, 'Lha', ; ᖤ CANADIAN SYLLABICS LHA UCP 15A5, Lo, 'Lhaa', ; ᖥ CANADIAN SYLLABICS LHAA UCP 15A6, Lo, 'Lh', ; ᖦ CANADIAN SYLLABICS LH UCP 1671, Lo, 'Nngi', ; ᙱ CANADIAN SYLLABICS NNGI UCP 1672, Lo, 'Ngii', ; ᙲ CANADIAN SYLLABICS NNGII UCP 1673, Lo, 'Nngo', ; ᙳ CANADIAN SYLLABICS NNGO UCP 1674, Lo, 'Ngoo', ; ᙴ CANADIAN SYLLABICS NNGOO UCP 1675, Lo, 'Nnga', ; ᙵ CANADIAN SYLLABICS NNGA UCP 1676, Lo, 'Ngaa', ; ᙶ CANADIAN SYLLABICS NNGAA UCP 1E02, Lu, 'B', ; Ḃ LATIN CAPITAL LETTER B WITH DOT ABOVE UCP 1E03, Ll, 'b', ; ḃ LATIN SMALL LETTER B WITH DOT ABOVE UCP 1E0A, Lu, 'D', ; Ḋ LATIN CAPITAL LETTER D WITH DOT ABOVE UCP 1E0B, Ll, 'd', ; ḋ LATIN SMALL LETTER D WITH DOT ABOVE UCP 1E1E, Lu, 'F', ; Ḟ LATIN CAPITAL LETTER F WITH DOT ABOVE UCP 1E1F, Ll, 'f', ; ḟ LATIN SMALL LETTER F WITH DOT ABOVE UCP 1E40, Lu, 'M', ; Ṁ LATIN CAPITAL LETTER M WITH DOT ABOVE UCP 1E41, Ll, 'm', ; ṁ LATIN SMALL LETTER M WITH DOT ABOVE UCP 1E56, Lu, 'P', ; Ṗ LATIN CAPITAL LETTER P WITH DOT ABOVE UCP 1E57, Ll, 'p', ; ṗ LATIN SMALL LETTER P WITH DOT ABOVE UCP 1E60, Lu, 'S', ; Ṡ LATIN CAPITAL LETTER S WITH DOT ABOVE UCP 1E61, Ll, 's', ; ṡ LATIN SMALL LETTER S WITH DOT ABOVE UCP 1E6A, Lu, 'T', ; Ṫ LATIN CAPITAL LETTER T WITH DOT ABOVE UCP 1E6B, Ll, 't', ; ṫ LATIN SMALL LETTER T WITH DOT ABOVE UCP 1E80, Lu, 'W', ; Ẁ LATIN CAPITAL LETTER W WITH GRAVE UCP 1E81, Ll, 'w', ; ẁ LATIN SMALL LETTER W WITH GRAVE UCP 1E82, Lu, 'W', ; Ẃ LATIN CAPITAL LETTER W WITH ACUTE UCP 1E83, Ll, 'w', ; ẃ LATIN SMALL LETTER W WITH ACUTE UCP 1E84, Lu, 'W', ; Ẅ LATIN CAPITAL LETTER W WITH DIAERESIS UCP 1E85, Ll, 'w', ; ẅ LATIN SMALL LETTER W WITH DIAERESIS UCP 1E9B, Ll, 's', ; ẛ LATIN SMALL LETTER LONG S WITH DOT ABOVE UCP 1EF2, Lu, 'Y', ; Ỳ LATIN CAPITAL LETTER Y WITH GRAVE UCP 1EF3, Ll, 'y', ; ỳ LATIN SMALL LETTER Y WITH GRAVE UCP 2002, Zs, ' ', ensp ; EN SPACE UCP 2003, Zs, ' ', emsp ; EM SPACE UCP 2004, Zs, ' ', emsp13 ; THREE-PER-EM SPACE UCP 2005, Zs, ' ', emsp14 ; FOUR-PER-EM SPACE UCP 2007, Zs, ' ', numsp ; FIGURE SPACE UCP 2008, Zs, ' ', puncsp ; PUNCTUATION SPACE UCP 2009, Zs, ' ', thinsp ; THIN SPACE UCP 200A, Zs, ' ', hairsp ; HAIR SPACE UCP 200B, Cf, '', ; ZERO WIDTH SPACE UCP 200C, Cf, '', zwnj ; ZERO WIDTH NON-JOINER UCP 200D, Cf, '', zwj ; ZERO WIDTH JOINER UCP 200E, Cf, '', lrm ; LEFT-TO-RIGHT MARK UCP 200F, Cf, '', rlm ; RIGHT-TO-LEFT MARK UCP 2010, Cf, '', dash ; ‐ HYPHEN UCP 2013, Pd, '-', ndash ; – EN DASH UCP 2014, Pd, '-', mdash ; — EM DASH UCP 2015, Pd, '-', horbar ; ― HORIZONTAL BAR UCP 2016, Pd, '|', verbar ; ― DOUBLE VERTICAL LINE UCP 2017, Po, '_', ; ‗ DOUBLE LOW LINE UCP 2018, Pi, "'", lsquo ; ‘ LEFT SINGLE QUOTATION MARK UCP 2019, Pf, "'", rsquo ; ’ RIGHT SINGLE QUOTATION MARK UCP 201A, Ps, "'", sbquo ; ‚ SINGLE LOW-9 QUOTATION MARK UCP 201C, Pi, '"', ldquo ; “ LEFT DOUBLE QUOTATION MARK UCP 201D, Pf, '"', rdquo ; ” RIGHT DOUBLE QUOTATION MARK UCP 201E, Ps, '"', bdquo ; „ DOUBLE LOW-9 QUOTATION MARK UCP 2020, Po, '+', dagger ; † DAGGER UCP 2021, Po, '+', Dagger ; ‡ DOUBLE DAGGER UCP 2022, Po, '.', bull ; • BULLET UCP 2025, Po, '..', nldr ; ‥ TWO DOT LEADER UCP 2026, Po, '...', hellip ; … HORIZONTAL ELLIPSIS UCP 202A, Cf, '', ; LEFT-TO-RIGHT EMBEDDING UCP 202B, Cf, '', ; RIGHT-TO-LEFT EMBEDDING UCP 202C, Cf, '', ; POP DIRECTIONAL FORMATTING UCP 202D, Cf, '', ; LEFT-TO-RIGHT OVERRIDE UCP 202E, Cf, '', ; RIGHT-TO-LEFT OVERRIDE UCP 2030, Po, '%', permil ; ‰ PER MILLE SIGN UCP 2032, Po, "'", prime ; ′ PRIME UCP 2033, Po, '"', Prime ; ″ DOUBLE PRIME UCP 2034, Po, '"', tprime ; ‴ TRIPLE PRIME UCP 2035, Po, '`', bprime ; ‵ REVERSED PRIME UCP 2039, Pi, '<', lsaquo ; ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK UCP 203A, Pf, '>', rsaquo ; › SINGLE RIGHT-POINTING ANGLE QUOTATION MARK UCP 203E, Po, '_', oline ; ‾ OVERLINE UCP 2041, So, '^', caret ; ⁁ CARET INSERTION POINT UCP 2043, So, '.', hybull ; ⁃ HYPHEN BULLET UCP 2044, Sm, '/', frasl ; ⁄ FRACTION SLASH UCP 204A, Po, '@', ; ⁊ TIRONIAN SIGN ET UCP 204F, Po, ';', bsemi ; ⁏ REVERSED SEMICOLON UCP 2060, Cf, '', nobreak ; WORD JOINER UCP 2063, Cf, '', ic ; INVISIBLE SEPARATOR UCP 207F, Lm, '`', ; ⁿ SUPERSCRIPT LATIN SMALL LETTER N UCP 20A7, Sc, '$', ; ₧ PESETA SIGN UCP 20AA, Sc, '$', ; ₪ NEW SHEQEL SIGN UCP 20AB, Sc, '$', ; ₫ DONG SIGN UCP 20AC, Sc, '$', euro ; € EURO SIGN UCP 20AF, Sc, '$', ; ₯ DRACHMA SIGN UCP 2105, So, '%', incare ; ℅ CARE OF UCP 2113, Ll, 'l', ell ; ℓ SCRIPT SMALL L UCP 2116, So, 'N', numero ; № NUMERO SIGN UCP 2122, So, '(TM)',trade ; ™ TRADE MARK SIGN UCP 2126, Lu, 'O', ohm ; Ω OHM SIGN UCP 2190, Sm, '<', larr ; ← LEFTWARDS ARROW UCP 2191, Sm, '^', uarr ; ↑ UPWARDS ARROW UCP 2192, Sm, '>', rarr ; → RIGHTWARDS ARROW UCP 2193, Sm, 'v', darr ; ↓ DOWNWARDS ARROW UCP 2194, Sm, '-', harr ; ↔ LEFT RIGHT ARROW UCP 2195, So, '|', varr ; ↕ UP DOWN ARROW UCP 21B5, So, '<', crarr ; ↵ DOWNWARDS ARROW WITH CORNER LEFTWARDS UCP 2200, Sm, 'v', forall ; ∀ FOR ALL UCP 2202, Sm, 'd', part ; ∂ PARTIAL DIFFERENTIAL UCP 2203, Sm, 'E', exist ; ∃ THERE EXIST UCP 2205, Sm, '/', empty ; ∅ EMPTY SET UCP 2206, Sm, '#', ; ∆ INCREMENT UCP 2207, Sm, '.', nabla ; ∇ NABLA UCP 2208, Sm, 'E', isin ; ∈ ELEMENT OF UCP 2209, Sm, '/', notin ; ∉ NOT AN ELEMENT OF UCP 220B, Sm, 'E', ni ; ∋ CONTAINS AS MEMBER UCP 220F, Sm, '#', prod ; ∏ N-ARY PRODUCT UCP 2211, Sm, '#', sum ; ∑ N-ARY SUMMATION UCP 2212, Sm, '-', minus ; − MINUS SIGN UCP 2217, Sm, '*', lowast ; ∗ ASTERISK OPERATOR UCP 2219, Sm, '.', ; ∙ BULLET OPERATOR UCP 221A, Sm, '#', radic ; √ SQUARE ROOT UCP 221D, Sm, 'o', prop ; ∝ PROPORTIONAL TO UCP 221E, Sm, '#', infin ; ∞ INFINITY UCP 2220, Sm, '<', ang ; ∠ ANGLE UCP 2227, Sm, '&', and ; ∧ LOGICAL AND UCP 2228, Sm, '|', or ; ∨ LOGICAL OR UCP 2229, Sm, '#', cap ; ∩ INTERSECTION UCP 222A, Sm, 'U', cup ; ∪ UNION UCP 222B, Sm, '/', int ; ∫ INTEGRAL UCP 2234, Sm, '.', there4 ; ∴ THEREFORE UCP 223C, Sm, '~', sim ; ∼ TILDE OPERATOR UCP 2245, Sm, '~', cong ; ≅ APPROXIMATELY EQUAL TO UCP 2248, Sm, '=', asymp ; ≈ ALMOST EQUAL TO UCP 2260, Sm, '=', ne ; ≠ NOT EQUAL TO UCP 2261, Sm, '=', equiv ; ≡ IDENTICAL TO UCP 2264, Sm, '<', le ; ≤ LESS-THAN OR EQUAL TO UCP 2265, Sm, '>', ge ; ≥ GREATER-THAN OR EQUAL TO UCP 2282, Sm, '<', sub ; ⊂ SUBSET OF UCP 2283, Sm, '>', sup ; ⊃ SUPERSET OF UCP 2284, Sm, '/', nsub ; ⊄ NOT SUBSET OF UCP 2286, Sm, '<=', sube ; ⊆ SUBSET OR EQUAL TO UCP 2287, Sm, '=>', supe ; ⊇ SUPERSET OR EQUAL TO UCP 2295, Sm, '+', oplus ; ⊕ CIRCLED PLUS UCP 2296, Sm, '-', ominus ; ⊖ CIRCLED MINUS UCP 2297, Sm, '.', otimes ; ⊗ CIRCLED TIMES UCP 22A5, Sm, '_', perp ; ⊥ UP TACK UCP 22C5, Sm, '.', sdot ; ⋅ DOT OPERATOR UCP 2308, Sm, '|', lceil ; ⌈ LEFT CEILING UCP 2309, Sm, '|', rceil ; ⌉ RIGHT CEILING UCP 230A, Sm, '|', lfloor ; ⌊ LEFT FLOOR UCP 230B, Sm, '|', rfloor ; ⌋ RIGHT FLOOR UCP 2310, So, '^', bnot ; ⌐ REVERSED NOT SIGN UCP 2320, Sm, '/', ; ⌠ TOP HALF INTEGRAL UCP 2321, Sm, '/', ; ⌡ BOTTOM HALF INTEGRAL UCP 2500, So, '-', boxh ; ─ BOX DRAWINGS LIGHT HORIZONTAL UCP 2502, So, '|', boxv ; │ BOX DRAWINGS LIGHT VERTICAL UCP 250C, So, '+', boxdr ; ┌ BOX DRAWINGS LIGHT DOWN AND RIGHT UCP 2510, So, '+', boxdl ; ┐ BOX DRAWINGS LIGHT DOWN AND LEFT UCP 2514, So, '+', boxur ; └ BOX DRAWINGS LIGHT UP AND RIGHT UCP 2518, So, '+', boxul ; ┘ BOX DRAWINGS LIGHT UP AND LEFT UCP 251C, So, '+', boxvr ; ├ BOX DRAWINGS LIGHT VERTICAL AND RIGHT UCP 2524, So, '+', boxvl ; ┤ BOX DRAWINGS LIGHT VERTICAL AND LEFT UCP 252C, So, '+', boxhd ; ┬ BOX DRAWINGS LIGHT DOWN AND HORIZONTAL UCP 2534, So, '+', boxhu ; ┴ BOX DRAWINGS LIGHT UP AND HORIZONTAL UCP 253C, So, '+', boxvh ; ┼ BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL UCP 2550, So, '-', Boxh ; ═ BOX DRAWINGS DOUBLE HORIZONTAL UCP 2551, So, '|', Boxv ; ║ BOX DRAWINGS DOUBLE VERTICAL UCP 2552, So, '+', boxdr ; ╒ BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE UCP 2553, So, '+', boxdr ; ╓ BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE UCP 2554, So, '+', Boxdr ; ╔ BOX DRAWINGS DOUBLE DOWN AND RIGHT UCP 2555, So, '+', boxdl ; ╕ BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE UCP 2556, So, '+', Boxdl ; ╖ BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE UCP 2557, So, '+', Boxdl ; ╗ BOX DRAWINGS DOUBLE DOWN AND LEFT UCP 2558, So, '+', boxur ; ╘ BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE UCP 2559, So, '+', boxur ; ╙ BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE UCP 255A, So, '+', Boxur ; ╚ BOX DRAWINGS DOUBLE UP AND RIGHT UCP 255B, So, '+', boxul ; ╛ BOX DRAWINGS UP SINGLE AND LEFT DOUBLE UCP 255C, So, '+', boxul ; ╜ BOX DRAWINGS UP DOUBLE AND LEFT SINGLE UCP 255D, So, '+', Boxul ; ╝ BOX DRAWINGS DOUBLE UP AND LEFT UCP 255E, So, '+', boxvr ; ╞ BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE UCP 255F, So, '+', boxvr ; ╟ BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE UCP 2560, So, '+', Boxvr ; ╠ BOX DRAWINGS DOUBLE VERTICAL AND RIGHT UCP 2561, So, '+', boxvl ; ╡ BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE UCP 2562, So, '+', boxvl ; ╢ BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE UCP 2563, So, '+', Boxvl ; ╣ BOX DRAWINGS DOUBLE VERTICAL AND LEFT UCP 2564, So, '+', boxhd ; ╤ BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE UCP 2565, So, '+', boxhd ; ╥ BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE UCP 2566, So, '+', Boxhd ; ╦ BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL UCP 2567, So, '+', boxhu ; ╧ BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE UCP 2568, So, '+', boxhu ; ╨ BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE UCP 2569, So, '+', Boxhu ; ╩ BOX DRAWINGS DOUBLE UP AND HORIZONTAL UCP 256A, So, '+', boxvh ; ╪ BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE UCP 256B, So, '+', boxvh ; ╫ BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE UCP 256C, So, '+', Boxvh ; ╬ BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL UCP 2580, So, '*', uhblk ; ▀ UPPER HALF BLOCK UCP 2584, So, '*', lhblk ; ▄ LOWER HALF BLOCK UCP 2588, So, '*', block ; █ FULL BLOCK UCP 258C, So, '*', ; ▌ LEFT HALF BLOCK UCP 2590, So, '*', ; ▐ RIGHT HALF BLOCK UCP 2591, So, '*', blk14 ; ░ LIGHT SHADE UCP 2592, So, '*', blk12 ; ▒ MEDIUM SHADE UCP 2593, So, '*', blk34 ; ▓ DARK SHADE UCP 25A0, So, '*', ; ■ BLACK SQUARE UCP 25A1, So, '*', square ; □ WHITE SQUARE UCP 25AA, So, '*', squarf ; □ BLACK SMALL SQUARE UCP 25CA, So, '#', loz ; ◊ LOZENGE UCP 2618, So, '*', ; ☘ SHAMROCK UCP 2640, So, '*', female ; ♀ FEMALE SIGN UCP 2642, So, '*', male ; ♂ MALE SIGN UCP 2660, So, '*', spades ; ≠ BLACK SPADE SUIT UCP 2663, So, '*', clubs ; ♣ BLACK CLUB SUIT UCP 2665, So, '*', hearts ; ♥ BLACK HEART SUIT UCP 2666, So, '*', diams ; ♦ BLACK DIAMOND SUIT UCP 274A, So, '*', ; ❊ EIGHT TEARDROP-SPOKED PROPELLER ASTERISK UCP F8FF, Co, '*', ; Private Use, Last UCP FB01, Ll, 'fi', ; fi LATIN SMALL LIGATURE FI UCP FB02, Ll, 'fl', ; fl LATIN SMALL LIGATURE FL UCP FB2A, Lo, 'Sh', ; שׁ HEBREW LETTER SHIN WITH SHIN DOT UCP FB2B, Lo, 'Sh', ; שׂ HEBREW LETTER SHIN WITH SIN DOT UCP FB35, Lo, 'V', ; וּ HEBREW LETTER VAV WITH DAGESH UCP FB4B, Lo, 'V', ; וֹ HEBREW LETTER VAV WITH HOLAM UCP FB56, Lo, 'P', ; ﭖ ARABIC LETTER PEH ISOLATED FORM UCP FB58, Lo, 'P', ; ﭘ ARABIC LETTER PEH INITIAL FORM UCP FB66, Lo, 'T', ; ﭦ ARABIC LETTER TTEH ISOLATED FORM UCP FB68, Lo, 'T', ; ﭨ ARABIC LETTER TTEH INITIAL FORM UCP FB7A, Lo, 'Ch', ; ﭺ ARABIC LETTER TCHEH ISOLATED FORM UCP FB7C, Lo, 'Ch', ; ﭼ ARABIC LETTER TCHEH INITIAL FORM UCP FB84, Lo, 'D', ; ﮄ ARABIC LETTER DAHAL ISOLATED FORM UCP FB88, Lo, 'D', ; ﮈ ARABIC LETTER DDAL ISOLATED FORM UCP FB8A, Lo, 'J', ; ﮊ ARABIC LETTER JEH ISOLATED FORM UCP FB8C, Lo, 'R', ; ﮌ ARABIC LETTER RREH ISOLATED FORM UCP FB8E, Lo, 'K', ; ﮎ ARABIC LETTER KEHEH ISOLATED FORM UCP FB92, Lo, 'G', ; ﮒ ARABIC LETTER GAF ISOLATED FORM UCP FB94, Lo, 'G', ; ﮔ ARABIC LETTER GAF INITIAL FORM UCP FB9E, Lo, 'N', ; ﮞ ARABIC LETTER NOON GHUNNA ISOLATED FORM UCP FBA6, Lo, 'H', ; ﮦ ARABIC LETTER HEH GOAL ISOLATED FORM UCP FBA8, Lo, 'H', ; ﮨ ARABIC LETTER HEH GOAL INITIAL FORM UCP FBA9, Lo, 'H', ; ﮩ ARABIC LETTER HEH GOAL MEDIAL FORM UCP FBAA, Lo, 'H', ; ﮪ ARABIC LETTER HEH DOACHASHMEE ISOLATED FORM UCP FBAE, Lo, 'Ye', ; ﮮ ARABIC LETTER YEH BARREE ISOLATED FORM UCP FBB0, Lo, 'Ye', ; ﮰ ARABIC LETTER YEH BARREE WITH HAMZA ABOVE ISOLATED FORM UCP FBFC, Lo, 'Ye', ; ﯼ ARABIC LETTER FARSI YEH ISOLATED FORM UCP FBFD, Lo, 'Ye', ; ﯽ ARABIC LETTER FARSI YEH FINAL FORM UCP FBFE, Lo, 'Ye', ; ﯾ ARABIC LETTER FARSI YEH INITIAL FORM UCP FE7C, Lo, 'Sh', ; ﹼ ARABIC SHADDA ISOLATED FORM UCP FE7D, Lo, '`', ; ﹽ ARABIC SHADDA MEDIAL FORM UCP FE80, Lo, "'", ; ﺀ ARABIC LETTER HAMZA ISOLATED FORM UCP FE81, Lo, 'A', ; ﺁ ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM UCP FE82, Lo, 'A', ; ﺂ ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM UCP FE83, Lo, 'A', ; ﺃ ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM UCP FE84, Lo, 'A', ; ﺄ ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM UCP FE85, Lo, 'W', ; ﺅ ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM UCP FE89, Lo, 'Ye', ; ﺉ ARABIC LETTER YEH WITH HAMZA ABOVE ISOLATED FORM UCP FE8A, Lo, 'Ye', ; ﺊ ARABIC LETTER YEH WITH HAMZA ABOVE FINAL FORM UCP FE8B, Lo, 'Y', ; ﺋ ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM UCP FE8D, Lo, 'A', ; ﺍ ARABIC LETTER ALEF ISOLATED FORM UCP FE8E, Lo, 'A', ; ﺎ ARABIC LETTER ALEF FINAL FORM UCP FE8F, Lo, 'B', ; ﺏ ARABIC LETTER BEH ISOLATED FORM UCP FE91, Lo, 'B', ; ﺑ ARABIC LETTER BEH INITIAL FORM UCP FE93, Lo, 'T', ; ﺓ ARABIC LETTER TEH MARBUTA ISOLATED FORM UCP FE95, Lo, 'T', ; ﺕ ARABIC LETTER TEH ISOLATED FORM UCP FE97, Lo, 'T', ; ﺗ ARABIC LETTER TEH INITIAL FORM UCP FE99, Lo, 'Th', ; ﺙ ARABIC LETTER THEH ISOLATED FORM UCP FE9B, Lo, 'Th', ; ﺛ ARABIC LETTER THEH INITIAL FORM UCP FE9D, Lo, 'J', ; ﺝ ARABIC LETTER JEEM ISOLATED FORM UCP FE9F, Lo, 'J', ; ﺟ ARABIC LETTER JEEM INITIAL FORM UCP FEA1, Lo, 'H', ; ﺡ ARABIC LETTER HAH ISOLATED FORM UCP FEA3, Lo, 'H', ; ﺣ ARABIC LETTER HAH INITIAL FORM UCP FEA5, Lo, 'Kh', ; ﺥ ARABIC LETTER KHAH ISOLATED FORM UCP FEA7, Lo, 'Kh', ; ﺧ ARABIC LETTER KHAH INITIAL FORM UCP FEA9, Lo, 'D', ; ﺩ ARABIC LETTER DAL ISOLATED FORM UCP FEAB, Lo, 'Dh', ; ﺫ ARABIC LETTER THAL ISOLATED FORM UCP FEAD, Lo, 'R', ; ﺭ ARABIC LETTER REH ISOLATED FORM UCP FEAF, Lo, 'Z', ; ﺯ ARABIC LETTER ZAIN ISOLATED FORM UCP FEB1, Lo, 'S', ; ﺱ ARABIC LETTER SEEN ISOLATED FORM UCP FEB3, Lo, 'S', ; ﺳ ARABIC LETTER SEEN INITIAL FORM UCP FEB5, Lo, 'Sh', ; ﺵ ARABIC LETTER SHEEN ISOLATED FORM UCP FEB7, Lo, 'Sh', ; ﺷ ARABIC LETTER SHEEN INITIAL FORM UCP FEB9, Lo, 'S', ; ﺹ ARABIC LETTER SAD ISOLATED FORM UCP FEBB, Lo, 'S', ; ﺻ ARABIC LETTER SAD INITIAL FORM UCP FEBD, Lo, 'D', ; ﺽ ARABIC LETTER DAD ISOLATED FORM UCP FEBF, Lo, 'D', ; ﺿ ARABIC LETTER DAD INITIAL FORM UCP FEC1, Lo, 'T', ; ﻁ ARABIC LETTER TAH ISOLATED FORM UCP FEC3, Lo, 'T', ; ﻃ ARABIC LETTER TAH INITIAL FORM UCP FEC5, Lo, 'Z', ; ﻅ ARABIC LETTER ZAH ISOLATED FORM UCP FEC7, Lo, 'Z', ; ﻇ ARABIC LETTER ZAH INITIAL FORM UCP FEC9, Lo, "'", ; ﻉ ARABIC LETTER AIN ISOLATED FORM UCP FECA, Lo, "'", ; ﻊ ARABIC LETTER AIN FINAL FORM UCP FECB, Lo, "'", ; ﻋ ARABIC LETTER AIN INITIAL FORM UCP FECC, Lo, "'", ; ﻌ ARABIC LETTER AIN MEDIAL FORM UCP FECD, Lo, 'Gh', ; ﻍ ARABIC LETTER GHAIN ISOLATED FORM UCP FECE, Lo, 'Gh', ; ﻎ ARABIC LETTER GHAIN FINAL FORM UCP FECF, Lo, 'Gh', ; ﻏ ARABIC LETTER GHAIN INITIAL FORM UCP FED0, Lo, 'Gh', ; ﻐ ARABIC LETTER GHAIN MEDIAL FORM UCP FED1, Lo, 'F', ; ﻑ ARABIC LETTER FEH ISOLATED FORM UCP FED3, Lo, 'F', ; ﻓ ARABIC LETTER FEH INITIAL FORM UCP FED5, Lo, 'Q', ; ﻕ ARABIC LETTER QAF ISOLATED FORM UCP FED7, Lo, 'Q', ; ﻗ ARABIC LETTER QAF INITIAL FORM UCP FED9, Lo, 'K', ; ﻙ ARABIC LETTER KAF ISOLATED FORM UCP FEDB, Lo, 'K', ; ﻛ ARABIC LETTER KAF INITIAL FORM UCP FEDD, Lo, 'L', ; ﻝ ARABIC LETTER LAM ISOLATED FORM UCP FEDF, Lo, 'L', ; ﻟ ARABIC LETTER LAM INITIAL FORM UCP FEE0, Lo, 'L', ; ﻠ ARABIC LETTER LAM MEDIAL FORM UCP FEE1, Lo, 'M', ; ﻡ ARABIC LETTER MEEM ISOLATED FORM UCP FEE3, Lo, 'M', ; ﻣ ARABIC LETTER MEEM INITIAL FORM UCP FEE5, Lo, 'N', ; ﻥ ARABIC LETTER NOON ISOLATED FORM UCP FEE7, Lo, 'N', ; ﻧ ARABIC LETTER NOON INITIAL FORM UCP FEE9, Lo, 'H', ; ﻩ ARABIC LETTER HEH ISOLATED FORM UCP FEEB, Lo, 'H', ; ﻫ ARABIC LETTER HEH INITIAL FORM UCP FEEC, Lo, 'H', ; ﻬ ARABIC LETTER HEH MEDIAL FORM UCP FEED, Lo, 'W', ; ﻭ ARABIC LETTER WAW ISOLATED FORM UCP FEEF, Lo, 'A', ; ﻯ ARABIC LETTER ALEF MAKSURA ISOLATED FORM UCP FEF0, Lo, 'A', ; ﻰ ARABIC LETTER ALEF MAKSURA FINAL FORM UCP FEF1, Lo, 'Y', ; ﻱ ARABIC LETTER YEH ISOLATED FORM UCP FEF2, Lo, 'Y', ; ﻲ ARABIC LETTER YEH FINAL FORM UCP FEF3, Lo, 'Y', ; ﻳ ARABIC LETTER YEH INITIAL FORM UCP FEF5, Lo, 'LA', ; ﻵ ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM UCP FEF6, Lo, 'LA', ; ﻶ ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM UCP FEF7, Lo, 'LA', ; ﻷ ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM UCP FEF8, Lo, 'LA', ; ﻸ ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM UCP FEFB, Lo, 'LA', ; ﻻ ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM UCP FEFC, Lo, 'LA', ; ﻼ ARABIC LIGATURE LAM WITH ALEF FINAL FORM UCP FEFF, Bm, '', ; ZERO WIDTH NO-BREAK SPACE UCP FFFD, ??, '?', ; ? Replacement of malformed input character. UCP FFFE, ??, '', ; Character is undefined in output encoding. UCP FFFF, ??, '' , ; Not a valid Unicode character
"ISO-8859-2"
.
"Latin 2 (Central European)"
.
https://en.wikipedia.org/wiki/Windows-1250
.
CP %MACRO CPid, CPname, CPrem, CPurl, CPtt [CPid] DW %CPid [CPinfo] CPname%CPid: DB %CPname,0 [CPname] DW CPname%CPid: - SECTION# [CPinfo] [CPinfo] CPrem%CPid: DB %CPrem,0 [CPrem] DW CPrem%CPid: - SECTION# [CPinfo] [CPinfo] CPurl%CPid: DB "%CPurl",0 [CPurl] DW CPurl%CPid - SECTION# [CPinfo] %TTlength %SETA %# - 4 %IF %TTlength = 0 || %TTlength = 128 %IF %TTlength = 128 [CPtt] CPtt%CPid: tt %FOR %*{1+4..128+4} DW 0x%tt %ENDFOR tt [CPtable] DW CPtt%CPid: - SECTION# [CPtt] %ELSE ; %TTlength=0 in ASCII or Unicode encodings. [CPtable] DW -1 ; This value signalises no translation table. %ENDIF %ELSE %ERROR Invalid CP %CPid %CPname table (%TTlength instead of 128). %ENDIF %ENDMACRO CP
; Plain ASCII 7bit encoding (American Standard Code for Information Interchange). CP 20127,"ASCII","7-bit encoding",https://en.wikipedia.org/wiki/ASCII ; ; Unicode encodings (Unicode Transformation Format) CP 65001,"UTF-8","AKA IBM1208",https://en.wikipedia.org/wiki/UTF-8 CP 1200,"UTF-16LE","UCS-2LE (used in MS Windows)",https://en.wikipedia.org/wiki/UTF-16 CP 1201,"UTF-16BE","UCS-2BE",https://en.wikipedia.org/wiki/UTF-16 CP 12000,"UTF-32LE","UCS-4LE",https://en.wikipedia.org/wiki/UTF-32 CP 12001,"UTF-32BE","UCS-4BE",https://en.wikipedia.org/wiki/UTF-32 ; ; OEM (Original Equipment Manufacturer) and ANSI (American National Standards Institute) 8bit encodings. CP 437,"IBM437","OEM-US",\ https://en.wikipedia.org/wiki/Code_page_437,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00A2,00A3,00A5,20A7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 667,"Mazovia","AKA CP790, AKA CP991 (Polish)",\ https://en.wikipedia.org/wiki/Mazovia_encoding,\ 00C7,00FC,00E9,00E2,00E4,00E0,0105,00E7,00EA,00EB,00E8,00EF,00EE,0107,00C4,0104,\ 80..8F 0118,0119,0142,00F4,00F6,0106,00FB,00F9,015A,00D6,00DC,00A2,0141,00A5,015B,0192,\ 90..9F 0179,017B,00F3,00D3,0144,0143,017A,017C,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 737,"IBM737","(Greek)",\ https://en.wikipedia.org/wiki/Code_page_737,\ 0391,0392,0393,0394,0395,0396,0397,0398,0399,039A,039B,039C,039D,039E,039F,03A0,\ 80..8F 03A1,03A3,03A4,03A5,03A6,03A7,03A8,03A9,03B1,03B2,03B3,03B4,03B5,03B6,03B7,03B8,\ 90..9F 03B9,03BA,03BB,03BC,03BD,03BE,03BF,03C0,03C1,03C3,03C2,03C4,03C5,03C6,03C7,03C8,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03C9,03AC,03AD,03AE,03CA,03AF,03CC,03CD,03CB,03CE,0386,0388,0389,038A,038C,038E,\ E0..EF 038F,00B1,2265,2264,03AA,03AB,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 775,"IBM775","(Baltic Rim)",\ https://en.wikipedia.org/wiki/Code_page_775,\ 0106,00FC,00E9,0101,00E4,0123,00E5,0107,0142,0113,0156,0157,012B,0179,00C4,00C5,\ 80..8F 00C9,00E6,00C6,014D,00F6,0122,00A2,015A,015B,00D6,00DC,00F8,00A3,00D8,00D7,00A4,\ 90..9F 0100,012A,00F3,017B,017C,017A,201D,00A6,00A9,00AE,00AC,00BD,00BC,0141,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,0104,010C,0118,0116,2563,2551,2557,255D,012E,0160,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,0172,016A,255A,2554,2569,2566,2560,2550,256C,017D,\ C0..CF 0105,010D,0119,0117,012F,0161,0173,016B,017E,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 00D3,00DF,014C,0143,00F5,00D5,00B5,0144,0136,0137,013B,013C,0146,0112,0145,2019,\ E0..EF 00AD,00B1,201C,00BE,00B6,00A7,00F7,201E,00B0,2219,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF ; CP 850,"IBM850","DOS-Latin1 (Western European)",\ https://en.wikipedia.org/wiki/Code_page_850,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,00D7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00AE,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF 00F0,00D0,00CA,00CB,00C8,0131,00CD,00CE,00CF,2518,250C,2588,2584,00A6,00CC,2580,\ D0..DF 00D3,00DF,00D4,00D2,00F5,00D5,00B5,00FE,00DE,00DA,00DB,00D9,00FD,00DD,00AF,00B4,\ E0..EF 00AD,00B1,2017,00BE,00B6,00A7,00F7,00B8,00B0,00A8,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF ; CP 851,"IBM851","DOS-Greek-1",\ https://en.wikipedia.org/wiki/Code_page_851,\ 00C7,00FC,00E9,00E2,00E4,00E0,0386,00E7,00EA,00EB,00E8,00EF,00EE,0388,00C4,0389,\ 80..8F 038A,FFFE,038C,00F4,00F6,038E,00FB,00F9,038F,00D6,00DC,03AC,00A3,03AD,03AE,03AF,\ 90..9F 03CA,0390,03CC,03CD,0391,0392,0393,0394,0395,0396,0397,00BD,0398,0399,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,039A,039B,039C,039D,2563,2551,2557,255D,039E,039F,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,03A0,03A1,255A,2554,2569,2566,2560,2550,256C,03A3,\ C0..CF 03A4,03A5,03A6,03A7,03A8,03A9,03B1,03B2,03B3,2518,250C,2588,2584,03B4,03B5,2580,\ D0..DF 03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,03C0,03C1,03C3,03C2,03C4,0384,\ E0..EF 00AD,00B1,03C5,03C6,03C7,00A7,03C8,0385,00B0,00A8,03C9,03CB,03B0,03CE,25A0,00A0,\ F0..FF ; CP 852,"IBM852","DOS-Latin2 (Central European)",\ https://en.wikipedia.org/wiki/Code_page_852,\ 00C7,00FC,00E9,00E2,00E4,016F,0107,00E7,0142,00EB,0150,0151,00EE,0179,00C4,0106,\ 80..8F 00C9,0139,013A,00F4,00F6,013D,013E,015A,015B,00D6,00DC,0164,0165,0141,00D7,010D,\ 90..9F 00E1,00ED,00F3,00FA,0104,0105,017D,017E,0118,0119,00AC,017A,010C,015F,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,00C1,00C2,011A,015E,2563,2551,2557,255D,017B,017C,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,0102,0103,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF 0111,0110,010E,00CB,010F,0147,00CD,00CE,011B,2518,250C,2588,2584,0162,016E,2580,\ D0..DF 00D3,00DF,00D4,0143,0144,0148,0160,0161,0154,00DA,0155,0170,00FD,00DD,0163,00B4,\ E0..EF 00AD,02DD,02DB,02C7,02D8,00A7,00F7,00B8,00B0,00A8,02D9,0171,0158,0159,25A0,00A0,\ F0..FF ; CP 853,"IBM853","(Turkish, Maltese, Esperanto)",\ https://en.wikipedia.org/wiki/Code_page_853,\ 00C7,00FC,00E9,00E2,00E4,00E0,0109,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,0108,\ 80..8F 00C9,010B,010A,00F4,00F6,00F2,00FB,00F9,0130,00D6,00DC,011D,00A3,011C,00D7,0135,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,011E,011F,0124,0125,FFFE,00BD,0134,015F,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,00C1,00C2,00C0,015E,2563,2551,2557,255D,017B,017C,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,015C,015D,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF FFFE,FFFE,00CA,00CB,00C8,0131,00CD,00CE,00CF,2518,250C,2588,2584,FFFE,00CC,2580,\ D0..DF 00D3,00DF,00D4,00D2,0120,0121,00B5,0126,0127,00DA,00DB,00D9,016C,016D,00B7,00B4,\ E0..EF 00AD,FFFE,2113,0149,02D8,00A7,00F7,00B8,00B0,00A8,02D9,FFFE,00B3,00B2,25A0,00A0,\ F0..FF ; CP 855,"IBM855","(Cyrillic, Serbian, Macedonian, Bulgarian)",\ https://en.wikipedia.org/wiki/Code_page_855,\ 0452,0402,0453,0403,0451,0401,0454,0404,0455,0405,0456,0406,0457,0407,0458,0408,\ 80..8F 0459,0409,045A,040A,045B,040B,045C,040C,045E,040E,045F,040F,044E,042E,044A,042A,\ 90..9F 0430,0410,0431,0411,0446,0426,0434,0414,0435,0415,0444,0424,0433,0413,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,0445,0425,0438,0418,2563,2551,2557,255D,0439,0419,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,043A,041A,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF 043B,041B,043C,041C,043D,041D,043E,041E,043F,2518,250C,2588,2584,041F,044F,2580,\ D0..DF 042F,0440,0420,0441,0421,0442,0422,0443,0423,0436,0416,0432,0412,044C,042C,2116,\ E0..EF 00AD,044B,042B,0437,0417,0448,0428,044D,042D,0449,0429,0447,0427,00A7,25A0,00A0,\ F0..FF ; CP 856,"IBM856","(Hebrew)",\ https://en.wikipedia.org/wiki/Code_page_856,\ 05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ 80..8F 05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,FFFE,00A3,FFFE,00D7,20AA,\ 90..9F 200E,200F,202A,202B,202D,202E,202C,FFFE,FFFE,00AE,00AC,00BD,00BC,20AC,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,FFFE,FFFE,FFFE,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,FFFE,FFFE,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,2518,250C,2588,FFFE,00A6,2590,2580,\ D0..DF FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,00AF,00B4,\ E0..EF 00AD,00B1,2017,00BE,00B6,00A7,00F7,00B8,00B0,00A8,2022,00B9,00B3,00B2,25A0,00A0,\ F0..FF ; CP 857,"IBM857","(Turkish)",\ https://en.wikipedia.org/wiki/Code_page_857,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,0131,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,0130,00D6,00DC,00F8,00A3,00D8,015E,015F,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,011E,011F,00BF,00AE,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF 00BA,00AA,00CA,00CB,00C8,20AC,00CD,00CE,00CF,2518,250C,2588,2584,00A6,00CC,2580,\ D0..DF 00D3,00DF,00D4,00D2,00F5,00D5,00B5,FFFE,00D7,00DA,00DB,00D9,00EC,00FF,00AF,00B4,\ E0..EF 00AD,00B1,FFFE,00BE,00B6,00A7,00F7,00B8,00B0,00A8,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF ; CP 858,"IBM858","(Western European)",\ https://en.wikipedia.org/wiki/Code_page_858,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,00D7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00AE,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF 00F0,00D0,00CA,00CB,00C8,20AC,00CD,00CE,00CF,2518,250C,2588,2584,00A6,00CC,2580,\ D0..DF 00D3,00DF,00D4,00D2,00F5,00D5,00B5,00FE,00DE,00DA,00DB,00D9,00FD,00DD,00AF,00B4,\ E0..EF 00AD,00B1,2017,00BE,00B6,00A7,00F7,00B8,00B0,00A8,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF ; CP 859,"IBM859","Latin 9 (Western European)",\ https://en.wikipedia.org/wiki/Code_page_859,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,00D7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00AE,00AC,0153,0152,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,00C1,00C2,00C0,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,00E3,00C3,255A,2554,2569,2566,2560,2550,256C,00A4,\ C0..CF 00F0,00D0,00CA,00CB,00C8,20AC,00CD,00CE,00CF,2518,250C,2588,2584,0160,00CC,2580,\ D0..DF 00D3,00DF,00D4,00D2,00F5,00D5,00B5,00FE,00DE,00DA,00DB,00D9,00FD,00DD,00AF,017D,\ E0..EF 00AD,00B1,FFFE,0178,00B6,00A7,00F7,017E,00B0,0161,00B7,00B9,00B3,00B2,25A0,00A0,\ F0..FF ; CP 860,"IBM860","(Portuguese)",\ https://en.wikipedia.org/wiki/Code_page_860,\ 00C7,00FC,00E9,00E2,00E3,00E0,00C1,00E7,00EA,00CA,00E8,00CD,00D4,00EC,00C3,00C2,\ 80..8F 00C9,00C0,00C8,00F4,00F5,00F2,00DA,00F9,00CC,00D5,00DC,00A2,00A3,00D9,20A7,00D3,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,00D2,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 861,"IBM861","(Icelandic)",\ https://en.wikipedia.org/wiki/Code_page_861,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00D0,00F0,00DE,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00FE,00FB,00DD,00FD,00D6,00DC,00F8,00A3,00D8,20A7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00C1,00CD,00D3,00DA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 862, "IBM862","(Hebrew)",\ https://en.wikipedia.org/wiki/Code_page_862,\ 05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ 80..8F 05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,00A2,00A3,00A5,20A7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 863,"IBM863","(French Canadian)",\ https://en.wikipedia.org/wiki/Code_page_863,\ 00C7,00FC,00E9,00E2,00C2,00E0,00B6,00E7,00EA,00EB,00E8,00EF,00EE,2017,00C0,00A7,\ 80..8F 00C9,00C8,00CA,00F4,00CB,00CF,00FB,00F9,00A4,00D4,00DC,00A2,00A3,00D9,00DB,0192,\ 90..9F 00A6,00B4,00F3,00FA,00A8,00B8,00B3,00AF,00CE,2310,00AC,00BD,00BC,00BE,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 864,"IBM864","(Arabic)",\ https://en.wikipedia.org/wiki/Code_page_864,\ 00B0,00B7,2219,221A,2592,2500,2502,253C,2524,252C,251C,2534,2510,250C,2514,2518,\ 80.8F 03B2,221E,03C6,00B1,00BD,00BC,2248,00AB,00BB,FEF7,FEF8,FFFE,FFFE,FEFB,FEFC,FFFE,\ 90..9F 00A0,00AD,FE82,00A3,00A4,FE84,FFFE,20AC,FE8E,FE8F,FE95,FE99,060C,FE9D,FEA1,FEA5,\ A0..AF 0660,0661,0662,0663,0664,0665,0666,0667,0668,0669,FED1,061B,FEB1,FEB5,FEB9,061F,\ B0..BF 00A2,FE80,FE81,FE83,FE85,FECA,FE8B,FE8D,FE91,FE93,FE97,FE9B,FE9F,FEA3,FEA7,FEA9,\ C0..CF FEAB,FEAD,FEAF,FEB3,FEB7,FEBB,FEBF,FEC1,FEC5,FECB,FECF,00A6,00AC,00F7,00D7,FEC9,\ D0..DF 0640,FED3,FED7,FEDB,FEDF,FEE3,FEE7,FEEB,FEED,FEEF,FEF3,FEBD,FECC,FECE,FECD,FEE1,\ E0..EF FE7D,0651,FEE5,FEE9,FEEC,FEF0,FEF2,FED0,FED5,FEF5,FEF6,FEDD,FED9,FEF1,25A0,FFFE,\ F0..FF ; CP 865,"IBM865","(Nordic)",\ https://en.wikipedia.org/wiki/Code_page_865,\ 00C7,00FC,00E9,00E2,00E4,00E0,00E5,00E7,00EA,00EB,00E8,00EF,00EE,00EC,00C4,00C5,\ 80..8F 00C9,00E6,00C6,00F4,00F6,00F2,00FB,00F9,00FF,00D6,00DC,00F8,00A3,00D8,20A7,0192,\ 90..9F 00E1,00ED,00F3,00FA,00F1,00D1,00AA,00BA,00BF,2310,00AC,00BD,00BC,00A1,00AB,00A4,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 866,"IBM866","AKA CP1125 (Cyrillic Russian)",\ https://en.wikipedia.org/wiki/Code_page_866,\ 0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ 80..8F 0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ 90..9F 0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ E0..EF 0401,0451,0404,0454,0407,0457,040E,045E,00B0,2219,00B7,221A,2116,00A4,25A0,00A0,\ F0..FF ; CP 867,"IBM867","(Hebrew)",\ https://en.wikipedia.org/wiki/Code_page_867,\ 05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ 80..8F 05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,00A2,00A3,00A5,FFFE,20AA,\ 90..9F 200E,200F,202A,202B,202D,202E,202C,FFFE,FFFE,2310,00AC,00BD,00BC,20AC,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 868,"IBM868","(Urdu)",\ https://en.wikipedia.org/wiki/Code_page_868,\ 06F0,06F1,06F2,06F3,06F4,06F5,06F6,06F7,06F8,06F9,060C,061B,061F,FE81,FE8D,FE8E,\ 80..8F FFFF,FE8F,FE91,FB56,FB58,FE93,FE95,FE97,FB66,FB68,FE99,FE9B,FE9D,FE9F,FB7A,FB7C,\ 90..9F FEA1,FEA3,FEA5,FEA7,FEA9,FB88,FEAB,FEAD,FB8C,FEAF,FB8A,FEB1,FEB3,FEB5,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,FEB7,FEB9,FEBB,FEBD,2563,2551,2557,255D,FEBF,FEC3,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,FEC7,FEC9,255A,2554,2569,2566,2560,2550,256C,FECA,\ C0..CF FECB,FECC,FECD,FECE,FECF,FED0,FED1,FED3,FED5,2518,250C,2588,2584,FED7,FB8E,2580,\ D0..DF FEDB,FB92,FB94,FEDD,FEDF,FEE0,FEE1,FEE3,FB9E,FEE5,FEE7,FE85,FEED,FBA6,FBA8,FBA9,\ E0..EF 00AD,FBAA,FE80,FE89,FE8A,FE8B,FBFC,FBFD,FBFE,FBB0,FBAE,FE7C,FE7D,FFFE,25A0,00A0,\ F0..FF ; CP 869,"IBM869","(Greek 2, Modern)",\ https://en.wikipedia.org/wiki/Code_page_869,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,0386,20AC,00B7,00AC,00A6,2018,2019,0388,2015,0389,\ 80..8F 038A,03AA,038C,FFFE,FFFE,038E,03AB,00A9,038F,00B2,00B3,03AC,00A3,03AD,03AE,03AF,\ 90..9F 03CA,0390,03CC,03CD,0391,0392,0393,0394,0395,0396,0397,00BD,0398,0399,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,039A,039B,039C,039D,2563,2551,2557,255D,039E,039F,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,03A0,03A1,255A,2554,2569,2566,2560,2550,256C,03A3,\ C0..CF 03A4,03A5,03A6,03A7,03A8,03A9,03B1,03B2,03B3,2518,250C,2588,2584,03B4,03B5,2580,\ D0..DF 03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,03C0,03C1,03C3,03C2,03C4,0384,\ E0..EF 00AD,00B1,03C5,03C6,03C7,00A7,03C8,0385,00B0,00A8,03C9,03CB,03B0,03CE,25A0,00A0,\ F0..FF ; CP 874,"IBM874","AKA ISO-8859-11, TIS-620 (Thai)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-11#Code_page_874,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0E01,0E02,0E03,0E04,0E05,0E06,0E07,0E08,0E09,0E0A,0E0B,0E0C,0E0D,0E0E,0E0F,\ A0..AF 0E10,0E11,0E12,0E13,0E14,0E15,0E16,0E17,0E18,0E19,0E1A,0E1B,0E1C,0E1D,0E1E,0E1F,\ B0..BF 0E20,0E21,0E22,0E23,0E24,0E25,0E26,0E27,0E28,0E29,0E2A,0E2B,0E2C,0E2D,0E2E,0E2F,\ C0..CF 0E30,0E31,0E32,0E33,0E34,0E35,0E36,0E37,0E38,0E39,0E3A,0E49,0E4A,0E4B,0E4C,0E3F,\ D0..DF 0E40,0E41,0E42,0E43,0E44,0E45,0E46,0E47,0E48,0E49,0E4A,0E4B,0E4C,0E4D,0E4E,0E4F,\ E0..EF 0E50,0E51,0E52,0E53,0E54,0E55,0E56,0E57,0E58,0E59,0E5A,0E5B,00A2,00AC,00A6,00A0,\ F0..FF ; CP 878,"KOI8-R","AKA IBM878 AKA Windows-20866 (Cyrillic, Russian, Bulgarian)",\ https://en.wikipedia.org/wiki/KOI8-R,\ 2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F 2591,2592,2593,2320,25A0,2219,221A,2248,2264,2265,00A0,2321,00B0,00B2,00B7,00F7,\ 90..9F 2550,2551,2552,0451,2553,2554,2555,2556,2557,2558,2559,255A,255B,255C,255D,255E,\ A0..AF 255F,2560,2561,0401,2562,2563,2564,2565,2566,2567,2568,2569,256A,256B,256C,00A9,\ B0..BF 044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF 043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF 042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF 041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF ; CP 880,"KOI8-E","ISO-IR-111 (Belarusian, Macedonian, Serbian, Ukrainian)",\ https://en.wikipedia.org/wiki/ISO-IR-111,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0452,0453,0451,0454,0455,0456,0457,0458,0459,045A,045B,045C,00AD,045E,045F,\ A0..AF 2116,0402,0403,0401,0404,0405,0406,0407,0408,0409,040A,040B,040C,00A4,040E,040F,\ B0..BF 044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF 043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF 042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF 041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF ; CP 882,"KOI8-T","(Tajik)",\ https://en.wikipedia.org/wiki/KOI8-T,\ 049B,0493,201A,0492,201E,2026,2020,2021,FFFE,2030,04B3,2039,04B2,04B7,04B6,FFFE,\ 80..8F 049A,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,FFFE,203A,FFFE,FFFE,FFFE,FFFE,\ 90..9F FFFE,04EF,04EE,0451,00A4,04E2,00A6,00A7,FFFE,FFFE,FFFE,00AB,00AC,00AD,00AE,FFFE,\ A0..F 00B0,00B1,00B2,0401,FFFE,04E1,00B6,00B7,FFFE,2122,FFFE,00BB,FFFE,FFFE,FFFE,00A9,\ B0..BF 0444,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF 043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF 0424,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF 041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF ; CP 884,"KOI8-F","Fingertip SW (Cyrilic)",\ https://en.wikipedia.org/wiki/KOI8-F,\ 2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F 2591,2018,2019,201C,201D,2022,2013,2014,00A9,2122,00A0,00BB,00AE,00AB,00B7,00A4,\ 90..9F 00A0,0452,0453,0451,0454,0455,0456,0457,0458,0459,045A,045B,045C,045D,045E,045F,\ A0..AF 2116,0402,0403,0401,0404,0405,0406,0407,0408,0409,040A,040B,040C,040D,040E,040F,\ B0..BF 044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF 043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF 042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF 041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF ; CP 885,"KOI8-CS","CSN 369103 (Czech, Slovak)",\ https://cs.wikipedia.org/wiki/KOI#KOI8-CS,\ 0411,0412,0413,00A7,00DF,0414,0401,0416,0417,0418,0419,041A,FFFE,FFFE,041B,041C,\ 80..8F 041D,041F,0420,0422,0423,0424,2588,2584,2580,0426,0427,0428,0429,042A,042B,042C,\ 90..9F 042D,042E,042F,0431,0432,0433,0434,0451,0436,0437,0438,0439,043A,00A7,043B,043C,\ A0..AF 043D,043F,0440,0442,0443,0444,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ B0..BF 250C,00E1,2514,010D,010F,011B,0155,2500,00FC,00ED,016F,013A,013E,00F6,0148,00F3,\ C0..CF 00F4,00E4,0159,0161,0165,00FA,251C,00E9,00E0,00FD,017E,252C,2567,258C,2590,253C,\ D0..DF 2510,00C1,2518,010C,010E,011A,0154,2502,00DC,00CD,016E,0139,013D,00D6,0147,00D3,\ E0..EF 00D4,00C4,0158,0160,0164,00DA,2524,00C9,00C0,00DD,017D,2534,207F,00B7,25A0,00A0,\ F0..FF ; CP 895,"Kamenicky","AKA DOS-895 AKA KEYBCS2 (Czech, Slovak)",\ https://en.wikipedia.org/wiki/Kamenick%C3%BD_encoding,\ 010C,00FC,00E9,010F,00E4,010E,0164,010D,011B,011A,0139,00CD,013E,013A,00C4,00C1,\ 80..8F 00C9,017E,017D,00F4,00F6,00D3,016F,00DA,00FD,00D6,00DC,0160,013D,00DD,0158,0165,\ 90..9F 00E1,00ED,00F3,00FA,0148,0147,016E,00D4,0161,0159,0155,0154,00BC,00A7,00AB,00BB,\ A0..AF 2591,2592,2593,2502,2524,2561,2562,2556,2555,2563,2551,2557,255D,255C,255B,2510,\ B0..BF 2514,2534,252C,251C,2500,253C,255E,255F,255A,2554,2569,2566,2560,2550,256C,2567,\ C0..CF 2568,2564,2565,2559,2558,2552,2553,256B,256A,2518,250C,2588,2584,258C,2590,2580,\ D0..DF 03B1,00DF,0393,03C0,03A3,03C3,00B5,03C4,03A6,0398,03A9,03B4,221E,03C6,03B5,2229,\ E0..EF 2261,00B1,2265,2264,2320,2321,00F7,2248,00B0,2219,00B7,221A,207F,00B2,25A0,00A0,\ F0..FF ; CP 912,"IBM912","(Central European)",\ https://en.wikipedia.org/wiki/Code_page_912,\ 2591,2592,2593,2502,2524,2518,250C,2588,00A9,2563,2551,2557,255D,00A2,00A5,2510,\ 80..8F 2514,2534,252C,251C,2500,253C,2584,2580,255A,2554,2569,2566,2560,2550,256C,00AE,\ 90..9F 00A0,0104,02D8,0141,00A4,013D,015A,00A7,00A8,0160,015E,0164,0179,00AD,017D,017B,\ A0..AF 00B0,0105,02DB,0142,00B4,013E,015B,02C7,00B8,0161,015F,0165,017A,02DD,017E,017C,\ B0..BF 0154,00C1,00C2,0102,00C4,0139,0106,00C7,010C,00C9,0118,00CB,011A,00CD,00CE,010E,\ C0..CF 0110,0143,0147,00D3,00D4,0150,00D6,00D7,0158,016E,00DA,0170,00DC,00DD,0162,00DF,\ D0..DF 0155,00E1,00E2,0103,00E4,013A,0107,00E7,010D,00E9,0119,00EB,011B,00ED,00EE,010F,\ E0..EF 0111,0144,0148,00F3,00F4,0151,00F6,00F7,0159,016F,00FA,0171,00FC,00FD,0163,02D9,\ F0..FF ; CP 1006,"IBM1006","(Arabic)",\ https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/CP1006.TXT,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,06F0,06F1,06F2,06F3,06F4,06F5,06F6,06F7,06F8,06F9,060C,061B,00AD,061F,FE81,\ A0..AF FE8D,FE8E,FE8E,FE8F,FE91,FB56,FB58,FE93,FE95,FE97,FB66,FB68,FE99,FE9B,FE9D,FE9F,\ B0..BF FB7A,FB7C,FEA1,FEA3,FEA5,FEA7,FEA9,FB84,FEAB,FEAD,FB8C,FEAF,FB8A,FEB1,FEB3,FEB5,\ C0..CF FEB7,FEB9,FEBB,FEBD,FEBF,FEC1,FEC5,FEC9,FECA,FECB,FECC,FECD,FECE,FECF,FED0,FED1,\ D0..DF FED3,FED5,FED7,FED9,FEDB,FB92,FB94,FEDD,FEDF,FEE0,FEE1,FEE3,FB9E,FEE5,FEE7,FE85,\ E0..EF FEED,FBA6,FBA8,FBA9,FBAA,FE80,FE89,FE8A,FE8B,FEF1,FEF2,FEF3,FBB0,FBAE,FE7C,FE7D,\ F0..FF ; CP 1167,"KOI8-RU","IBM1167 (Cyrillic, Russian, Ukrainian, Belarusian)",\ https://en.wikipedia.org/wiki/KOI8-RU,\ 2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F 2591,2592,2593,201C,25A0,2219,201D,2014,2116,2122,00A0,00BB,00AE,00AB,00B7,00A4,\ 90..9F 2550,2551,2552,0451,0454,2554,0456,0457,2557,2558,2559,255A,255B,0491,045D,255E,\ A0..AF 255F,2560,2561,0401,0404,2563,0406,0407,2566,2567,2568,2569,256A,0490,040D,00A9,\ B0..BF 044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF 043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF 042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF 041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF ; CP 1168,"KOI8-U","IBM1168 (Cyrillic, Ukrainian)",\ https://en.wikipedia.org/wiki/KOI8-U,\ 2500,2502,250C,2510,2514,2518,251C,2524,252C,2534,253C,2580,2584,2588,258C,2590,\ 80..8F 2591,2592,2593,2320,25A0,2219,221A,2248,2264,2265,00A0,2321,00B0,00B2,00B7,00F7,\ 90..9F 2550,2551,2552,0451,0454,2554,0456,0457,2557,2558,2559,255A,255B,0491,255D,255E,\ A0..AF 255F,2560,2561,0401,0404,2563,0406,0407,2566,2567,2568,2569,256A,0490,256C,00A9,\ B0..BF 044E,0430,0431,0446,0434,0435,0444,0433,0445,0438,0439,043A,043B,043C,043D,043E,\ C0..CF 043F,044F,0440,0441,0442,0443,0436,0432,044C,044B,0437,0448,044D,0449,0447,044A,\ D0..DF 042E,0410,0411,0426,0414,0415,0424,0413,0425,0418,0419,041A,041B,041C,041D,041E,\ E0..EF 041F,042F,0420,0421,0422,0423,0416,0412,042C,042B,0417,0428,042D,0429,0427,042A,\ F0..FF ; CP 1250,"Windows-1250","(Central European)",\ https://en.wikipedia.org/wiki/Windows-1250,\ 20AC,FFFE,201A,FFFE,201E,2026,2020,2021,FFFE,2030,0160,2039,015A,0164,017D,0179,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,0161,203A,015B,0165,017E,017A,\ 90..9F 00A0,02C7,02D8,0141,00A4,0104,00A6,00A7,00A8,00A9,015E,00AB,00AC,00AD,00AE,017B,\ A0..AF 00B0,00B1,02DB,0142,00B4,00B5,00B6,00B7,00B8,0105,015F,00BB,013D,02DD,013E,017C,\ B0..BF 0154,00C1,00C2,0102,00C4,0139,0106,00C7,010C,00C9,0118,00CB,011A,00CD,00CE,010E,\ C0..CF 0110,0143,0147,00D3,00D4,0150,00D6,00D7,0158,016E,00DA,0170,00DC,00DD,0162,00DF,\ D0..DF 0155,00E1,00E2,0103,00E4,013A,0107,00E7,010D,00E9,0119,00EB,011B,00ED,00EE,010F,\ E0..EF 0111,0144,0148,00F3,00F4,0151,00F6,00F7,0159,016F,00FA,0171,00FC,00FD,0163,02D9,\ F0..FF ; CP 1251,"Windows-1251","(Cyrillic)",\ https://en.wikipedia.org/wiki/Windows-1251,\ 0402,0403,201A,0453,201E,2026,2020,2021,20AC,2030,0409,2039,040A,040C,040B,040F,\ 80..8F 0452,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,0459,203A,045A,045C,045B,045F,\ 90..9F 00A0,040E,045E,0408,00A4,0490,00A6,00A7,0401,00A9,0404,00AB,00AC,00AD,00AE,0407,\ A0..AF 00B0,00B1,0406,0456,0491,00B5,00B6,00B7,0451,2116,0454,00BB,0458,0405,0455,0457,\ B0..BF 0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ C0..CF 0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ D0..DF 0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ E0..EF 0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ F0..FF ; CP 1252,"Windows-1252","ISO-8859-1, Latin 1 (Western European)",\ https://en.wikipedia.org/wiki/Windows-1252,\ 20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,0160,2039,0152,FFFE,017D,FFFE,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,0161,203A,0153,FFFE,017E,0178,\ 90..9F 00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF 00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF 00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF 00F0,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,00FD,00FE,00FF,\ F0..FF ; CP 1253,"Windows-1253","(Greek Modern)",\ https://en.wikipedia.org/wiki/Windows-1253,\ 20AC,FFFE,201A,0192,201E,2026,2020,2021,FFFE,2030,FFFE,2039,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,FFFE,203A,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0385,0386,00A3,00A4,00A5,00A6,00A7,00A8,00A9,FFFE,00AB,00AC,00AD,00AE,2015,\ A0..AF 00B0,00B1,00B2,00B3,0384,00B5,00B6,00B7,0388,0389,038A,00BB,038C,00BD,038E,038F,\ B0..BF 0390,0391,0392,0393,0394,0395,0396,0397,0398,0399,039A,039B,039C,039D,039E,039F,\ C0..CF 03A0,03A1,FFFE,03A3,03A4,03A5,03A6,03A7,03A8,03A9,03AA,03AB,03AC,03AD,03AE,03AF,\ D0..DF 03B0,03B1,03B2,03B3,03B4,03B5,03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,\ E0..EF 03C0,03C1,03C2,03C3,03C4,03C5,03C6,03C7,03C8,03C9,03CA,03CB,03CC,03CD,03CE,FFFE,\ F0..FF ; CP 1254,"Windows-1254","(Turkish)",\ https://en.wikipedia.org/wiki/Windows-1254,\ 20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,0160,2039,0152,FFFE,FFFE,FFFE,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,0161,203A,0153,FFFE,FFFE,0178,\ 90..9F 00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF 00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 011E,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,0130,015E,00DF,\ D0..DF 00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..ED 011F,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,0131,015F,00FF,\ F0..FF ; CP 1255,"Windows-1255","(Hebrew)",\ https://en.wikipedia.org/wiki/Windows-1255,\ 20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,FFFE,2039,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,FFFE,203A,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,00A1,00A2,00A3,20AA,00A5,00A6,00A7,00A8,00A9,00D7,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00F7,00BB,00BC,00BD,00BE,00BF,\ B0..BF 05B0,05B1,05B2,05B3,05B4,05B5,05B6,05B7,05B8,05B9,05BA,05BB,05BC,05BD,05BE,05BF,\ C0..CF 05C0,05C1,05C2,05C3,05F0,05F1,05F2,05F3,05F4,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ D0..CF 05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ E0..EF 05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,FFFE,FFFE,200E,200F,FFFE,\ F0..FF ; CP 1256,"Windows-1256","(Arabic)",\ https://en.wikipedia.org/wiki/Windows-1256,\ 20AC,067E,201A,0192,201E,2026,2020,2021,02C6,2030,0679,2039,0152,0686,0698,0688,\ 80..8F 06AF,2018,2019,201C,201D,2022,2013,2014,06A9,2122,0691,203A,0153,200C,200D,06BA,\ 90..9F 00A0,060C,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,06BE,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,061B,00BB,00BC,00BD,00BE,061F,\ B0..BF 06C1,0621,0622,0623,0624,0625,0626,0627,0628,0629,062A,062B,062C,062D,062E,062F,\ C0..CF 0630,0631,0632,0633,0634,0635,0636,00D7,0637,0638,0639,063A,0640,0641,0642,0643,\ D0..DF 00E0,0644,00E2,0645,0646,0647,0648,00E7,00E8,00E9,00EA,00EB,0649,064A,00EE,00EF,\ E0..EF 064B,064C,064D,064E,00F4,064F,0650,00F7,0651,00F9,0652,00FB,00FC,200E,200F,06D2,\ F0..FF ; CP 1257,"Windows-1257","(Baltic)",\ https://en.wikipedia.org/wiki/Windows-1257,\ 20AC,FFFE,201A,FFFE,201E,2026,2020,2021,FFFE,2030,FFFE,2039,FFFE,00A8,02C7,00B8,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,FFFE,2122,FFFE,203A,FFFE,00AF,02DB,FFFE,\ 90..9F 00A0,FFFE,00A2,00A3,00A4,FFFE,00A6,00A7,00D8,00A9,0156,00AB,00AC,00AD,00AE,00C6,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00F8,00B9,0157,00BB,00BC,00BD,00BE,00E6,\ B0..BF 0104,012E,0100,0106,00C4,00C5,0118,0112,010C,00C9,0179,0116,0122,0136,012A,013B,\ C0..CF 0160,0143,0145,00D3,014C,00D5,00D6,00D7,0172,0141,015A,016A,00DC,017B,017D,00DF,\ D0..DF 0105,012F,0101,0107,00E4,00E5,0119,0113,010D,00E9,017A,0117,0123,0137,012B,013C,\ E0..EF 0161,0144,0146,00F3,014D,00F5,00F6,00F7,0173,0142,015B,016B,00FC,017C,017E,02D9,\ F0..FF ; CP 1258,"Windows-1258","(Vietnamese)",\ https://en.wikipedia.org/wiki/Windows-1258,\ 20AC,FFFE,201A,0192,201E,2026,2020,2021,02C6,2030,FFFE,2039,0152,FFFE,FFFE,FFFE,\ 80..8F FFFE,2018,2019,201C,201D,2022,2013,2014,02DC,2122,FFFE,203A,0153,FFFE,FFFE,0178,\ 90..9F 00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF 00C0,00C1,00C2,0102,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,0300,00CD,00CE,00CF,\ C0..CF 0110,00D1,0309,00D3,00D4,01A0,00D6,00D7,00D8,00D9,00DA,00DB,00DC,01AF,0303,00DF,\ D0..DF 00E0,00E1,00E2,0103,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,0301,00ED,00EE,00EF,\ E0..EF 0111,00F1,0323,00F3,00F4,01A1,00F6,00F7,00F8,00F9,00FA,00FB,00FC,01B0,20AB,00FF,\ F0..FF ; CP 10000,"Mac-Roman","Macintosh Roman (Western European)",\ https://en.wikipedia.org/wiki/Mac_OS_Roman,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF 221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,03A9,00E6,00F8,\ B0..BF 00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,20AC,2039,203A,FB01,FB02,\ D0..DF 2021,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF F8FF,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0..FF ; CP 10004,"Mac-Arabic","Macintosh Arabic",\ https://en.wikipedia.org/wiki/MacArabic_encoding,\ 00C4,00A0,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,06BA,00AB,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,2026,00EE,00EF,00F1,00F3,00BB,00F4,00F6,00F7,00FA,00F9,00FB,00FC,\ 90..9F 0020,0021,0022,0023,0024,066A,0026,0027,0028,0029,002A,002B,060C,002D,002E,002F,\ A0..AF 0660,0661,0662,0663,0664,0665,0666,0667,0668,0669,003A,061B,003C,003D,003E,061F,\ B0..BF 274A,0621,0622,0623,0624,0625,0626,0627,0628,0629,062A,062B,062C,062D,062E,062F,\ C0..CF 0630,0631,0632,0633,0634,0635,0636,0637,0638,0639,063A,005B,005C,005D,005E,005F,\ D0..DF 0640,0641,0642,0643,0644,0645,0646,0647,0648,0649,064A,064B,064C,064D,064E,064F,\ E0..EF 0650,0651,0652,067E,0679,0686,06D5,06A4,06AF,0688,0691,007B,007C,007D,0698,06D2,\ F0..FF ; CP 10005,"Mac-Hebrew","Macintosh Hebrew",\ http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/HEBREW.TXT,\ 00C4,05F2,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 0020,0021,0022,0023,0024,0025,20AA,0027,0029,0028,002A,002B,002C,002D,002E,002F,\ A0..AF 0030,0031,0032,0033,0034,0035,0036,0037,0038,0039,003A,003B,003C,003D,003E,003F,\ B0..BF FFFF,201E,FFFF,FFFF,FFFF,FFFF,05BC,FB4B,FB35,2026,00A0,05B8,05B7,05B5,05B6,05B4,\ C0..CF 2013,2014,201C,201D,2018,2019,FB2A,FB2B,05BF,05B0,05B2,05B1,05BB,05B9,05B8,05B3,\ D0..DF 05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ E0..EF 05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,007D,005D,007B,005B,007C,\ F0..FF ; CP 10006,"Mac-Greek","Macintosh Greek",\ https://en.wikipedia.org/wiki/MacGreek_encoding,\ 00C4,00B9,00B2,00C9,00B3,00D6,00DC,0385,00E0,00E2,00E4,0384,00A8,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00A3,2122,00EE,00EF,2022,00BD,2030,00F4,00F6,00A6,20AC,00F9,00FB,00FC,\ 90..9F 2020,0393,0394,0398,039B,039E,03A0,00DF,00AE,00A9,03A3,03AA,00A7,2260,00B0,00B7,\ A0..AF 0391,00B1,2264,2265,00A5,0392,0395,0396,0397,0399,039A,039C,03A6,03AB,03A8,03A9,\ B0..BF 03AC,039D,00AC,039F,03A1,2248,03A4,00AB,00BB,2026,00A0,03A5,03A7,0386,0388,0153,\ C0..CF 2013,2015,201C,201D,2018,2019,00F7,0389,038A,038C,038E,03AD,03AE,03AF,03CC,038F,\ D0..DF 03CD,03B1,03B2,03C8,03B4,03B5,03C6,03B3,03B7,03B9,03BE,03BA,03BB,03BC,03BD,03BF,\ E0..EF 03C0,03CE,03C1,03C3,03C4,03B8,03C9,03C2,03C7,03C5,03B6,03CA,03CB,0390,03B0,00AD,\ F0..FF ; CP 10007,"Mac-Cyrillic","Macintosh Cyrillic",\ https://en.wikipedia.org/wiki/Mac_OS_Cyrillic_encoding,\ 0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ 80..8F 0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ 90..9F 2020,00B0,0490,00A3,00A7,2022,00B6,0406,00AE,00A9,2122,0402,0452,2260,0403,0453,\ A0..AF 221E,00B1,2264,2265,0456,00B5,0491,0408,0404,0454,0407,0457,0409,0459,040A,045A,\ B0..BF 0458,0405,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,040B,045B,040C,045C,0455,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,201E,040E,045E,040F,045F,2116,0401,0451,044F,\ D0..DF 0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ E0..EF 0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,20AC,\ F0..FF ; CP 10010,"Mac-Romanian","Macintosh Romanian",\ https://en.wikipedia.org/wiki/Mac_OS_Romanian_encoding,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,0102,015E,\ A0..AF 221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,2126,0103,015F,\ B0..BF 00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,00A4,2039,203A,0162,0163,\ D0..DF 2021,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF FFFE,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0..FF ; CP 10017,"Mac-Ukrainian","Macintosh Ukrainian",\ https://en.wikipedia.org/wiki/Mac_OS_Ukrainian_encoding,\ 0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ 80..8F 0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ 90..9F 2020,00B0,0490,00A3,00A7,2022,00B6,0406,00AE,00A9,2122,0402,0452,2260,0403,0453,\ A0..AF 221E,00B1,2264,2265,0456,00B5,0491,0408,0404,0454,0407,0457,0409,0459,040A,045A,\ B0..BF 0458,0405,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,040B,045B,040C,045C,0455,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,201E,040E,045E,040F,045F,2116,0401,0451,044F,\ D0..DF 0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ E0..EF 0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,00A4,\ F0..FF ; CP 10021,"Mac-Thai","Macintosh Thai",\ https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/THAI.TXT,\ 00AB,00BB,2026,0E48,0E49,0E4A,0E4B,0E4C,0E48,0E49,0E4A,0E4B,0E4C,201C,201D,0E4D,\ 80..8F FFFE,2022,0E31,0E47,0E34,0E35,0E36,0E37,0E48,0E49,0E4A,0E4B,0E4C,2018,2019,FFFE,\ 90..9F 00A0,0E01,0E02,0E03,0E04,0E05,0E06,0E07,0E08,0E09,0E0A,0E0B,0E0C,0E0D,0E0E,0E0F,\ A0..AF 0E10,0E11,0E12,0E13,0E14,0E15,0E16,0E17,0E18,0E19,0E1A,0E1B,0E1C,0E1D,0E1E,0E1F,\ B0..BF 0E20,0E21,0E22,0E23,0E24,0E25,0E26,0E27,0E28,0E29,0E2A,0E2B,0E2C,0E2D,0E2E,0E2F,\ C0..CF 0E30,0E31,0E32,0E33,0E34,0E35,0E36,0E37,0E38,0E39,0E3A,2060,200B,2013,2014,0E3F,\ D0..DF 0E40,0E41,0E42,0E43,0E44,0E45,0E46,0E47,0E48,0E49,0E4A,0E4B,0E4C,0E4D,2122,0E4F,\ E0..EF 0E50,0E51,0E52,0E53,0E54,0E55,0E56,0E57,0E58,0E59,00AE,00A9,FFFE,FFFE,FFFE,FFFE,\ F0..FF ; CP 10029,"Mac-CE","Macintosh Central European, MAC-Latin2",\ https://en.wikipedia.org/wiki/Mac_OS_Central_European_encoding,\ 00C4,0100,0101,00C9,0104,00D6,00DC,00E1,0105,010C,00E4,010D,0106,0107,00E9,0179,\ 80..8F 017A,010E,00ED,010F,0112,0113,0116,00F3,0117,00F4,00F6,00F5,00FA,011A,011B,00FC,\ 90..9F 2020,00B0,0118,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,0119,00A8,2260,0123,012E,\ A0..AF 012F,012A,2264,2265,012B,0136,2202,2211,0142,013B,013C,013D,013E,0139,013A,0145,\ B0..BF 0146,0143,00AC,221A,0144,0147,2206,00AB,00BB,2026,00A0,0148,0150,00D5,0151,014C,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,25CA,014D,0154,0155,0158,2039,203A,0159,0156,\ D0..DF 0157,0160,201A,201E,0161,015A,015B,00C1,0164,0165,00CD,017D,017E,016A,00D3,00D4,\ E0..EF 016B,016E,00DA,016F,0170,0171,0172,0173,00DD,00FD,0137,017B,0141,017C,0122,02C7,\ F0..FF ; CP 10079,"Mac-Icelandic","Macintosh Icelandic",\ https://en.wikipedia.org/wiki/Mac_OS_Icelandic_encoding,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 00DD,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF 221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,2126,00E6,00F8,\ B0..BF 00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,20AC,00D0,00F0,00DE,00FE,\ D0..DF 00FD,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF FFFE,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0..FF ; CP 10080,"Mac-Inuit","Macintosh Inuit",\ https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/INUIT.TXT,\ 1403,1404,1405,1406,140A,140B,1431,1432,1433,1434,1438,1439,1449,144E,144F,1450,\ 80..8F 1451,1455,1456,1466,146D,146E,146F,1470,1472,1473,1483,148B,148C,148D,148E,1490,\ 90..9F 1491,00B0,14A1,14A5,14A6,2022,00B6,14A7,00AE,00A9,2122,14A8,14AA,14AB,14BB,14C2,\ A0..AF 14C3,14C4,14C5,14C7,14C8,14D0,14EF,14F0,14F1,14F2,14F4,14F5,1505,14D5,14D6,14D7,\ B0..BF 14D8,14DA,14DB,14EA,1528,1529,152A,152B,152D,2026,00A0,152E,153E,1555,1556,1557,\ C0..CF 2013,2014,201C,201D,2018,2019,1558,1559,155A,155D,1546,1547,1548,1549,154B,154C,\ D0..DF 1550,157F,1580,1581,1582,1583,1584,1585,158F,1590,1591,1592,1593,1594,1595,1671,\ E0..EF 1672,1673,1674,1675,1676,1596,15A0,15A1,15A2,15A3,15A4,15A5,15A6,157C,0141,0142,\ F0..FF ; CP 10081,"Mac-Turkish","Macintosh Turkish",\ https://en.wikipedia.org/wiki/Mac_OS_Turkish_encoding,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF 221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,2126,00E6,00F8,\ B0..BF 00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,011E,011F,0130,0131,015E,015F,\ D0..DF 2021,00B7,201A,201E,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF FFFE,00D2,00DA,00DB,00D9,FFFE,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0.FF ; CP 10082,"Mac-Croatian","Macintosh Serbian Latin",\ https://en.wikipedia.org/wiki/Mac_OS_Croatian_encoding,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,0160,2122,00B4,00A8,2260,017D,00D8,\ A0..AF 221E,00B1,2264,2265,2206,00B5,2202,2211,220F,0161,222B,00AA,00BA,03A9,017E,00F8,\ B0..BF 00BF,00A1,00AC,221A,0192,2248,0106,00AB,010C,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 0110,2014,201C,201D,2018,2019,00F7,25CA,F8FF,00A9,2044,20AC,2039,203A,00C6,00BB,\ D0..DF 2013,00B7,201A,201E,2030,00C2,0107,00C1,010D,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF 0111,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,03C0,00CB,02DA,00B8,00CA,00E6,02C7,\ F0..FF ; CP 10083,"Mac-Gaelic","Macintosh Gaelic",\ https://en.wikipedia.org/wiki/Mac_OS_Gaelic,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF 1E02,00B1,2264,2265,1E03,010A,010B,1E0A,1E0B,1E1E,1E1F,0120,0121,1E40,00E6,00F8,\ B0..BF 1E41,1E56,1E57,027C,0192,017F,1E60,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 2013,2014,201C,201D,2018,2019,1E61,1E9B,00FF,0178,1E6A,20AC,2039,203A,0176,0177,\ D0..DF 1E6B,00B7,1EF2,1EF3,204A,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF 2618,00D2,00DA,00DB,00D9,0131,00DD,00FD,0174,0175,1E84,1E85,1E80,1E81,1E82,1E83,\ F0..FF ; CP 10084,"Mac-Celtic","Macintosh Celtic",\ https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CELTIC.TXT,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 2020,00B0,00A2,00A3,00A7,2022,00B6,00DF,00AE,00A9,2122,00B4,00A8,2260,00C6,00D8,\ A0..AF 221E,00B1,2264,2265,00A5,00B5,2202,2211,220F,03C0,222B,00AA,00BA,03A9,00E6,00F8,\ B0..BF 00BF,00A1,00AC,221A,0192,2248,2206,00AB,00BB,2026,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 2013,2014,201C,201D,2018,2019,00F7,25CA,00FF,0178,2044,20AC,2039,203A,0176,0177,\ D0..DF 2021,00B7,1EF2,1EF3,2030,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF 2663,00D2,00DA,00DB,00D9,0131,00DD,00FD,0174,0175,1E84,1E85,1E80,1E81,1E82,1E83,\ F0..FF ; CP 10089,"Mac-Latin","AKA Kermit, Macintosh Latin",\ https://en.wikipedia.org/wiki/Macintosh_Latin_encoding,\ 00C4,00C5,00C7,00C9,00D1,00D6,00DC,00E1,00E0,00E2,00E4,00E3,00E5,00E7,00E9,00E8,\ 80..8F 00EA,00EB,00ED,00EC,00EE,00EF,00F1,00F3,00F2,00F4,00F6,00F5,00FA,00F9,00FB,00FC,\ 90..9F 00DD,00B0,00A2,00A3,00A7,00D7,00B6,00DF,00AE,00A9,00B2,00B4,00A8,00B3,00C6,00D8,\ A0..AF 00B9,00B1,00BC,00BD,00A5,00B5,FFFE,FFFE,FFFE,FFFE,00BE,00AA,00BA,FFFE,00E6,00F8,\ B0..BF 00BF,00A1,00AC,0141,0192,02CB,FFFE,00AB,00BB,00A6,00A0,00C0,00C3,00D5,0152,0153,\ C0..CF 00AD,FFFE,FFFE,FFFE,0142,FFFE,00F7,FFFE,00FF,0178,FFFE,00A4,00D0,00F0,00DE,00FE,\ D0..DF 00FD,00B7,FFFE,FFFE,FFFE,00C2,00CA,00C1,00CB,00C8,00CD,00CE,00CF,00CC,00D3,00D4,\ E0..EF FFFE,00D2,00DA,00DB,00D9,0131,02C6,02DC,00AF,02D8,02D9,02DA,00B8,02DD,02DB,02C7,\ F0.FF ; CP 10101, "NextStep","AKA OpenStep",\ https://www.unicode.org/Public/MAPPINGS/VENDORS/NEXT/NEXTSTEP.TXT,\ 00A0,00C0,00C1,00C2,00C3,00C4,00C5,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ 80..8F 00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D9,00DA,00DB,00DC,00DD,00DE,00B5,00D7,00F7,\ 90..9F 00A9,00A1,00A2,00A3,2044,00A5,0192,00A7,00A4,2019,201C,00AB,2039,203A,FB01,FB02,\ A0..AF 00AE,2013,2020,2021,00B7,00A6,00B6,2022,201A,201E,201D,00BB,2026,2030,00AC,00BF,\ B0..BF 00B9,02CB,00B4,02C6,02DC,00AF,02D8,02D9,00A8,00B2,02DA,00B8,00B3,02DD,02DB,02C7,\ C0..CF 2014,00B1,00BC,00BD,00BE,00E0,00E1,00E2,00E3,00E4,00E5,00E7,00E8,00E9,00EA,00EB,\ D0..DF 00EC,00C6,00ED,00AA,00EE,00EF,00F0,00F1,0141,00D8,0152,00BA,00F2,00F3,00F4,00F5,\ E0..EF 00F6,00E6,00F9,00FA,00FB,0131,00FC,00FD,0142,00F8,0153,00DF,00FE,00FF,FFFD,FFFD,\ F0..FF ; CP 28591,"ISO-8859-1","Latin 1 (Western European)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-1,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF 00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF 00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF 00F0,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,00FD,00FE,00FF,\ F0..FF ; CP 28592,"ISO-8859-2","Latin 2 (Central European)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-1,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0104,02D8,0141,00A4,013D,015A,00A7,00A8,0160,015E,0164,0179,00AD,017D,017B,\ A0..AF 00B0,0105,02DB,0142,00B4,013E,015B,02C7,00B8,0161,015F,0165,017A,02DD,017E,017C,\ B0..BF 0154,00C1,00C2,0102,00C4,0139,0106,00C7,010C,00C9,0118,00CB,011A,00CD,00CE,010E,\ C0..CF 0110,0143,0147,00D3,00D4,0150,00D6,00D7,0158,016E,00DA,0170,00DC,00DD,0162,00DF,\ D0..DF 0155,00E1,00E2,0103,00E4,013A,0107,00E7,010D,00E9,0119,00EB,011B,00ED,00EE,010F,\ E0..EF 0111,0144,0148,00F3,00F4,0151,00F6,00F7,0159,016F,00FA,0171,00FC,00FD,0163,02D9,\ F0..FF ; CP 28593,"ISO-8859-3","Latin 3 (Turkish, Maltese, Esperanto)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-3,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0126,02D8,00A3,00A4,FFFE,0124,00A7,00A8,0130,015E,011E,0134,00AD,FFFE,017B,\ A0..AF 00B0,0127,00B2,00B3,00B4,00B5,0125,00B7,00B8,0131,015F,011F,0135,00BD,FFFE,017C,\ B0..BF 00C0,00C1,00C2,FFFE,00C4,010A,0108,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF FFFE,00D1,00D2,00D3,00D4,0120,00D6,00D7,011C,00D9,00DA,00DB,00DC,016C,015C,00DF,\ D0..DF 00E0,00E1,00E2,FFFE,00E4,010B,0109,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF FFFE,00F1,00F2,00F3,00F4,0121,00F6,00F7,011D,00F9,00FA,00FB,00FC,016D,015D,02D9,\ F0..FF ; CP 28594,"ISO-8859-4","Latin 4 (Baltic)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-4,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0104,0138,0156,00A4,0128,013B,00A7,00A8,0160,0112,0122,0166,00AD,017D,00AF,\ A0..AF 00B0,0105,02DB,0157,00B4,0129,013C,02C7,00B8,0161,0113,0123,0167,014A,017E,014B,\ B0..BF 0100,00C1,00C2,00C3,00C4,00C5,00C6,012E,010C,00C9,0118,00CB,0116,00CD,00CE,012A,\ C0..CF 0110,0145,014C,0136,00D4,00D5,00D6,00D7,00D8,0172,00DA,00DB,00DC,0168,016A,00DF,\ D0..DF 0101,00E1,00E2,00E3,00E4,00E5,00E6,012F,010D,00E9,0119,00EB,0117,00ED,00EE,012B,\ E0..EF 0111,0146,014D,0137,00F4,00F5,00F6,00F7,00F8,0173,00FA,00FB,00FC,0169,016B,02D9,\ F0..FF ; CP 28595,"ISO-8859-5","Latin/Cyrillic",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-5,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0401,0402,0403,0404,0405,0406,0407,0408,0409,040A,040B,040C,00AD,040E,040F,\ A0..AF 0410,0411,0412,0413,0414,0415,0416,0417,0418,0419,041A,041B,041C,041D,041E,041F,\ B0..BF 0420,0421,0422,0423,0424,0425,0426,0427,0428,0429,042A,042B,042C,042D,042E,042F,\ C0..CF 0430,0431,0432,0433,0434,0435,0436,0437,0438,0439,043A,043B,043C,043D,043E,043F,\ D0..DF 0440,0441,0442,0443,0444,0445,0446,0447,0448,0449,044A,044B,044C,044D,044E,044F,\ E0..EF 2116,0451,0452,0453,0454,0455,0456,0457,0458,0459,045A,045B,045C,00A7,045E,045F,\ F0..FF ; CP 28596,"ISO-8859-6","Latin/Arabic",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-6,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,FFFE,FFFE,FFFE,00A4,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,060C,00AD,FFFE,FFFE,\ A0..AF FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,061B,FFFE,FFFE,FFFE,061F,\ B0..BF FFFE,0621,0622,0623,0624,0625,0626,0627,0628,0629,062A,062B,062C,062D,062E,062F,\ C0..CF 0630,0631,0632,0633,0634,0635,0636,0637,0638,0639,063A,FFFE,FFFE,FFFE,FFFE,FFFE,\ D0..DF 0640,0641,0642,0643,0644,0645,0646,0647,0648,0649,064A,064B,064C,064D,064E,064F,\ E0..EF 0650,0651,0652,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ F0..FF ; CP 28597,"ISO-8859-7","Latin/Greek",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-7,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,2018,2019,00A3,20AC,20AF,00A6,00A7,00A8,00A9,037A,00AB,00AC,00AD,FFFE,2015,\ A0..AF 00B0,00B1,00B2,00B3,0384,0385,0386,00B7,0388,0389,038A,00BB,038C,00BD,038E,038F,\ B0..BF 0390,0391,0392,0393,0394,0395,0396,0397,0398,0399,039A,039B,039C,039D,039E,039F,\ C0..CF 03A0,03A1,FFFE,03A3,03A4,03A5,03A6,03A7,03A8,03A9,03AA,03AB,03AC,03AD,03AE,03AF,\ D0..DF 03B0,03B1,03B2,03B3,03B4,03B5,03B6,03B7,03B8,03B9,03BA,03BB,03BC,03BD,03BE,03BF,\ E0..EF 03C0,03C1,03C2,03C3,03C4,03C5,03C6,03C7,03C8,03C9,03CA,03CB,03CC,03CD,03CE,FFFE,\ F0..FF ; CP 28598,"ISO-8859-8","Latin/Hebrew, AKA IBM916 ",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-8,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,FFFE,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00D7,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00F7,00BB,00BC,00BD,00BE,FFFE,\ B0..BF FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ C0..CF FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,2017,\ D0..DF 05D0,05D1,05D2,05D3,05D4,05D5,05D6,05D7,05D8,05D9,05DA,05DB,05DC,05DD,05DE,05DF,\ E0..EF 05E0,05E1,05E2,05E3,05E4,05E5,05E6,05E7,05E8,05E9,05EA,FFFE,FFFE,200E,200F,FFFE,\ F0..FF ; CP 28599,"ISO-8859-9","Latin 5, AKA IBM920 (Turkish)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-9,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,00A1,00A2,00A3,00A4,00A5,00A6,00A7,00A8,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,00B4,00B5,00B6,00B7,00B8,00B9,00BA,00BB,00BC,00BD,00BE,00BF,\ B0..BF 00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 011E,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,0130,015E,00DF,\ D0..DF 00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF 011F,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,0131,015F,00FF,\ F0..FF ; CP 28600,"ISO-8859-10","Latin 6, AKA IBM919 (Nordic)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-10,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0104,0112,0122,012A,0128,0136,00A7,013B,0110,0160,0166,017D,00AD,016A,014A,\ A0..AF 00B0,0105,0113,0123,012B,0129,0137,00B7,013C,0111,0161,0167,017E,2015,016B,014B,\ B0..BF 0100,00C1,00C2,00C3,00C4,00C5,00C6,012E,010C,00C9,0118,00CB,0116,00CD,00CE,00CF,\ C0..CF 00D0,0145,014C,00D3,00D4,00D5,00D6,0168,00D8,0172,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF 0101,00E1,00E2,00E3,00E4,00E5,00E6,012F,010D,00E9,0119,00EB,0117,00ED,00EE,00EF,\ E0..EF 00F0,0146,014D,00F3,00F4,00F5,00F6,0169,00F8,0173,00FA,00FB,00FC,00FD,00FE,0138,\ F0..FF ; CP 28601,"ISO-8859-11","Latin/Thai",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-11,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0E01,0E02,0E03,0E04,0E05,0E06,0E07,0E08,0E09,0E0A,0E0B,0E0C,0E0D,0E0E,0E0F,\ A0..AF 0E10,0E11,0E12,0E13,0E14,0E15,0E16,0E17,0E18,0E19,0E1A,0E1B,0E1C,0E1D,0E1E,0E1F,\ B0..BF 0E20,0E21,0E22,0E23,0E24,0E25,0E26,0E27,0E28,0E29,0E2A,0E2B,0E2C,0E2D,0E2E,0E2F,\ C0..CF 0E30,0E31,0E32,0E33,0E34,0E35,0E36,0E37,0E38,0E39,0E3A,FFFE,FFFE,FFFE,FFFE,0E3F,\ D0..DF 0E40,0E41,0E42,0E43,0E44,0E45,0E46,0E47,0E48,0E49,0E4A,0E4B,0E4C,0E4D,0E4E,0E4F,\ E0..EF 0E50,0E51,0E52,0E53,0E54,0E55,0E56,0E57,0E58,0E59,0E5A,0E5B,FFFE,FFFE,FFFE,FFFE,\ F0..FF ; CP 28603,"ISO-8859-13","Latin 7, AKA IBM921 (Baltic)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-12,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,201D,00A2,00A3,00A4,201E,00A6,00A7,00D8,00A9,0156,00AB,00AC,00AD,00AE,00C6,\ A0..AF 00B0,00B1,00B2,00B3,201C,00B5,00B6,00B7,00F8,00B9,0157,00BB,00BC,00BD,00BE,00E6,\ B0..BF 0104,012E,0100,0106,00C4,00C5,0118,0112,010C,00C9,0179,0116,0122,0136,012A,013B,\ C0..CF 0160,0143,0145,00D3,014C,00D5,00D6,00D7,0172,0141,015A,016A,00DC,017B,017D,00DF,\ D0..DF 0105,012F,0101,0107,00E4,00E5,0119,0113,010D,00E9,017A,0117,0123,0137,012B,013C,\ E0..EF 0161,0144,0146,00F3,014D,00F5,00F6,00F7,0173,0142,015B,016B,00FC,017C,017E,2019,\ F0..FF ; CP 28604,"ISO-8859-14","Latin 8 (Celtic)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-14,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,1E02,1E03,00A3,010A,010B,1E0A,00A7,1E80,00A9,1E82,1E0B,1EF2,00AD,00AE,0178,\ A0..AF 1E1E,1E1F,0120,0121,1E40,1E41,00B6,1E56,1E81,1E57,1E83,1E60,1EF3,1E84,1E85,1E61,\ B0..BF 00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 0174,00D1,00D2,00D3,00D4,00D5,00D6,1E6A,00D8,00D9,00DA,00DB,00DC,00DD,0176,00DF,\ D0..DF 00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF 0175,00F1,00F2,00F3,00F4,00F5,00F6,1E6B,00F8,00F9,00FA,00FB,00FC,00FD,0177,00FF,\ F0..FF ; CP 28605,"ISO-8859-15","Latin 9, AKA IBM923 (Western Europe)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-15,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,00A1,00A2,00A3,20AC,00A5,0160,00A7,0161,00A9,00AA,00AB,00AC,00AD,00AE,00AF,\ A0..AF 00B0,00B1,00B2,00B3,017D,00B5,00B6,00B7,017E,00B9,00BA,00BB,0152,0153,0178,00BF,\ B0..BF 00C0,00C1,00C2,00C3,00C4,00C5,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 00D0,00D1,00D2,00D3,00D4,00D5,00D6,00D7,00D8,00D9,00DA,00DB,00DC,00DD,00DE,00DF,\ D0..DF 00E0,00E1,00E2,00E3,00E4,00E5,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF 00F0,00F1,00F2,00F3,00F4,00F5,00F6,00F7,00F8,00F9,00FA,00FB,00FC,00FD,00FE,00FF,\ F0..FF ; CP 28606,"ISO-8859-16","Latin 10 (South-Eastern Europe)",\ https://en.wikipedia.org/wiki/ISO/IEC_8859-16 ,\ FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 80..8F FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,FFFE,\ 90..9F 00A0,0104,0105,0141,20AC,201E,0160,00A7,0161,00A9,0218,00AB,0179,00AD,017A,017B,\ A0..AF 00B0,00B1,010C,0142,017D,201D,00B6,00B7,017E,010D,0219,00BB,0152,0153,0178,017C,\ B0..BF 00C0,00C1,00C2,0102,00C4,0106,00C6,00C7,00C8,00C9,00CA,00CB,00CC,00CD,00CE,00CF,\ C0..CF 0110,0143,00D2,00D3,00D4,0150,00D6,015A,0170,00D9,00DA,00DB,00DC,0118,021A,00DF,\ D0..DF 00E0,00E1,00E2,0103,00E4,0107,00E6,00E7,00E8,00E9,00EA,00EB,00EC,00ED,00EE,00EF,\ E0..EF 0111,0144,00F2,00F3,00F4,0151,00F6,015B,0171,00F9,00FA,00FB,00FC,0119,021B,00FF,\ F0..FF ;
[DATA] SegINPUT DW PARA# [INPUT] ; Paragraph address of segment [INPUT]. SegOUTPUT DW PARA# [OUTPUT] ; Paragraph address of segment [OUTPUT]. InpEncId DW 0 ; Input encoding identifier (437..65001). OutEncId DW 0 ; Output encoding identifier (437..65001). InpEncSt DW 0 ; Input encoding flags, see above. OutEncSt DW 0 ; Output encoding flags, see above. Relevance DD 0 ; Total sum of relevances of all characters in the input text. BestRelevance DD 8000_0000 ; Best relevance autodetected so far. Begin with lowest negative dword. InpErrorsLo DW 0 ; Number of input characters which are not defined in input encoding. InpErrorsHi DW 0 ; Number of input characters which are not defined in input encoding. OutErrorsLo DW 0 ; Number of characters which are not defined in output encoding. OutErrorsHi DW 0 ; Number of characters which are not defined in output encoding. InpFileName DB 128*B 0 OutFileName DB 128*B 0 InpHandle DW -1 OutHandle DW -1 InpFileSizeLo DW 0 ; Input file size, lower word. InpFileSizeHi DW 0 ; Input file size, higher word. OutFileSizeLo DW 0 ; Output file size, lower word. OutFileSizeHi DW 0 ; Output file size, higher word. InpEnd DW 0 ; Offset behind the last byte of input text. OutputPtr DW 0 ; Offset of the next free position in [OUTPUT]. TrTable DW 0 ; Offset of selected array of 128 WORDs with code points of OEM/ANSI encodings. EntitySkip DW 0 ; Number of bytes skipped when HTML entity is decoded on input (0..8). CharSize DB 0 ; Character width in bytes (1..4) used to increase relevance during autodetection. Errorlevel DB 0 ; 0=normal end, 2=invalid characters, 4=I/O error, 8=wrong syntax. TempString DB 16*B ; Working room for string manipulation. ; Byte Order Mark definitions. BOM_UTF32LE DB 0xFF,0xFE,0x00,0x00 BOM_UTF16LE EQU BOM_UTF32LE BOM_UTF32BE DB 0x00,0x00,0xFE,0xFF BOM_UTF16BE EQU BOM_UTF32BE + 2 BOM_UTF8 DB 0xEF,0xBB,0xBF,0 HelpText: D "Program: EuroConvertor version DOS %Version",13,10 D "Function: Conversion of text file encoding.",13,10 D "Format: Dual DOS/Windows application.",13,10 D "Licence: Freeware by vitsoft",13,10 D "Arguments: InpEncoding OutEncoding InpFileName OutFileName",13,10 D "Example: euroconv ISO8859-2 utf16le/BOM input.txt output.txt",13,10 D "Encodings: euroconv enc | more",13,10 D "Manual: https://vitsoft.info/econv_en.htm",13,10,0 [CODE]
DosMain: PROC PUSH PARA# [DATA] POP DS GetArg 1 ; Input encoding retrieve by GetArg. JCXZ .Help: CMPB [ES:SI],'-' JE .Help: CMPB [ES:SI],'/' JE .Help: Invoke DosParseEnc, SI,CX JC .UnknownEncoding: MOV [InpEncId],AX MOV [InpEncSt],BX JNSt BX,encStEnc,.10: CALL DosEncList: ; Display list of supported encodings. JMP .Abort: .10:GetArg 2 ; Output encoding. Invoke DosParseEnc, SI,CX JNC .13: .UnknownEncoding: ; Encoding is at ES:SI,CX. OR [Errorlevel],8 JCXZ .Help: PUSH DS,ES POP DS StdOutput SI,Size=CX POP DS StdOutput =B' is not supported encoding.',Eol=Yes JMP .Abort: .13:MOV [OutEncId],AX MOV [OutEncSt],BX GetArg 3 ; Input file name. JNC .16: .Help:StdOutput HelpText OR [Errorlevel],8 JMP .Abort: .16:MOV DI,InpFileName MOV DX,DI .19:LODSB [ES:SI] CMP AL,'"' JE .23: MOV [DS:DI],AL INC DI .23:LOOP .19: DosAPI AH=3Dh,AL=fileRead+filePermitAll ; OPEN EXISTING FILE. MOV [InpHandle],AX JNC .26: .InputError: OR [Errorlevel],4 StdOutput =B"Error reading input file ", InpFileName, Eol=Yes JMP .Abort: .26:PUSH DS DosAPI BX=AX,AH=3Fh,CX=SIZE#[INPUT],DX=0,DS=[SegINPUT] ; READ FROM FILE OR DEVICE. POP DS JC .InputError: MOV [InpEnd],AX ; Number of read bytes = end of read data in [INPUT]. ADD [InpFileSizeLo],AX GetArg 4 ; Output file name. JC .Help: MOV DI,OutFileName MOV DX,DI .29:LODSB [ES:SI] CMP AL,'"' JE .33: MOV [DS:DI],AL INC DI .33:LOOP .29: DosAPI AH=3Ch ; CREATE OR TRUNCATE FILE. MOV [OutHandle],AX JNC .36: .OutputError: OR [Errorlevel],4 StdOutput =B"Error writing output file ", OutFileName, Eol=Yes JMP .Abort: .36:; Cmdline parameters are valid and accepted. Fix input encoding. MOV ES,[SegINPUT] SUB SI,SI MOV DX,[InpEnd] ; Skip the fix if input endianess is irrelevant or explicitly specified. JSt [InpEncSt],encStLe|encStBe|encStAscii|encStUtf8|encStAuto, .69: JNSt [InpEncSt],encStUtf16, .53: ; Autodetect input UTF-16 endianess of text ES:SI..ES:DX. MOV AX,DX SUB AX,SI CMP AX,2 JB .46: ; Skip if text is too short. MOV AX,[ES:SI] CMP AX,[BOM_UTF16LE] JNE .43: SetSt [InpEncSt],encStBom .39:SetSt [InpEncSt],encStLe MOV AX,1200 JMP .79: .43:CMP AX,[BOM_UTF16BE] JNE .49: SetSt [InpEncSt],encStBom .46:SetSt [InpEncSt],encStBe MOV EAX,1201 JMP .79: .49:; No 16bit BOM is present in input, perform empiric autodetection. Invoke DosConvert, 1200,SI,DX,Void ; Try UTF-16LE. MOV BX,AX ; Number of input errors if UTF-16LE. Invoke DosConvert, 1201,SI,DX,Void ; Try UTF-16BE. CMP AX,BX JBE .46: ; UTF16BE detected. JMP .39: ; UTF16LE detected. .53:JNSt [InpEncSt],encStUtf32, .69: ; Autodetect input UTF-32 endianess of text ES:SI..ES:DX. MOV AX,DX SUB AX,SI CMP AX,4 JB .63: MOV EAX,[ES:SI] CMP EAX,[BOM_UTF32LE] JNE .59: SetSt [InpEncSt],encStBom .56:SetSt [InpEncSt],encStLe MOV AX,12000 JMP .79: .59:CMP EAX,[BOM_UTF32BE] JNE .66: SetSt [InpEncSt],encStBom .63:SetSt [InpEncSt],encStBe MOV AX,12001 JMP .79: .66:; No 32bit BOM is present in input, perform empiric autodetection. Invoke DosConvert, 12000,SI,DX,Void ; Try UTF-32LE. MOV BX,AX ; Number of input errors if UTF-32LE. Invoke DosConvert, 12001,SI,DX,Void ; Try UTF-32BE. CMP AX,BX JBE .63: ; UTF32BE detected. JMP .56: ; UTF32LE detected. .69:JNSt [InpEncSt],encStAuto, .79: CALL DosAutodetect .79:MOV DX,[InpEnd] MOV ES,[SegINPUT] MOV CX,DX SUB CX,SI ; [InpEncId] is now finally specified. [INPUT]:SI..DX is input text, CX its size. It may start with BOM. ; Select the output encoding procedure. MOV AX,[OutEncId] JNSt [OutEncSt],encStAuto|encStOem,.82 PUSH BX,DX DosAPI AX=6601h ; GET GLOBAL CODE PAGE TABLE. MOV AX,437 ; If DosAPI failed, use default IBM437. JC .81: MOV AX,BX ; AX is now the number of active code page. .81: MOV [OutEncId],AX SetSt [OutEncSt],encStOem POP DX,BX .82:Dispatch AX,65001,1200,1201,12000,12001,20127 ; Undispatched output encoding is 8bit OEM/ANSI. MOV CX,[TableDir.CodePages] MOV DI,[TableDir.CPid] PUSH DS POP ES REPNE SCASW JNE .20127: ; Unsupported output encoding - use ASCII. SUB DI,[TableDir.CPid] DEC DI,DI ADD DI,[TableDir.CPtable] MOV AX,[DI] ; AX is now offset of translation table in section [CPtt]. ADD AX,[TableDir.CPtt] MOV [TrTable],AX ; Offset of translation table in segment [DATA]. MOV DI, To8bit: JMP .89: .65001:MOV DI, ToUTF8: CMP CX,4 JBE .89: LODSD [ES:SI] DEC SI AND EAX,0x00FF_FFFF SetSt [InpEncSt],encStBom CMP EAX,[BOM_UTF8] JE .89: RstSt [InpEncSt],encStBom SUB SI,3 ; No input UTF8-BOM is present. JMP .89: .1200: .1201:MOV DI, ToUTF16: CMP CX,2 JB .89: SetSt [InpEncSt],encStBom TEST AL,1 ; Difference between BE and LE. LODSW JZ .83: CMP AX,[BOM_UTF16BE] JE .89: .83:CMP AX,[BOM_UTF16LE] JE .89: SUB SI,2 ; No input UTF16-BOM is present. RstSt [InpEncSt],encStBom JMP .89: .12000: .12001:MOV DI, ToUTF32: CMP CX,4 JB .89: SetSt [InpEncSt],encStBom TEST AL,1 ; Difference between BE and LE. LODSD JZ .86: CMP EAX,[BOM_UTF32BE] JE .89: .86:CMP EAX,[BOM_UTF32LE] JE .89: SUB ESI,4 ; No input BOM is present. RstSt [InpEncSt],encStBom JMP .89: .20127: MOV DI, ToASCII: .89: ; [INPUT]:SI..DX is now input text with BOM removed. JNSt [OutEncSt],encStUtf,.94: ; No BOM in non-Unicode encodings. JNSt [OutEncSt],encStBom,.94: ; No output BOM if not explicitely requested. ; Output BOM was requested. Write BOM before invokation of DosConvert. MOV AX,[OutEncId] Dispatch AX,1200d,1201d,12000d,12001d,65001d JMP .94: ; Non-Unicode enxoding cannot have BOM. .65001d: MOV EAX,[BOM_UTF8] CALL OutputAL SHR EAX,8 CALL OutputAL SHR EAX,8 CALL OutputAL JMP .94: .12001d: MOV EAX,[BOM_UTF32BE] CALL OutputEAX JMP .94: .12000d: MOV EAX,[BOM_UTF32LE] CALL OutputEAX JMP .94: .1201d: MOV AX,[BOM_UTF16BE] JMP .92: .1200d: MOV AX,[BOM_UTF16LE] .92:CALL OutputAX .94:MOV DX,[InpEnd] ; [INPUT]:SI..DX is input text without BOM. Output encoding callback procedure is now in CS:DI. Invoke DosConvert,[InpEncId],SI,DX,DI ADD [InpErrorsLo],AX ADC [InpErrorsHi],0 PUSH DS DosAPI AH=3Fh,BX=[InpHandle],CX=SIZE#[INPUT],DX=0,DS=[SegINPUT] ; READ FROM FILE OR DEVICE. POP DS JC .InputError: SUB SI,SI MOV [InpEnd],AX ; This many bytes have been read from input file. ADD [InpFileSizeLo],AX ADC [InpFileSizeHi],SI TEST AX JNZ .94: ; If not end of file yet. CALL OutputFlush: Invoke DosInfoText, =' In', InpFileName,[InpFileSizeLo],[InpFileSizeHi],[InpEncId],[InpEncSt],[InpErrorsLo],[InpErrorsHi] Invoke DosInfoText, ='Out', OutFileName,[OutFileSizeLo],[OutFileSizeHi],[OutEncId],[OutEncSt],[OutErrorsLo],[OutErrorsHi] MOV EAX,[InpErrorsLo] OR EAX,[OutErrorsLo] JZ .Abort: OR [Errorlevel],2 .Abort: DosAPI AH=3Eh,BX=[OutHandle] ; CLOSE FILE. DosAPI AH=3Eh,BX=[InpHandle] ; CLOSE FILE. TerminateProgram [Errorlevel] ENDPROC DosMain:
DosConvert Procedure InpEncId,TextPtr,TextEnd,DosOutputProc PUSH ES MOV AX,[%InpEncId] MOV SI,[%TextPtr] MOV DX,[%TextEnd] SUB CX,CX MOV [%ReturnAX],CX ; Initialize input error counter. INC CX MOV [CharSize],CX ; Initialize CharSize=1. It will be updated if input encoding is Unicode. AND CX,AX ; Let ECX=1 for odd InpEncId, ECX=0 for even InpEncId (endianess in UTF-16 and UTF-32). MOV ES,[SegINPUT] Dispatch AX,65001,20127,1200,1201,12000,12001 ; Special encodings UTF or ASCII. ; Undispatched encodings is 8bit, let's select the translation table. ; Convert from OEM or ANSI 8bit encoding. PUSH DS POP ES MOV CX,[TableDir.CodePages] MOV DI,[TableDir.CPid] MOV BX,-1 REPNE SCASW JNE .05: ; Unsupported output encoding. SUB DI,[TableDir.CPid] LEA BX,[DI-2] ADD BX,[TableDir.CPtable] ; BX now points to an offset of translation table in section [CPtt]. Or -1 if no table. MOV BX,[BX] CMP BX,-1 ; In case that this codepage has no translation table. JNE .10: .05:MOV [%ReturnAX],BX JMP .90: .10:ADD BX,[TableDir.CPtt]; DS:BX now points to translation table with 128 WORDs. XOR AX,AX .15:CMP SI,DX JNB .90: XOR EAX,EAX MOV ES,[SegINPUT] LODSB [ES:SI] CMP AL,128 JB .20: MOV DI,AX ADD DI,AX SUB DI,256 MOV AX,[BX+DI] ; Translate character 0x80..0xFF to Unicode. CMP AX,Replacement CMC ; CF=1 if AX=0xFFFD,0xFFFE or 0xFFFF (replacement or undefined). ADCW [%ReturnAX],0 ; Input error. .20:CMP AX,'&' ; Possible beginning of HTML entity. JNE .22: JNSt [InpEncSt],encStHtm|encStHtml,.22: ; Skip if HTML entity should be ignored. CALL DosHtmlDecode: .22:CALL [%DosOutputProc] JC .90: JMP .15: ; The next input character. .65001: ; Convert from UTF-8 encoding. SUB DX,SI DecodeUTF8 SI,.Store,Size=DX,Width=32 ; Uses macro from string16.htm. JC .23: JECXZ .23: ; CX bytes from [INPUT] was left undecoded. XOR DX,DX NEG CX NOT DX DosAPI AX=4201h,BX=[InpHandle] ; LSEEK DX:CX bytes back from current file position. .23:JMP .90: .Store:PROC ; Internal subprocedure .Store is callback from the macro DecodeUTF8. ; It is expected to pass decoded codepoint EAX to %DosOutputProc. MOV [CharSize],1 ; CharSize will be applied in codepage autodetection in GetRelevance. CMP EAX,80h JNA .2: MOV [CharSize],2 CMP EAX,800h JNA .2: MOV [CharSize],3 .2: CMP EAX,Replacement JNE .3: INCW [%ReturnAX] ; Replacement and unsupported codepoints increment input error counter. .3: MOV CX,[EntitySkip] JCXZ .5: DEC CX MOV [EntitySkip],CX RET ; Ignore the remaining letters of already decoded HTML entity. .5: CMP EAX,'&' ; Possible beginning of HTML entity. JNE .9: JNSt [InpEncSt],encStHtm|encStHtml,.9: MOV DI,SI CALL DosHtmlDecode: SUB SI,DI ; How many bytes should decoder advance to skip the decoded entity (0..9). MOV [EntitySkip],SI .9: JMP [BP+52] ; %DosOutputProc in DosConvert's frame. ENDP .Store: .20127: ; Convert from ASCII encoding. CMP SI,DX JNB .90: XOR EAX,EAX LODSB [ES:SI] CMP AL,128 JB .25: MOV AX,Replacement INCW [%ReturnAX] ; Input error. .25:JNSt [InpEncSt],encStHtm|encStHtml,.27: CMP AX,'&' JNE .27: CALL DosHtmlDecode: .27:CALL [%DosOutputProc] JC .90: JMP .20127: ; The next character. .1200: ; Convert from UTF-16LE encoding. ECX=0. .1201: ; Convert from UTF-16BE encoding. ECX=1. MOVB [CharSize],2 MOV BX,DX SUB BX,SI AND BX,1 JZ .30: SUB DX,BX ; Text size is not WORD aligned, truncate and count this as input error. INCW [%ReturnAX] .30:CMP SI,DX JNB .90: XOR EAX,EAX LODSW [ES:SI] JCXZ .35: XCHG AL,AH ; Convert UTF-16BE to UTF-16LE. .35:CMP AX,0xD7FF JBE .55: CMP AX,0xE000 JAE .55: ; High surrogate expected (0xD800..0xDBFF). SUB AX,0xD800 CMP AX,0x0400 JAE .45: MOV EDI,EAX ; Temporary save high 10 bits. SHL EDI,10 CMP SI,DX JNB .45: LODSW [ES:SI] ; Fetch the low surrogate. JCXZ .40: XCHG AL,AH ; Convert UTF-16BE to UTF-16LE. .40:SUB AX,0xDC00 ; Low surrogate expected (0xDC00..0xDFFF). JB .45: CMP AX,0x0400 JB .50: .45:MOV AX,Replacement INCW [%ReturnAX] JMP .55: .50:LEA EAX,[EAX+EDI+0x10000] ; Compose codepoint from both surrogates. .55:JNSt [InpEncSt],encStHtm|encStHtml,.57: CMP EAX,'&' JNE .57: CALL DosHtmlDecode: .57:CALL [%DosOutputProc] JC .90: JMP .30: .12000: ; Convert from UTF-32LE encoding. ECX=0. .12001: ; Convert from UTF-32BE encoding. ECX=1. MOVB [CharSize],4 MOV BX,DX SUB BX,SI AND BX,3 JZ .60: SUB DX,BX ; Text size is not DWORD aligned, truncate and count this as input error. INCW [%ReturnAX] .60:CMP SI,DX JNB .90: LODSD [ES:SI] JCXZ .65: ; CX=1 if UTF-32BE. BSWAP EAX ; Convert UTF-32BE to UTF-32LE. .65:CMP EAX,10FFFFh JA .70: ; Invalid above 10FFFFh. CMP EAX,0xD800 JB .80: ; Valid below 0xD800. CMP EAX,0xDFFF ; Invalid below 0xDFFF. JA .80: .70:MOV EAX,Replacement INCW [%ReturnAX] .80:JNSt [InpEncSt],encStHtm|encStHtml,.85: CMP EAX,'&' JNE .85: CALL DosHtmlDecode: .85:CALL [%DosOutputProc] JNC .60: .90:POP ES EndProcedure DosConvert
Void PROC ; Empty conversion. CLC RET ENDP Void GetRelevance: PROC ; Relevance is probability, that codepoint EAX appears in input text. PUSH CX,DX,DI,ES TEST EAX,0xFFFF_0000 JNZ .4: MOV CX,[TableDir.CodePoints] MOV DI,[TableDir.CodePoint] PUSH DS POP ES REPNE SCASW .4: MOV EAX,?? ; Relevance of invalid character is negative. JNE .9: SUB DI,2 SUB DI,[TableDir.CodePoint] SHR DI,1 ADD DI,[TableDir.Relevance] MOVSXB AX,[DI] .9: MOVZXB CX,[CharSize] IMUL CX ADD [Relevance+0],AX ; Accumulate the value in global memory location. ADC [Relevance+2],DX CLC POP ES,DI,DX,CX RET ENDP GetRelevance: Replace: PROC ; Replace codepoint EAX which does not exists in output encoding. PUSH BX,CX,SI,ES PUSH DS POP ES MOV BX,[OutEncSt] MOV SI,TempString JSt BX,encStIgn,.9: ; Ignore unsupported character. MOV DI,SI JNSt BX,encStQm,.2: .1:MOV DI,SI MOV EAX,'?' MOV [DI],AX ; Replace codepoint EAX with question mark. JMP .6: .2:JNSt BX,encStHtm|encStHtml,.3: MOVD [DI],'' ; Replace codepoint EAX with its HTML entity. ADD DI,3 StoH DI,Align=Left MOV AX,';' STOSW JMP .6: .3:TEST EAX,0xFFFF_0000 ; Replace codepoint EAX with its transliteration. JNZ .1: MOV CX,[TableDir.CodePoints] MOV DI,[TableDir.CodePoint] REPNE SCASW JNE .1: SUB DI,2 SUB DI,[TableDir.CodePoint] SHL DI,1 ADD DI,[TableDir.Translit] MOV EAX,[DI] ; EAX now contains 0..4 ASCII character, NUL padded. MOV DI,SI ; TempString. .4:AND AL,7Fh JZ .5: ; End of TempString. STOSB SHR EAX,8 JMP .4: .5:STOSB ; NUL-terminate replacement string. .6:LODSB CMP AL,0 JZ .9: JSt BX,encStUtf16,.7: JSt BX,encStUtf32,.8: CALL ToASCII: JMP .6: .7:CALL ToUTF16: JMP .6: .8:CALL ToUTF32: JMP .6: .9:POP ES,SI,CX,BX ADD [OutErrorsLo],1 ADC [OutErrorsHi],0 CLC RET ENDP Replace: ToASCII: PROC ; Convert codepoint EAX to ASCII encoding. CMP EAX,127 JA Replace: CALL OutputAL RET ENDP ToASCII: To8bit: PROC ; Convert codepoint EAX to OEM/ANSI encoding using [TrTable]. CMP EAX,127 JBE .8: ; ASCII 7bit characters are copied verbatim. TEST EAX,0xFFFF_0000 JNZ Replace: ; Character outside BMP is replaced with question mark. MOV DI,[TrTable] PUSH CX,ES,DS POP ES MOV CX,128 REPNE SCASW ; Search for codepoint in TrTable. POP ES,CX JNE Replace: ; If codepoint EAX is not supported by output encoding. SUB DI,[TrTable] ; DI is now 2,4,6,8,,,256. SHR DI,1 ; DI is now 1,2,3,4,,,128. LEA AX,[DI+128-1] ; AL is now 128,129,,,255. .8:CALL OutputAL RET ENDP To8bit: ToUTF32: PROC ; Convert codepoint EAX to UTF-32 encoding. JNSt [OutEncSt],encStBe, .8: BSWAP EAX .8:CALL OutputEAX RET ENDP ToUTF32: ToUTF16: PROC ; Convert codepoint EAX to UTF-16 encoding. TEST EAX,0xFFFF_0000 JZ .5: ; Character outside BMP will be written as two surrogates. SUB EAX,0x0001_0000 MOV EDI,EAX SHR EDI,10 ADD EDI,0xD800 ; EDI is now the high surrogate. XCHG EDI,EAX CALL .5: XCHG EAX,EDI ; Restore original codepoint in EAX. AND EAX,0x3FF ADD AX,0xDC00 ; AX is now the low surrogate. .5:JNSt [OutEncSt],encStBe,.8: XCHG AL,AH .8:CALL OutputAX RET ENDP ToUTF16: ToUTF8: PROC ; Convert codepoint EAX to UTF-8 encoding. MOV EDI,EAX CMP EAX,0x7F JBE .8: CMP EAX,0x7FF JBE .4: CMP EAX,0xFFFF JBE .2: ; 4byte encoding. SHR EAX,18 OR AL,0xF0 CALL .8: MOV EAX,EDI SHR EAX,12 CALL .6: MOV EAX,EDI SHR EAX,6 CALL .6: JMP .5: .2: ; 3byte encoding. SHR EAX,12 OR AL,0xE0 CALL .8: MOV EAX,EDI SHR EAX,6 CALL .6: MOV EAX,EDI JMP .6: .4:; 2byte encoding. SHR EAX,6 OR AL,0xC0 CALL .8: .5:MOV EAX,EDI .6:AND EAX,0x3F OR AL,0x80 .8:CALL OutputAL RET ENDP ToUTF8: OutputAL PROC ; Write byte from AL to output. PUSH DI,ES MOV DI,[OutputPtr] CMP DI,SIZE# [OUTPUT] JB .8: CALL OutputFlush: SUB DI,DI .8: PUSH PARA# [OUTPUT] POP ES STOSB MOV [OutputPtr],DI CLC POP ES,DI RET ENDP OutputAL OutputAX PROC ; Write word from AX to output. PUSH DI,ES MOV DI,[OutputPtr] CMP DI,SIZE# [OUTPUT] - 1 JB .8: CALL OutputFlush: SUB DI,DI .8:PUSH PARA# [OUTPUT] POP ES STOSW MOV [OutputPtr],DI CLC POP ES,DI RET ENDP OutputAX OutputEAX PROC ; Write dword from EAX to output. PUSH DI,ES MOV DI,[OutputPtr] CMP DI,SIZE# [OUTPUT] - 3 JB .8: CALL OutputFlush: SUB DI,DI .8:PUSH PARA# [OUTPUT] POP ES STOSD MOV [OutputPtr],DI CLC POP ES,DI RET ENDP OutputEAX OutputFlush:PROC ; Write contents of [OUTPUT] segment to OutFile. ; Input: [OutputPtr] is the size to be written. ; Output:[OutputPtr] is set to 0. PUSHAW MOV CX,[OutputPtr] JCXZ .9: PUSH DS SUB DX,DX DosAPI AH=40h,BX=[OutHandle],DS=[SegOUTPUT] ; WRITE TO FILE OR DEVICE. POP DS JC DosMain.OutputError: ; Abort on error. SUB DI,DI ADD [OutFileSizeLo],AX ADC [OutFileSizeHi],DI MOV [OutputPtr],DI .9: POPAW RET ENDP OutputFlush:
0x0000_00A0
.
encStUtf16,encStUtf32
specify input character width (8,16,32)
encStLe,encStBe
specify input character endianess
encStHtm,encStHtml
specify if
ASCII entities & < > " should be decoded, too.DosHtmlDecode PROC PUSHAW MOV BP,SP %EntEnd %SET BP-2 ; Local WORD variable for offset behind the semicolon, which terminates the entity. SUB SP,2 PUSH ES MOV CX,DX MOV DI,SI SUB CX,SI JNC .00: MOV CX,0xFFFC .00:MOV EAX,';' JSt [InpEncSt],encStUtf16, .10: JSt [InpEncSt],encStUtf32, .30: REPNE SCASB ; Search for entity terminator in 8bit input stream. JNE .90: MOV [%EntEnd],DI ; Remember the input stream position behind semicolon. SUB DI,SI MOV CX,DI ; Size of potential entity. CMP CX,SIZE# TempString JA .90: MOV DI,TempString ; Copy entity to TempString in segment DS. .05:LODSB [ES:SI] CMP AL,128 JAE .90: MOV [DI],AL INC DI LOOP .05: ; Copy entity contents, e.g. nbsp; or #x78AB, to TempString. JMP .50: .10:SHR CX,1 ; Input characters are 16bit. JNSt [InpEncSt],encStBe,.15: XCHG AL,AH .15:REPNE SCASW ; Search for entity terminator in 16bit input stream. JNE .90: MOV [%EntEnd],DI ; Remember the input stream position behind semicolon. SUB DI,SI MOV CX,DI ; Size of potential entity. SHR CX,1 ; Size in WORDs. JZ .90: CMP CX,SIZE# TempString JA .90: MOV DI,TempString ; Convert entity to 7bit ASCII and copy to TempString. .20:LODSW [ES:SI] JNSt [InpEncSt],encStBe,.25: XCHG AL,AH .25:CMP AX,128 JAE .90: MOV [DI],AL INC DI LOOP .20: JMP .50: .30:SHR ECX,2 ; Input characters are 32bit. JNSt [InpEncSt],encStBe,.35: BSWAP EAX .35:REPNE SCASD ; Search for entity terminator in 32bit input stream. JNE .90: MOV [%EntEnd],DI ; Remember the input stream position behind semicolon. SUB DI,SI MOV CX,DI ; Size of potential entity. SHR CX,2 ; Size in DWORDs. JZ .90: CMP CX,SIZE# TempString JA .90: MOV DI,TempString ; Convert entity to 7bit ASCII and copy to TempString. .40:LODSD [ES:SI] JNSt [InpEncSt],encStBe,.45: BSWAP EAX .45:CMP EAX,128 JAE .90: MOV [DI],AL INC DI LOOP .40: .50:MOV SI,TempString ; DS:SI now points to the entity (without ampersand) terminated with semicolon. PUSH DS POP ES SUB DI,SI ; DI is now the size of HTML entity in ASCII bytes, including semicolon. LODSB CMP AL,'#' ; Test if the entity is numeric. JE .65: CMP DI,8+1 ; Longer named entities are not supported. JA .90: DEC SI ; 1st character is not #, so its a named entity. MOV CL,5 LODSD XOR EBX,EBX DEC EBX ; Prepare mask to EBX. SUB CX,DI JS .55: ; TempString has 1..4 letters. SAL CX,3 SHR EBX,CL AND EAX,EBX ; Pad the shorter entity name with NUL bytes. MOV CX,[TableDir.Entities4] MOV DI,[TableDir.EntName4] REPNE SCASD ; Search for the entity by name. JNE .90: SUB DI,4 SUB DI,[TableDir.EntName4] SHR DI,1 ADD DI,[TableDir.EntVal4] MOVZXW EAX,[DI] JMP .80: ; Decoded entity codepoint is now in EAX. .55: ; Entity has 5..8 letters. XCHG EAX,EDX ; Temporarily save first four characters to EDX. LODSD ; Fifth to eighth characters. ADD CX,4 SAL CX,3 SHR EBX,CL AND EAX,EBX ; Pad the entity with NUL bytes. XCHG EDX,EAX MOV CX,[TableDir.Entities8] MOV DI,[TableDir.EntName8A] .60:REPNE SCASD ; Search for the entity by its first four letters. JNE .90: PUSH DI SUB DI,4 SUB DI,[TableDir.EntName8A] ADD DI,[TableDir.EntName8B] CMP EDX,[DI] ; Compare masked fifth..eighth letters. POP DI JNE .60: ; Continue search if the entity differed in 5..8th characters. SUB DI,4 SUB DI,[TableDir.EntName8A] SHR DI,1 ADD DI,[TableDir.EntVal8] MOVZXW EAX,[DI] JMP .80: ; Decoded entity codepoint is now in EAX. .65: ; Numeric entity expected. LODSB CMP AL,'0' JB .90: OR AL,'x'^'X' CMP AL,'x' JE .70: DEC SI ; DS:SI should now point to decimal number terminated with semicolon. LodDD SI ; Use macro LodDD from library cpuext16. JMP .75: .70:; DS:SI should now point to hexadecimal number terminated with semicolon. LodHD SI ; Use macro LodHD from library cpuext16. .75:JC .90: ; Abort if wrong number syntax. TEST DX JNZ .90: ; Abort if decoded entity is not in BMP. CMPB [SI],';' JNE .90: .80:; AX is decoded codepoint, [%EntEnd] is offset behind the entity in text. JSt [InpEncSt],encStHtml,.85: CMP AX,128 JB .90: ; Skip if ASCII entities should not be converted. .85:MOV DX,[%EntEnd] MOV [BP+2],DX ; %ReturnSI. MOV [BP+14],AX ; %ReturnAX. .90: XOR EAX,EAX POP ES MOV SP,BP POPAW RET ENDP DosHtmlDecode
DosAutodetect PROC ; Autodetect input encoding of text ES:SI..ES:DX. SUB CX,CX ; CX will be CP index (0,1,2,,,[CodePages]). MOV [BestRelevance+0],CX MOV [BestRelevance+2],0x8000 ; Initialize BestRelevance with lowest signed integer. MOV [InpEncId],CX MOV [InpEncSt],encStAuto .20:SUB EBP,EBP MOV [Relevance],EBP MOV BP,[TableDir.CPid] ADD BP,CX ADD BP,CX MOV AX,[DS:BP] CMP AX,912 JNE .40: DEC [Relevance] ; Slightly discriminate IBM912 against almost identical ISO8859-2. .40:Invoke DosConvert,AX,SI,DX, GetRelevance: MOV EAX,[Relevance] CMP EAX,[BestRelevance] JLE .80: ; Encoding indexed by CX is better candidate than all previous ones. MOV [BestRelevance],EAX MOV BP,[TableDir.CPid] ADD BP,CX ADD BP,CX MOV AX,[DS:BP] MOV [InpEncId],AX ; Remember the so far best input encoding. .80:INC CX ; Try the next codepage. CMP CX,[TableDir.CodePages] JB .20: RET ENDP DosAutodetect
utf
.
DosGrep %MACRO Needle, TextBegin, TextEnd PUSHW =B'%Needle', %TextEnd MOV DI,%TextBegin CALL DosGrep@RT DosGrep@RT:PROC1 PUSHAW MOV BP,SP PUSH ES,DS POP ES MOV SI,[%Par2] ; ES:SI is now the Needle. GetLength$ SI ; Return Needle size in CX. MOV DX,CX DEC DX MOV CX,[%Par1] ; TextEnd. SUB CX,DI ; Text size. JB .9: LODSB [DS:SI] ; First character of Needle. PUSH SS POP ES .1: REPNE SCASB [ES:DI] ; Search for the 1st char of Needle. JNE .9: PUSH CX,SI,DI MOV CX,DX REPE CMPSB ; Compare the rest of Needle. POP DI,SI,CX JNE .1: ADD DI,DX MOV [%ReturnDI],DI CMP DX,DX ; Set ZF=1. .9: POP ES POPAW RET 2*2 ; ZF=0 if Needle was not found. ENDP1 DosGrep@RT %ENDMACRO DosGrep
DosGrepNum %MACRO TextBegin, TextEnd PUSHW %TextEnd, %TextBegin CALL DosGrepNum@RT DosGrepNum@RT:PROC1 PUSHAW MOV BP,SP MOV CX,[%Par2] ; TextEnd. MOV SI,[%Par1] ; TextBegin. SUB CX,SI JB .9: SUB AX,AX .3: LODSB [SS:SI] SUB AL,'0' JB .5: CMP AL,9 JNA .7: .5: DEC CX JNZ .3: STC JMP .9: .7: DEC SI PUSH DS,SS POP DS LodD SI POP DS JC .9: MOV [%ReturnSI],SI MOV [%ReturnAX],AX .9: POPAW RET 2*2 ; CF=0 if a valid number EAX was found. ENDP1 DosGrepNum@RT %ENDMACRO DosGrepNum
Utf-16-LE-BOM
.
DosParseEnc Procedure EncPtr, EncSize Enc$ LocalVar Size=32 ; Input string converted to lower case. Enc$End LocalVar ; Pointer to the end of string in Enc$. ClearLocalVar PUSH ES MOV SI,[%EncPtr] MOV CX,[%EncSize] XOR BX,BX MOV [%ReturnAX],BX MOV [%ReturnBX],BX CMP CX,24 JNB .Err: ; Argument is too long. CMP CL,2 JB .Err: ; Argument is too short. CMPB [ES:SI],'"' JNE .NoQ: INC SI ; Argument is in quotes. DEC CX,CX ; Omit the quotes. .NoQ: TEST CX JZ .Err: ; Argument is empty. MOV DX,CX LEA DI,[%Enc$] .LoCa:LODSB [ES:SI] OR AL,0x20 ; Simplified conversion to lower case. MOV [SS:DI],AL INC DI DEC CX JNZ .LoCa: LEA SI,[%Enc$] ADD DX,SI MOV [%Enc$End],DX ; Parse all encoding properties from text SS:SI..SS:DX into flags in BX. prop %FOR ascii,utf,bom,le,be,htm,html,qm,ign,transl,oem,ansi,auto,enc DosGrep %prop,SI,DX %Prop1 %SETC '%prop[1]' & ~('A'^'a') ; Uppercase the 1st letter of %prop. JNE .N%prop: SetSt BX,encSt%Prop1%prop[2..] .N%prop: %ENDFOR prop JNSt BX,encStHtml, .Shtm: RstSt BX,encStHtm .Shtm:JNSt BX,encStAscii, .Nas: MOV AX,20127 JMP .End: .Nas: JNSt BX,encStAuto|encStEnc, .Nau: XOR AX,AX JMP .End: .Nau: JNSt BX,encStOem|encStAnsi, .Noe: PUSH BX,DX DosAPI AX=6601h ; GET GLOBAL CODE PAGE TABLE. MOV AX,437 ; If DosAPI failed, use default IBM437. JC .Noa: MOV AX,BX ; AX is now the number of active code page. .Noa: POP DX,BX SetSt BX,encStAuto JMP .End: .Noe: JNSt BX,encStUtf,.Nut: DosGrepNum SI,DX ; Distinguish UTF-8/16/32. JC .Err: Dispatch AL,8,16,32 .Err: STC JMP .Ret: .8: SetSt BX,encStUtf8 MOV AX,65001 JMP .End: .16: SetSt BX,encStUtf16 MOV AX,1200 JMP .En: .32: SetSt BX,encStUtf32 MOV AX,12000 .En: JSt BX,encStLe, .End: JNSt BX,encStBe, .End: ; Endianess will be detected later. INC AX JMP .End: .Nut: ; Try direct CPid numeric specification. DosGrepNum SI,DX JC .Nnum: ; No number in encoding specifications. MOV DI,[TableDir.CPid] MOV CX,[TableDir.CodePages] ; Number of supported code pages. PUSH DS ; Try to find the number AX in array [CPid]. POP ES REPNE SCASW JE .End: ; String Enc$ is not direct CPid. It could contain some alternative CP number: Dispatch AX,8859,790,916,919,920,921,923,991,1208 JMP .Nnum: .1208:MOV AX,65001 ; IBM1208 is alias of UTF-8 = CP65001. SetSt BX,encStUtf+encStUtf8 JMP .End: .790: .991: MOV AX,667 ; IBM790,IBM991 is Mazovia=CP667. JMP .End: .916: MOV AX,28598 ; "ISO-8859-8","IBM916, Latin/Hebrew" JMP .End: .919: MOV AX,28600 ;"ISO-8859-10","IBM919, Latin 6, Nordic" JMP .End: .920: MOV AX,28599 ;"ISO-8859-9","IBM920, Latin 5, Turkish", JMP .End: .921: MOV AX,28603 ; "ISO-8859-13","IBM921, Latin 7, Baltic" JMP .End: .923: MOV AX,28605 ; "ISO-8859-15","IBM923, Latin 9, Western Europe" JMP .End: .8859:DosGrepNum SI,DX ; Get the number following ISO-8859-. JC .Err: TEST AX ; ISO-8859-0 is not supported. JZ .Err: CMP AX,12 ; ISO-8859-12 is not supported. JE .Err: CMP AX,16 ; 8859-1 .. 8859-16 is supported. JA .Err: ADD AX,28590 JMP .End: .Nnum: ; No numeric CPid in Enc$ was found (or 8 in KOI8). Try letter-only strings. LEA SI,[%Enc$] MOV AX,10101 DosGrep nextstep,SI,DX JE .End: MOV AX,667 DosGrep mazo,SI,DX ; Try Mazovia. JE .End: MOV AX,895 DosGrep kame,SI,DX ; Try Kamenických alias KEYBCS2. JE .End: DosGrep bcs2,SI,DX JE .End: DosGrep koi8,SI,DX ; Try KOI8. JNE .MAC: MOV SI,DI ; Points behind KOI8. MOV AX,885 ; Try KOI8-CS. DosGrep cs,SI,DX JE .End: MOV AX,878 ; Try KOI8-R. DosGrep r,SI,DX JE .End: MOV AX,1168 ; Try KOI8-U. DosGrep u,SI,DX JE .End: MOV AX,880 ; Try KOI8-E. DosGrep e,SI,DX JE .End: MOV AX,882 ; Try KOI8-T. DosGrep t,SI,DX JE .End: MOV AX,884 ; Try KOI8-F. DosGrep f,SI,DX JE .End: .MAC: LEA SI,[%Enc$] DosGrep mac,SI,DX ; Try Macintosh. JNE .Err: ; If no other choices are left, give up. MOV AX,10010 DosGrep romanian,SI,DX JE .End: DosGrep rumun,SI,DX JE .End: MOV AX,10000 DosGrep roman,SI,DX JE .End: MOV AX,10004 DosGrep arab,SI,DX JE .End: MOV AX,10005 DosGrep hebre,SI,DX JE .End: MOV AX,10006 DosGrep greek,SI,DX JE .End: MOV AX,10007 DosGrep cyril,SI,DX JE .End: MOV AX,10017 DosGrep ukr,SI,DX JE .End: MOV Ax,10021 DosGrep thai,SI,DX JE .End:: MOV AX,10079 DosGrep iceland,SI,DX JE .End: MOV AX,10029 DosGrep ce,SI,DX JE .End: MOV AX,10080 DosGrep inuit,SI,DX JE .End: MOV AX,10081 DosGrep turk,SI,DX JE .End: MOV AX,10082 DosGrep croat,SI,DX JE .End: MOV AX,10083 DosGrep gael,SI,DX JE .End: MOV AX,10084 DosGrep celtic,SI,DX JE .End: MOV AX,10089 DosGrep latin,SI,DX JE .End: DosGrep kermit,SI,DX JE .End: STC JMP .Ret: .End: MOV [%ReturnAX],AX MOV [%ReturnBX],BX .Ret:POP ES EndProcedure DosParseEnc ; CF=error
DosEncList PROC PUSH DS POP ES StdOutput =' EuroConv supported encodings',Eol=Yes StdOutput Eol=Yes StdOutput =' OEM/ANSI 8bit code pages:',Eol=Yes SUB DX,DX ; Encoding index 0..[CodePages]. .10:MOV SI,[TableDir.CPid] ADD SI,DX ADD SI,DX LODSW Dispatch AX,20127,65001,1200,1201,12000,12001 StoD TempString XOR AX,AX STOSW MOV BX,[TableDir.CPname] ADD BX,DX ADD BX,DX MOV BX,[BX] ADD BX,[TableDir.CPinfo] MOV SI,[TableDir.CPrem] ADD SI,DX ADD SI,DX MOV SI,[SI] ADD SI,[TableDir.CPinfo] StdOutput ='CP',TempString,=', "',BX,='" ',SI,Eol=Yes .20127: .65001: .1200: .1201: .12000: .12001: INC DX CMP DX,[TableDir.CodePages] JB .10: StdOutput Eol=Yes StdOutput =' Plain ASCII 7bit encoding:',Eol=Yes StdOutput ='CP20127, "ASCII"',Eol=Yes StdOutput Eol=Yes StdOutput =' Unicode encoding:',Eol=Yes StdOutput ='CP1200, "UTF-16LE"',Eol=Yes StdOutput ='CP1201, "UTF-16BE"',Eol=Yes StdOutput =' "UTF-16" (endianess will be autodetected)',Eol=Yes StdOutput ='CP12000, "UTF-32LE"',Eol=Yes StdOutput ='CP12001, "UTF-32BE"',Eol=Yes StdOutput =' "UTF-32" (endianess will be autodetected)',Eol=Yes StdOutput ='CP65001, "UTF-8"',Eol=Yes StdOutput Eol=Yes StdOutput =' Special assignment:',Eol=Yes DosAPI AX=6601h ; GET GLOBAL CODE PAGE TABLE. MOV AX,437 ; If DosAPI failed, use default IBM437. JC .Noa: MOV AX,BX .Noa:StoD TempString XOR AX,AX STOSB StdOutput ='OEM = console encoding selected by regional settings: CP',TempString,Eol=Yes StdOutput ='AUTO = autodetect encoding',Eol=Yes StdOutput ='ENC = display this list of supported encodings.',Eol=Yes StdOutput Eol=Yes StdOutput =' Encoding modifiers:',Eol=Yes StdOutput ='/BOM = write Byte Order Mark (valid only with UTF encodings)',Eol=Yes StdOutput ='/IGN = omit characters not supported in output encoding',Eol=Yes StdOutput ='/QM = replace characters not supported in output encoding with "?"',Eol=Yes StdOutput ='/HTML = replace characters not supported in output encoding with HTML-entity',Eol=Yes StdOutput ='/TRANS = transliterate characters not supported in output encoding (default)',Eol=Yes RET ENDP DosEncList
DosInfoText Procedure Direction, FileName, FileSizeLo, FileSizeHi, EncId, EncSt, ErrorsLo, ErrorsHi PUSH DS POP ES MOV DX,[%FileName] StdOutput [%Direction],='put file: "',DX,='"',Eol=Yes MOV AX,[%FileSizeLo] MOV DX,[%FileSizeHi] StoDD TempString XOR AX,AX STOSB StdOutput [%Direction],='put size: ',TempString, Eol=Yes MOV AX,[%EncId] StoD TempString, Signed=No MOVB [DI],0 StdOutput [%Direction],='put encoding: CP',TempString,=', "' MOV CX,[TableDir.CodePages] MOV DI,[TableDir.CPid] REPNE SCASW SUB DI,2 SUB DI,[TableDir.CPid] MOV DX,DI ; Temporary store CP word index to DX. ADD DI,[TableDir.CPname] MOV DI,[DI] ADD DI,[TableDir.CPinfo] StdOutput DI MOV BX,[%EncSt] JNSt BX,encStBom,.20: StdOutput ="/BOM" .20:JNSt BX,encStIgn,.30: StdOutput ='/IGN' .30:JNSt BX,encStQm, .40: StdOutput ='/QM' .40:JNSt BX,encStHtm,.50: StdOutput ="/HTM" .50:JNSt BX,encStHtml,.60: StdOutput ="/HTML" .60:MOV DI,DX ; Restore CP word index. ADD DI,[TableDir.CPrem] MOV DI,[DI] ADD DI,[TableDir.CPinfo] StdOutput ='", ',DI JNSt BX,encStOem,.80: StdOutput ='", ',="(OEM)," .80:JNSt BX,encStAuto,.90: StdOutput =' (autodetected)' .90:StdOutput Eol=Yes MOV AX,[%ErrorsLo] MOV DX,[%ErrorsHi] StoDD TempString XOR AX,AX STOSB StdOutput [%Direction],='put errors: ',TempString, Eol=Yes EndProcedure DosInfoText
ENDPROGRAM eurocond ; End of 16bit DOS version of EuroConvertor (used as a stub of Windows version).
WinHeader specifies format of Windows program and included macrolibraries and it also declares the segment order.
Windows version reuses at runtime the Tables defined in DOS variant. They are located in front of the first segment [.text] within the stub at the beginning of image at virtual address specified as ImageBase (usually 0x0040_0000).
Windows variant uses macros with the same names as the DOS variant. As both DOS and Windows programs are defined in one source file, the names of 16bit macros will be dropped here.
If you don't have the import libraryobjlib\winapi.lib, create it first with the sample project DLL2LIB : change to directory..\prowin32\and executeeuroasm dll2lib.htm
.
euroconv PROGRAM Format=PE, Width=32, Entry=WinMain, \ IconFile=euroconv.ico, StubFile="eurocond.exe" %DROPMACRO * ; Discard homonymous macros declared in DOS macrolibraries included to stub program. INCLUDEHEAD1 wins.htm, winscon.htm, winsgui.htm, winapi.htm, winsdlg.htm, \ Include 32bit macrolibraries. winfile.htm, stdcal32.htm, cpuext32.htm, status32.htm, string32.htm INCLUDEHEAD "%^SourceFile" ; Include common constants defined above in DosHeader. INCLUDEHEAD ..\easource\pfmz.htm ; Include declaration of MZ EXE Header structure. LINK winapi.lib ; Import library which declares Windows functions invoked with macro WinAPI. [.text] SEGMENT ALIGN=64 ; Specify order of segments in Windows version. [.data] SEGMENT [.bss] SEGMENT
WinBlockSize EQU 1M ; Used to limit autodetection text size in Windows version. ; Static initialized global data. [.data] ; Byte Order Mark definitions. BOM_UTF32LE DB 0xFF,0xFE,0x00,0x00 BOM_UTF16LE EQU BOM_UTF32LE BOM_UTF32BE DB 0x00,0x00,0xFE,0xFF BOM_UTF16BE EQU BOM_UTF32BE + 2 BOM_UTF8 DB 0xEF,0xBB,0xBF,0 WndClassName D "EuroConv",0 InfoText D "EuroConv - convertor of text file encoding.",0 HelpText: D "Program: EuroConvertor WIN version %Version",13,10 D "Function: Conversion of text file encoding.",13,10 D "Format: Dual DOS/Windows console/GUI application.",13,10 D "Licence: freeware by vitsoft",13,10 D "Arguments: InpEncoding OutEncoding InpFileName OutFileName",13,10 D "Example: euroconv ISO8859-2 utf16/LE/BOM input.txt output.txt",13,10 D "Encodings: euroconv enc | more",13,10 D "Interactive: euroconv",13,10 D "Manual: https://vitsoft.info/econv_en.htm",13,10,0 ; Static uninitialized data. [.bss] ; 32bit copy of pointers from TableDir relocated from stub by WinMain. CodePoint D DWORD ; Offset of WORD array in section [CodePoint]. Relevance D DWORD ; Offset of BYTE array in section [Relevance]. Translit D DWORD ; Offset of DWORD array in section [Translit]. CodePoints D DWORD ; The number of supported codepoints, i.e. the length of previous arrays. EntVal4 D DWORD ; Offset of WORD array in section [EntVal4]. EntName4 D DWORD ; Offset of DWORD array in section [EntName4]. Entities4 D DWORD ; The number of supported HTML entities with 1..4 characters. EntVal8 D DWORD ; Offset of WORD array in section [EntVal8]. EntName8A D DWORD ; Offset of DWORD array in section [EntNamw8A]. EntName8B D DWORD ; Offset of DWORD array in section [EntNamw8B]. Entities8 D DWORD ; The number of supported HTML entities with 5..8 characters. CPid D DWORD ; Offset of WORD array in section [CPid]. CPname D DWORD ; Offset of WORD array in section [CPname]. CPrem D DWORD ; Offset of WORD array in section [CPrem]. CPurl D DWORD ; Offset of WORD array in section [CPurl]. CPtable D DWORD ; Offset of WORD array in section [CPtable]. CPinfo D DWORD ; Offset of ASCIIZ strings in section [CPinfo]. CPtt D DWORD ; Offset of 256*BYTE blocks in section [CPtt]. CodePages D DWORD ; The number of supported encodings, i.e. the length of CP* arrays. ; Other static data of console variant. Errorlevel D DWORD ; ; 0=normal end, 2=invalid characters, 4=I/O error, 8=wrong syntax. RunFromGUI D DWORD ; Nonzero when the CON version was forked from GUI version. InpEncId D DWORD ; Input encoding identifiers (437..65001). OutEncId D DWORD ; Output encoding identifiers (437..65001). OemEncId D DWORD ; Output OEM encoding of current user. AnsiEncId D DWORD ; Output ANSI encoding of current user. InpEncSt D DWORD ; Input encoding flags, see above. OutEncSt D DWORD ; Output encoding flags, see above. InpErrors D DWORD ; Number of input characters which are not defined in input encoding. OutErrors D DWORD ; Number of characters which are not defined in output encoding. InpBegin D DWORD ; Pointer to the first byte of input text mapped in memory. InpEnd D DWORD ; Pointer behind the last byte of input text mapped in memory. DetectSize D DWORD ; Input size used for autodetection. Max 1M. SumRelevance D DWORD ; Total sum of relevances of all characters in the input text. TrTable D DWORD ; Offset of selected array of 128 WORDs with code points of OEM/ANSI encodings. EntitySkip D DWORD ; Number of bytes skipped when HTML entity is decoded on input (0..8). CharSize D DWORD ; Character width in bytes (1..4) used to increase relevance during autodetection. InpFile DS FILE ; FILE structure for access encapsulated by macros OutFile DS FILE ; from library winfile. TempString D 128*BYTE ; Working room for string manipulation. Cmd$ D 24 + 2 * MAX_PATH_SIZE * BYTE ; Room for the cmdline constructed dynamically by WinGui. [.text]
Windows program entry point.
The DOS stub is already mapped in address virtual space of Windows version. Offsets of 16bit data sections will be recalculated from TableDir to 32bit pointers and used in this Windows program.
[.text] WinMain Procedure hWnd, uMsg, wParam, lParam Clear SEGMENT# [.bss], Size=SIZE# [.bss] MOV ESI,%^ImageBase ; DOS stub is mapped at this VA. MOVZXW EDX,[ESI+PFMZ_DOS_HEADER.e_cparhdr] ; Paragraph size of MZ DOS header + relocations. SAL EDX,4 ; Convert paragraph size to bytes. LEA EBX,[ESI+EDX] CMPD [EBX+0],"%Signature[1..4]" ; Check if TableDir is present in stub. JNE .BadLink: CMPD [EBX+4],"%Signature[5..8]" JNE .BadLink: ; Copy/Relocate table directory from stub to 32bit [.bss] segment. member %FOR CodePoints,Entities4,Entities8,CodePages ; Copy scalar values. MOVZXW EAX,[EBX+TABLEDIR.%member] MOV [%member],EAX %ENDFOR member member %FOR CodePoint,Relevance,Translit,EntVal4,EntName4,\ Relocate pointers. EntVal8,EntName8A,EntName8B,CPid,CPname,CPrem,\ CPurl,CPtable,CPinfo,CPtt MOVZXW EAX,[EBX+TABLEDIR.%member] ADD EAX,EBX MOV [%member],EAX %ENDFOR member WinAPI GetOEMCP MOV [OemEncId],EAX WinAPI GetACP MOV [AnsiEncId],EAX GetArg 5 ; Undocumented 5th argument was used when this instance was forked from GUI. JC .10: ; Its value is arbitrary. Its existence will set flag RunFromGUI. ORB [RunFromGUI],1 .10:GetArg 1 ; Input encoding. StripQuotes ESI,ECX TEST ECX JNZ WinCON: Invoke WinGui,[%hWnd],[%uMsg],[%wParam],[%lParam] ; When run without parameters, switch to windowed subsystem. JMP .Abort: WinCON: ; If some command-line parameter was specified, continue with console subsystem. CMPB [ESI],'-' JE .Help: CMPB [ESI],'/' JE .Help: Invoke WinParseEnc, ESI,ECX JC .UnknownEncoding: MOV [InpEncId],EAX MOV [InpEncSt],EBX JNSt EBX,encStEnc,.11: CALL WinEncList: ; Display list of supported encodings. JMP .Abort: .11:GetArg 2 ; Output encoding. StripQuotes ESI,ECX Invoke WinParseEnc, ESI,ECX JNC .14: .UnknownEncoding: ORB [Errorlevel],8 JECXZ .Help: StdOutput ESI,Size=ECX StdOutput =' is not supported encoding.',Eol=Yes JMP .Abort: .BadLink: StdOutput ='Internal error, TableDir not found.',Eol=Yes JMP .Abort: .14:MOV [OutEncId],EAX MOV [OutEncSt],EBX GetArg 3 ; Input file name. JNC .19: .Help:StdOutput HelpText ORB [Errorlevel],8 JMP .Abort: .19:StripQuotes ESI,ECX JECXZ .Help: FileAssign InpFile,ESI,Size=ECX FileMapOpen InpFile JNC .23: StdOutput ='Error reading input file "', InpFile.Name, ='"', Eol=Yes ORB [Errorlevel],4 JMP .Abort: .23:MOV [DetectSize],EAX CMP EAX,WinBlockSize JBE .27: MOV [DetectSize],WinBlockSize .27:ADD EAX,ESI MOV [InpBegin],ESI MOV [InpEnd],EAX GetArg 4 ; Output file name. JC .Help: StripQuotes ESI,ECX JECXZ .Help: FileAssign OutFile,ESI,Size=ECX FileStreamCreate OutFile, BufSize=64K JNC .31: .OutputError: StdOutput ='Error writing output file "', OutFile.Name, ='"', Eol=Yes ORB [Errorlevel],4 JMP .Abort: .31:; Cmdline parameters are valid and accepted. Fix input encoding. MOV ESI,[InpBegin] MOV EDX,[DetectSize] ADD EDX,ESI ; Skip the fix if input endianess is irrelevant or explicitly specified. JSt [InpEncSt],encStLe|encStBe|encStAscii|encStUtf8|encStAuto, .69: JNSt [InpEncSt],encStUtf16, .49: ; Autodetect input UTF-16 endianess of text ESI..EDX. MOV EAX,EDX SUB EAX,ESI CMP EAX,2 JB .42: ; Skip if text is too short. MOVZXW EAX,[ESI] CMP AX,[BOM_UTF16LE] JNE .38: SetSt [InpEncSt],encStBom .34:SetSt [InpEncSt],encStLe MOV EAX,1200 JMP .73: .38:CMP AX,[BOM_UTF16BE] JNE .46: SetSt [InpEncSt],encStBom .42:SetSt [InpEncSt],encStBe MOV EAX,1201 JMP .73: .46:; No BOM is present in input, perform empiric autodetection. Invoke WinConvert, 1200,ESI,EDX,Void ; Try UTF-16LE. MOV EBX,EAX ; Number of input errors if UTF-16LE. Invoke WinConvert, 1201,ESI,EDX,Void ; Try UTF-16BE. CMP EAX,EBX JBE .42: ; UTF16BE detected. JMP .34: ; UTF16LE detected. .49:JNSt [InpEncSt],encStUtf32, .69: ; Autodetect input UTF-32 endianess of text ESI..EDX. MOV EAX,EDX SUB EAX,ESI CMP EAX,4 JB .61: MOV EAX,[ESI] CMP EAX,[BOM_UTF32LE] JNE .57: SetSt [InpEncSt],encStBom .53:SetSt [InpEncSt],encStLe MOV EAX,12000 JMP .73: .57:CMP EAX,[BOM_UTF32BE] JNE .65: SetSt [InpEncSt],encStBom .61:SetSt [InpEncSt],encStBe MOV EAX,12001 JMP .73: .65:; No BOM is present in input, perform empiric autodetection. Invoke WinConvert, 12000,ESI,EDX,Void ; Try UTF-32LE. MOV EBX,EAX ; Number of input errors if UTF-32LE. Invoke WinConvert, 12001,ESI,EDX,Void ; Try UTF-32BE. CMP EAX,EBX JBE .61: ; UTF32BE detected. JMP .53: ; UTF32LE detected. .69:JNSt [InpEncSt],encStAuto, .61: ; Autodetect input encoding of text ESI..EDX. PUSH ESI CALL WinAutodetect: POP ESI .73:MOV EDX,[InpEnd] ; [InpEncId] is now finally specified. ESI..EDX is input text. It may start with BOM. ; Select the output encoding procedure. MOV EAX,[OutEncId] JNSt [OutEncSt],encStAuto|encStOem,.77: MOV EAX,[OemEncId] MOV [OutEncId],EAX SetSt [OutEncSt],encStAuto|encStOem .77:Dispatch AX,65001,1200,1201,12000,12001,20127 ; Undispatched output encoding is 8bit OEM/ANSI. MOV ECX,[CodePages] MOV EDI,[CPid] REPNE SCASW JNE .20127: ; Unsupported output encoding - use ASCII. SUB EDI,2 SUB EDI,[CPid] ADD EDI,[CPtable] MOVZXW EAX,[EDI] ADD EAX,[CPtt] MOV [TrTable],EAX MOV EDI, To8bit: JMP .88: .65001:MOV EDI, ToUTF8: MOV ECX,EDX SUB ECX,ESI CMP ECX,4 JBE .88: LODSD DEC ESI AND EAX,0x00FF_FFFF CMP EAX,[BOM_UTF8] JE .88: SUB ESI,3 ; No input UTF8-BOM is present. JMP .88: .1200: .1201:MOV EDI, ToUTF16: MOV ECX,EDX SUB ECX,ESI CMP ECX,2 JB .88: TEST AL,1 ; Difference between BE and LE. LODSW JZ .81: CMP AX,[BOM_UTF16BE] JE .88: .81:CMP AX,[BOM_UTF16LE] JE .88: SUB ESI,2 ; No input BOM is present. JMP .88: .12000: .12001: MOV EDI, ToUTF32: MOV ECX,EDX SUB ECX,ESI CMP ECX,4 JB .88: TEST AL,1 ; Difference between BE and LE. LODSD JZ .84: CMP EAX,[BOM_UTF32BE] JE .88: .84:CMP EAX,[BOM_UTF32LE] JE .88: SUB ESI,4 ; No input BOM is present. JMP .88: .20127: MOV EDI, ToASCII: .88: ; ESI..EDX is now input text with input BOM removed. JNSt [OutEncSt],encStUtf,.95: ; No BOM in non-Unicode encodings. JNSt [OutEncSt],encStBom,.95: ; No output BOM if not explicitely requested. ; Output BOM was requested. Write BOM before invokation of WinConvert. MOV EAX,[OutEncId] Dispatch AX,1200d,1201d,12000d,12001d FileStreamWrite OutFile,BOM_UTF8,3 JMP .90: .12001d:FileStreamWrite OutFile,BOM_UTF32BE,4 JMP .90: .12000d:FileStreamWrite OutFile,BOM_UTF32LE,4 JMP .90: .1201d: FileStreamWrite OutFile,BOM_UTF16BE,2 JMP .90: .1200d:FileStreamWrite OutFile,BOM_UTF16LE,2 .90:JC .OutputError: .95:; ESI..EDX is input text without BOM. Output encoding callback procedure is now in EDI. Invoke WinConvert,[InpEncId],ESI,EDX,EDI ; The final conversion. MOV [InpErrors],EAX Invoke WinInfoText, =' In', InpFile,[InpEncId],[InpEncSt],[InpErrors] Invoke WinInfoText, ='Out', OutFile,[OutEncId],[OutEncSt],[OutErrors] MOV EAX,[InpErrors] OR EAX,[OutErrors] JZ .Abort: ORB [Errorlevel],2 .Abort: FileClose OutFile, InpFile TESTB [RunFromGUI],1 ; Test if CON was forked from GUI version. JZ .NoGUI: StdOutput Eol=Yes ; Otherwise give the user some time to read InfoText. StdOutput ="Press any key to quit.",Eol=Yes WinAPI GetStdHandle,STD_INPUT_HANDLE MOV EBX,EAX PUSH ESI MOV ESI,ESP WinAPI GetConsoleMode,EBX,ESI POP ESI WinAPI SetConsoleMode,EBX,0 ; Switch the console to raw mode. WinAPI ReadConsole,EBX,TempString+4,1,TempString,0 WinAPI SetConsoleMode,EBX,ESI ; Restore original console mode from ESI. .NoGUI: TerminateProgram [Errorlevel] EndProcedure WinMain
WinConvert Procedure InpEncId,TextPtr,TextEnd,WinOutputProc MOV EAX,[%InpEncId] MOV ESI,[%TextPtr] MOV EDX,[%TextEnd] SUB ECX,ECX MOV [%ReturnEAX],ECX ; Initialize input error counter. INC ECX MOV [CharSize],ECX ; Initialize CharSize=1. It will be increased if input encoding is Unicode. AND ECX,EAX ; Let ECX=1 for odd InpEncId, ECX=0 for even InpEncId (endianess in UTF-16 and UTF-32). Dispatch AX,65001,20127,1200,1201,12000,12001 ; Special encodings UTF or ASCII. ; Undispatched encodings is 8bit, let's select the translation table. ; Convert from OEM or ANSI 8bit encoding. MOV ECX,[CodePages] MOV EDI,[CPid] REPNE SCASW JNE .05: SUB EDI,[CPid] ; EDI is now offset of used CP in array of WORDs. ADD EDI,[CPtable] MOVZXW EBX,[EDI-2] ; EAX is now the corresponding offset of translation table in array CPtable. CMP BX,-1 ; If the CP doesn't have translation table (UTF or ASCII). JNE .10: .05:; Unsupported CPid. DECD [%ReturnEAX] ; Change error counter from 0 to 0xFFFF_FFFF if wrong CPid. JMP .90: .10:ADD EBX,[CPtt] ; EBX now points to translation table with 128 WORDs. .15:CMP ESI,EDX JNB .90: XOR EAX,EAX LODSB CMP AL,128 JB .20: MOV AX,[2*EAX+EBX-256] ; Translate character 0x80..0xFF to Unicode. CMP AX,Replacement CMC ; CF=1 if AX=0xFFFD,0xFFFE or 0xFFFF (replacement or undefined). ADCD [%ReturnEAX],0 ; Input error. .20:CMP EAX,'&' ; Possible beginning of HTML entity. JNE .22: JNSt [InpEncSt],encStHtm|encStHtml,.22: ; Skip if HTML entity should be ignored. CALL WinHtmlDecode: .22:CALL [%WinOutputProc] JC .90: JMP .15: ; The next input character. .65001: ; Convert from UTF-8 encoding. SUB EBX,EBX SUB EDX,ESI MOV [%ReturnEAX],EBX DecodeUTF8 ESI,.Store,Size=EDX,Width=32 ; Uses macro from string32.htm. JMP .90: .Store:PROC ; Internal subprocedure .Store is callback from the macro DecodeUTF8. ; It is expected to pass decoded codepoint EAX to %WinOutputProc. MOVB [CharSize],1 ; CharSize will be applied in codepage autodetection in GetRelevance. CMP EAX,80h JNA .2: MOVB [CharSize],2 CMP EAX,800h JNA .2: MOVB [CharSize],3 .2: CMP EAX,Replacement CMC ADCD [%ReturnEAX],0 ; Replacement and unsupported codepoints increment input error counter. MOV ECX,[EntitySkip] JECXZ .5: DEC ECX MOV [EntitySkip],ECX RET ; Ignore the remaining letters of already decoded HTML entity. .5: CMP EAX,'&' ; Possible beginning of HTML entity. JNE .9: JNSt [InpEncSt],encStHtm|encStHtml,.9: MOV EDI,ESI CALL WinHtmlDecode: SUB ESI,EDI ; How many bytes should decoder advance to skip the decoded entity (0..9). MOV [EntitySkip],ESI .9: JMP [%WinOutputProc] ENDP .Store: .20127: ; Convert from ASCII encoding. CMP ESI,EDX JNB .90: XOR EAX,EAX LODSB CMP AL,128 JB .25: MOV AX,Replacement INCD [%ReturnEAX] ; Input error. .25:JNSt [InpEncSt],encStHtm|encStHtml,.27: CALL WinHtmlDecode .27:CALL [%WinOutputProc] JC .90: JMP .20127: ; The next character. .1200: ; Convert from UTF-16LE encoding. ECX=0. .1201: ; Convert from UTF-16BE encoding. ECX=1. MOVB [CharSize],2 MOV EBX,EDX SUB EBX,ESI AND EBX,1 JZ .30: SUB EDX,EBX ; Text size is not WORD aligned, truncate. INCD [%ReturnEAX] .30:CMP ESI,EDX JNB .90: XOR EAX,EAX LODSW JECXZ .35: XCHG AL,AH ; Convert UTF-16BE to UTF-16LE. .35:CMP AX,0xD7FF JBE .55: CMP AX,0xE000 JAE .55: ; High surrogate expected (0xD800..0xDBFF). SUB AX,0xD800 CMP AX,0x0400 JAE .45: MOV EDI,EAX ; Temporary save high 10 bits. SHL EDI,10 CMP ESI,EDX JNB .45: LODSW ; Fetch the low surrogate. JECXZ .40: XCHG AL,AH ; Convert UTF-16BE to UTF-16LE. .40:; Low surrogate expected (0xDC00..0xDFFF). SUB AX,0xDC00 JB .45: CMP AX,0x0400 JB .50: .45:MOV EAX,Replacement INCD [%ReturnEAX] JMP .55: .50:LEA EAX,[EAX+EDI+0x10000] ; Compose codepoint from both surrogates. .55:JNSt [InpEncSt],encStHtm|encStHtml,.57: CALL WinHtmlDecode: .57:CALL [%WinOutputProc] JC .90: JMP .30: .12000: ; Convert from UTF-32LE encoding. ECX=0. .12001: ; Convert from UTF-32BE encoding. ECX=1. MOVB [CharSize],4 MOV EBX,EDX SUB EBX,ESI AND EBX,3 JZ .60: SUB EDX,EBX ; Text size is not DWORD aligned, truncate. INCD [%ReturnEAX] .60:CMP ESI,EDX JNB .90: LODSD JECXZ .65: ; ECX=1 if UTF-32BE. BSWAP EAX ; Convert UTF-32BE to UTF-32LE. .65:CMP EAX,10FFFFh JA .70: ; Invalid above 10FFFFh. CMP EAX,0xD800 JB .80: ; Valid below 0xD800. CMP EAX,0xDFFF ; Invalid below 0xDFFF. JA .80: .70:MOV EAX,Replacement INCD [%ReturnEAX] .80:JNSt [InpEncSt],encStHtm|encStHtml,.85: CALL WinHtmlDecode: .85:CALL [%WinOutputProc] JNC .60: .90:EndProcedure WinConvert
Void PROC ; Empty conversion. CLC RET ENDP Void GetRelevance PROC ; Relevance is probability, that codepoint EAX appears in input text. PUSH ECX,EDI TEST EAX,0xFFFF_0000 JNZ .4: MOV EDI,[CodePoint] MOV ECX,[CodePoints] REPNE SCASW .4: MOV EAX,?? ; Relevance of invalid character is negative. JNE .9: DEC EDI,EDI SUB EDI,[CodePoint] SHR EDI,1 ADD EDI,[Relevance] MOVSXB EAX,[EDI] .9: IMUL EAX,[CharSize] ; Longer characters have bigger relevance. ADD [SumRelevance],EAX ; Accumulate the value in global memory location. CLC POP EDI,ECX RET ENDP GetRelevance: Replace PROC ; Replace codepoint EAX which does not exists in output encoding. PUSH EBX,ECX,ESI MOV EBX,[OutEncSt] MOV ESI,TempString JSt EBX,encStIgn,.9: ; Ignore unsupported character. MOV EDI,ESI JNSt EBX,encStQm,.2: .1:MOV EDI,ESI MOVW [EDI],'?' ; Replace codepoint EAX with question mark. JMP .6: .2:JNSt EBX,encStHtm|encStHtml,.3: MOVD [EDI],'' ; Replace codepoint EAX with its HTML entity. ADD EDI,3 StoH EDI,Align=Left MOV AX,';' STOSW JMP .6: .3:TEST EAX,0xFFFF_0000 ; Replace codepoint EAX with its transliteration. JNZ .1: MOV ECX,[CodePoints] MOV EDI,[CodePoint] REPNE SCASW JNE .1: DEC EDI,EDI SUB EDI,[CodePoint] SHL EDI,1 ADD EDI,[Translit] MOV EAX,[EDI] ; EAX now contains 0..4 ASCII character, NUL padded. MOV EDI,ESI ; TempString. .4:CMP AL,0 JZ .5: STOSB SHR EAX,8 JMP .4: .5:STOSB ; NUL-terminate replacement string. .6:SUB EAX,EAX ; TempString at ESI is now ASCIIZ string with replacement. LODSB CMP AL,0 JZ .9: JSt EBX,encStUtf16,.7: JSt EBX,encStUtf32,.8: FileStreamWriteByte OutFile JMP .6: .7:CALL ToUTF16: JMP .6: .8:CALL ToUTF32: JMP .6: .9:POP ESI,ECX,EBX INCD [OutErrors] CLC RET ENDP Replace ToASCII PROC ; Convert codepoint EAX to ASCII encoding. CMP EAX,127 JA Replace: FileStreamWriteByte OutFile RET ENDP ToASCII To8bit PROC ; Convert codepoint EAX to OEM/ANSI encoding using [TrTable]. CMP EAX,127 JBE .8: ; ASCII 7bit characters are copied verbatim. TEST EAX,0xFFFF_0000 JNZ Replace: ; Character outside BMP is replaced with question mark. MOV EDI,[TrTable] PUSH ECX MOV ECX,128 REPNE SCASW ; Search for codepoint in TrTable. POP ECX JNE Replace: ; If codepoint EAX is not supported by output encoding. SUB EDI,[TrTable] ; EDI is now 2,4,6,8,,,256. SHR EDI,1 ; EDI is now 1,2,3,4,,,128. LEA EAX,[EDI+128-1] ; AL is now 128,129,,,255. .8:FileStreamWriteByte OutFile RET ENDP To8bit ToUTF32 PROC ; Convert codepoint EAX to UTF-32 encoding. JNSt [OutEncSt],encStBe, .8: BSWAP EAX .8:FileStreamWriteDword OutFile RET ENDP ToUTF32 ToUTF16 PROC ; Convert codepoint EAX to UTF-16 encoding. TEST EAX,0xFFFF_0000 JZ .5: ; Character outside BMP will be written as two surrogates. SUB EAX,0x10000 MOV EDI,EAX SHR EDI,10 ADD EDI,0xD800 ; EDI is now the high surrogate. XCHG EDI,EAX JNSt [OutEncSt],encStBe,.3: XCHG AL,AH .3:FileStreamWriteWord OutFile XCHG EAX,EDI ; Restore original codepoint in EAX. AND EAX,0x3FF ADD EAX,0xDC00 ; EDI is now the low surrogate. .5:JNSt [OutEncSt],encStBe,.8: XCHG AL,AH .8:FileStreamWriteWord OutFile RET ENDP ToUTF16 ToUTF8 PROC ; Convert codepoint EAX to UTF-8 encoding. MOV EDI,EAX CMP EAX,0x7F JBE .8: CMP EAX,0x7FF JBE .4: CMP EAX,0xFFFF JBE .2: ; 4byte encoding. SHR EAX,18 OR AL,0xF0 CALL .8: MOV EAX,EDI SHR EAX,12 CALL .6: MOV EAX,EDI SHR EAX,6 CALL .6: JMP .5: .2: ; 3byte encoding. SHR EAX,12 OR AL,0xE0 CALL .8: MOV EAX,EDI SHR EAX,6 CALL .6: MOV EAX,EDI JMP .6: .4:; 2byte encoding. SHR EAX,6 OR AL,0xC0 CALL .8: .5:MOV EAX,EDI .6:AND EAX,0x3F OR AL,0x80 .8:FileStreamWriteByte OutFile RET ENDP ToUTF8
0x0000_00A0
.
WinHtmlDecode:PROC PUSHAD MOV ECX,EDX MOV EDI,ESI SUB ECX,ESI MOV EAX,';' JSt [InpEncSt],encStUtf16, .10: JSt [InpEncSt],encStUtf32, .30: REPNE SCASB ; Search for entity terminator in 8bit input stream. JNE .90: MOV EBP,EDI ; Remember the input stream position behind semicolon. SUB EDI,ESI MOV ECX,EDI CMP ECX,SIZE# TempString JA .90: MOV EDI,TempString REP MOVSB JMP .50: .10:SHR ECX,1 JNSt [InpEncSt],encStBe,.15: XCHG AL,AH .15:REPNE SCASW ; Search for entity terminator in 16bit input stream. JNE .90: MOV EBP,EDI ; Remember the input stream position behind semicolon. SUB EDI,ESI MOV ECX,EDI SHR ECX,1 JZ .90: CMP ECX,SIZE# TempString JA .90: MOV EDI,TempString .20:LODSW JNSt [InpEncSt],encStBe,.25: XCHG AL,AH .25:CMP AX,128 JAE .90: STOSB LOOP .20: JMP .50: .30:SHR ECX,2 JNSt [InpEncSt],encStBe,.35: BSWAP EAX .35:REPNE SCASD ; Search for entity terminator in 32bit input stream. JNE .90: MOV EBP,EDI ; Remember the input stream position behind semicolon. SUB EDI,ESI MOV ECX,EDI SHR ECX,2 JZ .90: CMP ECX,SIZE# TempString JA .90: MOV EDI,TempString .40:LODSD JNSt [InpEncSt],encStBe,.45: BSWAP EAX .45:CMP EAX,128 JAE .90: STOSB LOOP .40: .50:MOV ESI,TempString SUB EDI,ESI ; EDI is now the size of HTML entity in ASCII bytes, including semicolon. LODSB CMP AL,'#' ; Test if the entity is numeric. JE .65: CMP EDI,8+1 ; Longer named entities are not supported. JA .90: DEC ESI MOV CL,5 LODSD ; First to fourth characters of entity. SUB ECX,EDI JS .55: ; TempString has 1..4 letters. XOR EBX,EBX DEC EBX ; Prepare mask to EBX. SAL ECX,3 SHR EBX,CL AND EAX,EBX ; NUL-pad shorter entity name. MOV ECX,[Entities4] MOV EDI,[EntName4] REPNE SCASD ; Search for the entity by name. JNE .90: SUB EDI,4 SUB EDI,[EntName4] SHR EDI,1 ADD EDI,[EntVal4] MOVZXW EAX,[EDI] JMP .80: ; Decoded entity codepoint is now in EAX. .55:XCHG EAX,EDX ; Temporarily save first four characters to EDX. LODSD ; Fifth to eighth characters. ADD ECX,4 JS .90: XOR EBX,EBX DEC EBX SAL ECX,3 SHR EBX,CL AND EAX,EBX ; NUL-pad shorter entity name. XCHG EDX,EAX ; Fifth..eighth characters are now in EDX. MOV ECX,[Entities8] MOV EDI,[EntName8A] .60:REPNE SCASD ; Search for the entity by its first four letters. JNE .90: PUSH EDI SUB EDI,4 SUB EDI,[EntName8A] ADD EDI,[EntName8B] CMP EDX,[EDI] POP EDI JNE .60: ; if 5. and higher letters do not match. SUB EDI,4 SUB EDI,[EntName8A] SHR EDI,1 ADD EDI,[EntVal8] MOVZXW EAX,[EDI] JMP .80: ; Decoded entity codepoint is now in EAX. .65:LODSB ; Numeric entity expected. CMP AL,'0' JB .90: OR AL,'x'^'X' CMP AL,'x' JE .75: DEC ESI ; ESI should now point to decimal number terminated with semicolon. LodD ESI ; Use macro LodD or LodH from library cpuext32. JMP .77: .75:LodH ESI ; ESI should now point to hexadecimal number terminated with semicolon. .77:JC .90: XCHG EAX,EDX LODSB CMP AL,';' XCHG EDX,EAX JNE .90: .80:; EAX is decoded codepoint, EBP is pointer behind the entity in text. JSt [InpEncSt],encStHtml,.85: CMP EAX,128 JB .90: ; Skip if ASCII entities should not be converted. .85:MOV [ESP+4],EBP ; %ReturnESI. MOV [ESP+28],EAX ; %ReturnEAX. .90:POPAD RET ENDP WinHtmlDecode:
This PROC will calculate the total relevance of characters in a given input encoding. Relevance will increase when translated character is common letter, and it will decrease when the character is nonalphabetical. This is repeated with each supported encoding and procedure returns the encoding with highest achieved relevance.
Only first 1 MB of big file is used for autodetection.
WinAutodetect: PROC SUB ECX,ECX ; ECX will be CP index (0,1,2,,,[CodePages]). MOV [InpEncId],ECX MOV [InpEncSt],encStAuto MOV EBX,0x8000_0000 ; EBX will remember the so far best relevance. .A1:MOV [SumRelevance],0 PUSH ECX,EDX WinAPI SendMessage,[hInpEncBox],LB_SETCURSEL,ECX,0 TEST EAX JZ .A2: ; Skip animation when not in GUI version. WinAPI Sleep,50 .A2:POP EDX,ECX MOV EAX,[CPid] MOVZXW EAX,[EAX+2*ECX] CMP AX,912 JNE .A3: DEC [SumRelevance] ; Slightly discriminate IBM912 against almost identical ISO8859-2. .A3:Invoke WinConvert,EAX,ESI,EDX, GetRelevance: MOV EAX,[SumRelevance] CMP EAX,EBX JLE .A4: ; Encoding indexed by ECX is better candidate than all previous ones. MOV EBX,EAX MOV EAX,[CPid] MOVZXW EAX,[EAX+2*ECX] MOV [InpEncId],EAX ; Remember the so far best input encoding. .A4:INC ECX ; Try the next codepage. CMP CX,[CodePages] JB .A1: MOV EBX,[hInpEncBox] MOV EDX,[InpEncId] CALL WinSetEncBox: WinAPI SetFocus,[hInpEncBox] RET ENDP WinAutodetect:
WinSetEncBox:PROC ; Input: EBX=EncBoxHandle, EDX=EncId. MOV EDI,[CodePages] .A6:PUSH EDX WinAPI SendMessage,EBX,LB_SETCURSEL,EDI,0 POP EDX TEST EAX JZ .A8: MOV ESI,TempString PUSH EDX WinAPI SendMessage,EBX,LB_GETTEXT,EDI,ESI POP EDX LODSW CMP AX,'CP' JNE .A7: LodD ESI CMP EAX,EDX JE .A9: .A7:PUSH EDX WinAPI Sleep,20 POP EDX .A8:DEC EDI JNS .A6: .A9:RET ENDP WinSetEncBox:
utf
.
WinGrep %MACRO Needle, TextBegin, TextEnd PUSHD %TextEnd, =B'%Needle' MOV EDI,%TextBegin CALL WinGrep@RT WinGrep@RT:PROC1 PUSHAD MOV ESI,[ESP+36] ; Pointer to the Needle. GetLength$ ESI ; Return Needle size in ECX. LEA EDX,[ECX-1] ; Size of Needle. MOV ECX,[ESP+40] ; TextEnd. SUB ECX,EDI ; Text size. JB .9: LODSB ; First character of Needle. .1: REPNE SCASB JNE .9: PUSH ECX,ESI,EDI MOV ECX,EDX REPE CMPSB POP EDI,ESI,ECX JNE .1: LEA EDI,[EDI+EDX] MOV [ESP+0],EDI ; %ReturnEDI. .9: POPAD RET ; ZF=1 if Needle was found. ENDP1 WinGrep@RT %ENDMACRO WinGrep
WinGrepNum %MACRO TextBegin, TextEnd PUSHD %TextEnd, %TextBegin CALL WinGrepNum@RT WinGrepNum@RT:PROC1 PUSHAD MOV ECX,[ESP+40] ; TextEnd. MOV ESI,[ESP+36] ; TextBegin. SUB ECX,ESI JB .9: SUB EAX,EAX .3: LODSB SUB AL,'0' JB .5: CMP AL,9 JNA .7: .5: DEC ECX JNZ .3: STC JMP .9: .7: DEC ESI LodD ESI JC .9: MOV [ESP+4],ESI MOV [ESP+28],EAX .9: POPAD RET ; CF=0 if a valid number EAX was found. ENDP1 WinGrepNum@RT %ENDMACRO WinGrepNum
Utf-16-LE-BOM
.
WinParseEnc Procedure Enc$Ptr, Enc$Size Enc$ LocalVar Size=32 ; Input string converted to lower case. Enc$End LocalVar ; Pointer to the end of string in Enc$. ClearLocalVar MOV ESI,[%Enc$Ptr] MOV ECX,[%Enc$Size] XOR EBX,EBX MOV [%ReturnEAX],EBX MOV [%ReturnEBX],EBX CMP ECX,24 JNB .Err: StripQuotes ESI,ECX TEST ECX JZ .Err:,DIST=NEAR ; Argument is empty. LEA EDI,[%Enc$] .LoCa:LODSB OR AL,0x20 ; Simplified conversion to lower case. STOSB DEC ECX JNZ .LoCa: MOV [%Enc$End],EDI MOV EDX,EDI LEA ESI,[%Enc$] ; Parse all encoding properties from text ESI..EDX into flags in EBX. prop %FOR ascii,utf,bom,le,be,htm,html,qm,ign,transl,oem,ansi,auto,enc WinGrep %prop,ESI,EDX %Prop1 %SETC '%prop[1]' & ~('A'^'a') ; Uppercase the 1st letter of %prop. JNE .N%prop: SetSt EBX,encSt%Prop1%prop[2..] .N%prop: %ENDFOR prop JNSt EBX,encStHtml, .Shtm: RstSt EBX,encStHtm .Shtm:JNSt EBX,encStAscii, .Nas: MOV EAX,20127 JMP .End1: .Nas: JNSt EBX,encStAuto|encStEnc, .Nau: XOR EAX,EAX JMP .End1: .Nau: JNSt EBX,encStOem, .Noe: WinAPI GetOEMCP SetSt EBX,encStAuto JMP .End1: .Noe: JNSt EBX,encStAnsi, .Nan: MOV EAX,[AnsiEncId] .End1:JMP .End:,DIST=NEAR .Nan: JNSt EBX,encStUtf,.Nut: WinGrepNum ESI,EDX JC .Err: Dispatch AL,8,16,32 .Err: STC JMP .Ret: .8: SetSt EBX,encStUtf8 MOV EAX,65001 JMP .End: .16: SetSt EBX,encStUtf16 MOV EAX,1200 JMP .En: .32: SetSt EBX,encStUtf32 MOV EAX,12000 .En: JSt EBX,encStLe, .End: JNSt EBX,encStBe, .End: ; Endianess will be detected later. INC EAX JMP .End:,DIST=NEAR .Nut: ; Try direct CPid numeric specification. WinGrepNum ESI,EDX JC .Nnum: MOV EDI,[CPid] ; Try to find the number EAX in array [CPid]. MOV ECX,[CodePages] ; Number of supported code pages. REPNE SCASW JE .End2:,DIST=NEAR ; String Enc$ is not direct CPid. It could contain some alternative CP number: Dispatch AX,8859,790,916,919,920,921,923,991,1208 JMP .Nnum: .1208:MOV AX,65001 ; IBM1208 is UTF-8 = CP65001. SetSt EBX,encStUtf+encStUtf8 JMP .End2: .790: .991: MOV AX,667 ; IBM790,IBM991 is Mazovia=CP667. JMP .End2: .916: MOV AX,28598 ; "ISO-8859-8","IBM916, Latin/Hebrew" JMP .End2: .919: MOV AX,28600 ;"ISO-8859-10","IBM919, Latin 6, Nordic" JMP .End2: .920: MOV AX,28599 ;"ISO-8859-9","IBM920, Latin 5, Turkish", JMP .End2: .921: MOV AX,28603 ; "ISO-8859-13","IBM921, Latin 7, Baltic" JMP .End2: .923: MOV AX,28605 ; "ISO-8859-15","IBM923, Latin 9, Western Europe" JMP .End2: .8859:WinGrepNum ESI,EDX ; Get number behind ISO-8859-. JC .Err: TEST EAX ; ISO-8859-0 is not supported. JZ .Err: CMP AX,12 ; ISO-8859-12 is not supported. JE .Err: CMP AX,16 ; 8859-1 .. 8859-16 is supported. JA .Err: ADD EAX,28590 .End2:JMP .End:,DIST=NEAR .Nnum: ; No numeric CPid in Enc$ was found (or 8 in KOI8). Try letter-only strings. LEA ESI,[%Enc$] MOV AX,10101 WinGrep nextstep,ESI,EDX JE .End2: MOV AX,667 WinGrep mazo,ESI,EDX ; Try Mazovia. JE .End2: MOV AX,895 WinGrep kame,ESI,EDX ; Try Kamenických alias KEYBCS2. JE .End3: WinGrep bcs2,ESI,EDX JE .End3: WinGrep koi8,ESI,EDX ; Try KOI8. JNE .MAC: MOV ESI,EDI ; Points behind KOI8. MOV AX,885 ; Try KOI8-CS. WinGrep cs,ESI,EDX JE .End3: MOV AX,878 ; Try KOI8-R. WinGrep r,ESI,EDX JE .End3: MOV AX,1168 ; Try KOI8-U. WinGrep u,ESI,EDX JE .End3: MOV AX,880 ; Try KOI8-E. WinGrep e,ESI,EDX JE .End3: MOV AX,882 ; Try KOI8-T. WinGrep t,ESI,EDX JE .End3: MOV AX,884 ; Try KOI8-F. WinGrep f,ESI,EDX .End3:JE .End4:,DIST=NEAR .MAC: LEA ESI,[%Enc$] WinGrep mac,ESI,EDX ; Try Macintosh. JNE .Err: ; If no other choices left, give up. MOV AX,10010 WinGrep romanian,ESI,EDX JE .End4: WinGrep rumun,ESI,EDX JE .End4: MOV AX,10000 WinGrep roman,ESI,EDX JE .End4: MOV AX,10004 WinGrep arab,ESI,EDX JE .End4: MOV AX,10005 WinGrep hebre,ESI,EDX .End4:JE .End5: MOV AX,10006 WinGrep greek,ESI,EDX JE .End5: MOV AX,10007 WinGrep cyril,ESI,EDX JE .End5: MOV AX,10017 WinGrep ukr,ESI,EDX JE .End5: MOV Ax,10021 WinGrep thai,ESI,EDX JE .End5: MOV AX,10079 WinGrep iceland,ESI,EDX JE .End5: MOV AX,10029 WinGrep ce,ESI,EDX JE .End5: MOV AX,10080 WinGrep inuit,ESI,EDX .End5:JE .End: MOV AX,10081 WinGrep turk,ESI,EDX JE .End: MOV AX,10082 WinGrep croat,ESI,EDX JE .End: MOV AX,10083 WinGrep gael,ESI,EDX JE .End: MOV AX,10084 WinGrep celtic,ESI,EDX JE .End: MOV AX,10089 WinGrep latin,ESI,EDX JE .End: WinGrep kermit,ESI,EDX JE .End: STC JMP .Ret: .End: MOV [%ReturnEAX],EAX MOV [%ReturnEBX],EBX .Ret:EndProcedure WinParseEnc ; CF=error
WinEncList PROC StdOutput =' EuroConv supported encodings',Eol=Yes StdOutput Eol=Yes StdOutput =' OEM/ANSI 8bit code pages:',Eol=Yes SUB EDX,EDX ; Encoding index 0..[CodePages]. .10:MOV EAX,[CPid] MOVZXW EAX,[EAX+2*EDX] Dispatch AX,20127,65001,1200,1201,12000,12001 StoD TempString XOR EAX,EAX STOSD MOV EBX,[CPname] MOVZXW EBX,[EBX+2*EDX] MOV ESI,[CPinfo] LEA ESI,[ESI+EBX] MOV EBX,[CPrem] MOVZXW EBX,[EBX+2*EDX] MOV EDI,[CPinfo] LEA EDI,[EDI+EBX] StdOutput ='CP',TempString,=', "',ESI,='" ',EDI,Eol=Yes .20127: .65001: .1200: .1201: .12000: .12001: INC EDX CMP EDX,[CodePages] JB .10: StdOutput Eol=Yes StdOutput =' Plain ASCII 7bit encoding:',Eol=Yes StdOutput ='CP20127, "ASCII"',Eol=Yes StdOutput Eol=Yes StdOutput =' Unicode encoding:',Eol=Yes StdOutput ='CP1200, "UTF-16LE"',Eol=Yes StdOutput ='CP1201, "UTF-16BE"',Eol=Yes StdOutput =' "UTF-16" (endianess will be autodetected)',Eol=Yes StdOutput ='CP12000, "UTF-32LE"',Eol=Yes StdOutput ='CP12001, "UTF-32BE"',Eol=Yes StdOutput =' "UTF-32" (endianess will be autodetected)',Eol=Yes StdOutput ='CP65001, "UTF-8"',Eol=Yes StdOutput Eol=Yes StdOutput =' Special assignment:',Eol=Yes WinAPI GetOEMCP StoD TempString XOR EAX,EAX STOSB StdOutput ='OEM = console encoding selected by regional settings: CP',TempString,Eol=Yes WinAPI GetACP StoD TempString XOR EAX,EAX STOSB StdOutput ='ANSI = graphic encoding selected by regional settings: CP',TempString,Eol=Yes StdOutput ='AUTO = autodetect encoding',Eol=Yes StdOutput ='ENC = display this list of supported encodings.',Eol=Yes StdOutput Eol=Yes StdOutput =' Encoding modifiers:',Eol=Yes StdOutput ='/BOM = write Byte Order Mark (valid only with UTF encodings)',Eol=Yes StdOutput ='/IGN = omit characters not supported in output encoding',Eol=Yes StdOutput ='/QM = replace characters not supported in output encoding with "?"',Eol=Yes StdOutput ='/HTML = replace characters not supported in output encoding with HTML-entity',Eol=Yes StdOutput ='/TRANS = transliterate characters not supported in output encoding (default)',Eol=Yes RET ENDP WinEncList
WinInfoText Procedure Direction, File, EncId, EncSt, Errors MOV EBX,[%File] LEA EDX,[EBX+FILE.Name] StdOutput [%Direction],='put file: "',EDX,='"', Eol=Yes MOV EAX,[EBX+FILE.Size] StoD TempString XOR EAX,EAX STOSD StdOutput [%Direction],='put size: ',TempString, Eol=Yes MOV EAX,[%EncId] MOV EDI,EDX MOV EBX,EAX StoD XOR EAX,EAX STOSD StdOutput [%Direction],='put encoding: CP',EDX, =', "' MOV EAX,EBX MOV ECX,[CodePages] MOV EDI,[CPid] REPNE SCASW DEC EDI,EDI SUB EDI,[CPid] MOV EDX,[CPname] ADD EDX,EDI MOVZXW EDX,[EDX] ADD EDX,[CPinfo] StdOutput EDX MOV EBX,[%EncSt] JNSt EBX,encStBom,.20: StdOutput ="/BOM" .20:JNSt EBX,encStIgn,.30: StdOutput ="/IGN" .30:JNSt EBX,encStQm, .40: StdOutput ="/QM" .40:JNSt EBX,encStHtm,.50: StdOutput ="/HTM" .50:JNSt EBX,encStHtml,.60: StdOutput ="/HTML" .60:MOV EDX,[CPrem] ADD EDX,EDI MOVZXW EDX,[EDX] ADD EDX,[CPinfo] StdOutput ='", ',EDX, JNSt EBX,encStOem,.70: StdOutput ='", ',="(OEM)," .70:JNSt EBX,encStAnsi,.80: StdOutput ='", ',="(ANSI)," .80:JNSt EBX,encStAuto,.90: StdOutput =' (autodetected)' .90:StdOutput Eol=Yes MOV EAX,[%Errors] StoD TempString XOR EAX,EAX STOSD StdOutput [%Direction],='put errors: ',TempString, Eol=Yes EndProcedure WinInfoText
%ControlList %SET \ Enumeration of common controls names. InpExploreBtn, \ Button [Explore]. InpEdit, \ Field for input file name. InpAutodetectBtn, \ Button [Autodetect]. InpEncBox, \ Selection box for input encoding. InpStIgnRadio, \ Radiobutton to ignore entities. InpStHtmRadio, \ Radiobutton to convert non-ASCII entites. InpStHtmlRadio, \ Radiobutton to convert all entities. OutExploreBtn, \ Button [Explore]. OutEdit, \ Field for output file name. OutEncBox, \ Selection box for output encoding. OutStOemBtn, \ Button [OEM]. OutStAnsiBtn, \ Button [ANSI]. OutStTranslRadio, \ Radiobutton to transliterate. OutStHtmlRadio, \ Radiobutton to convert to entity. OutStQmRadio, \ Radiobutton to replace with ?. OutStIgnRadio, \ Radiobutton to ignore. OutStBomCheck, \ Checkbox to prefix output BOM. CmdEdit, \ Field for command line. CmdConvertBtn, \ Button [Convert]. CmdQuitBtn, \ Button [Quit]. ; ; Numeric identifiers of common controls. %id %SETA WM_APP ; Bias of WM_COMMAND identifiers to avoid collision with system ids. ctrl %FOR %ControlList %id %SETA %id+1 id%ctrl EQU %id %ENDFOR ctrl idStatusBar EQU %id+1 [.bss] ; Static data of GUI variant. hMainWindow D DWORD ; Handle of the main window. hStatusBar D DWORD ; Handle of the status strip at the bottom. ; Array with handles of common controls window. hBegin: ; Pointer to the first common control window handle. ctrl %FOR %ControlList h%ctrl D DWORD %ENDFOR ctrl hEnd: ; Pointer behind the last common control window handle. ; Array with original versions of common control WndProc. ; Arrays PrevProcBegin..PrevProcEnd and hBegin..hEnd are synchronized. PrevProcBegin: ; Pointer to the first common control original WndProc. ctrl %FOR %ControlList PrevProc%ctrl D DWORD %ENDFOR ctrl D DWORD ; Behind the last common control original WndProc. ; Windows GUI structures. WndClassEx DS WNDCLASSEX ; Definition of the window class structure. Msg DS MSG ; Window message. StartupInfo DS STARTUP_INFO ; Process properties. ProcessInfo DS PROCESS_INFORMATION InpFileDlg DS OPENFILENAME OutFileDlg DS OPENFILENAME [.text]
WinGui Procedure CALL WinCreate CALL WinUpdate WinAPI ShowWindow, [hMainWindow], SW_SHOWNORMAL WinAPI UpdateWindow, [hMainWindow] MOV EBX,[hInpEncBox] MOV EDX,[AnsiEncId] MOV [InpEncId],EDX CALL WinSetEncBox ; Set ANSI as a default input encoding. MOV EBX,[hOutEncBox] MOV EDX,65001 MOV [OutEncId],EDX CALL WinSetEncBox ; Set UFT-8 as a default output encoding. WinAPI UpdateWindow, [hMainWindow] .MsgLoop: WinAPI GetMessage, Msg,0,0,0 TEST EAX JZ .MsgQuit: ; ZF signalises message WM_QUIT - request for program termination. WinAPI TranslateMessage, Msg ; Remap character keys from national keyboards. WinAPI DispatchMessage, Msg ; Let Windows call our WinProc. JMP .MsgLoop: ; Wait for another message. .MsgQuit: TerminateProgram [Errorlevel] EndProcedure WinGui
As the main program window is completely painted by common controls (windows of class "STATIC","BUTTON","LISTBOX","EDIT"), WinProc doesn't have to handle WM_PAINT, WM_SIZE etc.
WinProc Procedure hWnd, uMsg, wParam, lParam MOV EBX,[%hWnd] MOV EAX,[%uMsg] MOV ESI,[%wParam] MOV EDI,[%lParam] ; Load msg attributes to registers for handler's convenience. ; Fork message uMsg=EAX to its handler using macro Dispatch: Dispatch EAX, WM_COMMAND, WM_DESTROY .Def:WinAPI DefWindowProc,[%hWnd],[%uMsg],[%wParam],[%lParam] ; Ignored events pass to DefWindowProc. JMP .Ret:,DIST=NEAR ; Go to EndProcedure WinProc with value EAX returned from DefWindowProc. ; Message handlers terminate with a jump to label .Def: or .Ret0:. .WM_COMMAND: ; User clicked on some common control. (LOWORD) wParam specifies which one. MOV EAX,0xFFFF AND EAX,ESI ctrl %FOR %ControlList CMP EAX,id%ctrl JE .id%ctrl %ENDFOR ctrl JMP .Def: .idInpStIgnRadio: .idInpStHtmRadio: .idInpStHtmlRadio: .idOutStBomCheck: .idOutStTranslRadio: .idOutStHtmlRadio: .idOutStQmRadio: .idOutStIgnRadio: .EN_SETFOCUS: .EN_KILLFOCUS: .Update:CALL WinUpdate JMP .Ret0: .idInpExploreBtn: ; Ctrl-O or [Explore] was pressed. WinAPI GetOpenFileName,InpFileDlg WinAPI SetWindowText,[hInpEdit],InpFile.Name JMP .Update: .idInpAutodetectBtn: WinAPI SendMessage,[hInpEdit],WM_GETTEXTLENGTH,0,0 TEST EAX JNZ .Auto: .0:WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Select the input file first.' JMP .Ret0: .Auto: WinAPI SendMessage,[hInpEdit],WM_GETTEXT,MAX_PATH_SIZE,InpFile.Name MOVB [EAX+InpFile.Name],0 WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Detecting input encoding, please wait...' MOV ESI,InpFile.Name GetLength$ ESI StripQuotes ESI,ECX JECXZ .0: FileAssign InpFile,ESI,Size=ECX FileMapOpen InpFile JC .0: MOV [DetectSize],EAX CMP EAX,WinBlockSize JBE .7: MOV [DetectSize],WinBlockSize .7:ADD EAX,ESI MOV [InpBegin],ESI MOV [InpEnd],EAX MOV EDX,EAX CALL WinAutodetect: FileClose InpFile JMP .Update: .idOutStAnsiBtn:MOV EDX,[AnsiEncId] JMP .SetEnc: .idOutStOemBtn:MOV EDX,[OemEncId] .SetEnc:MOV [OutEncId],EDX MOV EBX,[hOutEncBox] CALL WinSetEncBox: WinAPI SetFocus,[hOutEncBox] JMP .Update: .idOutExploreBtn: ; Ctrl-S or [Explore] was pressed. WinAPI GetSaveFileName,OutFileDlg WinAPI SetWindowText,[hOutEdit],OutFile.Name JMP .Update: .idInpEdit: .idOutEdit: .idCmdEdit: SHR ESI,16 Dispatch ESI, EN_SETFOCUS, EN_KILLFOCUS JMP .Def: .idInpEncBox: .idOutEncBox: SHR ESI,16 CMP ESI,LBN_SELCHANGE JNE .Def: JMP .Update: .idCmdConvertBtn: WinAPI SendMessage,[hCmdEdit],WM_GETTEXT,SIZE# Cmd$, Cmd$ StdOutput Eol=Yes StdOutput Cmd$,Eol=Yes WinAPI GetStartupInfo,StartupInfo WinAPI CreateProcess,0,Cmd$,0,0,0,0,0,0,StartupInfo,ProcessInfo WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Converting, please wait...' WinAPI Sleep,999 .idCmdQuitBtn: .WM_DESTROY: ; GUI program terminates. WinAPI PostQuitMessage,0 ; Tell Windows to quit this program with errorlevel 0. ; JMP .Ret0: .Ret0:XOR EAX,EAX .Ret: MOV [%ReturnEAX],EAX EndProcedure WinProc
WinCtrlProc Procedure hWnd, uMsg, wParam, lParam ; Find the corresponding original WndProc offset and put it to EDI. MOV ESI,[%wParam] MOV EDI,hBegin MOV ECX,(hEnd-hBegin)/4 MOV EAX,[%hWnd] REPNE SCASD JE .Found: MOV EDI,hMainWindow JMP .Def: .Found:MOV EBX,EDI ; EBX now points to the next common control window handle in array hBegin..hEnd. ADD EDI,PrevProcBegin-hBegin-4 ; EDI now points to the common control original own WndProc. MOV EAX,[%uMsg] Dispatch EAX,WM_KEYDOWN,WM_SYSKEYDOWN JMP .Def: .WM_KEYDOWN: Dispatch ESI,VK_TAB,VK_ESCAPE .Def:WinAPI CallWindowProc,[EDI],[%hWnd],[%uMsg],[%wParam],[%lParam] ; Common control window's own WndProc is in EDI. JMP .Ret: .WM_SYSKEYDOWN: ; Accelerator key character together with Alt was pressed. OR ESI,'X'^'x' ; Convert the character to lower case. Dispatch ESI,'e','f','i','a','1','2','3','s','x','o','n','m','5','6','7','8','b','d','c','q' JMP .Def: .e: MOV EBX,hInpExploreBtn JMP .Focus: .f: MOV EBX,hInpEdit JMP .Focus: .i: MOV EBX,hInpEncBox JMP .Focus: .a: MOV EBX,hInpAutodetectBtn JMP .Focus: .1: MOV EBX,hInpStIgnRadio JMP .Focus: .2: MOV EBX,hInpStHtmRadio JMP .Focus: .3: MOV EBX,hInpStHtmlRadio JMP .Focus: .s: MOV EBX,hOutEdit JMP .Focus: .x: MOV EBX,hOutExploreBtn JMP .Focus: .o: MOV EBX,hOutEncBox JMP .Focus: .n: MOV EBX,hOutStAnsiBtn JMP .Focus: .m: MOV EBX,hOutStOemBtn JMP .Focus: .5: MOV EBX,hOutStTranslRadio JMP .Focus: .6: MOV EBX,hOutStHtmlRadio JMP .Focus: .7: MOV EBX,hOutStQmRadio JMP .Focus: .8: MOV EBX,hOutStIgnRadio JMP .Focus: .b: MOV EBX,hOutStBomCheck JMP .Focus: .d: MOV EBX,hCmdEdit JMP .Focus: .c: MOV EBX,hCmdConvertBtn JMP .Focus: .q: MOV EBX,hCmdQuitBtn JMP .Focus: .VK_ESCAPE: ; Quit the program. WinAPI SendMessage,[hMainWindow],WM_DESTROY,0,0 JMP .Ret0: .VK_TAB: ; Move focus to other common control window. WinAPI GetAsyncKeyState,VK_SHIFT SAL AX,1 JNC .NoShift: ; Shift-TAB was pressed, cycle backward to previous window (EBX-8). SUB EBX,8 ; EBX now points to the previous window handle. CMP EBX,hBegin: JAE .NoShift: MOV EBX,hEnd-4 ; Cycle from the top window to the bottom. .NoShift: ; Shift was not pressed, move focus to the next window (EBX+0). CMP EBX,hEnd JB .Focus: MOV EBX,hBegin: ; Cycle from the bottom window to the topmost. .Focus: ; Set focus to window whose handle is addressed by EBX. WinAPI SetFocus,[EBX] .Ret0:SUB EAX,EAX .Ret:MOV [%ReturnEAX],EAX EndProcedure WinCtrlProc
WinCreate PROC ; Main window. MOV [WndClassEx.cbSize],SIZE# WNDCLASSEX MOV [WndClassEx.lpszClassName],WndClassName MOV [WndClassEx.style],CS_HREDRAW|CS_VREDRAW MOV [WndClassEx.lpfnWndProc],WinProc WinAPI GetModuleHandle,0 MOV [WndClassEx.hInstance],EAX WinAPI LoadIcon,[WndClassEx.hInstance],="#1" ; PROGRAM IconFile= property is registerred as Nr.1. MOV [WndClassEx.hIcon],EAX MOV [WndClassEx.hbrBackground],COLOR_BTNFACE +1 WinAPI RegisterClassEx,WndClassEx ; Main window. WinAPI CreateWindowEx, WS_EX_CLIENTEDGE, \ WndClassName, WndClassName, WS_OVERLAPPEDWINDOW, \ CW_USEDEFAULT,CW_USEDEFAULT,660,786, \ 0, 0, [WndClassEx.hInstance], 0 MOV [hMainWindow],EAX ; EuroConv icon. WinAPI CreateWindowEx,0,="STATIC",="#1",WS_CHILD+WS_VISIBLE+SS_ICON,\ 12,0,32,32,[hMainWindow],0,[WndClassEx.hInstance],0 WinAPI SendMessage,EAX,STM_SETICON,[WndClassEx.hIcon],0 ; Title WinAPI CreateWindowEx,0,="STATIC",InfoText,\ WS_CHILD+WS_VISIBLE,\ 50,10,540,20,[hMainWindow],0,[WndClassEx.hInstance],0 ; Input form. WinAPI CreateWindowEx,0,='STATIC',0, \ WS_CHILD+WS_VISIBLE+SS_BLACKFRAME,\ 10,30,620,274,[hMainWindow],0,[WndClassEx.hInstance],0 ; Initialize input Explore dialogue, see OPENFILENAME. MOV [InpFileDlg.lStructSize],SIZE# OPENFILENAME MOV EAX,[hMainWindow] MOV [InpFileDlg.hwndOwner],EAX MOV [InpFileDlg.lpstrFile],InpFile.Name MOV [InpFileDlg.nMaxFile],SIZE# InpFile.Name MOV [InpFileDlg.Flags],OFN_FILEMUSTEXIST WinAPI CreateWindowEx,0,="STATIC",="Input &file name to open",WS_CHILD+WS_VISIBLE ,\ 50,42,210,20,[hMainWindow],0,[WndClassEx.hInstance],0 ; Input button [Explore]. WinAPI CreateWindowEx,0,='BUTTON',="&Explore",WS_CHILD+WS_VISIBLE+BS_DEFPUSHBUTTON,\ 500,36,120,20,[hMainWindow],idInpExploreBtn,[WndClassEx.hInstance],0 MOV [hInpExploreBtn],EAX ; Edit input file name. WinAPI CreateWindowEx,0,='EDIT',InpFile.Name,WS_CHILD+WS_VISIBLE+WS_BORDER+ES_AUTOHSCROLL, \ 20,60,600,20,[hMainWindow],idInpEdit,[WndClassEx.hInstance],0 MOV [hInpEdit],EAX WinAPI SendMessage,EAX,EM_SETLIMITTEXT,SIZE# InpFile.Name,0 WinAPI CreateWindowEx,0,="STATIC",="&Input file encoding",WS_CHILD+WS_VISIBLE,\ 50,91,210,20,[hMainWindow],0,[WndClassEx.hInstance],0 ; Input button [Autodetect]. WinAPI CreateWindowEx,0,='BUTTON',="&Autodetect",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\ 500,84,120,20,[hMainWindow],idInpAutodetectBtn,[WndClassEx.hInstance],0 MOV [hInpAutodetectBtn],EAX ; Listbox for selection of input encoding. WinAPI CreateWindowEx,0,='LISTBOX',0,WS_CHILD+WS_VISIBLE+WS_BORDER+WS_VSCROLL+WS_HSCROLL+LBS_NOTIFY,\ 20,109,600,140,[hMainWindow],idInpEncBox,[WndClassEx.hInstance],0 MOV [hInpEncBox],EAX MOV EDI,EAX CALL WinFillEncBox ; Radiobuttons for input HTML entities. WinAPI CreateWindowEx,0,='BUTTON',="&1 Ignore all HTML entities.",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON+WS_GROUP,\ 50,240,440,20,[hMainWindow],idInpStIgnRadio,[WndClassEx.hInstance],0 MOV [hInpStIgnRadio],EAX WinAPI SendMessage,EAX,BM_SETCHECK,1,0 ; Use "Ignore all" as default selection. WinAPI CreateWindowEx,0,='BUTTON',="&2 Ignore ASCII HTML entities && &< &> &".",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\ +SS_NOPREFIX,\ With SS_NOPREFIX it doesn't show any text. 50,260,440,20,[hMainWindow],idInpStHtmRadio,[WndClassEx.hInstance],0 MOV [hInpStHtmRadio],EAX WinAPI CreateWindowEx,0,='BUTTON',="&3 Convert all HTML entities.",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\ 50,280,440,20,[hMainWindow],idInpStHtmlRadio,[WndClassEx.hInstance],0 MOV [hInpStHtmlRadio],EAX ; Output form. WinAPI CreateWindowEx,0,='STATIC',0, \ WS_CHILD+WS_VISIBLE+WS_GROUP+SS_BLACKFRAME,\ 10,320,620,294,[hMainWindow],0,[WndClassEx.hInstance],0 WinAPI CreateWindowEx,0,="STATIC",="Output file name to &save",WS_CHILD+WS_VISIBLE ,\ 50,332,210,20,[hMainWindow],0,[WndClassEx.hInstance],0 ; Initialize output Explore dialogue, see OPENFILENAME. MOV [OutFileDlg.lStructSize],SIZE# OPENFILENAME MOV EAX,[hMainWindow] MOV [OutFileDlg.hwndOwner],EAX MOV [OutFileDlg.lpstrFile],OutFile.Name MOV [OutFileDlg.nMaxFile],SIZE# OutFile.Name MOV [OutFileDlg.Flags],OFN_OVERWRITEPROMPT ; Output button [Explore]. WinAPI CreateWindowEx,0,='BUTTON',="E&xplore",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\ 500,326,120,20,[hMainWindow],idOutExploreBtn,[WndClassEx.hInstance],0 MOV [hOutExploreBtn],EAX ; Edit output file name. WinAPI CreateWindowEx,0,='EDIT',OutFile.Name,WS_CHILD+WS_VISIBLE+WS_BORDER+ES_AUTOHSCROLL, \ 20,350,600,20,[hMainWindow],idOutEdit,[WndClassEx.hInstance],0 MOV [hOutEdit],EAX WinAPI SendMessage,EAX,EM_SETLIMITTEXT,SIZE# OutFile.Name,0 WinAPI CreateWindowEx,0,="STATIC",="&Output file encoding",WS_CHILD+WS_VISIBLE,\ 50,381,210,20,[hMainWindow],0,[WndClassEx.hInstance],0 ; Output button [ANSI]. WinAPI CreateWindowEx,0,='BUTTON',="A&NSI",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\ 360,378,120,20,[hMainWindow],idOutStAnsiBtn,[WndClassEx.hInstance],0 MOV [hOutStAnsiBtn],EAX ; Output button [OEM]. WinAPI CreateWindowEx,0,='BUTTON',="OE&M",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\ 500,378,120,20,[hMainWindow],idOutStOemBtn,[WndClassEx.hInstance],0 MOV [hOutStOemBtn],EAX ; Listbox for selection of output encoding. WinAPI CreateWindowEx,0,='LISTBOX',0,WS_CHILD+WS_VISIBLE+WS_BORDER+WS_VSCROLL+WS_HSCROLL+LBS_NOTIFY,\ 20,399,600,140,[hMainWindow],idOutEncBox,[WndClassEx.hInstance],0 MOV [hOutEncBox],EAX MOV EDI,EAX CALL WinFillEncBox WinAPI SendMessage,EDI,LB_SETCURSEL,0,0 ; Use 0-th option as default selection. ; Radiobuttons for invalid output characters. WinAPI CreateWindowEx,0,='BUTTON',="&5 Transliterate invalid characters to ASCII Latin.",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON+WS_GROUP,\ 50,530,420,20,[hMainWindow],idOutStTranslRadio,[WndClassEx.hInstance],0 MOV [hOutStTranslRadio],EAX WinAPI SendMessage,EAX,BM_SETCHECK,1,0 ; Use "Transliterate" as default selection. WinAPI CreateWindowEx,0,='BUTTON',="&6 Convert invalid characters to HTML entities.",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\ 50,550,420,20,[hMainWindow],idOutStHtmlRadio,[WndClassEx.hInstance],0 MOV [hOutStHtmlRadio],EAX WinAPI CreateWindowEx,0,='BUTTON',="&7 Replace invalid characters with '?'.",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\ 50,570,420,20,[hMainWindow],idOutStQmRadio,[WndClassEx.hInstance],0 MOV [hOutStQmRadio],EAX WinAPI CreateWindowEx,0,='BUTTON',="&8 Ignore invalid characters.",\ WS_CHILD+WS_VISIBLE+BS_AUTORADIOBUTTON,\ 50,590,420,20,[hMainWindow],idOutStIgnRadio,[WndClassEx.hInstance],0 MOV [hOutStIgnRadio],EAX ; Output checkbox [Use BOM in UTF]. WinAPI CreateWindowEx,0,='BUTTON',="Use &BOM in UTF",WS_CHILD+WS_VISIBLE+BS_AUTOCHECKBOX+WS_GROUP,\ 460,590,140,20,[hMainWindow],idOutStBomCheck,[WndClassEx.hInstance],0 MOV [hOutStBomCheck],EAX ; Command form. WinAPI CreateWindowEx,0,='STATIC',0, \ WS_CHILD|WS_VISIBLE+WS_GROUP+SS_BLACKFRAME,\ 10,630,620,84,[hMainWindow],0,[WndClassEx.hInstance],0 WinAPI CreateWindowEx,0,="STATIC",="Comman&d",WS_CHILD+WS_VISIBLE ,\ 50,642,210,20,[hMainWindow],0,[WndClassEx.hInstance],0 ; Edit the command. WinAPI CreateWindowEx,0,='EDIT',Cmd$,WS_CHILD+WS_VISIBLE+WS_BORDER+ES_AUTOHSCROLL, \ 20,660,600,20,[hMainWindow],idCmdEdit,[WndClassEx.hInstance],0 MOV [hCmdEdit],EAX WinAPI SendMessage,EAX,EM_SETLIMITTEXT,SIZE# Cmd$,0 ; Command button [Quit]. WinAPI CreateWindowEx,0,='BUTTON',="&Quit",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\ 20,684,120,20,[hMainWindow],idCmdQuitBtn,[WndClassEx.hInstance],0 MOV [hCmdQuitBtn],EAX ; Command button [Convert]. WinAPI CreateWindowEx,0,='BUTTON',="&Convert",WS_CHILD+WS_VISIBLE+BS_PUSHBUTTON,\ 500,684,120,20,[hMainWindow],idCmdConvertBtn,[WndClassEx.hInstance],0 MOV [hCmdConvertBtn],EAX ; Status bar. WinAPI CreateStatusWindow,WS_CHILD+WS_BORDER+WS_VISIBLE,='EuroConv',[hMainWindow],idStatusBar MOV [hStatusBar],EAX WinAPI SendMessage,EAX,SB_SIMPLE,1,0 ; Tell the status bar to be simple. WinAPI SetFocus,[hInpExploreBtn] ; Common control child windows are now created. ; Let's hook their WndProc and replace it with WinCtrlProc, which will process VK_TAB. ctrl %FOR %ControlList WinAPI SetWindowLong,[h%ctrl], GWL_WNDPROC, WinCtrlProc MOV [PrevProc%ctrl],EAX %ENDFOR ctrl RET ENDP WinCreate
WinFillEncBox PROC ; Fill the listbox EDI with all supported encoding names+remarks. WinAPI SendMessage,EDI, LB_SETHORIZONTALEXTENT,400,0 opt %FOR 'CP20127, "ASCII"', \ 'CP1200, "UTF-16LE"',\ 'CP1201, "UTF-16BE"',\ ' "UTF-16" (endianess will be autodetected)',\ 'CP12000, "UTF-32LE"',\ 'CP12001, "UTF-32BE"',\ ' "UTF-32" (endianess will be autodetected)',\ 'CP65001, "UTF-8"' WinAPI SendMessage,EDI, LB_ADDSTRING,0,=%opt %ENDFOR opt SUB EDX,EDX ; Encoding index 0..[CodePages]. .10:MOV EAX,[CPid] MOVZXW EAX,[EAX+2*EDX] Dispatch AX,20127,65001,1200,1201,12000,12001 PUSH EDI MOV EDI,TempString MOVW [EDI],'CP' INC EDI,EDI StoD EDI MOV AX,', ' STOSW MOV AL,'"' STOSB MOV EBX,[CPname] MOVZXW EBX,[EBX+2*EDX] MOV ESI,[CPinfo] ADD ESI,EBX GetLength$ ESI REP MOVSB MOV AX,'" ' STOSW MOV EBX,[CPrem] MOVZXW EBX,[EBX+2*EDX] MOV ESI,[CPinfo] ADD ESI,EBX GetLength$ ESI REP MOVSB SUB EAX,EAX STOSB POP EDI PUSH EDX WinAPI SendMessage,EDI, LB_ADDSTRING,0,TempString POP EDX .20127: .65001: .1200: .1201: .12000: .12001: INC EDX CMP EDX,[CodePages] JB .10: RET ENDP WinFillEncBox
WinUpdate PROC PUSHAD MOV EDI,Cmd$ MOV ESI,='euroconv ' MOV ECX,9 REP MOVSB WinAPI SendMessage,[hInpEncBox],LB_GETCURSEL,0,0 INC EAX JZ .10: ; On LB_ERR=-1 use 0-th option. DEC EAX .10:WinAPI SendMessage,[hInpEncBox],LB_GETTEXT,EAX,EDI MOV AL,'"' MOV ECX,32 MOV EBX,EDI REPNE SCASB ; Find 1st quote. MOV ESI,EDI REPNE SCASB ; Find 2nd quote. DEC EDI MOV ECX,EDI SUB ECX,ESI MOV EDI,EBX REP MOVSB WinAPI SendMessage,[hInpStHtmRadio],BM_GETCHECK,0,0 TEST EAX JZ .25: MOV ESI,="/HTM" MOV ECX,4 REP MOVSB .25:WinAPI SendMessage,[hInpStHtmlRadio],BM_GETCHECK,0,0 TEST EAX JZ .30: MOV ESI,="/HTML" MOV ECX,5 REP MOVSB .30:MOV AL,' ' STOSB WinAPI SendMessage,[hOutEncBox],LB_GETCURSEL,0,0 INC EAX JZ .35: ; On LB_ERR=-1 use 0-th option. DEC EAX .35:WinAPI SendMessage,[hOutEncBox],LB_GETTEXT,EAX,EDI MOV AL,'"' MOV ECX,32 MOV EBX,EDI REPNE SCASB ; Find 1st quote. MOV ESI,EDI REPNE SCASB ; Find 2nd quote. DEC EDI MOV ECX,EDI SUB ECX,ESI MOV EDI,EBX REP MOVSB WinAPI SendMessage,[hOutStBomCheck],BM_GETCHECK,0,0 TEST EAX JZ .48: MOV ESI,="/BOM" MOV ECX,4 REP MOVSB .48:WinAPI SendMessage,[hOutStHtmlRadio],BM_GETCHECK,0,0 TEST EAX JZ .50: MOV ESI,="/HTML" MOV ECX,5 REP MOVSB .50:WinAPI SendMessage,[hOutStQmRadio],BM_GETCHECK,0,0 TEST EAX JZ .55: MOV ESI,="/QM" MOV ECX,3 REP MOVSB .55:WinAPI SendMessage,[hOutStIgnRadio],BM_GETCHECK,0,0 TEST EAX JZ .60: MOV ESI,="/IGN" MOV ECX,4 REP MOVSB .60:MOV AL,' ' STOSB WinAPI SendMessage,[hInpEdit],WM_GETTEXTLENGTH,0,0 TEST EAX JNZ .65: MOV EAX,"NUL " STOSD JMP .70: .65:MOV ESI,EAX MOV AL,'"' STOSB WinAPI SendMessage,[hInpEdit],WM_GETTEXT,MAX_PATH_SIZE,EDI ADD EDI,ESI MOV AX,'" ' STOSW .70:WinAPI SendMessage,[hOutEdit],WM_GETTEXTLENGTH,0,0 TEST EAX JNZ .75: MOV EAX,"NUL " STOSD JMP .80: .75:MOV ESI,EAX MOV AL,'"' STOSB WinAPI SendMessage,[hOutEdit],WM_GETTEXT,MAX_PATH_SIZE,EDI ADD EDI,ESI MOV AX,'" ' STOSW .80:MOV EAX,"/W" ; 5th argument tells console version to wait on [Enter] when it terminates. STOSD WinAPI SetWindowText,[hCmdEdit],Cmd$ WinAPI SendMessage,[hStatusBar],SB_SETTEXT,SB_SIMPLEID,='Select text file names and their encodings, then press [Convert].' POPAD RET ENDP WinUpdate
ENDPROGRAM euroconv