EuroAssembler Index Manual Download Source Macros


Sitemap Links Forum Tests Projects

eurosort.htm
EuroSort
EuroSort for Linux
EuroSort for Windows

EuroSort is a program from the collection EuroTool for sorting of text files or for sorting of type files (consisting of fixed-length records).
Encoding of the input text file will be autodetected or it can be specified explicitly.
It also takes into account national sorting conventions (locales) and allows you to specify
whether to sort numbers first, uppercase or lowercase letters first, or whether to merge spaces.


When the EuroSort is run in Linux without parameters (using ./eurosort.x ), it launches a pseudographic screen for entering arguments.
Just press Tab or click with mouse at the selected field or at the required Encoding and Locale:

The Windows variant looks similary. When you press the button [Sort], graphic screen is dismissed, EuroSort summarizes all specified parameters on console and starts the action. Sorting of a big file may take some time, so a percentage indicator is displayed.

If both the input and output files are specified on the command line, EuroSort runs in console mode and the graphical screen is not displayed at all. Possible arguments:

; EuroSort default configuration (UTF-8): /InputFile= ; Input file name to be sorted. /OutputFile= ; Output file name where the sorted input will be saved. /InputEncoding= ; Encoding of InputFile; autodetect when not specified. /IE=? for help. /Locale= ; Use national sorting preferences. /Locale=? for help. /HeaderSize=0 ; Number of bytes to omit from sort at the begining. /HeaderLength=0 ; Number of lines to omit from sort at the begining. /FooterSize=0 ; Number of bytes to omit from sort at the end. /FooterLength=0 ; Number of lines to omit from sort at the end. /RecordSize=0 ; Fixed size in bytes if >0; otherwise variable size ended by EOL. /KeyOffset=0 ; Offset of the sorting key in chars from the beginning of the record. /KeyLength=-1 ; Key size in chars; up to the end of record when -1. /KeyReverse=false ; Sort direction descending. /MergeSpaces=false ; Treat multiple white spaces as one space. /DigitFirst=true ; Sort digits before letters. /UpperFirst=false ; Sort uppercase letters before lowercase ones. /LeaveTemporary=no ; Do not erase temporary index file when the program ends.
How does the program work:
  1. EuroSort searches for the global configuration /etc/eurotool/eurosort.ini (Linux) or %AppData%\eurotool\eurosort.ini (Windows). It it doesn't exist, it will try to create it with default arguments.
    You should run EuroSort with root privilegies for the first time: sudo ./eurosort.x.
  2. Then it searches for the local configuration file eurosort.ini in the current directory. This file will not be created by EuroSort but you may create it (or copy from the global configuration) and write your favourite options in it, for instance Locale=CZ, so you don't have to specify this argument in each EuroSort invokation.
  3. Now EuroSort reads arguments from the command line. If either InputFile or OutputFile is missing, it will let you specify arguments in the graphic window.
  4. Input file is mapped to memory and if /InputEncoding= is not explicitly specified, it will be autodetected.
  5. If /HeaderSize= and /FooterSize= are not 0, they are applied. Then /HeaderLength= and /FooterLength= are applied, if not 0.
    Header and Footer are omitted from the sorting process.
  6. Index of input file is created; it has the name of output file appended with extension .index.
  7. If the RecordSize= is not zero, the input file is treated as typed , it consists of records with this fixed size, such as dbIII database file.
    Otherwise the input file is treated as an ordinal text file with records (lines) terminated with line feed character ASCII 10.
  8. EuroSort summarizes all specified parameters on console and starts the action. Sorting ends with the message Sorted 100 %. and the program terminates.

By default, when /Locale is not specified, the sort order follows the following sequence of basic characters of the Latin, Greek, and Cyrillic alphabets, supplemented by lowercase letters and diacritics.

A B C Č D E Ə F G I J K L M N O P Q R Ř S Š T U V W X Y Z Ž Þ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω А Б В Г Д Ђ Е Є Ё Ж З И І Ј Й К Л Љ М Н Њ О П Р С Т Ћ Ќ У Ф Х Ц Ч Џ Ш Щ Ъ Ы Ь Э Ю Я

This sorting method is suitable for most European languages. Basic Latin letters A..Z and four characters with macron Č Ř Š Ž have a primary sorting function. If the characters are indistinguishable after transformation to this primary sorting weight, a secondary sorting weight is applied that takes into account diacritics (but not character case). Finally, a third sorting weight is applied to distinguish upper and lower case letters. The complete order of all characters can be viewed in the file alphabet.txt.

Other non-european alphabets (Hebrew, Arabic, Thai, Canadian, far East) sort letters by their assigned Unicode value only.

The /Locale= argument accepts two-letter country code ISO 3166-1 , which introduces exceptions of the default sort order, as defined in Wikipedia:

/Locale=AZ
Azarbaijani alphabet sorts letter Q after K and letter X after H:
A B C D E Ə F G H X I J K Q L M N O P R S T U V W Y Z
/Locale=CZ
Czech sorts digraph ch Ch CH (but not cH) between letters H and I:
A B C D E F G H CH I J K L M N O P Q R S T U V W X Y Z.
/Locale=DK
Danish adds three separate letters Æ Ø Å at the end of alphabet.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å
/Locale=EE
Estonian treats Õ Ä Ö Ü as separate characters and sorts them together with X Y at the end of alphabet after W. Letters Š Z Ž sort after S:
A B C D E F G H I J K L M N O P Q R S Š Z Ž T U V W Õ Ä Ö Ü X Y.
/Locale=ES
Spanish introduces a separate letter Ñ:
A B C D E F G H I J K L M N Ñ O P Q R S T U V W X Y Z.
/Locale=FI
Finish adds three separate letters Å Ä Ö at the end of alphabet (the same as Swedish).
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö
/Locale=HU
Hungarian treats the following digraphs as separate letters: CS DZ GY LY NY SZ TY ZS
A B C CS D DZ E F G GY H I J K L LY M N NY O P Q R S SZ T TY U V W X Y Z ZS
/Locale=IS
Icelandic treats long vowels Á É Í Ó Ú Ý as separate letters, and adds letters Þ Æ Ö at the end of alphabet:
A Á B C D Ð E É F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý Þ Æ Ö
/Locale=LT
Lithuanian sorts Y between I and J:
A B C Č D E F G H I Y J K L M N O P R S Š T U V Z Ž
/Locale=LV
Latvian ignores the differences between base letters and letters modified by macron or comma below:
A B C Č D E F G H I J K L M N O P R S Š T U V Z Ž
Locale=PL
Polish treats letters with diacritics as fully independent:
A Ą B C Ć D E Ę F G H I J K L Ł M N Ń O Ó P Q R S Ś T U V W X Y Z Ź Ż
Locale=NO
Norwegian adds three separate letters Æ Ø Å at the end of alphabet:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å
/Locale=SE
Swedish adds three separate letters Å Ä Ö at the end of alphabet:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö
/Locale=SK
Slovak sorts digraph ch Ch CH (but not cH) between letters H and I (the same as Czech):
A B C D E F G H CH I J K L M N O P Q R S T U V W X Y Z.
/Locale=TR
Turkish treats characters with diacritics as separate letters:
A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z
↑ EuroSort
Common configuration and argument processing for both Linux and Windows version:

Both programs in this source file have the name eurosort . The linked executable file for Linux has the name eurosort.x and the version for Windows has the name eurosort.exe.

Both executables will be built with the command euroasm eurosort.htm.

Linux GUI version works with ANSI terminal in character pseudo graphic mode.

         EUROASM CPU=X64, Unicode=No, MaxInclusions=100, NoWarn=0563
         INCLUDE argument.htm     ; Assemble the module argument.htm.
↑ EuroSort for Linux
         INCLUDE sortlinc.htm  ; Assemble the module sortlinc.htm (Linux console subsystem).
         INCLUDE sortling.htm  ; Assemble the module sortling.htm (Linux pseudographic subsystem).
eurosort PROGRAM Format=ELFX, Width=64, Entry=MainCon
          LINK argument.obj, sortlinc.obj, sortling.obj  ; Link three modules to the final eurosort.x.
         ENDPROGRAM eurosort
↑ EuroSort for Windows
         INCLUDE sortwinc.htm  ; Assemble the module sortwinc.htm (Windows console subsystem).
         INCLUDE sortwing.htm  ; Assemble the module sortwing.htm (Windows graphic subsystem).
eurosort PROGRAM Format=PE, Width=64, Entry=MainCon, IconFile=eurosort.ico,
          LINK argument.obj, sortwinc.obj, sortwing.obj    ; Link three modules to the final eurosort.exe.
         ENDPROGRAM eurosort

▲Back to the top▲