Subroutines > Detailed Descriptions
CollapseAll image

BASORT

Updated March 2011

Random file syntax:

xcall BASORT, ch, reccnt, recsiz, k1siz, k1pos, k1ord, k2siz, k2pos, k2ord, k3pos, k3siz, k3ord, k1typ, k2typ, k3typ, k4siz, k4pos, k4ord, k5siz, k5pos, k5ord, k6pos, k6siz, k6ord, k4typ, k5typ, k6typ

Note: six sort keys are allowed as of A-Shell 1136 of 29 Jan 09; three keys are permitted in earlier versions.

Sequential file syntax:

xcall BASORT, chin, chout, recsiz, k1siz, k1pos, k1ord, k2siz, k2pos, k2ord, k3siz, k3pos, k3ord, k1typ, k2typ, k3typ

Note: Key types were added in A-Shell 1136 of 29 Jan 09; you can specify 0 for string, 4 for natural sort, or 5 for natural case-insensitive sort. In earlier versions, only plain string keys were supported, so there was no need for the k?typ parameters.

For sorting arrays in memory, see SORTIT

Parameters

All BASORT parameters are [in] and Num. The key type (k?typ) values are 0 for string (default), 1 for floating point, and 2 for binary.

Param

Description

ch

File channel (used only with random files).

chin

Input file channel (used only with sequential files)

chout

Output file channel (used only with sequential files)

reccnt

Number of records in the file.

recsiz

Record length. For sequential files, this should be equal to or greater than the longest line in the file.

k?siz

Size of the sort key, in bytes.

k?pos

First character position occupied by the key.

k?ord

Sort order of the key, where 0=ascending, 1=descending.

k?typ

Data type of the key: 0=String, 1=Floating Point (random files only), 2=Binary (random files only), 3=Integer (random only), 4=Natural Sort, 5=case-insensitive natural sort. See History note for A-Shell 5.1.1135 below for more details on types 3-5.

 

Comments

A-Shell's implementation of BASORT.SBR is fully compatible with that under AMOS, optimized to make the best use of memory available. The sorting process is based around the quicksort algorithm, but up to three different types of sort may be performed depending on the amount of free memory. In the default case, the free memory is taken from the A-Shell partition. However, under systems with large amounts of efficient virtual memory available (e.g. UNIX and Windows), you may specify the option SBR=MALLOCSORT to miame.ini to cause A-Shell to dynamically allocate, if possible, a chunk of memory large enough to sort the entire file in one quicksort operation. See the SBR system parameter for more details on the SBR options.

In the case of the sequential file sort, there is no option for key types because the keys must be string type. Also, since the record lengths can be variable, you should specify a Recsiz as large or larger than the largest line in the file (up to a maximum of 4096). The file to be sorted must be open for input on ChIn, and the program must supply the channel (chout) of an output file that you want the file sorted into. If you want to sort the file on itself, you may specify chout as the same value as chin (in which case it only needs to be open once for input). If you want to re-read the sorted output file, you will need to close it and reopen it for input.

Here is a description of the three sort algorithms:

Type

Description

Small

The entire file is loaded, quicksorted and then saved to disk. This will occur if there is sufficient memory to load the entire file. It is extremely fast.

Medium

The keys are extracted from the file, and then quicksorted. A new file is then created by reading the records from the old file in the order of the quicksorted keys. This will occur if there is sufficient memory to load a tag file consisting of the sort keys and record numbers. It is reasonably fast.

Large

The file is quicksorted in chunks into two separate files. A disk-based poly-phase merge sort is then used to repeatedly merge these until a single sequence of sorted records results. This type of sort occurs in all other situations. It is quite slow.

 

The type of sort being used may be determined by specifying:

TRACE=AMSORT

in the system configuration file or with the command:

SET TRACE AMSORT

from the dot prompt.

Among other uses, the tracing display may be useful for analyzing the performance of a LARGE model sort (since the individual phases are shown).

As with BASORT under AMOS, records with the same sort key are guaranteed not to be re-ordered in the resulting file. This also applies to direct memory sorting of files in MEM:.

The normal limit on memory allocation with the MALLOCSORT option is 8MB, but you can change this with the MALLOCLIMIT=<bytes>{K}{M} parameter in the miame.ini file. Bumping this up, to say, 32MB, on a typical modern desktop PC can make a huge difference when sorting files larger than 8MB (but smaller than the limit you specify).

If you’re looking for a more powerful sort routine (i.e. faster, able to handle more keys, able to perform data massaging beyond mere sorting, etc.) we have licensed the OptTech Sort routine and have customized it to understand most of our data structures. Aside from offering additional keys and functionality, it is considerably faster at sorting large files than BASORT. It can also perform fancy data manipulation tricks, like removing duplicate records, adding summary records, creating tag files, etc. But, the calling interface is different, it does not handle certain file types, and there is a licensing fee. If interested, please contact us for more details.

The absolute limit for BASORT file size is 2GB. Above that, you must use the OptTech Sort.

Troubleshooting tip: if you're having problems with large sorts failing, the first recommendation is to activate and configure the MALLOCSORT and MALLOCLIMIT options so that the sort can be performed using the Medium, or preferably, the Small algorithm, which is not only the fastest but the simplest and thus the least likely to run into complications.

 

History

2009 January A-Shell 5.1.1135

BASORT now includes three new sort types: {{Caution: this note is referred to above }}

3: Integer (I1, I2, or I4)

4: Natural Sort

5: Case insensitive Natural Sort

Warning about using signed integer sort keys (B5, I1, I2 and I4) in files with a control record: traditionally the technique for preventing the control record from getting sorted out of place was to temporarily set it to all nulls, prior to sorting. But this is not adequate for signed integer sort keys, since nulls = 0, which is not at the bottom but in the middle of the range of signed values. The lowest possible signed integer value is -2^N where N is the # of bits in the field (e.g. -128, -32768, etc.) Similarly, for descending sort, the maximum value is 2^N-1.