Following is a reasonably complete answer to the question "What is the limit on ISAM file sizes?" Note that this discussion applies to both standard ISAM and ISAM-A.
| • | ISAM data file size limit: Both ISAM and ISAM-A (aka ISAM-PLUS) use internal 32 bit record pointers. Although it might be possible, with care, to treat them as unsigned, by convention, these are signed and thus this imposes a limit of 2*31 (approximately 2 billion) records. If your record size was 512 bytes, that would impose a theoretical overall data file size of 2GB * 512 = 1TB. |
| • | ISAM index file size limit: Both ISAM and ISAM-A also use internal 32 bit index pointers, although the index block size is different between the two. ISAM-A uses 1024 byte index blocks, imposing a theoretical IDX file size limit of 2TB. ISAM 1.0 used 512 byte index blocks, while ISAM 1.1 is configurable (see ISMBLD.LIT /B switch), supporting IDX block sizes from 512 to 16384 (1TB to 32TB theoretical limits). |
Note that the index size is dependent on the number of data records, key size(s), and amount of extra space to allow for performance and tree balancing reasons. So in practice this may lower the theoretical limit on the number of data records considerably.
The above numbers are all theoretical limits. In practice, you will likely run into some severe performance issues long before you get to those sizes. Some of the factors to consider here are:
| • | File system performance on large files. All file systems slow down when dealing with very large files, because of the need to employ some kind of hierarchical pointer structure in order to locate the physical disk block associated with a particular logical position in the file. The details may vary between file system types, but there doesn't seem to be a definitive reference to which file system(s) are the best for such large files. |
| • | Cache efficiency falls dramatically with such large files, because the nature of ISAM leads to random jumps all over the file. Other than the first couple of levels of the index, the rest of the accesses in both the index and data files is likely to uniformly spread, meaning that the cache efficiency will not be much better than the raw ratio of the amount of cache memory available to the size of the files. |
| • | Because of the cache problem, you can run up against the raw random access performance of the disk drive. However, modern disk drives—including SSDs—are very fast, and so hardware contraints are not as common as they used to be. Even the fastest drives cannot of course compete with the speed of the RAM cache. |
Obviously you would want to go with the fastest possible drives and as much RAM as you can possibly get. Also, splitting the index and data across two drives would help considerably.
But the biggest help would come from restructuring your data to reduce the size of any individual file. Within reason, you will get much better performance by having more, smaller files.
For example, even though it might not be that elegant, if you had a transaction history file with 120 million records, you would probably get better performance by splitting it into 12 files of 10 million records each (one file per month).