Previous Thread
Next Thread
Print Thread
File Max in ppn/dir #36078 11 Apr 23 03:42 PM
Joined: Sep 2002
Posts: 5,450
F
Frank Online Content OP
Member
OP Online Content
Member
F
Joined: Sep 2002
Posts: 5,450
Good day -

This question comes up every so often. In a linux back-end, is there a limit to the number of files that should/could reside in a ppn/dir? I know under amos there was a point of diminishing returns when the directory of a ppn reached >x number of files. Just took too long for amos to to locate files. There was also a risk of overwhelming the dir link file. Does this exist in linux? I am embarking on a project where i might be storing large quantities of sequential xml files for online insurance claim record keeping.

TIA

Re: File Max in ppn/dir [Re: Frank] #36079 11 Apr 23 04:41 PM
Joined: Jun 2001
Posts: 11,645
J
Jack McGregor Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 11,645
I think the same problem occurs, to varying degrees, in all filesystems. It not very easy though to identify the magic limit. The basic issue is that directories are not indexed like ISAM files. In AMOS, I think there was just one link to the start of the directory, and each directory block was linked to the next. So not only is the search linear, but you've got a disk seek operation for each block (of N files). Newer filesystems have added various refinements, but I don't think any of them have resorted to full indexing. And memory mapping and disk caching have eliminated a lot of the physical disk seeks, but other than directly accessing a file (e.g. an OPEN statement, or command execution), most directory operations require linear scanning of the directory anyway. So it's more or less unavoidable that the larger the directory gets, the more overhead there will be.

Another consideration is how often the directory is accessed. Directories containing your main data files and programs (e.g. SYS:, CMD:, BAS:, [p,0], etc.) are going to have a lot of accesses, which on the upside means they'll remain cached but on the downside there's a lot of linear searching. Directories used for archival, or perhaps for reports, might get much less access so maybe you don't need to worry about them as much.

Personally, I would try to keep the active directories below, say, 2000 files. But I often run into sites that have ten or even a hundred thousand files in a directory. Doing a wildcard search like DIR ABCD*.RUN on a directory with fifty thousand fiels can take many seconds on a busy system, especially if the directory hasn't been fully cached.

Typically, the main cause behind these giant directories is the creation of report or other output files with unique names (timestamped, sequentially numbered, user-associated, etc.). Depending on the situation, a couple of suggestions for managing this would be:
  • Create one or more special ersatz directories, e.g. REPORTS: or LOGS: and direct your output files there. You can then set up a scheduled task to erase them after, say, a week, allowing you the ability to review/reprint the files for a reasonable period without letting them accumulate indefinitely.
  • Use native hierarchical directories, rather than PPNs, for such files. You can create a standard function to generate an appropriate directory tree (e.g. /REPORTS/CCYY/MM/) and just incorporate that into your standard open-file-for-output routine.
  • If you're in the hardware business, embrace the problem and use it to convince your customers to keep buying faster and bigger servers! laugh

Re: File Max in ppn/dir [Re: Frank] #36081 11 Apr 23 05:34 PM
Joined: Sep 2002
Posts: 5,450
F
Frank Online Content OP
Member
OP Online Content
Member
F
Joined: Sep 2002
Posts: 5,450
Thanks for the reply Cap -

I agree 100% and have witnessed (and caused) some of the above. In some cases these sequential files need to be accessed in perpetuity and thus cannot be archived or deleted after useful lifespan.

I just recall (in Herman's time) there was a ,b,2 limit the the number of files that could physicially reside in a directory. If this is basically a non-issue and it comes down to speed of locating or opening a file on a busy directory i am OK with that since these aren't report type data files.

Re: File Max in ppn/dir [Re: Frank] #36083 11 Apr 23 07:04 PM
Joined: Jun 2001
Posts: 11,645
J
Jack McGregor Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 11,645
I'm 100% sure that in modern filesystems there's not a 64K limit on the number of files, so you don't need to worry about that.

I would suggest going with the hierarchical dated directory tree structure (\category\CCYY\MM\xxx) for files that need to be maintained forever.

One other idea would be to periodically tar or zip them up, one category/month per tarball. That keeps the directory clean and also probably saves a lot of space.


Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3