Previous Thread
Next Thread
Print Thread
Unable to create shared memory #9631 25 Jul 01 09:30 AM
Joined: Jun 2001
Posts: 713
S
Steven Shatz Online Content OP
Member
OP Online Content
Member
S
Joined: Jun 2001
Posts: 713
Periodically (say, once a month), I or my customers get the following message when trying to invoke A-Shell on the customer's system:

[Unable to create shared memory]
Cannot open A-Shell queue system.

Our solution when this occurs has been to reboot Linux, but I am not sure a reboot is really necessary.

1. Why does this problem arise?
2. Is it normal to occur every 30-40 days?
3. What is the proper way to resolve this problem?
4. How can this problem be avoided?

I should note that we try NOT to reboot Linux (Red Hat 6.2 at this site) whenever possible since this invariably results in printer lockups.

Re: Unable to create shared memory #9632 26 Jul 01 04:38 PM
Joined: Jun 2001
Posts: 11,650
J
Jack McGregor Offline
Member
Offline
Member
J
Joined: Jun 2001
Posts: 11,650
First, a little background:

If your miame.ini QUEUE= statement contains "MEM:" (instead of "DISK:") then A-Shell (Unix/Linux only) uses shared memory instead of a shared disk file to store the QFLOCK.SYS table (which is used by XLOCK, FLOCK, and related locking subroutines, but not by LOKSER.) The advantage of shared memory vs. shared disk file is much faster access, which could be important in a large system if you use one of these locking subroutines frequently. The disadvantage is that shared memory is a little harder to work with, analyze, debug, monitor, etc., compared to a disk file, which can simply be erased if all else fails.

Now to answer your questions:

1. Why? I don't know. UNIX/LINUX maintains a table of processes which are "attached" to each shared memory segment, and the implication is that there is some kind of corruption, resource leakage or other limitation which is preventing another attachment from being registered.

2. There is no "normal" frequency for this problem, and it is actually quite rare. The few times it has been reported, it has been on large systems that have been up for a few weeks or more (which tends to reinforce the resource leakage theory.)

3. Obviously, rebooting will clear all memory. But a slightly less drastic approach would be to get all the A-Shell users to log out (which you would want to do anyway prior to a reboot.) You may find that QUTL.LIT, SYSTAT.LIT, KILL.LIT and the UNIX/LINUX ipcs and ipcrm utilities are helpful in the following ways:

SYSTAT.LIT will allow you to see who is logged in. (You're going to want to get them all to log out, so you may want to identify them.)

QUTL.LIT gives you various ways to see what locks are in use - both in overview with the STATUS command and by listing them individually, which may help you decide if there is a lot of garbage there.

KILL.LIT can be used to kill jobs that you can't get to log out in some other way.

ipcs displays information about the shared memory segments (and semaphores and message queues, although A-Shell does not use any of them.) If there are multiple shared memory segments, the A-Shell one can usually be identified by the combination of the owner (will be an A-Shell user), the key (will start with 0000), and number of attachments, which should be equal to the number of A-Shell users currently running. (It if does not agree with the job count in SYSTAT, that would imply something has gone wrong with the attachment table in the operating system.)

If after getting all the users out of A-Shell, the shared memory segment does not go away by itself (it should) then you can remove it manually using ipcrm. The syntax varies among UNIX flavors, but basically you just need to identify the segment (by its "id") that you want to delete.

Once deleted, you can start launching A-Shell sessions again, as the first one in will automatically create a new segment.

4. Since we don't know exactly what causes it, we don't know exactly how to avoid it. But it does seem that a periodic reboot (say, every Sunday night) would go a long way towards fixing any resource leaks. There may be a system configuration parameter which allows you to allocate more resources to shared memory attachments, but I'm not sure what it is. You can also switch to the DISK: version, but as mentioned above, that will create additional overhead that might be significant if you do a lot of XLOCK/FLOCK operations.

P.S. Even though the QFLOCK.SYS will not appear on disk (if using the shared memory option) there will be a locking file, qflock.lck in the path specified by the QUEUE= statement. This file handles multi-user locking of the QFLOCK.SYS table. In rare cases, it may be possible that a job hangs while having the table locked, which would also prevent any new users from launching A-Shell (although the error message would not mention a problem with "creating shared memory" since it would have already been created.) If you suspect this to be the case, you can use the UNIX/LINUX lslk utility to see if there is a lock on that file, and if so, by whom. Then you can cross-reference that pid with the output of SYSTAT to see what job that is. If necessary, you can use KILL.LIT (or as a last resort, kill -9 ) to remove the process locking the table.

Re: Unable to create shared memory #9633 27 Jul 01 09:25 AM
Joined: Jun 2001
Posts: 713
S
Steven Shatz Online Content OP
Member
OP Online Content
Member
S
Joined: Jun 2001
Posts: 713
Thank you for your detailed and clear explanation of this problem. As follow-up, we just discovered a defective memory chip on the system and replaced it. So that might have been the cause of the memory pool problems. Time will tell.

Coincidentally (?), one of the RAID disks started having problems at the same time. Though this was compensated for automatically by a spare on the RAID controller, perhaps this too was affecting memory? The problem disk will be replaced next week as well.


Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3