Quote

I made the following sequence:

Run a program with -hei command line switch activated. (Actually activated it at runtime using MX_CLFLAGS)
Sent a KILL to the program from dsk0:1,2
The program remains in SYSTAT as a zombie
The GUI remains on the screen
The process was killed and behind the GUI the process went to Linux

Is this the expected behaviour? I believed that KILL would produce a exitcode that -HEI would trap.


The -hei switch causes the SIGHUP signal to immediately generate error 250 in the running program. This is in contrast to the default behavior, in which the receipt of the SIGHUP signal will initially just roll the process into background, allowing it to continue running until it needs something from the terminal (at which point the error 250 will be triggered).

Assuming you have TRACE=SIGHUP,BASERR,INOUT active in the miame.ini (as I always recommend to everyone), then you would normally be able to see this sequence of events in the ashlog file. There are however two complications in your scenario:

1) The KILL command doesn't send a SIGHUP signal. By default it sends SIGINT (^C). If you use the /K switch, then it sends SIGTERM. The -hei switch does not affect the handling of SIGTERM, which always immediately generates error 251. So in other words it acts like SIGHUP would when the -hei switch is active, except the error code will be 251 for SIGTERM (vs 250 for SIGHUP).

2) In the GUI environment, if you are waiting on some user interface action (for example, in an XTREE, or EVENTWAIT, or INFLD call), a BASIC error may abort the server-side program, but that doesn't necessary have any effect on the client, which at the time is waiting on the user and not paying any attention to the server connection. If the server side actually terminates the connection, then yes, the client will most likely detect that immediately, if not within a short time. But the error 251 isn't by itself going to terminate the session.

Here's an example of the scenario you describe. The process 5992 (TSKADJ) is running XTRA2, waiting on an XTREE operation. Then from another process, 4975, I executed KILL TSKADJ/K, causing the SIGTERM signal (15) to be sent. On receipt, it generates BASIC error #251, causing the server side to abort the operation it was in (waiting on a response from the client in XTREE). That leads to the spurious error message (.. response not formatted properly) because in fact, the client doesn't know about any of this and hasn't sent back any response yet. Upon exiting from the XCALL XTREE statement, the error handler is invoked, but in this case there was no error trapping activated so it proceeds to terminate the program and then exit from A-Shell. But, since it didn't terminate the SSH connection (it exited to the bash shell script), the client is still sitting there, blissfully unaware of all this server-side activity, waiting on the user to do something.

Code
18-Jan-23 13:31:29 [p4975-69]<KILL:3b69> Sending SIGTERM to TSKADJ
18-Jan-23 13:31:29 [p4975-69]<KILL:3b99> MX_KILL signal 15 sent to pid 5992, rc=0
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c> SIGTERM trapped on: TSKADJ (jackmc)
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c>  (Setting basic error #251) (m1.tinstate:1)
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c>  SIGINT sent, rc=0
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c> XTREE warning: ATE response not formatted properly!
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c> WARNING: PCKLST/XTREE fatal error code -9
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c> ABasic Error #251 (Termination signal (SIGTERM) received) at location counter 42C (untrapped)
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c>  Job in kbd wait; cleaning up qflock & exiting.
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c> Out: Nodes Remaining = 48P/89L, 2 reads, 0 writes, 219 kbd bytes
18-Jan-23 13:31:29 [p5992-88]<XTRA2:42c>  After qpurge & qclose


Unfortunately, there isn't any straightforward way for the server to tell the client to abort a GUI operation like XTREE, since the client is waiting on client-side events, not isn't expecting anything from the server. About the only option here would be to terminate the connection. (In your example, the A-Shell process terminated and dropped back to the Linux shell, but the SSH connection remained alive. If it had exited entirely, then the session would have been closed. So one possibility would be for you to close the connection when you exit A-Shell. I'm guessing that you probably do that already for most user sessions, but not for administrative/developer sessions.

I agree that the situation isn't ideal from the user perspective. (It would be better to get some visual feedback about the error immediately.) However, in most cases, the only error that seems at all likely to occur while the user is in a GUI client-side wait state (perhaps sitting in an XTREE-based menu) is a spontaneous disconnect. And that should cause an immediate reaction from the client. Note that if you check the "Close Window on Disconnect" option on the Misc tab of the ATE configuration, it will display a message box informing the user that the connection has been lost. Otherwise the window will probably just disappear, leaving the user wondering what happened.

The situation you describe (sending a kill signal from another job) is (hopefully) rather rare, and could be dealt with by having your error trapping routine terminate the session on error 251.

It's not out of the realm of possibility to upgrade the various GUI operations to monitor the TCP channel to the server while waiting on the user, which would then theoretically allow the server to interrupt the client. But it would be complicated. And for perspective, the comparison that is often cited is a web app. There, if the back-end loses its connection or aborts, typically the client won't realize it either, until or unless there is a client-side timeout. Which you can implement in ATE as well (see AG_IATIMEOUT).