For what it's worth, along with the fix for the ++IFDEF <structname> issue (discussed
here), I've added the following stats to the end of the LSX file in the hopes of perhaps making it easier to identify compiler performance issues, or at least quantify them. (This is from your prccstdsp program compiled under Windows) ...
Performance Statistics *:
======================================================
Phase 0 (/p, /px prescan): 3.2812 secs
Shake out unused routines (/px): 0.0312 secs (3614 kept, 7717 dropped)
Phase 1 (main compilation): 2.8906 secs
Phase 2 (emit RUN code): 0.1406 secs
Total elapsed time: 16.4653 secs
Labels/Functions: 99984 searches, 312.5000 ms
Symbol Definitions: 1150955 searches, 4218.7500 ms
Variable Definitions: 261632 searches, 937.5000 ms
Structure Definitions: 15041 searches, 0.0000 ms
* Total elapsed time should be accurate. All other time values are based on
the Windows kernel + user CPU clock counters, theoretically measuring
CPU rather than elapsed time, but may not be precise enough to measure
individual searches accurately.
As the asterisked note indicates, unfortunately, measuring the time used in routines that take almost no time per call, but which get called tens of thousands of times, potentially adding up to real time, is not very reliable when the time for a single call isn't significantly longer than the clock resolution. The Windows CPU clocks use 100-nanosecond units, but it's not clear whether the actual resolution is that fine, or whether, for example, each search for a structure name in the structure definition index takes significantly less than that, thus making it look like the 15041 searches took no time at all. Also, the total CPU time for the 4 search routines itemized is very close to the total CPU time for the entire compilation, which is ridiculous, because there's a lot of other code in the compilation that wasn't being separately itemized. So that strongly implies that whether or not a routine gets properly measured is a bit like Russian Roulette (either it gets off for free because the clock didn't change, or it gets hit with a big charge, most of which should have gone against prior routines.
Another detail which stands out is that the initial pre-scan (Phase 0) actually took longer than the main compilation (Phase 1), which surprised me. Partly that may be because the pre-scan had to look at 3 times as many routines as the main scan. And partly it may be because the pre-scan is having to build these various index trees, offloading some of the work from the main scan (which mostly then was doing a lot of searches of those trees). Or maybe there is some inefficiency there which needs to be squeezed out. But here we can't blame the clock resolution because it's measuring once at the start and once at the end of each phase (which are plenty long enough to make the clock resolution a non issue).
Another surprising result is that the total elapsed time is so much more than the CPU time. That suggests that the OS devoted almost 2/3 of its time to other background processes. (Not entirely out of the question, especially with Windows, but still a bit surprising.)
Anyway, I'll probably post the update later tonight or tomorrow, but now that you have your
workaround, and considering how apparently rare it is for the original problem to actually be triggered, it's not clear there's any urgency.