There are various implementations and flavors of "regular expressions", including POSIX and Perl. We've chosen to go with the Perl flavor, because it is generally considered better—more features, more predictable, faster—and seems to be more popular—with Java, Python and .NET for example using similar or derived implementations. The specific implementation we are using is called PCRE (Perl-Compatible Regular Expression); see the website for more details.
A-Shell versions prior to 7.0.1777 used the original version of PCRE; beyond that, depending on the OS platform, it may use the newer version 2 (aka PCRE2). Under Windows, the original library is housed in pcre3.dll (think of it as version 1.3); beyond that, it is housed in pcre2-8.dll. In either case, the appropriate version is included with the standard A-Shell distribution and loaded dynamically the first time you access a regular expression function. For A-Shell/Linux, it is installed via the package pcre or pcre-devel (version 1) and pcre2 / pcre2-devel (version 2). You can determine which version is required by using the command 'ldd ashell' to list the dependencies.
For details on the syntax of regular expressions, see the Perl Regular Expression documentation or any number of web sites which offer tutorials and examples.
The two most common uses of regular expressions are to extend the power and flexibility of string searches, and to check for valid syntax in a string. Another possible use is for parsing and extracting specific portions of strings, using the capture group mechanism to return the subexpression matches of interest.
The main downside of regular expressions is that they are rather cryptic and can become so complex as to consume massive computing resources (although that is generally not an issue for common usage). As an example, a simple (incomplete) regular expression to match a valid email address word-delimited within a larger string, is:
"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"
A more complete expression for validating email addresses (based on RFC 2822) is:
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|""(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"
While it may be doubtful that many users would be able to conjure up such an expression during an ad-hoc search, application developers may build up a collection of useful regular expressions (often you can just copy them from helpful websites devoted to the subject) that can be used for common search or validation purposes.
Regular expression processing internally consists of two separate operations:
| • | compiling the expression (checking for syntax errors) |
| • | using the expression to match against subject string(s). |
Because the operation of compiling the expression can be as CPU intensive as matching it against subject strings, both REGEX and INSTR() support means of reusing previously compiled expressions. In the default case, if the current pattern matches the previously used one, then the previous compilation will be used automatically. This strategy works well when applying a single expression repetitively against many subject strings (as when searching for a pattern in a text file). But it doesn't work so well if you are searching through a file or database and comparing each line/record against more than one regular expression. To maximize efficiency in such cases, as well as for cases where you have a collection of common patterns used throughout your application, you can precompile and store up to 20 patterns, which can then be used on demand without having to re-compile them.
Note: the original implementation of REGEX in A-Shell 5.1.1100 treated a null pattern string as referring to the previously compiled pattern. This mechanism has been dropped, since it is somewhat confusing to implement at the application level, and also introduces the problem of having to specifically check for null patterns. The new implementation just compares the current pattern to the last one (for non-precompiled patterns) to determine when recompilation can be avoided). Null patterns return 0 (failed match) in all cases.
See the sample programs in EXLIB:[908,46].
See the thread "Substring search right-to-left" on the A-Shell Forum for an example of using REGEX to split a path spec into the directory and filename.