Regular Expressions Syntax
Oxygen XML Author Eclipse plugin uses the Java regular expression syntax. It is similar to that used in
Perl 5, with several exceptions. Thus, Oxygen XML Author Eclipse plugin does not support the following constructs:
- The conditional constructs
(?{X})
and(?(condition)X|Y)
. - The embedded code constructs
(?{code})
and(??{code})
. - The embedded comment syntax
(?#comment)
. - The preprocessing operations
\l
,\u
,\L
, and\U
.
When using regular expressions, note that some sets of characters from
XPath/XML Schema/Schematron are slightly different than the ones used by Oxygen XML Author Eclipse plugin/Java in the text searches. The most common
example is with the
\w
and \W
set of characters. To ensure
consistent results between the two, it is recommended that you use the following constructs:- /w -
[#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
instead of\w
- /W -
[\p{P}\p{Z}\p{C}]
instead of\W
There are some other notable differences that may cause unexpected results, including the following:
- In Perl,
\1
through\9
are always interpreted as back references. A backslash-escaped number greater than 9 is treated as a back reference if at least that many sub-expressions exist. Otherwise, it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In Java,\1
through\9
are always interpreted as back references, and a larger number is accepted as a back reference if at least that many sub-expressions exist at that point in the regular expression. Otherwise, the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit. - Perl uses the
g
flag to request a match that resumes where the last match left off. - In Perl, embedded flags at the top level of an expression affect the whole expression. In Java, embedded flags always take effect at the point where they appear, whether they are at the top level or within a group. In the latter case, flags are restored at the end of the group just as in Perl.
- Perl is forgiving about malformed matching constructs, as in the expression
*a
, as well as dangling brackets, as in the expressionabc]
, and treats them as literals. This class also accepts dangling brackets but is strict about dangling meta-characters such as+
,?
and*
.