|
|
|
Advanced substitutions
Substring functions
The %subregex function supports a number of special replacements. These are (?MEMBER:ref), (?LIST:ref), (?INSTANCE), (?ERROR), (?DEBUG), and the regex conditional syntax (?(ref)true|false). All of these are case sensitive, as is the use of named references.
(?ERROR) adds any pattern error and recognized substring errors to the beginning of the returned string.
(?DEBUG) puts additional match, copy, and zScript call information into the returned string. These are in an XML format and can be used to determine exactly what the pattern matched and what it was replaced with.
(?INSTANCE) provides a counter for the number of times the pattern has been matched.
(?MEMBER) gives the position in a list of alternates that was matched for the reference.
(?LIST) produces a | seperated list of every match for that reference.
Instancing
Additionally all references support specific and relative instancing. This is done by adding :instance to the reference.
Examples
\'1:7' means the text captured to reference 1 on the 7th match.
(?MEMBER:3:-1) means the member value of reference 3 on the previous match.
(?LIST:x:+1) means the list for reference x on the next match.
The symbol $ can be used to indicate final match instance. This particularly useful with (?LIST) to get a complete list of what was captured to that reference.
(?LIST:7:$) means list for reference 7 at the final match.
Conditional Syntax
As with the regex syntax the condition is true when the reference was captured on the specific instance and false when it wasn't. Conditions can also make use of the match instance counter directly.
(?(>7)true) would sub in true when the instance counter is greater then 7.
(?(!=4)true|false) would sub in true on all matches except the 4th where it would sub false
The use of ^ and $ to indicate start and end of returned string are also permitted with conditionals. This allows for captured data to placed at those locations.
(?($)(?LIST:4)) would put the list for reference 4 at the end of the string
(?(^)(?LIST:4)) would put the list for reference 4 at the beginning of the string. Since it would be empty at that time nothing would be added. Further instancing is required to get a usable value, for example (?(^)(?LIST:4:$)).
MEMBER
The (?MEMBER) substitution is for use with patterns that have a capture of a list of items. It is functionally similar to the %ismember function. This substitution function was provided to accelerate member lookups by eliminating the additional call of %ismember.
#SHOW %subregex("abcdef","(a|b|d|e)","(?MEMBER:1)")
displays: 12c34f |
User comments |
Seb: Wed Jul 23, 2008 10:12 am |
|
You might want to explain the term 'instancing'. I worked out that you meant a form of 'slicing' where the result is one instance, but non-programmers don't stand a chance of understanding this article at the moment!
At least one example using each of these special replacements, plus the conditional syntax, would also be _very_ helpful.
That said, thanks for the documentation. |
|
|
Vijilante: Thu Jul 24, 2008 11:15 pm |
|
Of General note
I still have quite a bit to do on the documentation here. I put this up rather hastily to make sure that people could at least get a sense of the features currently available. I have some other more detailed posts in the CMud Beta Forum that discussed some of the design items for these features.
In a few of my tests I am finding some features lacking, and will be aiming at expanding and improving the points I consider to be failures. There is more yet to come as I work on further regex projects.
In specific
The "Instance" is a match counter. When your pattern has matched 12 times the instance will be 12, at the start of the string the instance is 0 as the pattern hasn't matched yet. Maybe I should just select a different word, but then I would likely also want to select a different code for the parser to recognize. I really think having distinct words for 'match' and 'how many matches' is needed. Simply looking up 'instance' in a dictionary and using 27,323 brain cells to consider the current (implied) implications should be enough for anyone that has English as a fourth language. I am pretty sure that out of the google of brain cells each person is born with they can still muster enough to understand this word. |
|
|
Seb: Fri Jul 25, 2008 10:23 am |
|
Yes, it's not 'instance' that is terribly obscure (though you could probably use 'occurrence', which has fewer meanings), it is 'instancing'! Normally instancing, in computers, means creating an instance, doesn't it? Here you mean 'selecting an instance' (or occurrence) by that word.
But it's not really the vocab that is the problem, it's the fact that the terms are not explained, or that the explanation itself uses strange vocab:
Quote: |
Instancing
Additionally all references support specific and relative instancing.
|
What is a 'reference' in this instance? (Ha! No pun intended!) OK, I can deduce what you mean by reference, but it's not automatic, i.e. it requires a few seconds thought. And remember, this manual should be for non-programmers! |
|
|
|