Register to post in forums, or Log in to your existing account
 

Play RetroMUD
Post new topic  Reply to topic     Home » Forums » CMUD Beta Forum
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Mon Mar 24, 2008 10:30 pm   

%subregex thoughts
 
After tweaking around with some time tests I realized I really have to rewrite how the backreferences are handled to smooth them out. Right now they can cause a rather significant slowdown. I doubt I would get a rewrite of that handling to optimize it done before Zugg was ready to release a new version, but I will work at it for another version.

I have also been kicking the idea of putting in something like the pattern conditional into the substitution string. For those that don't know (?(1)yes|no) when used in a pattern would match 'yes' if the 1 backreference was matched, and 'no' otherwise. Using it as a substitution would mean it would put in those words for the same reason.

The reason I started thinking about that addition is that there are far too many times when I am matching against a list, and then end up having to do %ismember on the matched part. This is essentially double matching. If I instead was able to do something like "(?:()item1|()item2|()item3)" for the pattern and then "(?(1)1)(?(2)2)(?(3)3)" for the substitution then its all done. Obviously the conditional syntax doesn't quite lend itself to being an ismember, but since it is also an if I think it might be the better way to add it. The benefit I see is that it should also eliminate some of the really messy situations with having to nest quotes with %char(34) or other syntaxes.

I am wondering what people think about the addition before I start really coding on any of it. I will also have to double check that I am properly notified when a capture is skipped/bypassed in order to make that addition. I really have to check on that anyway, since I didn't put any handling for it in originally; and it might cause a problem depending on how and whether I am notified. I will also look into seeing if there is some way I can list capture to tell me which of its items was used, but I won't try to hard for that if it is not obvious.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
ReedN
Wizard


Joined: 04 Jan 2006
Posts: 1279
Location: Portland, Oregon

PostPosted: Mon Mar 24, 2008 11:09 pm   
 
I try to avoid backreferences on the principle that they slow things down. I've never had an issue doing the remaining checking in the trigger itself.
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Mon Mar 24, 2008 11:25 pm   
 
Vijilante is talking about "backreferences" within the substitution string, and since the whole purpose of %subregex is to replace a regular expression with other text, using the originally matched text is going to be *very* common. What ReedN is saying is like someone saying that they don't use %1..%99 in their trigger scripts. So I think you are missing the point of Vijilante's post. This has nothing to do with triggers. The %subregex function is like the "preg_replace" function in PHP and handling backreferences in the substitution string is very important.

Vijilante: I'll be happy to make any mods that you come up with. Right now I think it works fine for most people and is still a big improvement over the old version, so I wouldn't stress over it too much.

And I'd worry about the normal backreference stuff before worrying about pattern conditionals. I've never seen pattern conditionals before...does this syntax exists in any other programs? I'd like to keep %subregex working like normal PCRE or preg_replace as much as possible.
Reply with quote
ReedN
Wizard


Joined: 04 Jan 2006
Posts: 1279
Location: Portland, Oregon

PostPosted: Mon Mar 24, 2008 11:38 pm   
 
Ah yes, I misread Vijilante's post, I was thinking of triggers.
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Tue Mar 25, 2008 1:14 am   
 
The conditional syntax is the same as is used in a pattern. A common usage of it is to put the capture you will condition on within a look ahead or behind. This causes the capture to not actually use any of the string but get set when the look is valid. Then the pattern mutates based on the condition.

Another common usage is the nested parenthesis example that is in nearly every regex book at this point.
PCRE documentation wrote:
Consider the following pattern, which contains non-significant white
space to make it more readable (assume the PCRE_EXTENDED option) and to
divide it into three parts for ease of discussion:

( \( )? [^()]+ (?(1) \) )

The first part matches an optional opening parenthesis, and if that
character is present, sets it as the first captured substring. The sec-
ond part matches one or more characters that are not parentheses. The
third part is a conditional subpattern that tests whether the first set
of parentheses matched or not. If they did, that is, if subject started
with an opening parenthesis, the condition is true, and so the yes-pat-
tern is executed and a closing parenthesis is required. Otherwise,
since no-pattern is not present, the subpattern matches nothing. In
other words, this pattern matches a sequence of non-parentheses,
optionally enclosed in parentheses.


I was also just thinking of 2 things that would be pretty easily added at the same time. However they are not part of the regex syntax.

One would be a list of everything that a given capture caught during the entire string. For example "abc 123 456 def 789 ghi 034" with the pattern "(\d+)" could supply a substitution item of "123|456|789|034". It would be limited to only being tacked on to the beginning or end of the final string, but I can think of 1 or 2 where I thought to myself if only I knew what it matched each time.

The other would be a sort of debug. Again it would be tacked on to the beginning or end of the final substituted string. The format would be a record variable with keys of each match and values of what was put in its place.

If did either or both of those secondary ideas I would probably just pick one of the sides for where they went, but allow them to be specified at any point in the subsitution string. Whatever format I came up with for them would allow specifying a seperator string. Perhaps it might be possible to have some sort of previous match (what did \1 mean 3 matches ago) syntax too, but I would really have to think about the requirements for that.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
JQuilici
Adept


Joined: 21 Sep 2005
Posts: 250
Location: Austin, TX

PostPosted: Tue Mar 25, 2008 4:43 am   
 
While you're working on the backreferences, please take a look at the %pat() function, as well - I imagine that they are very closely related. The following code used to work correctly in 2.18:
Code:
#show %subregex("The quick brown fox jumped over the lazy dog","(a|e|i|o|u)","%upper(%pat(1))")

and produced
Code:
ThE qUIck brOwn fOx jUmpEd OvEr thE lAzy dOg

but in 2.20, it just deletes all the vowels. This may be a problem with the reference, or subregex may simplly be evaluating its third argument too early in 2.20.

(I just found this while working through examples from the manual page for %subregex. The second example on that page fails, too.)
_________________
Come visit Mozart Mud...and tell an imm that Aerith sent you!
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Tue Mar 25, 2008 10:14 am   
 
The use of %pat was completely removed. When I wrote the subregex routine I had no way to know how to enable it, and so I wrote a rather simple backreference replacement in. Zugg decided he liked eliminating the confusion with %pat and kept what I wrote. The %pat confusion is that using the %subregex from a trigger you will likely already have values associated with the %pat function, and you may even be mixing some of those references in with the captures from the subregex.

I wrote a reasonably lengthy comment into the help that explained the new usage. Your example properly changed for 2.20 is
Code:
#show %subregex("The quick brown fox jumped over the lazy dog","(a|e|i|o|u)","%upper(\1)")
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Tue Mar 25, 2008 12:19 pm   
 
This is weird as a bag of snakes; I replied to a bunch of threads earlier (or I thought I did), including this one, yet my reply's now gone. I must've made the whole thing up =/

The simple answer, as Viji says, is just to replace %pat(1) with \1.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
JQuilici
Adept


Joined: 21 Sep 2005
Posts: 250
Location: Austin, TX

PostPosted: Tue Mar 25, 2008 2:21 pm   
 
Aha....that'll teach me to use the help file from inside CMUD rather than coming to the website. Makes sense now.
_________________
Come visit Mozart Mud...and tell an imm that Aerith sent you!
Reply with quote
Larkin
Wizard


Joined: 25 Mar 2003
Posts: 1113
Location: USA

PostPosted: Tue Mar 25, 2008 5:34 pm   
 
The help files and web site draw from the same source, except that the web site shows these additional comments. I'm sure it'll get rolled into the help entry soon, especially after this thread, eh?
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Tue Mar 25, 2008 6:13 pm   
 
I'm surprised Viji didn't just add it to the article, actually. I'm loathe to do it myself because I haven't looked into \k and \K thoroughly.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Tue Mar 25, 2008 9:05 pm   
 
I didn't addit it to the help because 2.18 is the public version and that is what the help has to reflect. I put the full explanation there as a comment to document the changes for the beta version figuring that beta testers frequent the forums and would notice there was an update.

In any case since no one seems to have any opinion about the idea of adding the conditional I will let it stew some more as I look into how hard it would be to produce a number return from a capture pattern like (abc|def|ghi). That would be the equivalent of doing %ismember at the same time which was really what got me thinking about it in the first place.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
JQuilici
Adept


Joined: 21 Sep 2005
Posts: 250
Location: Austin, TX

PostPosted: Tue Mar 25, 2008 9:23 pm   
 
Vijilante wrote:
I was also just thinking of 2 things that would be pretty easily added at the same time. However they are not part of the regex syntax.

One would be a list of everything that a given capture caught during the entire string. For example "abc 123 456 def 789 ghi 034" with the pattern "(\d+)" could supply a substitution item of "123|456|789|034". It would be limited to only being tacked on to the beginning or end of the final string, but I can think of 1 or 2 where I thought to myself if only I knew what it matched each time.

The other would be a sort of debug. Again it would be tacked on to the beginning or end of the final substituted string. The format would be a record variable with keys of each match and values of what was put in its place.

If did either or both of those secondary ideas I would probably just pick one of the sides for where they went, but allow them to be specified at any point in the subsitution string. Whatever format I came up with for them would allow specifying a seperator string. Perhaps it might be possible to have some sort of previous match (what did \1 mean 3 matches ago) syntax too, but I would really have to think about the requirements for that.

I think this way lies madness.

As you mention, the trouble is that (a) you're inventing a new non-standard syntax, and (b) you really only want to dump this at the end of the line, not on each match. It's not clear what the right way to express this would be.

However, it should be possible to achieve all the results that you desire, plus a great many more (e.g. 'replace the end of the line with the value of \2 the next-to-last time \3 matched', or 'sum the numeric value of every-other match of \2'), simply by putting a custom function or two into the substitution string, along with the conditionals that you are considering adding (I'm completely in favor of adding them, BTW). Since there is a true 2-stage evaluation of the string on each match (regex engine replaces backreferences, then zScript expands variable & function references), your custom function can do whatever internal bookeeping it wants on the matches.

For instance, consider the following:
Code:
#FUNC myFunc($op,$str) {#SWITCH ($op) (start) {myFuncList={};#RETURN ""} (end) {#RETURN @myFuncList} {#ADDITEM myFuncList $str;#RETURN " "}}
#SHOW %subregex("abc def ghi jkl mno pqr","((?P<startln>^)|(?P<match>\s?[agm]\w{2}\s?)|(?P<endln>$))","(?(startln) @myFunc(start)|(?(endln)@myFunc(end)|@myFunc(match,(?P=match))))")

Yes, it's evil, but it means that @myFunc(start) gets called at the beginning of the line, @myFunc(match,<text of match>) at each match, and @myFunc(end) at the end of the line. The code inside myFunc deletes each match (by returning a space), builds the list of matches, and dumps the list at the end of the line. If I haven't done something stupid, the result of that call should be:
Code:
def jkl pqr abc|ghi|mno

(Or it would, if conditionals were working in the replacement string. Wink)
_________________
Come visit Mozart Mud...and tell an imm that Aerith sent you!
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Tue Mar 25, 2008 11:44 pm   
 
Ok. That is 1 vote for the conditionals. I have checked on a few of the things needed for that and they should be pretty easy until I try to handle nesting them as is in your example, but I already have thoughts on how to handle that.

Also for your example the way subregex works and the addition of the conditional would result in "@myFunc(start)@myFunc(match, abc )def@myFunc(match, ghi )jkl@myFunc(match, mno )pqr@myFunc(end)" being passed to the zScript parser for evaluation and return. So it should actually work, but part of the idea was to make it use less script and be easier to use. For your example the final return would be "def jkl pqrabc|ghi|mno", your $op=end #RETURN does not provide a seperator.

On point A, yep. I would have to invent some syntax. I would probably want to keep it something like the regex pattern syntax, but different enough that it is not likely to ever clash with something that might be added to Perl or another language's regex system. Probably something like (?DEBUG=seperator text) and (?LIST=name/number|seperator text). If I can do it these would be for each instance substituted, (?MEMBER=name/number) and (?MATCHED=[i]name/number|relative/absolute instance)

On point B, no it isn't clear how to express it. I would probably just make it the end of the line all the time, and then document it. I don't really like doing something like that, but I definitely see a few uses for such a return value in my scripts. For example I have taken to using %subregex to remove all unwanted items from a list. Sometimes they are garbage, other times I just don't want to look at them in the following #FORALL. If I could split the list into 2 parts with 1 %subregex it would definitely shorten some of my scripts. I also want to try and keep that substitution string short, easy to read, and easy to debug.

You know the funniest thing is I used to hate %subregex. Everytime I tried to use it I would lock up zMud, and then I didn't have much better luck with CMud. Not that long ago, around 2.14, it became my best friend for many parsing things.

I am still just thinking about many of the things, but since Zugg is giving me a bit of trust to let me write code that may be added to CMud I want to make sure it is done right.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Sat Apr 05, 2008 2:53 am   
 
I thought I would give those interested an update. I obviously didn't get anything done in time for 2.22.

I finished all the design work for the data structure to hold the parsed substitution string. I think JQuilici was right when he said, "I think this way lies madness." In any case the structure I designed should be rather efficient on both memory and speed, and will handle a bit more then I described earlier.

I am currently working on the parsing required to handle it. I have the basic code structure for it done, and nearly all of the simple backreference support is written. Right now I am aiming at getting the nasty part, nested stuff inside a conditional, done. For those that want a peek at the thinking behind it, this is most of the regex that will be responsible for parsing the substitution string.
Code:
   //octal 050=( 051=)
   '(?<bs>\*)?'+ //optional backslashes
   '(?:(?<p>\050)?(?(p)\?(?:(?<ref1>P=)|(?<mem>MEMBER:)|(?<list>LIST:)|'+ //matched p needs ref
   '(?<c1>\?\050(?<cins>[<=>$!^]{0,2}:)?)'+ //have the start of conditional
   '(?:ERROR|INSTANCE|DEBUG)\051(*ACCEPT)'+ // commands that require no ref
   // ACCEPT causes immediate success
   '(?<ar>))'+ //null capture to set the ar name for later test condition
   '|'+ // false side of condition on p
   '(?<ref2>\k|\g|\)'+ // back reference symbols
   '(?<ar>(?<b>{)|(?<l><)|(?<q>''))?)'+ //check for inclusion characters and set ar if we have any
   // made it this far we need ref
   '(?(cins)|'+ // have condition based on instance no ref wanted
   '(?<ref>'+regex.SubStringNames+'\d+))'+ //get the ref, names ends with a |
   '(?(ar):(?<rr>(?:+|-)?\d+))'+ //if absolute or relative referencing is allowed
   '(?(b)}|(?(l)>|(?(q)''|(?(p)\051)))))'+ // on condtional this closes the ref section
   '(?(c1)|(*ACCEPT))'+ // if we didn't start a conditional then it is done
I am actually planning to allow some extended referencing like \k'abc:7' which would mean use the value captured to abc on the 7 match instance. The conditionals will additionally have ^ for start of string, $ for end of string, < <= > >= = == != <> instance to allow you to replace in different ways after at different instances.

I am also pretty much total cleaned up on the backslash replacements so that they only occur when escaping a valid replace command. I am trying to make it so that someone really doesn't have to think about whether they need it. The only place it gets tricky is within the conditionals because I am not about to write a 200 line parsing routine to handle all the guessing and nesting. Those will require \(, \|, and \) for such characters within them.

On a much lighter note I finally figured out the right combination of things to use the SKIP verb instead of PRUNE during the matching. This looks like it picked up the speed a bit, and after I have thoroughly tested that change I may be able to shorten some other code for a little more speed. I have also gotten the naked alternation to work, which is a definite compatibility improvement.

Hopefully I will be able to get all of it written and tested in time for 2.23
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Taz
GURU


Joined: 28 Sep 2000
Posts: 1395
Location: United Kingdom

PostPosted: Sat Apr 05, 2008 3:10 am   
 
Well you've got a month so that should be plenty of time.
_________________
Taz :)
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Tue Apr 15, 2008 11:41 pm   
 
I haven't given up on this. I am still plugging away at it. I really have to laugh at Delphi sometimes. The current one I find totally funny is these few lines of code:
Code:
const OPLAST: Byte=109;

const OP_LENGTHS:array[0..OPLAST] of byte = (
Delphi gives a compiling error pointing at the OPLAST in "[0..OPLAST]". The error message is "[DCC Error] : E2026 Constant expression expected". Just silly since the "const" means constant. Evidently some constants are not constant enough. Laughing
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Sun Apr 20, 2008 1:49 pm   
 
I am now into the final phases of writing this and need some input on a few items.

First is the new (?ERROR) function. By putting this into your substitution string it will give error information. It starts with any pattern compile error there might be, then as parsing of your substitution string continues it will report possible errors there. How to return it to you is the real question. I was planning on tacking it on at the front of the returned string, but then I got this bright idea.

What if I tacked it on such that it the CMud parser would see it as a function reference? By that I mean doing something like
Code:
#SHOW %subregex("abc","(a","(?ERROR)x")
would pass '@SubRegexError("missing ) (2)")abc' back to the CMud parser. This would cause the #SHOW to actually display 'abc' unless you had the function to do something with the error text.

The downside to doing that is it would be very dificult to document, and extremely hard for newer users to understand. It also could cause further havoc for those of us with some threading usage if we do anything major in the function.

A simpler way is to just put a seperator in. Does anyone have any thoughts on what that seperator should be? Perhaps an XML format '<error missing ) (2)/>abc', any thoughts on that?

Second is the (?DEBUG). I am now thinking about having it put some sort of marks into the final string to indicate what was done. I also will definitely have it put an extra set of quote around the string so the CMud parser ignores the contents. This first step would make the final return for the common eval example
Code:
#SHOW %subregex("1 2 12","(\d+)","(?DEBUG)%eval(\1+10)")
display "%eval(1+10) %eval(2+10) %eval(12+10)". I think that is a helpful item for debugging.

The real question is how helpful would marking the matched portions be? What format is best for such marking? Does anyone want to see what the match was? I could make the return from the above example "match1=%eval(1+10)|nomatch1= |match2=%eval(2+10)|nomatch2= |match3=%eval(12+10)", but I think that looks horrible. It might also have problems unless I get the right combination of quotes. If I use some obscure symbols to sperate the portions then it once again is hard for newer users.

Again perhaps an XML format would be good, "<match1 1>%eval(1+10)</match1> <match2 2>%eval(2+10)</match2> <match3 12>%eval(12+10)</match3>". This is still a little tough on the eyes, but is actually easier to explain, and allows some flexibility to include what was matched and what the resulting substitution was.

Any thoughts on either item would be helpful.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Sun Apr 20, 2008 2:35 pm   
 
Seems to me like raising an event would be a much better course than using a function, if you decided to use that route.

However, it'll be obvious that there's an error in the regex when it starts substituting incorrectly. If you're manipulating a string in that way, you're going to be displaying it somewhere eventually. You'll notice that an incorrect string's being displayed and be able to backtrack to the problem and fix it. The only time this'd really be useful is if your %subregex was taking a dynamic regex rather than a fixed one, but I can't think of (m)any tasks that'd require that. Seems very niche.

EDIT: I did just think of one time you might use subregex where an error checker like this might be useful - when building a list to be passed to #forall. I guess it does have its uses.

Similarly with the debug option - I can't see it being all that useful. The only time you'd need it is if you built your regex wrong to capture what you wanted it to (perhaps using a greedy quantifier when you shouldn't have). Regex novices probably won't understand the debug output or how to apply that to their regex, and advanced users probably don't need the crutch.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)

Last edited by Fang Xianfu on Sun Apr 20, 2008 2:38 pm; edited 1 time in total
Reply with quote
JQuilici
Adept


Joined: 21 Sep 2005
Posts: 250
Location: Austin, TX

PostPosted: Sun Apr 20, 2008 2:37 pm   
 
Rather than stuffing these strings into the substitution, why not return them through additional arguments to the %subregex function? By that I mean, allow a call like:
Code:
$newStr = %subregex($string,$pattern,$subStr,$errorStr,$debugStr)

The $errorStr and $debugStr would get filled with the error string and the debug string (whatever we decide those should contain), but the return value ($newStr) would not contain either one - only the actual result of the substitution.

This has several advantages:
  1. If the error or debug arguments are not used, the implementation can avoid constructing them (for speed)
  2. No need to parse out the error or debug strings from the return value
  3. It becomes very easy to build #if tests on the error string (useful if the pattern comes from the user and might be bugged somehow). Imagine following my example with '#if ($errorStr) {//print error message} {//do something useful with $newStr}'.
  4. The subStr really is just the substitution string. Makes documentation easier, among other things.
  5. Since the additional args are optional, all existing calls to %subregex() work exactly as before.
  6. If we decide to get verbose in the error or debug output, we can make them multiple lines, stringlists, DB vars, or whatever makes sense, without screwing up the return value.

Thoughts?
_________________
Come visit Mozart Mud...and tell an imm that Aerith sent you!
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Sun Apr 20, 2008 4:47 pm   
 
Quote:
Seems to me like raising an event would be a much better course than using a function, if you decided to use that route.
Not really something I have an option about. That would require Zugg to modify the handling for subregex, and it would also be very difficult for me to pass a signal up to his code to do that.

Quote:
Regex novices probably won't understand the debug output or how to apply that to their regex, and advanced users probably don't need the crutch.
You have no idea how many times while working out a detailed subregex I really needed to have some sort of debugging output. Small mistakes in the pattern like a missing ) are irritating and easy to make. Causing something to not match at all is harder to diagnose, and I still do it all the time. Getting more detailed information always helps to spot the problem.

Quote:
Rather than stuffing these strings into the substitution, why not return them through additional arguments to the %subregex function?
Zugg stated an aim of having 2.23 as a public release. The type of change required to do what you are suggesting is attactive, but it is a low level parser change Zugg would have to make. I know he really doesn't like making such changes near a public release. Another reason not to do it is that one of my goals from the start of my subregex project was to make it a drop and go item for Zugg, changing the structure of the function doesn't do that.

Quote:
If the error or debug arguments are not used, the implementation can avoid constructing them (for speed)
It is already done that way. When they are not requested they do not get built.

Quote:
If we decide to get verbose in the error or debug output, we can make them multiple lines, stringlists, DB vars, or whatever makes sense, without screwing up the return value.
That is the entire point of both of the items. My general assumption with them is that a user is having trouble with a specific subregex. They are looking to find out what is wrong and will add the (?ERROR) and/or (?DEBUG) to the substitution string so they can find out what is going on. For example a few posts back I put up an initial version of the regex that handles the parsing I was contemplating. The final version of that regex is about 3 times as long, and it took me nearly 5 hours to perfect the conditional nesting portion of the regex. Along the way I made silly mistakes like a misplaced ) or *, that lead to the regex not compiling, that was the first reason I wanted to add an error item.

During debugging of the parsing I intenionally make mistakes to simulate bad user inputs, and make sure they will be handled correctly. Sometimes I accidentally make a mistake like forgetting the ( for a conditional's condition, and then I am looking at the test output saying why isn't right. Once I spot the problem that leads me to decide I should add another possible error check into the parsing pattern.

All such mistakes in the substitution string are show up rather quickly because they become a text replacement. For example this errored usage
Code:
#SHOW %subregex("abc","b","(?(1)x)")
would display "a(?(1)x)c". If a user instead does
Code:
#SHOW %subregex("abc","b","(?ERROR)(?(1)x)")
Then I would build the error message 'Invalid reference "1" at: 12'. I really do want some way to pass that back to the user, and that is what I need help deciding how to do.

Quote:
Since the additional args are optional, all existing calls to %subregex() work exactly as before.
When I am finished with this I will have to go fix all my usages of %subregex to change them from %pat to the new syntax. I am very tempted to write another regex to find all of them, and then see if Zugg will add it to the compatibility report. As it is something like that needs to be done anyway.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Tue Apr 22, 2008 10:06 pm   
 
I have more or less reached a decision with this, and the reason for it was pretty simple
Vijilante wrote:
My general assumption with them is that a user is having trouble with a specific subregex. They are looking to find out what is wrong and will add the (?ERROR) and/or (?DEBUG) to the substitution string so they can find out what is going on.
A user really shoulnd't have either of these thing in all the time. Sometimes it is helpful to talk about it to strighten my thoughts out.

I am still open to format specifics, but right now I am leaning towards and XML format for both because that makes both behave the same and is rather easy to put in. It also allows me to give some details about the behavior of the entire thing to the user.

Since Zugg announced a rough schedule for 2.23 I really can't offer too much more time for suggestions. Some of the code I adjusted in order to build this is a low level change in trigger functioning, and Zugg would want more time to evaluate those changes for a beta version. I really don't want to hold up a public release for my own crusade.

I will be spending about a day on testing and debugging with the list I have built and then adding the final format for those items. Somewhere in there I have to build Zugg a test package so he can evaluate it easily, and also write up some better documentation for it. Last call for suggestions.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Display posts from previous:   
Post new topic   Reply to topic     Home » Forums » CMUD Beta Forum All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

© 2009 Zugg Software. Hosted by Wolfpaw.net