|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Jul 06, 2007 4:58 pm
Regular expression syntax in CMUD v2.0 |
This is a continuation of the discussion about allowing a short-cut syntax for specifying regular expression patterns in the upcoming 2.0 version of CMUD.
The reason for allowing normal trigger patterns to contain "embedded" regular expression syntax is so that we don't have to have duplicate commands and functions...one for normal zscript trigger patterns, and one for regular expressions. Also, there are some commands that don't currently support regular expressions: for example, the (#IF @var =~ "pattern") expression.
Several people have suggested using the syntax "/regex/" to specify a regular expression. I really like this, but it causes a problem for any current trigger that contains a "/" character. And the "/" character is pretty common, especially on MUD prompts like "100/500hp 200/300 mana" etc.
I want to allow "embedded" regular expressions. So in the above example, CMUD might think that "/500hp 200/" was an embedded regular expression, and this would break a lot of existing triggers.
Even though I personally like the look of "/regex/", maybe we can use something like "#regex#" instead? I think Perl uses that syntax (actually, I think Perl allows any delimiter to be used).
Related to this is how to specify options, such as case-sensitive. In Perl, PHP, and other languages that use regular expression syntax, you can do something like this: "/regex/i" to specify a case-insensitive regular expression. Again, this doesn't work very well for embedded regular expressions within a normal trigger pattern.
So, I'm looking for suggestions on how people would like to see this problem solved. I think it would be a really useful new feature if we can come up with a good syntax that isn't too kludged. |
|
|
|
Arminas Wizard
Joined: 11 Jul 2002 Posts: 1265 Location: USA
|
Posted: Fri Jul 06, 2007 5:38 pm |
Does the delimiter have to be a single char? What about %/regex/% with the %/ as the opener and the /% for the closing? Then tell the parser to look ahead if it finds a %/ and if it does not find a /% to treat the pattern normally instead of parsing for regex?
|
|
_________________ Arminas, The Invisible horseman
Windows 7 Pro 32 bit
AMD 64 X2 2.51 Dual Core, 2 GB of Ram |
|
|
|
Thinjon100 Apprentice
Joined: 12 Jul 2004 Posts: 190 Location: Canada
|
Posted: Fri Jul 06, 2007 6:14 pm |
If you can use multiple characters, and you're going to encapsulate the global/insensitive/etc flags, you'll probably want to go with something more along the lines of %/regex/flags/%. The middle / should be unharmed by the regex, as normal regex won't parse a / without a \ delimiter.
This does bring one issue with it, though... if you're going to allow inline embedded regex... which escape delimiter is used? In normal zScript, the ~ is the escape, while in regex it's the more traditional \... are we going to have issues on conflicting escape characters, where we'll have to double-escape things, so that zScript and regex don't attempt to assign them their special functions, or will the zScript parser ignore everything within an embedded regex, thus returning us to only the use of the backslash? |
|
_________________ If you're ever around Aardwolf, I'm that invisible guy you can never see. Wizi ftw! :) |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Jul 06, 2007 6:55 pm |
What I plan to do with the escape characters is to allow both. If you use the traditional \ character, it gets passed untouched to the regular expression. But CMUD will also parse the ~ character and will change it into a \ character. This will allow people who want to consistently use the ~ character to work properly. Of course, to embed a literal ~ character you can use ~~ or \~. Both of those will be parsed correctly.
The %/regex/flags/% idea is pretty good. I don't think that interferes with any existing syntax. The % is already used for function calls and for %1..%99, but using %/ should be easy to parse. The advantage of this over the #regex# syntax is that it allows easy implementation of the regex flags feature. |
|
|
|
Zhiroc Adept
Joined: 04 Feb 2005 Posts: 246
|
Posted: Fri Jul 06, 2007 9:17 pm |
Well, as this is CMUD 2.0, if the parser is extended to allow for /regex/flags directly, without the use of quotes, then there is no confusion, which seems to be the best of both worlds.
#if ($str =~ "pattern")...
#if ($str =~ /regexp/flags)...
Or is changing the parser not really possible?
By the way, you can change the delimiters in Perl, but you have to then use the explicit match syntax (m#regexp# or m(regexp) which demonstrates bracketing delimiters). |
|
|
|
Tech GURU
Joined: 18 Oct 2000 Posts: 2733 Location: Atlanta, USA
|
Posted: Fri Jul 06, 2007 11:49 pm |
Since using the /regex/ approach could cause issues with the existing triggers I'm in favor of Amrinas' suggestion. %/regex/flags% works for me.
|
|
_________________ Asati di tempari! |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Sat Jul 07, 2007 12:04 am |
Zhiroc: changing the parser like that doesn't allow the regex to be embedded into the middle of an existing pattern. It's not just the #IF statement that I'm trying to fix...I'm trying to come up with something general that will work for *all* trigger patterns to allow embedded regex syntax.
|
|
|
|
haiku Wanderer
Joined: 19 Nov 2004 Posts: 70
|
Posted: Sun Aug 05, 2007 1:10 am |
ruby syntax on regex can be either /regex/ or %r{regex}. Would that help?
|
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Sun Aug 05, 2007 3:19 am |
I suppose it's a bit belated now, but I quite like that %r idea. It seems a very zScript-like syntax. I could imagine there being technical problems with that kind of function-ish syntax, though, since it's not something that's ever been used before.
|
|
|
|
Daagar Magician
Joined: 25 Oct 2000 Posts: 461 Location: USA
|
Posted: Sun Aug 05, 2007 2:36 pm |
Zugg's latest blog entry sounded like he already went with the %/regex/% format.
|
|
|
|
|
|