|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Sun Jul 29, 2007 9:55 pm
[1.34] Regex \b not quite right |
When %q came up in discusion recently I did a bit of testing and found that it and the underlying \b regex were not quite right. Patterns of:
Code: |
something\b+
something\b?
something\b*
something\b{1}
something\b{0,1}
something\b h |
All would not match the string "something, h".
A pattern of "something\b" though does. For some reason the \b wildcard is not properly interpretting modifiers, and also has trouble when followed by other text in the pattern.
Also when used within parenthesis "something(\b)" the matching %nn value is always empty. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Sun Jul 29, 2007 10:32 pm |
(\b) is always going to be empty because it's a 0-width operator. That's why the last regex there doesn't work either, the comma doesn't exist in the regex so it doesn't match. \b doesn't match a character, it matches the 0-width gap that's between the characters. That's probably why it doesn't like modifiers either, since you can't have two gaps between something next to each other. The simple answer to this problem is "Don't use modifiers with \b".
|
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Mon Jul 30, 2007 4:30 am |
I just looked a few regex reference pages up again and reread everything. Most of the quantifiers in my tests are legal and actually all legal ones should match. Probably just something overlooked in the regex system as to why they don't match and not a big deal. Just a matter of updating the Pattern Matching page I guess. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Mon Jul 30, 2007 4:54 am |
That, or perhaps Zugg is using a different version of the regex system to the one you read about.
If you accept that \b is 0-width, it doesn't make much sense to be able to repeat it anyway. How can you match two boundary lines between things in a row? It doesn't work. Using "test\b?" is nonsensical as well, since you're either interested in what's after the boundary (in which case you're saying "what comes after test is either a word, or it's not a word" and you could use a wildcard or a range to represent its possible values and capture them without using \b) or you're not (\b ends the pattern) and it's not needed. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Jul 30, 2007 5:17 pm |
CMUD (and zMUD) use the Perl Regular Expression library PCRE. I'm sure you can google it to find more complete documentation. But my understanding of \b is the same as what Fang mentioned. I understand that it matches a zero-width boundry and doesn't actually match any special characters. It's a special tag, such as the ^ and $ anchors (which I think you can use \s and \e for, or something like that).
|
|
|
|
|
|