Register to post in forums, or Log in to your existing account
 

Play RetroMUD
Post new topic  Reply to topic     Home » Forums » CMUD Beta Forum
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Sun Jul 29, 2007 9:55 pm   

[1.34] Regex \b not quite right
 
When %q came up in discusion recently I did a bit of testing and found that it and the underlying \b regex were not quite right. Patterns of:
Code:
something\b+
something\b?
something\b*
something\b{1}
something\b{0,1}
something\b h
All would not match the string "something, h".

A pattern of "something\b" though does. For some reason the \b wildcard is not properly interpretting modifiers, and also has trouble when followed by other text in the pattern.

Also when used within parenthesis "something(\b)" the matching %nn value is always empty.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Sun Jul 29, 2007 10:32 pm   
 
(\b) is always going to be empty because it's a 0-width operator. That's why the last regex there doesn't work either, the comma doesn't exist in the regex so it doesn't match. \b doesn't match a character, it matches the 0-width gap that's between the characters. That's probably why it doesn't like modifiers either, since you can't have two gaps between something next to each other. The simple answer to this problem is "Don't use modifiers with \b".
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Mon Jul 30, 2007 4:30 am   
 
Smile

I just looked a few regex reference pages up again and reread everything. Most of the quantifiers in my tests are legal and actually all legal ones should match. Probably just something overlooked in the regex system as to why they don't match and not a big deal. Just a matter of updating the Pattern Matching page I guess.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Mon Jul 30, 2007 4:54 am   
 
That, or perhaps Zugg is using a different version of the regex system to the one you read about.

If you accept that \b is 0-width, it doesn't make much sense to be able to repeat it anyway. How can you match two boundary lines between things in a row? It doesn't work. Using "test\b?" is nonsensical as well, since you're either interested in what's after the boundary (in which case you're saying "what comes after test is either a word, or it's not a word" and you could use a wildcard or a range to represent its possible values and capture them without using \b) or you're not (\b ends the pattern) and it's not needed.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Mon Jul 30, 2007 5:17 pm   
 
CMUD (and zMUD) use the Perl Regular Expression library PCRE. I'm sure you can google it to find more complete documentation. But my understanding of \b is the same as what Fang mentioned. I understand that it matches a zero-width boundry and doesn't actually match any special characters. It's a special tag, such as the ^ and $ anchors (which I think you can use \s and \e for, or something like that).
Reply with quote
Display posts from previous:   
Post new topic   Reply to topic     Home » Forums » CMUD Beta Forum All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

© 2009 Zugg Software. Hosted by Wolfpaw.net