|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Thu Jan 20, 2011 11:12 pm
[Solved]Regex Problems with the Repetition Indicator? |
Pattern:
Code: |
\([\w ]+\){0,3}(?:\s)?(\w+)
|
Sample match test:
Code: |
(Sanctuary)(Red Aura) The High Priest of Althainia stands here, looking for someone to send on a quest.
|
%1='y'
Sample match test:
Code: |
(Sanctuary) The High Priest of Althainia stands here, looking for someone to send on a quest.
|
%1='the'
In both cases the expected match is 'the', however, whenever I add more spell flags (word strings encapsulated in parentheses) it defaults to capturing the last letter of the first flag. Either there's something about the Repetition pattern matching that I don't understand, or this is a bug in how CMUD interprets the regex for it. I figured using repetition would be a more elegant solution than: (?:\([\w ]+\))?(?:\([\w ]+\))?(?:\([\w ]+\))?, but may I have to go that route to get the desired effect? |
|
_________________ Listen to my Guitar - If you like it, listen to more
Last edited by chamenas on Fri Jan 21, 2011 6:24 am; edited 1 time in total |
|
|
|
charneus Wizard
Joined: 19 Jun 2005 Posts: 1876 Location: California
|
Posted: Thu Jan 20, 2011 11:51 pm |
This does appear to be a bug in the way CMUD interprets it, though odd. I tested this through RegExr (a free program in which I do all my regex testing), and in both cases, it captured the word 'The' correctly.
However, it's probably more correct to do your regex this way:
Code: |
^(?:\([\w ]+\)){0,3}(?:\s)?(\w+) |
Hope this helps! |
|
|
|
Fizgar Magician
Joined: 07 Feb 2002 Posts: 333 Location: Central Virginia
|
Posted: Fri Jan 21, 2011 12:14 am |
Change your pattern to:
Code: |
(\([\w ]+\)){0,3}(?:\s)?(\w+) |
Then use %2
*Edit*
I need to start refreshing before posting if i go afk in the middle of posting lol. |
|
_________________ Windows Vista Home Premium SP2 32-bit
AMD Athlon Dual Core 4400+ 2.31 GHz
3 GB RAM
CMUD 3.34 |
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Fri Jan 21, 2011 1:34 am |
The following works just fine:
Code: |
^(?:\[AFK\])?(?:\([\w ]+\))?(?:\([\w ]+\))?(?:\([\w ]+\))?(?:\s)?([\w']+)
|
Apparently CMUD prefers that the expression be in parentheses, even though technically, in regex I don't think that's required for the repeat operator.
Thankfully, I don't need to use %2, because regex also offers us the ?: operator. |
|
|
|
charneus Wizard
Joined: 19 Jun 2005 Posts: 1876 Location: California
|
Posted: Fri Jan 21, 2011 2:11 am |
If you do exactly what I showed you, you wouldn't need to use %2, and it's a bit neater.
|
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Fri Jan 21, 2011 2:18 am |
Oops, I didn't paste the right one. This is what I have right now:
Code: |
^(?:\[AFK\])?(?:\([\w ]+\)){0,3}(?:\s)?([\w']+)
|
And is what I meant to post, if you read my post you'll see I made a comment about not needing %2 because of the ?: operator. I just copied and pasted the wrong regex.
I actually came to that pattern on my own and didn't realize that you had surrounded the first portion of the pattern in parentheses as well. When I first read yours, I only noted that you had a beginning of line operator |
|
|
|
Fizgar Magician
Joined: 07 Feb 2002 Posts: 333 Location: Central Virginia
|
Posted: Fri Jan 21, 2011 3:23 am |
Giving the pattern a capturing group rather than a non capturing group was just laziness on my part but you are 100% correct so all's good, sorry about that. As for what CMUD prefers and what is required in regex, the way I was taught is that unless you put the quantifier after a group it will work on what is directly before it. In the case of your original regex that would be the literal match of ")". See the image below.
1. The first "(" was matched.
2. The word Sanctuary was a match from the list you gave
3. The first ")"was matched but the regex is only trying to match this between 0 and 3 times and not the starting "(" character and what falls in between.
4. Now since the next match the pattern will try to make, is either a white space and one or more word characters, or one or more word characters, and the next item in the string is the "(" character before Red, the engine back tracked and returned the first word character it found. "y"
That said, there is no bug in CMUD regarding the original pattern, it was matching exactly what it was supposed to. Sorry about the blurred image, I scaled it down rather than going to the trouble of changing desktop resolutions before getting the shot. |
|
_________________ Windows Vista Home Premium SP2 32-bit
AMD Athlon Dual Core 4400+ 2.31 GHz
3 GB RAM
CMUD 3.34
Last edited by Fizgar on Fri Jan 21, 2011 6:47 am; edited 1 time in total |
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Fri Jan 21, 2011 6:25 am |
Hmm, you're right, though we're already past that point . I'm unsure why I expected the repetition operator to know it was trying to match a specific group of items unless I properly grouped them (as I did in the second example, which Charneus and I both came to as our conclusion) . You give a good explanation of it though, to anyone who might still be confused.
|
|
|
|
|
|