Register to post in forums, or Log in to your existing account
 

Play RetroMUD
Post new topic  Reply to topic     Home » Forums » CMUD General Discussion
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Tue Jun 10, 2008 11:23 pm   

So I'm looking for a regex
 
that'll capture a variable number of words. There're a couple of options for this, the most obvious being ([\w ]+) but if the word is followed by a space (or more than one space), they'll all be captured as well. Something like ((?:\w+ )+) seems like it'll work better, but it'll still capture one too many spaces at the end. So, what regex will capture a string that both begins and ends with a word character? Here's some sample text for you to experiment with:

Code:
  [1533] bayberry bark      [2000] bellwort flower    [1244] black cohosh
  [1998] bloodroot leaf     [   8] blue ink           [   9] cloth
  [   1] crystal pentagon   [1999] echinacea          [2000] ginger root
  [1490] ginseng root       [1998] goldenseal root    [   1] green ink
  [1995] hawthorn berry     [ 505] irid moss          [ 407] kuzu root
  [2000] lady's slipper     [ 267] lobelia seed       [1994] myrrh gum
  [1909] prickly ash bark   [   7] red ink            [   1] rope
  [  64] skullcap           [ 552] valerian           [  19] venom sac
  [   3] yellow ink


My current train of thought involves \b, but I haven't actually tried anything with it yet. Answers on a postcard, please.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
Brenex
Beginner


Joined: 13 May 2008
Posts: 25

PostPosted: Wed Jun 11, 2008 12:05 am   
 
My Lusternia one:

[\s*(\d+)]\s(\w+(?:\s+\w+)*)\s*

With Trigger Multiple Times in line selected it works perfectly fine in RegexBuddy without capturing the spaces.
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Wed Jun 11, 2008 4:27 am   
 
First, dont bother trying match it. The olny things that are truly definite is that there will be at least one item per line, that the opening bracket will occur 2 spaces after the start of line and that the closing bracket will be 4 characters later. Use encapsulating triggers, then parse the recorded lines with %subregex.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
chamenas
Wizard


Joined: 26 Mar 2008
Posts: 1547

PostPosted: Wed Jun 11, 2008 12:58 pm   
 
Vijilante wrote:
First, dont bother trying match it. The olny things that are truly definite is that there will be at least one item per line, that the opening bracket will occur 2 spaces after the start of line and that the closing bracket will be 4 characters later. Use encapsulating triggers, then parse the recorded lines with %subregex.


Could you explain why? Good learning experience!
_________________
Listen to my Guitar - If you like it, listen to more
Reply with quote
Caled
Sorcerer


Joined: 21 Oct 2000
Posts: 821
Location: Australia

PostPosted: Wed Jun 11, 2008 2:07 pm   
 
chamenas wrote:
Vijilante wrote:
First, dont bother trying match it. The olny things that are truly definite is that there will be at least one item per line, that the opening bracket will occur 2 spaces after the start of line and that the closing bracket will be 4 characters later. Use encapsulating triggers, then parse the recorded lines with %subregex.


Could you explain why? Good learning experience!

"The simplest answer is often the best."

Mind you, I do it the way Bremex suggested, but basically, there are certain problems with capturing rift data, and the way around it is to simply ignore the problem (multiple lines). You either do this by firing the trig multiple times per line, or you treat the entire block as a single line. Either or.
_________________
Athlon 64 3200+
Win XP Pro x64
Reply with quote
Dharkael
Enchanter


Joined: 05 Mar 2003
Posts: 593
Location: Canada

PostPosted: Wed Jun 11, 2008 2:13 pm   
 
I would use something like this.

Code:
^\s\s(?:\[((?>\s|\d(?!\s)){3,3}\d)\]\s\b([\w\s'\-]+)\b\s*)(?:\[((?>\s|\d(?!\s)){3,3}\d)\]\s\b([\w\s'\-]+)\b\s*)?(?:\[((?>\s|\d(?!\s)){3,3}\d)\]\s\b([\w\s'\-]+)\b\s*)?$

The heart of which is simply
Code:
(?:\[((?>\s|\d(?!\s)){3,3}\d)\]\s\b([\w\s'\-]+)\b\s*)

Between 1 and 3 inclusive groups the first must be at the start of the string preceeded by 2 spaces.

Fails pretty quick and removes the need for reparsing.
_________________
-Dharkael-
"No matter how subtle the wizard, a knife between the shoulder blades will seriously cramp his style."
Reply with quote
Brenex
Beginner


Joined: 13 May 2008
Posts: 25

PostPosted: Wed Jun 11, 2008 3:39 pm   
 
Dharkael, I'm using your trigger now, but it captures white spaces in the numbers as a heads up. Thanks for making me go look up Atomic Grouping and Look(?:aheads|behinds). I think I might read over regex again to learn some more new tricks. Very Happy
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Wed Jun 11, 2008 3:58 pm   
 
I really didn't want to use a function call for this if I didn't need to (yes, it's much simpler, but it's also probably much slower). However, Dharkael's trigger is an order of magnitude more complex than I asked for.

The main problem I originally wanted solving was avoiding capturing spaces in the words, and Dharkael's trigger should do that. My understanding is that his [\w\s'\-]+ will capture all those spaces, sure, until it gets to the opening [ of the next item (or the end of the line), and then realise that the gap between its final character (a space) and the next character ([ or $) isn't valid for \b, since both characters are non-word characters. So it backtracks to the last word-boundary, which was between the end of the final word and the beginning of the spaces, where \b matches successfully and \s* captures the rest of the spaces.

This does exactly what I wanted it to, and hoorays to it for that - but I wonder if there's a way to do it while avoiding the backtracking?
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
oldguy2
Wizard


Joined: 17 Jun 2006
Posts: 1201

PostPosted: Wed Jun 11, 2008 9:05 pm   
 
The one Brenex had wasn't bad but it won't work on things like "lady's slipper".

This one works fine and doesn't capture trailing spaces.

Code:
<trigger priority="10" repeat="true" regex="true" id="1">
  <pattern>\[\s*(\d+)\]\s(\w+'?(?>\s?\w+?)+)</pattern>
  <value>#addkey Rift %2 %1</value>
</trigger>
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Wed Jun 11, 2008 9:20 pm   
 
Off the top of my head.
Code:
#CLASS RiftCapture
#VAR Rift {} {}
#TRIG RiftCap {^Glancing into the rift you see:} {Rift=""}
#COND {} {#IF (%match(%line,"%dh")) {
 Rift=%subregex(@Rift,"\s*\[\s*(\d+)\] ([\w']+ ??)+\s*(?=\[)","\'2'=\'1'|")
 #CALL %vartype(Rift,5) //Not sure about the number for record var
 #STATE RiftCap 0
} {
 Rift=%concat(@Rift," ",%line)
} {prompt|looplines|param=30|stop}
#CLASS 0
I am pretty sure you will find it is faster to do it this way. Adjusting the priorities and using an #ONINPUT for the state 0 trigger can make it even faster.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Wed Jun 11, 2008 11:45 pm   
 
I didn't really want to argue the relative merits of either method, since both are based on the same idea (a regex that doesn't capture the extra spaces) and beyond that, finding out which I prefer is a simple case of doing them both and seeing. But thanks for your regex suggestion, anyway. The ([\w']+ ??) is nice to look at.

And thanks for yours, Oldguy. It's pretty trivial to change Brenex' \w+ to [\w']+ and if you wanted to use this principle for other strings, not just the rift (the rift only came to mind because there was another thread about it) you'd need do that anyway. Even if I don't end up using it exactly as it is there (not sure I prefer using a single optional apostrophe and optional spaces/word characters) it's definitely given me something to think about, and thanks for that.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
alluran
Adept


Joined: 14 Sep 2005
Posts: 223
Location: Sydney, Australia

PostPosted: Thu Jun 12, 2008 3:07 pm   
 
I'd try:

Code:

^(?:\[\s{0,3}(\d+)\] (.*?)\s+)(?:\[\s{0,3}(\d+)\] (.*?)\s+)?(?:\[\s{0,3}(\d+)\] (.*?)\s+)?$


I dun see why people were trying all these super complex patterns...
May need a \s+ after the leading ^, not sure if the whitespace at start was in the output or not
_________________
The Drake Forestseer
Reply with quote
Dharkael
Enchanter


Joined: 05 Mar 2003
Posts: 593
Location: Canada

PostPosted: Thu Jun 12, 2008 3:25 pm   
 
Very simple... but it doesn't work.
_________________
-Dharkael-
"No matter how subtle the wizard, a knife between the shoulder blades will seriously cramp his style."
Reply with quote
alluran
Adept


Joined: 14 Sep 2005
Posts: 223
Location: Sydney, Australia

PostPosted: Thu Jun 12, 2008 3:47 pm   
 
fires for me in 2.26
_________________
The Drake Forestseer
Reply with quote
Brenex
Beginner


Joined: 13 May 2008
Posts: 25

PostPosted: Thu Jun 12, 2008 4:06 pm   
 
It doesn't work because it doesn't take into account the two spaces the lines begin with. You have ^[ and it should be ^\s\s[
Reply with quote
Fang Xianfu
GURU


Joined: 26 Jan 2004
Posts: 5155
Location: United Kingdom

PostPosted: Thu Jun 12, 2008 4:57 pm   
 
No, it really doesn't work, even with that fix. That was one of the very first things I looked at, and it fires fine but captures wrong. If you give it the first line of example text I have there, you get %1=1533 and %2="bayberry" and that's it. I don't know why it's not backtracking to expand the non-greedy .*? match, but it's not.
_________________
Rorso's syntax colouriser.

- Happy bunny is happy! (1/25)
Reply with quote
Brenex
Beginner


Joined: 13 May 2008
Posts: 25

PostPosted: Thu Jun 12, 2008 6:38 pm   
 
Weird, works fine in RegexBuddy (bug?) Didn't test it in cmud since what I had changed it to worked fine anyways. Now that I tried it in CMUD I see what you mean. This works for the first two:

Code:
^\s\s(?:\[\s{0,3}(\d+)\] (.*?)\s+)(?:\[\s{0,3}(\d+)\] (.*?)\s+)?


but adding:
Code:
(?:\[\s{0,3}(\d+)\] (.*?)\s+)?$


to make:
Code:
^\s\s(?:\[\s{0,3}(\d+)\] (.*?)\s+)(?:\[\s{0,3}(\d+)\] (.*?)\s+)?(?:\[\s{0,3}(\d+)\] (.*?)\s+)?$
doesn't work. Go figure, but then again I only picked up regex a few days ago anywho. Perhaps it doesn't work that way.
Reply with quote
Display posts from previous:   
Post new topic   Reply to topic     Home » Forums » CMUD General Discussion All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

© 2009 Zugg Software. Hosted by Wolfpaw.net