Zugg Software :: View topic

Posted: Tue Jun 10, 2008 11:23 pm

that'll capture a variable number of words. There're a couple of options for this, the most obvious being ([\w ]+) but if the word is followed by a space (or more than one space), they'll all be captured as well. Something like ((?:\w+ )+) seems like it'll work better, but it'll still capture one too many spaces at the end. So, what regex will capture a string that both begins and ends with a word character? Here's some sample text for you to experiment with:

Beginner Joined: 13 May 2008 Posts: 25

My Lusternia one:

[\s*(\d+)]\s(\w+(?:\s+\w+)*)\s*

With Trigger Multiple Times in line selected it works perfectly fine in RegexBuddy without capturing the spaces.

SubAdmin Joined: 18 Nov 2001 Posts: 5182

First, dont bother trying match it. The olny things that are truly definite is that there will be at least one item per line, that the opening bracket will occur 2 spaces after the start of line and that the closing bracket will be 4 characters later. Use encapsulating triggers, then parse the recorded lines with %subregex.

Wizard Joined: 26 Mar 2008 Posts: 1547

Posted: Wed Jun 11, 2008 2:07 pm

Enchanter Joined: 05 Mar 2003 Posts: 593 Location: Canada

I would use something like this.

Beginner Joined: 13 May 2008 Posts: 25

Dharkael, I'm using your trigger now, but it captures white spaces in the numbers as a heads up. Thanks for making me go look up Atomic Grouping and Look(?:aheads|behinds). I think I might read over regex again to learn some more new tricks. Very Happy

Posted: Wed Jun 11, 2008 3:58 pm

I really didn't want to use a function call for this if I didn't need to (yes, it's much simpler, but it's also probably much slower). However, Dharkael's trigger is an order of magnitude more complex than I asked for.

The main problem I originally wanted solving was avoiding capturing spaces in the words, and Dharkael's trigger should do that. My understanding is that his [\w\s'\-]+ will capture all those spaces, sure, until it gets to the opening [ of the next item (or the end of the line), and then realise that the gap between its final character (a space) and the next character ([ or $) isn't valid for \b, since both characters are non-word characters. So it backtracks to the last word-boundary, which was between the end of the final word and the beginning of the spaces, where \b matches successfully and \s* captures the rest of the spaces.

This does exactly what I wanted it to, and hoorays to it for that - but I wonder if there's a way to do it while avoiding the backtracking?

Wizard Joined: 17 Jun 2006 Posts: 1201

The one Brenex had wasn't bad but it won't work on things like "lady's slipper".

This one works fine and doesn't capture trailing spaces.

SubAdmin Joined: 18 Nov 2001 Posts: 5182

Off the top of my head.

Posted: Wed Jun 11, 2008 11:45 pm

I didn't really want to argue the relative merits of either method, since both are based on the same idea (a regex that doesn't capture the extra spaces) and beyond that, finding out which I prefer is a simple case of doing them both and seeing. But thanks for your regex suggestion, anyway. The ([\w']+ ??) is nice to look at.

And thanks for yours, Oldguy. It's pretty trivial to change Brenex' \w+ to [\w']+ and if you wanted to use this principle for other strings, not just the rift (the rift only came to mind because there was another thread about it) you'd need do that anyway. Even if I don't end up using it exactly as it is there (not sure I prefer using a single optional apostrophe and optional spaces/word characters) it's definitely given me something to think about, and thanks for that.

Posted: Thu Jun 12, 2008 3:07 pm

I'd try:

Enchanter Joined: 05 Mar 2003 Posts: 593 Location: Canada

Very simple... but it doesn't work.

Posted: Thu Jun 12, 2008 3:47 pm

fires for me in 2.26

Beginner Joined: 13 May 2008 Posts: 25

It doesn't work because it doesn't take into account the two spaces the lines begin with. You have ^[ and it should be ^\s\s[

Posted: Thu Jun 12, 2008 4:57 pm

No, it really doesn't work, even with that fix. That was one of the very first things I looked at, and it fires fine but captures wrong. If you give it the first line of example text I have there, you get %1=1533 and %2="bayberry" and that's it. I don't know why it's not backtracking to expand the non-greedy .*? match, but it's not.

Beginner Joined: 13 May 2008 Posts: 25

Weird, works fine in RegexBuddy (bug?) Didn't test it in cmud since what I had changed it to worked fine anyways. Now that I tried it in CMUD I see what you mean. This works for the first two: