Register to post in forums, or Log in to your existing account
 

Play RetroMUD
Post new topic  Reply to topic     Home » Forums » CMUD Beta Forum
Dharkael
Enchanter


Joined: 05 Mar 2003
Posts: 593
Location: Canada

PostPosted: Wed May 07, 2008 4:44 am   

IDEA:Optimized Stringlists and Database Variables in Triggers
 
Inspired by this thread
I was thinking you could make an option to optimize stringlist and database in triggers by to reduce the number of backtrackings necessary to match and more importantly the number of backtracking necessary to fail.

Using a Patricia trie like the one shown here

Using that example you could take a string list containing :
romane
romanus
romulus
rubens
ruber
rubicon
rubicundus


Which normally would would make a regex like
Code:
(?:romane|romanus|romulus|rubens|ruber|rubicon|rubicundus)

you could transform it into something like
Code:
r(?>om(?>ulus|an(?>e|us))|ub(?>e(?>r|ns)|ic(?>on|undus)))

This transformation would lower the average time to match and even more so the time to fail.
On larger stringlist and DB variables the speed gains would, I think, be tremendous.
_________________
-Dharkael-
"No matter how subtle the wizard, a knife between the shoulder blades will seriously cramp his style."
Reply with quote
shalimar
GURU


Joined: 04 Aug 2002
Posts: 4691
Location: Pensacola, FL, USA

PostPosted: Wed May 07, 2008 5:19 am   
 
I can only imagine how ugly it would be for my thousand+ strong list..

Would it not be timely to form the regex you speak of from the list?
Especially when the list continues to expand?
_________________
Discord: Shalimarwildcat
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Wed May 07, 2008 6:38 am   
 
Well, I wouldn't be surprised if the PCRE library is already doing that kind of optimization itself, so I don't think CMUD needs to do it too. But Vijilante might have more insight into this. Remember that PCRE has a "compile" step that converts the initial regular expression into a compiled pattern. I think this is the point where PCRE is building this kind of search tree internally. CMUD keeps the trigger compiled until you change it, so this is basically already being done.
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Wed May 07, 2008 10:53 am   
 
The algorithm used for regex matching actually specifies a list syntax as an if..else chain. It is sometimes important to use the order of the items of the list to control matching, and the PCRE respects that.

With a short set of words like you have shown the gains would be small, but they would be there. With longer phrases the gains would be more noticeable, but with longer phrases you can often find common portions to group together and making some portions optional achieves nearly the same effect. Those small gains from controlling backtracing and the match path would probably be matched by a loss of speed from the extra layers of nesting. Making the net gain/loss near 0.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
Dharkael
Enchanter


Joined: 05 Mar 2003
Posts: 593
Location: Canada

PostPosted: Fri May 09, 2008 3:45 am   
 
shalimar wrote:
I can only imagine how ugly it would be for my thousand+ strong list..

It's not very pretty, you wouldn't want to write it by hand.
shalimar wrote:

Would it not be timely to form the regex you speak of from the list?
Especially when the list continues to expand?

Yeah it would definitely take more time to reformulate the regex, but with the right data structure, ie a Patricia Trie it doesn't take that long.
And unless your list is being changed every few lines or so you're still definitely gonna reap gains from it.

Vijilante wrote:
Those small gains from controlling backtracing and the match path would probably be matched by a loss of speed from the extra layers of nesting. Making the net gain/loss near 0.

That's definitely not the case, especially for larger lists.
Using Shalimar's data I filled 2 variables with both a standard alternation regex like the one CMUD currently uses on lists and one using the technique i described above.
Using the %regex function to test it out, The alternation regex's elapsed time was always a multiple of the Trie regex and the longer the string it was tested on the greater the magnitude
I wasn't really sure how to test the triggers so I made REGEX triggers with the same two variables and used CTRL-Q to test them while toggling between the enabled status of the triggers.
and using #CLR to reset each time. On my system the Trie regex returned approx 7.2 and Alternation regex returned 10.2
No doubt you guys have much better ways to test it.
But definitely it doesn't cancel out.
There is a difference in size as well.
The Alternation regex is 19289 characters
The Trie Regex is 16268
I have the code and can post it if you wan't but its pretty large because of the regex variables.

EDIT:I threw both patterns into RegexBuddy and to match large McSweeney bodyguard which i guess alphabetically is about halfway.
It took 671 steps to match the alternation pattern and only 55 to match the Trie pattern.
I really can't see how this could be a bad thing.
And in the case where there are no common prefixes whatsoever (the longer the list the more unlikely that is) the pattern it makes is identical to the naked alternation pattern.
Also I don't think the trie pattern needs to be sorted so that should save some time(although I think sorting might impove its speed, I'll have to consider and do some testing)
_________________
-Dharkael-
"No matter how subtle the wizard, a knife between the shoulder blades will seriously cramp his style."
Reply with quote
Seb
Wizard


Joined: 14 Aug 2004
Posts: 1269

PostPosted: Wed Jul 23, 2008 11:35 pm   
 
So, the PCRE hasn't compiled an optimal version of
Code:
(?:romane|romanus|romulus|rubens|ruber|rubicon|rubicundus)
...
Reply with quote
Display posts from previous:   
Post new topic   Reply to topic     Home » Forums » CMUD Beta Forum All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

© 2009 Zugg Software. Hosted by Wolfpaw.net