 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Jan 26, 2007 5:50 pm
Regular expression algorithm research |
Argh- I hit edit instead of quote- sick as a dog and trying to post doesn't work- Zugg, Vij, is there a rollback? left what I could of the original... (Tarn)
Given string A and string B, create a regular expression that will match both strings. It should create the most specific regex possible (in other words, not just something simple like .* which would match anything).
Seems like with all of the world using regular expression for several decades that such an algorithm should already exist. But I haven't had much luck searching for it.
....
For example, here is how I wanted it to work in the Trigger wizard:
1) Select New Trigger pattern.
2) From the pull-down list of past MUD lines, select a line like this:
Zugg tells you "Hi"
and click the Create Pattern button.
3) Now, select another line from the pull-down list. This time, select a line like this:
bortaS tells you "Hi there"
and click the Merge Pattern button.
Now I want the wizard to be smart enough to create a trigger like this:
%w tells you "*" |
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5187
|
Posted: Fri Jan 26, 2007 7:39 pm |
Given some time I could come up with something. I sort of did a portion of that way back in that trigger wizard I posted in the zMud beta forum. The one that nobody could understand. However, that had the user picking out the portions that should be wildcarded.
Just the quick thoughts off the top of my head are 2 sets of circumstances:
1. The single spaces between words that make up normal language.
Code: |
2. The column formatted
situaton, as seen
in this case. |
Obviously I exageratted it quite a bit, but looking for multiple spaces could give a good indication to use a fixed width pattern. Otherwise it is a matter of using a word by word comparison. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Jan 26, 2007 8:41 pm |
I actually think it needs to be done on a character by character basis (for example, the trailing "s" in String A above). My guess is that I'll have to build some sort of tree structure, similar to what the regex code builds.
I am also going to search on "string diff" algorithms and see what I can find. But I think my dreams about what I wanted the trigger wizard to be might be a bit hard to achieve.
My idea was to allow the user to select multiple lines from the MUD and CMUD would automatically figure out the appropriate wildcards. But after looking at some regex tools that are available, I think I might be forced to make the user highlight the parts that are different.
But I haven't given up yet! |
|
|
 |
bortaS Magician

Joined: 10 Oct 2000 Posts: 320 Location: Springville, UT
|
Posted: Sat Jan 27, 2007 12:24 am |
Zugg,
There is a tool that I use called RegexBuddy. This could give you pretty good idea on how to do the trigger wizard. My favorite feature in this program is the Create tab. It gives you a tree with the current tags. I think something like this would be quite useful to the trigger wizard.
http://regexbuddy.com/
Btw, this is coded in Delphi. |
|
_________________ bortaS
~~ Crusty Klingon Programmer ~~ |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Sat Jan 27, 2007 12:47 am |
Yep, that was one of the links that I saw. It's very close to what I'm doing for part of the trigger wizard. But it doesn't have the features that I want for selecting lines from the MUD. It's a good start, but I want more
Of course, CMUD won't have a lot of the stuff that RegexBuddy has. I won't be doing any of the "Use", "Library", "Grep", "Replace", etc stuff. It will be more geared towards trigger patterns. And it will have more features for dealing with the () saved subpatterns, like assigning a named pattern.
For example, here is how I wanted it to work in the Trigger wizard:
1) Select New Trigger pattern.
2) From the pull-down list of past MUD lines, select a line like this:
Zugg tells you "Hi"
and click the Create Pattern button.
3) Now, select another line from the pull-down list. This time, select a line like this:
bortaS tells you "Hi there"
and click the Merge Pattern button.
Now I want the wizard to be smart enough to create a trigger like this:
%w tells you "*"
Now, you can highlight the %w part of the trigger and select Save Pattern and it will put the () around it and mark this as subpattern %1. If you fill in the Name field for this subpattern (enter the name: Target) then the wizard will show this pattern:
($Target:%w) tells you "*"
You can also manually highlight text and convert it to wildcards, but I wanted this kind of automatted approach where the user just selects a bunch of lines from the MUD that they want to trigger on.
I'm not sure if I will be showing the full "tree view" of the trigger pattern hierarchy. Also, I have to decide if I'm just going to support the normal CMUD trigger syntax, or if I will also support RegEx, or both. For beginning CMUD users I want the trigger wizard to use the simpler CMUD trigger pattern syntax and not use RegExs. So a RegEx wizard might come later.
Btw, I find it highly amusing that they are selling RegexBuddy for the same price as CMUD, and yet CMUD will have something similar to RegexBuddy built in as a pretty minor feature compared to the rest of the program's features. However, I also find it highly annoying that they don't have a trial version. Unfortunately, they lost a potential customer (me) because of that. I don't buy *any* software I can't try first, no matter what kind of return policy it has.
Also, honestly, I'm not sure someone new to Regular Expressions would actually find RegexBuddy that much easier. The author still uses pretty "geeky" terminology for the descriptions, and the tree view can be a bit hard to follow and understand. Fortunately, since I'm not trying to build a completely general purpose "do it all" tool, I just need to focus on the trigger pattern and MUD aspects of this and try to make it easier for MUD players. |
|
Last edited by Zugg on Sat Jan 27, 2007 12:53 am; edited 1 time in total |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Sat Jan 27, 2007 12:50 am |
Oh, and btw, it turns out that the "Merge Pattern" stuff is really similar to writing a "diff" program. I've got a couple of links to diff algorithms that I'm reading to try and understand it all. But it you think of a single character in a trigger pattern as a "line in a file", doing a "diff" between two trigger patterns is much like doing a diff between two different files. Then, instead of creating a list of inserted and deleted lines, I just use that information to create the pattern wildcards. The lines that are the same between the two become the literal characters in the pattern.
|
|
|
 |
bortaS Magician

Joined: 10 Oct 2000 Posts: 320 Location: Springville, UT
|
Posted: Sat Jan 27, 2007 1:29 am |
"Also, honestly, I'm not sure someone new to Regular Expressions would actually find RegexBuddy that much easier. The author still uses pretty "geeky" terminology for the descriptions, and the tree view can be a bit hard to follow and understand. Fortunately, since I'm not trying to build a completely general purpose "do it all" tool, I just need to focus on the trigger pattern and MUD aspects of this and try to make it easier for MUD players."
I figured as much, as your target audience is quite different from RegexBuddy. I just like the environment to test different things. I hear ya on the automated stuff for new users. That would make the trigger experience more pleasant for newbies. |
|
_________________ bortaS
~~ Crusty Klingon Programmer ~~ |
|
|
 |
Fang Xianfu GURU

Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Sat Jan 27, 2007 6:39 am |
Zugg wrote: |
Btw, I find it highly amusing that they are selling RegexBuddy for the same price as CMUD, and yet CMUD will have something similar to RegexBuddy built in as a pretty minor feature compared to the rest of the program's features. |
This is why we love Zuggsoft, I guess. |
|
|
 |
Rorso Wizard
Joined: 14 Oct 2000 Posts: 1368
|
Posted: Sat Jan 27, 2007 11:08 am |
The easiest way to understand the principle of diffing in my opinion is to think about it like this:
We have two sequences(lists) of data, A and B.
A = <2,3,4>
B = <2,4,5>
What a diff tool does is to find "the longest common subsequence". How does this work? Say we now have two sets, C and D. C contains all possible subsequences of A, and D contains all possible subsequences of B.
Then we have:
C = { <2>, <3>, <4>,
<2, 3>,
<2, 4>,
<3, 4>,
<2, 3, 4>
}
and
D = { <2>, <4>, <5>,
<2, 4>,
<2, 4, 5>,
<2, 5>,
<4, 5>
}
Now what needs to be done is find the longest common subsequence. As the subsequences must be common anyway
the first thing to easen this is to remove everything that is not common:
C = { <2>, <4>,
<2, 4>,
}
and
D = { <2>, <4>,
<2, 4>
}
Next you identify the longest common subsequence:
Answer = <2, 4>
So the sequence <2, 4> both have in common. This must then mean that line 3 was removed when making version B from A, and line 5 was inserted into B. |
|
|
 |
Seb Wizard
Joined: 14 Aug 2004 Posts: 1269
|
Posted: Sat Jan 27, 2007 2:08 pm |
Zugg wrote: |
For example, here is how I wanted it to work in the Trigger wizard:
1) Select New Trigger pattern.
2) From the pull-down list of past MUD lines, select a line like this:
Zugg tells you "Hi"
and click the Create Pattern button.
3) Now, select another line from the pull-down list. This time, select a line like this:
bortaS tells you "Hi there"
and click the Merge Pattern button.
Now I want the wizard to be smart enough to create a trigger like this:
%w tells you "*"
|
How would CMUD know you wanted anything between the double quotes? If you did a character by character diff, you would see the differences are in the first word and in the ' there'. So if CMUD is trying to create the most specific trigger that matches both lines (which is what you said you wanted), it should give the result:
%w tells you "Hi{ |there}"
Also, a couple of minor ideas you probably already considered:
It should be possible to type into both lines - both altering lines from the MUD, and just typing a line from scrath.
It should be possible to select a line or part of a line in the MUD output window and the Editor window, and to right-click on it and select Send to Trigger Wizard... Then it would ask you which line in the trigger wizard to replace.
It might be nice to allow more than just two lines, although it might make the diff a bit harder. |
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5187
|
Posted: Sat Jan 27, 2007 4:10 pm |
I think before you do too much programming for this you should research lexical analyzers. Here is my thnking about a few made up common lines:
Your stab scratches the pegasus.
Your slash mangles an ugly troll.
They are similar enough in wording that a trigger might be built by a difference assessment if it knows a little bit about English. The correct trigger would be:
^Your (?:stab|slash) (?:scratches|mangles) (?:a|an|the) .*\p$
Of course the basic difference perspective would result in: ^Your .*$ |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
Fang Xianfu GURU

Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Sat Jan 27, 2007 4:23 pm |
Viji's hit the problem. Sometimes you'd want a trigger that does
(Your stab scratches the pegasus.|Your slash mangles an ugly troll.) and sometimes you'll want ^Your (?:stab|slash) (?:scratches|mangles) (?:a|an|the) .*\p$.
The most specific regex you can come up with using two lines will use (blah|blah) rather than (\d+) (and the corresponding CMUD syntax), and the system needs to know which to use when. Perhaps, using the example above, you want the trigger to fire ONLY when it's a pegasus or a troll you're fighting, rather than anything. |
|
|
 |
Tarn GURU
Joined: 10 Oct 2000 Posts: 873 Location: USA
|
Posted: Sat Jan 27, 2007 7:05 pm Re: Regular expression algorithm research |
Zugg wrote: |
Given string A and string B, create a regular expression that will match both strings. It should create the most specific regex possible (in other words, not just something simple like .* which would match anything).
Seems like with all of the world using regular expression for several decades that such an algorithm should already exist. But I haven't had much luck searching for it.
|
I wrote one of these a few years ago. What I did was to build up an edit script to transform one string into the other (op as each character is read are Equal/Accept, Substitute, Insert, Delete). Then each stretch of ops that weren't "Accept" were replaced with wildcards. Once you know where the wildcards are, you can see if all of the examples the user provided fit a narrower pattern (such as always being a positive integer).
There are lots of resources on the net and in print on construction of edit scripts (easiest way to visualize is as a 2D matrix with one string forming the basis of each axis, with the numbers in the matrix being either ops as you move through the string or net cumulative cost).
The problem gets harder when you have more than two strings, of course (I thought about some extensions but never got around to coding them). Even automating the simple cases would be nice, though.
I can dig out some links if you have trouble finding resources. Many common books like CRC's "Algorithms and Theory of Computation Handbook" also have summaries.
(crawling back into bed, because posting when feverish clearly isn't working.)
-Tarn |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Jan 29, 2007 8:14 pm |
Tarn: I'd be interested in any links that you have (but stay in bed and get better first!)
Rorso: I think you might have been reading the same web site as me. I found this great site that talks about How Diff Works that talks about how to computer the longest common sequence. Unfortunately, the author never finished the series of articles.
Seb: Yes, you will definitely be able to manually enter strings into both fields. And yes, there will be a right-click option for selecting text from the MUD window and making a trigger out of it. And yes, it should let you select as many lines as you want...refining the trigger pattern each time.
Vijilante: You are correct that the "art" of this is all in determining the best wildcard. The Diff algorithm will convert the strings into the "Edit Script" that Tarn mentioned. Once it has the list of Insert/Delete/Keep statements, then it will know which text is common between the strings, and which text isn't. So then it will need to apply some rules on the different text to create the best wildcards. My guess is that creating the wildcards will be less "algorithmic" and more subjective. I'll have to come up with some rules, like splitting text into words, and deciding when to use a wildcard like \w and when to use a string list.
Obviously this kind of fancy wizard is going to be complicated and will take a while to get working. I wonder what kind of priority this kind of feature should have compared to all of the other things I need to work on. Maybe I need to start with something simpler and then add the fancier "automatically create pattern based upon selected MUD lines" features later? |
|
|
 |
Seb Wizard
Joined: 14 Aug 2004 Posts: 1269
|
Posted: Tue Jan 30, 2007 12:37 am |
Yes, I think starting with something simpler might be a good idea. The idea of doing a diff between lines of text is a good one, but not really necessary - more, the icing on the cake.
How about a wizard that asks you for just one line of text, then asks you to select the parts of that line that can change. Is it possible to override the Ctrl-Left-Click behaviour, such that it is possible to select multiple pieces of text in the line at once? Otherwise, the wizard could ask you to select the first piece and click Next (to select the next one) or Done when you are finished selecting bits. If you click Next it asks you for the next one. Each one you select is coloured. When you click done it takes you through each one and asks you how this bit of information can change. Can this bit of text take just a few values, or can it be any word, any number, etc. It also asks you if you want to save the information to a parameter. If you say yes, it gives you the opportunity to name the parameter. Finally, it gives you the opportunity to put in some test strings and check they match, much like the zMUD Trigger Test tab. If they don't (or even if they do) you can go Back in the wizard and change stuff.
As usual with wizards though, it is nice to be able to see some relevant stuff from the previous screen (and something often not done by the designers). And going back 4 or 5 steps is rather annoying, so a way to just change what you want from the final screen is nice. |
|
|
 |
Seb Wizard
Joined: 14 Aug 2004 Posts: 1269
|
Posted: Tue Jan 30, 2007 1:11 am |
Hmm, actually what would make my suggestion in my last post more like what Zugg was thinking of before is being able to test multiple strings on the same page. And being able to tweak the pattern and seeing when any of the strings cease to match the pattern (immediately). I have a fair idea how I would lay out the GUI if I was doing it but I'm not sure if I can explain it (or even how it would really look once I finished). Still I'll have a go.
1. User clicks trigger wizard from a menu.
2. If some text is already selected in the MUD window that goes into a one line text box in a small window near the top of the screen.
3. The user is given the opportunity to either (a) select text from the MUD window by clicking a button and the wizard hides behind the MUD window (and when the user lets go of the mouse, focus returns to the trigger wizard and it comes to the front) or (b) type into the text box.
4. I sort of described this part (the selecting part) earlier, but I think it would be nice if when you press Next to get here, the window height is increased with the new part / panel. If it looks like there isn't going to be enough space for everything below, use a FoldPanelBar.
5. Add the trigger test folding panel bar to the bottom of the window with space for 3 strings or so.
Hmm, that maybe sounds a bit confused. Hopefully you get the general idea. |
|
|
 |
Rorso Wizard
Joined: 14 Oct 2000 Posts: 1368
|
Posted: Tue Jan 30, 2007 1:29 am |
Zugg wrote: |
Tarn: I'd be interested in any links that you have (but stay in bed and get better first!)
Rorso: I think you might have been reading the same web site as me. I found this great site that talks about How Diff Works that talks about how to computer the longest common sequence. Unfortunately, the author never finished the series of articles.
|
I haven't read that website but it looks very nice. I have read some websites about the general lcs algorithm.
If you have the book "Data Structures & Their Algorithms" by Harry R. Lewis and Larry Denenberg there is an exercise on page 478 that gives some insight(but not too much) about diff. More interesting is perhaps the reference in the reference list:
J. W Hunt and T. G. Szymanski "A fast Algorithm for Computing Longest Common Subsequences," Communications of the ACM 20 (1977), pp. 350-353( this might be the article: http://portal.acm.org/citation.cfm?id=359581.359603 )
In the red Dragon Book on page 155-156 there's an exercise about lcs. Nothing really special. There's a reference to the above article though. Also there is an algorithm showed to count the "edit distance" between two strings. Which is defined as "minimum number of character insertions, deletions, and replacements required to transform x into y". |
|
|
 |
mr_kent Enchanter
Joined: 10 Oct 2000 Posts: 698
|
Posted: Tue Jan 30, 2007 8:52 am |
Fang Xianfu wrote: |
Viji's hit the problem. Sometimes you'd want a trigger that does
(Your stab scratches the pegasus.|Your slash mangles an ugly troll.) and sometimes you'll want ^Your (?:stab|slash) (?:scratches|mangles) (?:a|an|the) .*\p$. |
Furthermore,
^Your * (?:scratches|mangles) (*).
Given even these very short example lines, there are just too many variants. Trying to automatically create the correct trigger that the user wants given several lines just seems to me a futile exercise. To account for every possible configuration, you'd need a drop down list to allow the user to choose the correct pattern and that goes way beyond a simple tool. In order to choose the correct one, the user would need to be able recognize the correct syntax anyway and if he/she knows that, it would be quicker to just select the text to be replaced by wildcards. This is a neat idea in my opinion, but not something to spend time on right now.
Just my thoughts on this. |
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5187
|
Posted: Wed Jan 31, 2007 4:58 am |
I think this should be a very low priority. There are just too many possible combinations to what someone wants and needs to do for it to be boiled down simply. I thought I came reasonably close nearly 2 years ago, but no one really seemed interested in playing around with it and giving me useful suggestion on how to make it better, so I dropped it. I even deleted my original files, and find that the pasted portion on the forum has some small copy/paste errors.
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
Tarn GURU
Joined: 10 Oct 2000 Posts: 873 Location: USA
|
Posted: Mon Feb 05, 2007 8:00 pm |
Zugg wrote: |
Tarn: I'd be interested in any links that you have (but stay in bed and get better first!)
|
I live!
A web link that includes (JavaScript) source code if you View Source on the page:
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/
You can work examples and it will show deletions, insertions, substitutions, and accepts (the first two being dashes in either the first or second string as appropriate).
In print, Dan Gusfield, "Algorithms on Strings, Trees, and Sequences" is a five star book at Amazon (has one chapter on creating edit scripts, with good diagrams of the matrices during computation).
I don't know too much about the innards of diff tools, but believe that at least some of them use less intensive analysis of the strings they're comparing (because the most straightforward, powerful methods are of complexity len(str1) * len(str2) and that just isn't feasible for long source code files). This makes sense for diff because it's designed to look at files that are mostly identical, with small differences here and there. What we're discussing is more about discovering the nature of differences between significantly different (but short) bits of text.
-Tarn |
|
|
 |
Tarn GURU
Joined: 10 Oct 2000 Posts: 873 Location: USA
|
Posted: Mon Feb 05, 2007 8:08 pm |
Vijilante wrote: |
I think this should be a very low priority. There are just too many possible combinations to what someone wants and needs to do for it to be boiled down simply.
|
The fully general case is hard, but I think a few basic tools (say, automatic communication channel configuration) would go a long way for usability. The backend code could also help make other features such as mapper auto-config more powerful and easy to extend.
Quote: |
I thought I came reasonably close nearly 2 years ago, but no one really seemed interested in playing around with it and giving me useful suggestion on how to make it better, so I dropped it. I even deleted my original files, and find that the pasted portion on the forum has some small copy/paste errors. |
I can't find that post any more (but noticed it at the time). Got a link?
-Tarn |
|
|
 |
Rainchild Wizard

Joined: 10 Oct 2000 Posts: 1551 Location: Australia
|
Posted: Mon Feb 05, 2007 10:53 pm |
Why don't you throw this one to the community for now?
When zApp forms are implemented we'll be able to write our own packages that could make these kinds of wizards, provided we have necessary tools like 'get selected text from scrollback / clipboard' etc.
With a combination of zscript and vba we should be able to come up with something, which will do two things:
1) take some programming stress off you to concentrate on other important things which generate revenue directly
2) solidify the package library as the way of the future, giving them further reason to upgrade
And in 6 months if we haven't got something up and running and you've run out of things to do, you can come back to this and implement it then. |
|
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|