Register to post in forums, or Log in to your existing account
 

Play RetroMUD
Post new topic  Reply to topic     Home » Forums » CMUD Beta Forum
tyebald
Newbie


Joined: 01 Apr 2010
Posts: 7

PostPosted: Sat Apr 03, 2010 4:02 am   

[3.14a] 'Clickable URLs' trigger fix(es)
 
This is definitely an issue for the URL trigger pattern. It may also be an issue for the email pattern. The email substitution exhibits the same potential for recursive substitution, but again, I have not yet tested it.

Duplication:
If a CMud user sets up window capturing in such a way that text gets #CAPtured from the main window into an alternate window, and then captured from the alternate window into another window, when the captured text causes the URL trigger to fire, CMud will enter an infinite loop.

Explanation:
The URL pattern matcher will invoke #SUB for any matching pattern it finds. However, it will leave the matching pattern in the "visible" text after the #SUB. This is fine in a single Window, but begins to fall apart once matching text gets captured to another Window. When #CAPtured, triggers will fire again if the Window is linked to the package containing the matching trigger pattern. For those that don't know, in the case of 'Clickable URLs', the default behavior for any newly-created Windows seems to be to link it and the two other default packages to the Window (unless steps are taken to prevent this), so there's a good chance this trigger pattern will be enabled.

Now, either the trigger does fire in the second window or it doesn't. I had the script debugger running against this issue at one point, but once the program enters the infinite loop, the debugger can't be brought into the foreground and my screen isn't big enough to be able to see everything all at once. And, there are other issues I've observed with #SUB (and am working on documenting), so it may not be firing at all in the second window. But once the third window is in the equation, the application hang is guaranteed.

Solution:
By creating a substituted pattern that does not, itself, match the pattern that initially triggered the substitution, this behavior can be mitigated. There are still #SUB issues relating to matching patterns being generated by output from "child" scripts (scripts executed by other scripts), or to the triggering pattern occurring multiple times on a single line (in the case of patterns which do not require matching the entire line, of course). But as long as the substituted text does not match the initial pattern, the recursion goes away.


This is the trigger I am currently using instead:

Code:
<trigger name="ActiveURL" priority="4880" trigontrig="false" regex="true" id="488">
  <pattern>\b((?:(https?|ftp|telnet)://|www\.|ftp\.)(?:www\.|ftp\.|)([\w\d#@%;$()~_?\+\-=&amp;\.]+)(?::\d+|)(?:/[\w\d:#@%;/$()~_?\+\-=&amp;\.]+|/|))(\s|[[:punct:]]|$)</pattern>
  <value>#SUB {%concat(~, %3, ~<~/a~>, %4, " ")}</value>
</trigger>

This trigger is contained in a Module that is contained in a Package that is linked to every Window (it's my Common Package). It does not have the recursion issue, causing the application hang. The RegEx pattern can likely be optimized. I've extended it to capture URLs that end with a /, so that it doesn't "dangle" on the outside of the substitution. The #SUB command was just experimentation, just based on the observed number of "concat" executions found in the compiled code between this approach and the original. The trailing space is a kludge, to handle the situation where multiple URLs on a single line trigger multiple substitutions.


I would recommend either implementing a trigger pattern like the one above, or just disabling the triggers in the default 'Clickable URLs' package. The Help files accurately state that users should not modify the built-in packages, because any new updates can override those changes. And since the current default behavior will also link the default packages to any new Windows, until the triggers are either disabled or adjusted, this recursion issue will keep reintroducing itself.

Tyebald
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Sat Apr 03, 2010 5:37 pm   
 
Thanks for posting that. I'll take a look at your regex and will try to improve the pattern in the current package.

What does :punct: refer to in your above pattern?

Also, if you have a better pattern for the email matching, feel free to post it too. I'm certainly no regex expert.
Reply with quote
tyebald
Newbie


Joined: 01 Apr 2010
Posts: 7

PostPosted: Wed Apr 07, 2010 2:40 am   
 
No worries. I'll take a look at the email one too. I think that one will be easier, as I think it will be simpler to extract a meaningful sub-string to use. I struggled trying to pull out just the ....name escapes me.... but the "main" part of the URL, without protocols, prefixes, or trailing content.

The [:name:] patterns are short-hand for POSIX character classes. I got tired of trying to read through [.\$% ...] to see if I had a missing value, and stumbled across this feature in the pcre site help, so I figured I'd give it a whirl. The last grouping,
Code:
(\s|[[:punct:]]|$)
, is my replacement for \b because the \b word boundary pattern wouldn't include a trailing slash character as part of the URL. That's a personal preference issue, not a functional one. \b wouldn't trip up the pattern, but the final slash, if present would wind up echo'd outside the substituted text. What I included in the above trigger matches a working version that would sub both ....uh....ok, had to remove the URL samples, 'cause they got me banned last time. :)

There may still be legitimate URL patterns that should match but don't, with the above trigger pattern. To be honest, as I was working up my test jig to loop through all the URL patterns I could come up with, I stumbled across some additional issues with #SUB (at least, when invoked by a trigger) where it doesn't "see" text #ECHO'd (or #SAY'd or #SHOW'd, etc.) by a script if the script is executed via another script, rather than, say, from the command line. I'll cover that in detail in another post if that doesn't prove out to be user error. That's just my excuse for not having fully tested the trigger yet.
Reply with quote
Vijilante
SubAdmin


Joined: 18 Nov 2001
Posts: 5182

PostPosted: Wed Apr 07, 2010 6:31 am   
 
Believe it or not URL patterns are a bit of a sticky wickit. The gold standard for a regex that fully qualifies and parses all the different forms is about 5 pages long. Luckily, we don't have to parse the thing just qualify it as a URL and make sure we have all the characters for it without extra.

Protocol: http allows for additional parameters after the path.
(?:(?:(https?)|news|ftp|telnet)://)?

Optional username and password. Not really valid for http, but keeping that from matching is beyond our needs.
(?:[a-zA-Z0-9_+=.-]+(?:\:[a-zA-Z0-9_+=.-]+)?@)?

Address required portion should be possesive and greedy, but we don't want to catch a user punctation after the URL such as:
"Go visit blah.blah.blah." so we can't make it possessive.
(?:(?:\d{1,3}\.){3}\d{1,3}|[a-zA-Z](?:[a-zA-Z0-9-]*\.[a-zA-Z0-9]+)+)

Optional port number
(?:\:\d+)?

Next is path, this is all optional, we will allow a more characters then are traditionally used. Non-greedy * so a trailing punctuation can be stolen back.
(?:/[^*:?"<>/ \011\012\015]*?)*

Condition on the http for optional parameters, maybe should be more restictive on the characters. Non-greedy + this time.
(?(1)(?:\?[^ \011\012\015]+?)?)

Finally steal back any trailing punctuation that a user may have typed after the URL. Use a look ahead because we don't want the character to be part of the match string. Matching the character is optional, but we can't use an optional section because we need to match to something here. Alternation list of possible characters or end anchors is the way to go.
(?=[?.!,"'`]*(?:\011| |$|\z))

Putting it all together:
(?:(?:(https?)|news|ftp|telnet)://)?(?:[a-zA-Z0-9_+=.-]+(?:\:[a-zA-Z0-9_+=.-]+)?@)?(?:(?:\d{1,3}\.){3}\d{1,3}|[a-zA-Z](?:[a-zA-Z0-9-]*\.[a-zA-Z0-9]+)+)(?:\:\d+)?(?:/[^*:?"<>/ \011\012\015]*?)*(?(1)(?:\?[^ \011\012\015]+?)?)(?=[?.!,"'`]*(?:\011| |$|\z))

I haven't run that through a full set of test cases. It probably matches a few things it shouldn't but is mostly restrictive enough. Here are the different lines I tested with.
Code:
http://blah.blah
http://blah.blah/
http://blah.blah/werpoi
no match the next, it is missing part of the address
ftp://123.123.123
ftp://123.123.123.123:7
abc@123.123.123.123
abc:def@123.123.123.123
abc@123.123.123.123. see that trailing period 3 times
abc@123.123.123.123.
http://blah.blah/weeero/23rd35y.
https://a.b?c.
https://a.b?c.def
https://a.b?c.def.,! give all 3 back!
nothing .in this. line. .should work!
but blah.blah will.
_________________
The only good questions are the ones we have never answered before.
Search the Forums
Reply with quote
tyebald
Newbie


Joined: 01 Apr 2010
Posts: 7

PostPosted: Fri Apr 09, 2010 4:42 am   
 
Sorry, wasn't trying to imply that Zugg's regex was insufficient to find matching URLs. It does fine at that, or well enough anyway (I'll take one line over 5 pages, personally). But once matched, there weren't any unique patterns (%1, %2, etc.) that would both not risk recursion (by being the same text string) and still be somewhat informative. It's the #SUBing back in of text that, itself, can still cause the original trigger to fire that causes the lock-up.

#SUBing in 'weblink' instead of the URL text would have accomplished the goal of not locking up. But I was aiming for a substring of the original pattern that would be a little more informative than just #SUBing in 'weblink', which meant reworking the regex pattern somewhat. What I aimed for was stripping off the leading protocol string, when present, and stripping off the leading 'www.', when present. And then I truncated the stuff at the end for good measure, since the full URL text still shows up when the MouseOver event fires. That leaves something that still kinda sorta looks like a URL, but will not match the original trigger pattern on its own. Which means if the substitution pattern happens to get its hands on the string of text again (which is what currently happens in the capturing scenario outlined above), it's no big deal because instead of recursing, it just ignores the string.

Tyebald
Reply with quote
Display posts from previous:   
Post new topic   Reply to topic     Home » Forums » CMUD Beta Forum All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

© 2009 Zugg Software. Hosted by Wolfpaw.net