 |
ReedN Wizard
Joined: 04 Jan 2006 Posts: 1279 Location: Portland, Oregon
|
Posted: Wed Jun 18, 2008 9:41 am
What's the proper way to escape '@' in a regex trigger? |
I've having an issue where I need to match '@' in the text, but the regex it treating it as a variable.
String I want to match:
S--@h++,H++,CE100%,W<-SE@16kts,C/S->SE@0,V -
Regex:
^S..@h.{2},H.{2},CE\d{1,3}\%,W<-\w{1,3}@\d{1,2}kts,C/S->\w{1,3}@\d{1,2},\w{1,5} -$
With the above regex the '@' is messing up the matching. I tried to put a '\' in front of it but that didn't work. The only thing I could think of to get it to work was to match the '@' with a '.', as in:
^S...h.{2},H.{2},CE\d{1,3}\%,W<-\w{1,3}.\d{1,2}kts,C/S->\w{1,3}.\d{1,2},\w{1,5} -$
Does anyone know the proper way of doing this when you have actual '@'s you want to match? |
|
|
 |
Fang Xianfu GURU

Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Wed Jun 18, 2008 10:52 am |
I think you need to quote them with ~ so that the CMUD parser will ignore them - this should be the only time that ~ is stripped too in case you need to use it. I don't have CMUD here to check that, but I guess you can try it and see :P
In case that doesn't work, in the meantime you can use a range to define a bunch of characters you don't want it to match, like [^\w\d!"£$%\^&*()_\-+=[]{}',.~#?/] or something. I don't think any of those need quoting other than the ones I've already done. |
|
|
 |
Larkin Wizard

Joined: 25 Mar 2003 Posts: 1113 Location: USA
|
Posted: Wed Jun 18, 2008 10:57 am |
I just tested in CMUD with the following code, and it fired just fine for me:
Code: |
#REGEX {^Hello \@ home\.$} {#SAY "Hiya."}
#SHOW "Hello @ home." |
Using ~ does not work in a regex pattern for escaping things like this.  |
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5182
|
Posted: Wed Jun 18, 2008 11:02 am |
It is a bit of hoop to jump through. Use the octal code for them so CMud's parser has no idea what you are doing. The only time you really should have to do it is when the next characters could be interpretted as a variable name. In the case of your regex that would be the "@h", I changed them all anyways.
^S..\080h.{2},H.{2},CE\d{1,3}\%,W<-\w{1,3}\080\d{1,2}kts,C/S->\w{1,3}\080\d{1,2},\w{1,5} -$ |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
Larkin Wizard

Joined: 25 Mar 2003 Posts: 1113 Location: USA
|
Posted: Wed Jun 18, 2008 1:49 pm |
I can see now that mine worked because I followed it with a space, but I think there's a bug here somewhere.
I tried this, and it didn't fire (which I consider to be a bug, personally):
Code: |
#REGEX {^Hello\@home\.$} {#SAY "Hiya."}
#SHOW "Hello@home." |
So, I tried this instead, and it still didn't fire for me:
Code: |
#REGEX {^Hello\080home\.$} {#SAY "Hiya."}
#SHOW "Hello@home." |
Am I missing something here? |
|
|
 |
ReedN Wizard
Joined: 04 Jan 2006 Posts: 1279 Location: Portland, Oregon
|
Posted: Wed Jun 18, 2008 2:06 pm |
I tried using the \080 but like Larkin I was unable to get that to work. I also verified that '~' doesn't work. It does work if I use [@] which is a match on a set of just one which I guess should be efficient enough and is what I'm currently using.
It does have me perplexed that there doesn't seem to be a way to escape this character in Cmud. Does this seem like a bug? |
|
|
 |
Rahab Wizard
Joined: 22 Mar 2007 Posts: 2320
|
Posted: Wed Jun 18, 2008 3:13 pm |
Zugg has already said in another posting that v2.28 will have a %quoteregex() function. Would this solve your problem?
|
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Wed Jun 18, 2008 5:04 pm |
It's a bug.
Normally you should use ~@Home because you are escaping the @ from CMUD parsing of the CMUD variable, which has nothing to do with regular expressions. The \ is used to escape characters within a regular expression from the PCRE engine.
Patterns are expanded by CMUD first to get the variable references, and that is what you want to stop from happening in this case, which is why you'd use a ~ character for it.
You can use the Compiled Pattern tab to see exactly what is happening with these two cases. When using the \@Home you will see that CMUD is still compiling a variable reference. When using the ~@Home you will see that the variable reference is gone, but the bug is that the ~ is not removed from the string pattern.
I'll try to get this bug fixed in 2.28.
Edited: I not sure why the ^Hello\080home\.$ doesn't work. I debugged CMUD and verified that it is sending that pattern directly to the PCRE engine. We might need Vijilante to advise us about why this doesn't work or if it's a bug in the PCRE.DLL. |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Wed Jun 18, 2008 5:28 pm |
OK, I changed my mind. When I looked at the CMUD code, it was already properly handling the difference between a normal zScript trigger pattern and a regular expression. It just wasn't handling the \ properly in the code.
So, ignore what I said above. In a *regular expression* the proper way to quote a @ is using the \ just like you normally would. You don't use ~ to quote anything in a regular expression. Sorry for the confusion.
But this still doesn't explain the issue with \080. |
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5182
|
Posted: Wed Jun 18, 2008 7:10 pm |
I am examining the compiled pattern data from within my test app and it looks like the recent change to use the 30bit link size caused both octal and hex notations to break. It is going to take me a while figure out why since this really shouldn't have happened.
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Wed Jun 18, 2008 7:48 pm |
Geez, you'd think something as widely used as the PCRE.DLL would be debugged by now. For CMUD use, I think I'd rather have the 30-bit for larger string-list patterns than the octal/hex notation though. But definitely let me know if you find a solution.
|
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5182
|
Posted: Wed Jun 18, 2008 7:54 pm |
Well, that was fun. I guess I shouldn't post in the morning on my first cup of coffee. Octal of course means base 8 and has a valid range of 0 to 7, as you can see 080 is invalid and should have been 100. Sadly it took me reading through all the PCRE source to find where the conversion is done to realize this. My statement about the hex notation being off was due to a hasty test. I am used to using 0x when programming so I did \0x40, regex only uses the x so it should have been \x40. Everything looks to be working right once I got my brain moving in the right directions.
^S..\100h.{2},H.{2},CE\d{1,3}\%,W<-\w{1,3}\100\d{1,2}kts,C/S->\w{1,3}\100\d{1,2},\w{1,5} -$ |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
Zugg MASTER

Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Wed Jun 18, 2008 10:22 pm |
Don't worry, I've had days like that too ;)
Thanks for giving all of us the explanation since nobody else noticed that \080 wasn't proper octal either. |
|
|
 |
ReedN Wizard
Joined: 04 Jan 2006 Posts: 1279 Location: Portland, Oregon
|
Posted: Thu Jun 19, 2008 1:13 am |
Where do you look up those codes?
|
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5182
|
Posted: Thu Jun 19, 2008 8:10 am |
Those codes are the ascii numbers for the character. You can get the number using #SHOW %ascii("character"). Then converting to octal or hex can be done in any number of ways. I tend to do it in my head, but you can use the calculator or some various script snippets that are floating around.
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
mr_kent Enchanter
Joined: 10 Oct 2000 Posts: 698
|
Posted: Thu Jun 19, 2008 8:29 am |
This link was given on these forums at one point and I had saved it in my browser. Hope it helps.
|
|
|
 |
ReedN Wizard
Joined: 04 Jan 2006 Posts: 1279 Location: Portland, Oregon
|
Posted: Thu Jun 19, 2008 8:55 am |
I didn't think to use %ancii. And that link was helpful too, thanks!
Converting it isn't a problem. I'm an EE so converting bases is second nature. |
|
|
 |
|
|