Zugg Software :: View topic

SubAdmin Joined: 18 Nov 2001 Posts: 5182

I was doing some speed testing on different script structures to determine which is nanoseconds faster. These tests really don't matter much, but what does matter is what I found after bumping into the quota bug. This series of tests shows that the problem is with specific commands.

Procedure
1. Launch CMud
2. Close Sessions window (ESC)
3. Enter at the command line

SubAdmin Joined: 18 Nov 2001 Posts: 5182

One small note to add to this.

Wizard Joined: 11 Jul 2002 Posts: 1265 Location: USA

Wow, my computer is somewhat beefier than the one you are on and it dies at the exact same point.

Sorcerer Joined: 13 Jan 2006 Posts: 715

Same. Interesting. Using 2.13 still, but yeah.

Posted: Mon Dec 03, 2007 6:09 pm

I'll experiment with this and see what I can find. Some commands in CMUD require themselves to be run by the main thread, whereas other commands (that are threadsafe) can be run directly within the thread itself.

#ADD is a threadsafe command, so it runs within the thread and doesn't require any interaction with the main thread.

#NOOP is marked as *not* threadsafe (because many people still use it instead of #CALL to execute functions and COM stuff). So #NOOP gets run in the main thread. The way commands are run in the main thread is using the Delphi Synchronize method within the thread. This uses the Windows message queue to send a message to the main thread to run a command. CMUD looks for that message while it is waiting on the background thread and executes the command in the main thread.

So, #NOOP is going to use the Windows Message Queue, but #ADD is not. And that is why you might see different behavior with different commands. It's possible that 9999 is the magic limit for the Windows message queue. But if there is some other problem causing this, I'll see if it's something I can control.

SubAdmin Joined: 18 Nov 2001 Posts: 5182

Alright that makes a lot of sense. It does sort of make me curious as to why #CALL isn't similarly marked as not thread safe. It was one of the commands I tested in making my first post, and did not generate a failure.

What makes me even more curious is that you are saying #NOOP is flagged as not thread safe and therefore must be run by first synchronizing the thread running the script into the "main" thread. Then preforming the #NOOP from there and releasing the synchronization. As I understand the structure you are using, the "main" would be the one that the command line is run from.

So you are saying that the script run from the command line thread is attempting to synchonize with its own thread! Does that make my continuing confusion clear? Probably I am wrong about that structure and the "main" thread actually means a thread other then the one the command line runs in.

The same series of synchronize, do command messages would have to be made to complete the script run this way

Posted: Mon Dec 03, 2007 8:17 pm

The command line is running in it's own thread too. When you type #THREAD, notice that it shows a single thread running the "#THREAD" command? That is the command line thread.

The "main" thread is the Windows thread that is created when CMUD first runs which handles all of the user interface and stuff like that. This "thread" is not shown by the #THREAD command. Only "script threads" are shown by #THREAD.

This is what allows the Shift-Esc feature. All scripts are run in their own threads, even the command line. And pressing Shift-Esc puts the command line into the background and stops blocking the main thread from processing Windows messages. Otherwise, the main thread waits for the command line thread to finish.

The #CALL and #NOOP are *both* marked as not-threadsafe. So there shouldn't be any difference between them.

When you put something within a #THREAD command, then it's always running in the background, so you don't get the crash. The crash only occurs when the main thread is *waiting* for another thread to complete and it blocking Windows messages (but Windows is continuing to accumulate new messages in it's message queue). While it is waiting for a script to finish and blocking Windows messages, I think there seems to be a limit of 9999 messages in the queue before you get the Windows quota error. And when you call a non-threadsafe command, it is adding a message to the queue to synchronize the command.

I might be able to remove the Synchronize messages from the queue. Delphi actually uses two mechanisms to synchronize threads...it puts a message into the Windows Message Queue, but it also sets a Windows Event. While CMUD is waiting for the script to finish, it processes the Windows Event, but not the message queue. I don't know why Delphi is doing this redundant signalling. But I think it's the messages getting added to the queue that cause this.

But none of this really matters because Windows is going to generate plenty of other messages during a long loop that will end up causing the quota error no matter what I do...even if I remove the Synchronize messages from the queue. Removing those messages will just buy you a bit more time, but it will be very system dependent.

SubAdmin Joined: 18 Nov 2001 Posts: 5182

Ok. Now I have enough information to really be stubborn as a mule about this. Don't bother looking into this problem anytime today, I will be writing a lengthy detailed post right after this.

SubAdmin Joined: 18 Nov 2001 Posts: 5182

I am going to be taking this post step by step. I will make every attempt to explain things in this post clearly enough that a 2 year old can understand it. If it comes off the wrong way then I beg forgiveness, I am quite frustrated that I seem to be the only person that sees a problem with how CMud works now.

Before I even get into any of this I want to point out just why I am so irritated over this.

Posted: Tue Dec 04, 2007 12:10 am

SubAdmin Joined: 18 Nov 2001 Posts: 5182

I am going to have to think about this a lot more. I know there is correctable bug in all of this; to borrow from the beholder image that so truly defines me here, all my eye stalks are standing on end. I am sorry you had to waste another good programming hour explaining how things work to me.

I really do recognize that some bugs are not correctable. They are limitations of the system within which the program is written. I will find a path to correcting this one though. That hyperbolic time curve I mention in my first post of this topic is more than incentive enough to find this bug. Your fixes to #NOOP for 2.15 are likely to make it harder though. The secondary bug in #NOOP's handling actually provided a very clean way to get at what I see as a the main bug without inolving other sections of CMud. In other words the bug in #NOOP vs #CALL provided an isolated way to access the deeper bug. I can probably work around this change.

Posted: Tue Dec 04, 2007 1:49 am

If you want to continue testing this in v2.15 with the change to #NOOP, just use '#NOOP 1' and that will do the same thing as #NOOP did in 2.14. As I mentioned, #NOOP compiles and processes it's arguments at runtime, so the trick is to give it a simple argument that is quick and easy to compile and parse. But that will force it to run in the main thread. (although admittedly it makes your test with #NOOP take about twice as long since it *is* calling the compiler even with just the '1' argument)

As you might guess, '#CALL 1' will be much faster because the argument is compiled at compile-time and then #CALL is smart enough to know that it can execute '1' without calling the main thread. This is just one of the reasons that I keep telling people to use #CALL instead of #NOOP since #CALL is optimized to work better with CMUD while #NOOP just calls the dump argument expansion that zMUD did.

If I learn more about how Delphi is implementing it's Synchronize method, I'll let you know.

Posted: Tue Dec 04, 2007 2:37 am

I ran Spy++ on this to see what Messages were getting added to the Windows Queue. If you leave the mouse hovering over the command line, then there is a flood of WM_NCHITTEST messages from the command line window trying to determine where the mouse is located. That causes a slowdown. If you run your:

Posted: Tue Dec 04, 2007 3:13 am

Found the Delphi PostMessage routine. There is a procedure called WakeMainThread that does:

PostMessage(Handle, WM_NULL, 0, 0);

everytime the Synchronize method is called. This might explain why it didn't show up in the Spy++ log. I have no idea why they are doing this, and I can't imagine that this actually adds a real message to the queue. My guess is that they are trying to trigger some sort of Windows API side-effect that I'm not aware of. I'm going to try and stop this method call to see what happens.

Now you've got me obsessed with this too Wink

Posted: Tue Dec 04, 2007 3:28 am

That was it!!!!!!

The Synchronize method was looking for a event handler assigned to the WakeMainThread variable. This variable was assigned to the TApplication.WakeMainThread method, which did the PostMessage(...WM_NULL...) that I showed above. According to their source code comments:

SubAdmin Joined: 18 Nov 2001 Posts: 5182

The short explanation of why I am bothered by this is this test. Enter at the command line

SubAdmin Joined: 18 Nov 2001 Posts: 5182

I was typing all that last post up for a while, and agonizing over every word. I post it, and then notice something odd...there is another post up ahead of my latest.

Kudos Zugg! I need hunt no more. You have once again surpassed all known standards. The knew Zugg standard of excellence is set!

Posted: Tue Dec 04, 2007 11:51 pm

I know you probably do it for the love MUDding and great software, but after tracking this down I'll be more than willing to buy/upgrade you to the latest Zugg product of your choice.

You truly are The Beholder.