|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Sun Dec 02, 2007 4:14 pm
[2.14]Not enough quota revisitted |
I was doing some speed testing on different script structures to determine which is nanoseconds faster. These tests really don't matter much, but what does matter is what I found after bumping into the quota bug. This series of tests shows that the problem is with specific commands.
Procedure
1. Launch CMud
2. Close Sessions window (ESC)
3. Enter at the command line
Code: |
$var=%secs;#LOOP 10000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 20000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 40000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 80000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 160000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 320000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 640000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 1280000 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 2560000 {};#SHOW (%secs-$var) |
You will see these have a very consistent progression on time used. There are no failures int this group on my system. You may repeat step 3 as many times as desired. I even ran the loop up a few more powers to 8192 and still the progression held smooth with no faults. Memory usage stayed totally unchanged.
4. Enter at the command line
Code: |
$t=1;$var=%secs;#WHILE ($t<10000) {#ADD $t 1};#SHOW (%secs-$var)
$t=1;$var=%secs;#WHILE ($t<20000) {#ADD $t 1};#SHOW (%secs-$var)
$t=1;$var=%secs;#WHILE ($t<40000) {#ADD $t 1};#SHOW (%secs-$var)
$t=1;$var=%secs;#WHILE ($t<80000) {#ADD $t 1};#SHOW (%secs-$var)
$t=1;$var=%secs;#WHILE ($t<160000) {#ADD $t 1};#SHOW (%secs-$var)
$t=1;$var=%secs;#WHILE ($t<320000) {#ADD $t 1};#SHOW (%secs-$var) |
Here again we see a nice linear progression. There are no faults, and the test takes over 30 seconds on my system.
5. Enter at the command line
Code: |
$var=%secs;#LOOP 1000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 2000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 4000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 8000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 16000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 32000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 64000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 128000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 256000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var)
$var=%secs;#LOOP 512000 {#IF (1) {#ADD $a 1}};#SHOW (%secs-$var) |
An even more complex test, but again a nice linear progression, and the final line there takes over 30 on my system by itself. Still no fault.
6.Enter at the command line
Code: |
$var=%secs;#LOOP 100 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 200 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 400 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 800 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 1600 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 3200 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 6400 {#NOOP};#SHOW (%secs-$var) |
On my system that is enough to generate the error (total is 12700). You will notice the time used tends to have an exponential growth rate instead of the clean linear progression seen in the first 2 tests. If you haven't gotten the error to pop up with this test, add the next power of 12800 (10000 by itself is enough to pop the error on my system and only takes 13 seconds). You can also back off 1 level to allow repeating of the test to better see the hyperbolic progression.
7. Once the the pops up select Continue Application
8. Enter at the command line
These last 2 steps demonstrate that the exception is actually thrown by the #SHOW. The cause of it is in something being held around from the #NOOP within the loop. You will see the final #SHOW from step 6 is actually in there. This mostly demonstrates that the quota error is not caused by time, number of messages waiting, or even #SHOW. Some commands are doing something wrong.
I would invite other beta testers to try out other commands, I tried a few and didn't really find a pattern in what fails and what doesn't. Out of the few I tried I actually didn't find anything else that caused the error or hyperbolic progression seen in step 6 with #NOOP. Of all the ones I tested I always stayed away from anything that would cause additional display and setting changes. I tested #SWITCH, #VAR, #CALL, #ADDITEM, and #DELITEM; as well as some combinations of them like the #IF above. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Sun Dec 02, 2007 5:10 pm |
One small note to add to this.
Code: |
$var=%secs;#LOOP 9997 {#NOOP};#SHOW (%secs-$var) |
Does not cause the quota error. You can do it as many times as you want as long as each is run seperately.
Code: |
$var=%secs;#LOOP 9998 {#NOOP};#SHOW (%secs-$var) |
Does cause the quota error. So 9998 is the magic number for #NOOP. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Arminas Wizard
Joined: 11 Jul 2002 Posts: 1265 Location: USA
|
Posted: Sun Dec 02, 2007 5:18 pm |
Wow, my computer is somewhat beefier than the one you are on and it dies at the exact same point.
|
|
_________________ Arminas, The Invisible horseman
Windows 7 Pro 32 bit
AMD 64 X2 2.51 Dual Core, 2 GB of Ram |
|
|
|
ralgith Sorcerer
Joined: 13 Jan 2006 Posts: 715
|
Posted: Sun Dec 02, 2007 10:04 pm |
Same. Interesting. Using 2.13 still, but yeah.
|
|
_________________ CrossOver: Windows Compatibility on Mac and Linux CMUD Advocate |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Dec 03, 2007 6:09 pm |
I'll experiment with this and see what I can find. Some commands in CMUD require themselves to be run by the main thread, whereas other commands (that are threadsafe) can be run directly within the thread itself.
#ADD is a threadsafe command, so it runs within the thread and doesn't require any interaction with the main thread.
#NOOP is marked as *not* threadsafe (because many people still use it instead of #CALL to execute functions and COM stuff). So #NOOP gets run in the main thread. The way commands are run in the main thread is using the Delphi Synchronize method within the thread. This uses the Windows message queue to send a message to the main thread to run a command. CMUD looks for that message while it is waiting on the background thread and executes the command in the main thread.
So, #NOOP is going to use the Windows Message Queue, but #ADD is not. And that is why you might see different behavior with different commands. It's possible that 9999 is the magic limit for the Windows message queue. But if there is some other problem causing this, I'll see if it's something I can control. |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Mon Dec 03, 2007 7:35 pm |
Alright that makes a lot of sense. It does sort of make me curious as to why #CALL isn't similarly marked as not thread safe. It was one of the commands I tested in making my first post, and did not generate a failure.
What makes me even more curious is that you are saying #NOOP is flagged as not thread safe and therefore must be run by first synchronizing the thread running the script into the "main" thread. Then preforming the #NOOP from there and releasing the synchronization. As I understand the structure you are using, the "main" would be the one that the command line is run from.
So you are saying that the script run from the command line thread is attempting to synchonize with its own thread! Does that make my continuing confusion clear? Probably I am wrong about that structure and the "main" thread actually means a thread other then the one the command line runs in.
The same series of synchronize, do command messages would have to be made to complete the script run this way
Code: |
#THREAD a {$var=%secs;#LOOP 9998 {#NOOP};#SHOW (%secs-$var)} |
#NOOP is still not marked as not thread safe, this definitely is not in the main thread. This doesn't crash, and here we are guaranteed that the threading messages must be occuring. Did CMud change its mind about #NOOP being thread safe, or did it just start handling the messages correctly?
Whether I am right or wrong about the threading stucture you are using, the crash shouldn't occur. If I am right about the stucture then the running the loop from the command line should be faster, if I am wrong then both the command line and a background thread should show the same speed. I guess all I have ever been saying with tests that bump into this crash is that there is an inconsistency in the message handling. A message or 3 are not removed when they should be, or some messages are being generated when they shouldn't. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Dec 03, 2007 8:17 pm |
The command line is running in it's own thread too. When you type #THREAD, notice that it shows a single thread running the "#THREAD" command? That is the command line thread.
The "main" thread is the Windows thread that is created when CMUD first runs which handles all of the user interface and stuff like that. This "thread" is not shown by the #THREAD command. Only "script threads" are shown by #THREAD.
This is what allows the Shift-Esc feature. All scripts are run in their own threads, even the command line. And pressing Shift-Esc puts the command line into the background and stops blocking the main thread from processing Windows messages. Otherwise, the main thread waits for the command line thread to finish.
The #CALL and #NOOP are *both* marked as not-threadsafe. So there shouldn't be any difference between them.
When you put something within a #THREAD command, then it's always running in the background, so you don't get the crash. The crash only occurs when the main thread is *waiting* for another thread to complete and it blocking Windows messages (but Windows is continuing to accumulate new messages in it's message queue). While it is waiting for a script to finish and blocking Windows messages, I think there seems to be a limit of 9999 messages in the queue before you get the Windows quota error. And when you call a non-threadsafe command, it is adding a message to the queue to synchronize the command.
I might be able to remove the Synchronize messages from the queue. Delphi actually uses two mechanisms to synchronize threads...it puts a message into the Windows Message Queue, but it also sets a Windows Event. While CMUD is waiting for the script to finish, it processes the Windows Event, but not the message queue. I don't know why Delphi is doing this redundant signalling. But I think it's the messages getting added to the queue that cause this.
But none of this really matters because Windows is going to generate plenty of other messages during a long loop that will end up causing the quota error no matter what I do...even if I remove the Synchronize messages from the queue. Removing those messages will just buy you a bit more time, but it will be very system dependent. |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Mon Dec 03, 2007 8:35 pm |
Ok. Now I have enough information to really be stubborn as a mule about this. Don't bother looking into this problem anytime today, I will be writing a lengthy detailed post right after this.
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Mon Dec 03, 2007 11:18 pm |
I am going to be taking this post step by step. I will make every attempt to explain things in this post clearly enough that a 2 year old can understand it. If it comes off the wrong way then I beg forgiveness, I am quite frustrated that I seem to be the only person that sees a problem with how CMud works now.
Before I even get into any of this I want to point out just why I am so irritated over this.
Zugg wrote: |
Removing those messages will just buy you a bit more time, but it will be very system dependent. |
Arminas wrote: |
Wow, my computer is somewhat beefier than the one you are on and it dies at the exact same point. |
There is no system dependency here yet , and there is absolutely no reason that completed messages should not have been removed.
In all the tests involved to explain this you will only have to launch CMud once. I will dodge the crash assiduously. Your first few steps therefore are:
1. Launch CMud
2. Close Sessions window (ESC)
Each test from here on requires that you simply paste the entire portion of the code block into the command line. Press Enter, sit back and watch. Do not move the mouse, do not press another key.
The first test to be examined is 3 loops done from the command line
Code: |
$var=%secs;#LOOP 9990 {};#SHOW (%secs-$var)
$var=%secs;#LOOP 9990 {#CALL};#SHOW (%secs-$var)
$var=%secs;#LOOP 9990 {#NOOP};#SHOW (%secs-$var) |
The first loop of this proves that we can preform a loop of that magnitude from the command line. The second with #CALL establishes a time frame for including a command in the loop. The difference in time between the first and second loop is expectable, and generally looks to be reasonable. The third loop demonstrates that there is a problem. Notice the tremendous jump in time between the 2nd and 3rd loop. The commands #CALL and #NOOP are supposedly identical for the purposes of this test. Both handle unlimited parameters but have none in this test. Both are flagged as NON-THREADSAFE.
The first test comes down to just 1 question, "What is different between the handling of #CALL and #NOOP?"
Now I will move on to the second test.
Code: |
#THREAD a {$var=%secs;#LOOP 9990 {};#SHOW (%secs-$var)}
#THREAD b {#WAITTHREAD a;$var=%secs;#LOOP 9990 {#CALL};#SHOW (%secs-$var)}
#THREAD c {#WAITTHREAD b;$var=%secs;#LOOP 9990 {#NOOP};#SHOW (%secs-$var)} |
This second test confirms what was found in the first test, specifically #CALL and #NOOP are not the same. We see that the time on the first and second loops are virtually identical to those repesctive loops from test 1. The time on the third loop is vastly different between the 2 tests.
This second test raises a new question, "What is different between handling #NOOP in a background thread versus the command line thread?"
Now a few quotes that should help to clear up these questions, I will add/edit stuff in bold:
Quote: |
The command line is running in it's own thread too. |
That quote is critical! That tells us beyond a shadow of a doubt that the results of those 2 tests should be the same. They are both being run in a thread that is not the "main" thread. They both require the #NOOP command to synchronize to the "main" thread. We still have no idea how to answer that second question.
Quote: |
The way commands are run in the commandline(was main) thread is using the Delphi Synchronize method within the commandline thread. This uses the Windows message queue to send a message to the main thread to run a command. |
This goes a long ways to explain the difference in results between the #CALL and #NOOP loops in both tests. #NOOP is indeed preforming this extra synchronize step. We can clearly see how much this synchronization costs by comparing loop 2 and loop 3 of the second test(that is #CALL and #NOOP from a background thread). The quote hasn't completely answered the first question, but it tells us what to look at. This also points towards a new question, "Why are messages accummulating with test 1?"
Quote: |
The crash only occurs when the main(I am pretty sure you mean actual main) thread is *waiting* for another thread to complete and it is blocking Windows messages (but Windows is continuing to accumulate new messages in it's message queue). |
This means the "main" thread is waiting for the commandline thread and blocking Windows messages. The testing procedures works very specifically to eliminate extraneous Windows messages. The test point that causes the crash was rerun by 2 other systems that would generate fewer Windows messages then mine by virtue of higher speeds. Yet both other systems experienced the exact crash and non-crash points I demonstrated in a previous post. There is an obvious difference in the Windows messages that are occurring with different testers. The only logical conclusion from this is that it is not total messages in the queue that causes the crash, the limit of 10K messages is thread specific.
We also get a new question from this analysis, "Why would the "main" thread block messages regarding script exection that it is processing?" This might be the entire problem, the blocking with the commandline thread is causing a race condition on its own script execution messages.
The final quote reaffirms the analysis from the previous quote and points a big flashing neon sign at the crash bug.
Quote: |
All scripts are run in their own threads, even the command line. And pressing Shift-Esc puts the command line into the background and stops blocking the main thread from processing GLOWING NEON Windows SIGN FLASHES messages. Otherwise, the main thread waits for the command line thread to finish. |
You have definitely generated a race condition in the message handling because the main thread is waiting for the commandline thread to finish before removing some messages, and the script execution messages were included in those that don't get processed. Your stated condition for the main thread is to not process WINDOWS messages. Which answers the 3rd question, with you really didn't mean to do that.
Assuming you have set your message values according to MS recommendations you will have a clear break between user-defined messages and Windows messages, and can implement a change that corrects most of the bug in 1-5 lines.
That still leaves the first question I had to ask, "What is different between the handling of #CALL and #NOOP?" |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Dec 04, 2007 12:10 am |
Quote: |
The first test comes down to just 1 question, "What is different between the handling of #CALL and #NOOP?" |
OK, I looked at the code and I lied. Even though #CALL was marked as "not thread safe" in the CMD.DB command database, it was being handled differently than #NOOP. With #CALL, it wasn't executing in the main thread unless there was a specific argument that needed to be synchronized. In other words, #CALL delayed the decision as to whether it needed to be run in the main thread until the arguments were examined. Whereas #NOOP was being executed in the main thread no matter what. I have modified v2.15 so that #NOOP works more like #CALL. The only remaining difference between #CALL and #NOOP is that #CALL compiles it's arguments at compile-time, whereas #NOOP compiles it's arguments at runtime (and treats the argument as a literal string at compile time). You'll be able to see this difference in the Compiled Code if you compare #CALL and #NOOP with arguments (the difference should already show up in v2.14)
Quote: |
What is different between handling #NOOP in a background thread versus the command line thread? |
When executing any script in the command line, the main thread uses MsgWaitForMultipleObjects Windows API to wait until the command line thread is finished. During this wait, CMUD looks for any Synchronize signals that indicate something needs to be run in the main thread. If it sees this, then it runs the routine in the main thread and then waits again.
When executing a script using #THREAD to create a true background thread, the main thread never waits on it. MsgWaitForMultipleObjects is never called. The Synchronize is handled by Delphi by processing special messages in the Windows message queue. My guess is that there is less Windows overhead with a true background thread than there is when waiting on the command line thread to finish with MsgWaitForMultipleObjects. But I don't know enough of the low-level details of exactly how MsgWaitForMultipleObjects is implemented in Windows to answer the question.
Quote: |
Quote: |
The command line is running in it's own thread too. |
That quote is critical! That tells us beyond a shadow of a doubt that the results of those 2 tests should be the same. |
Not necessarily! With #THREAD, CMUD always creates a new Windows thread and spawns it without waiting for anything. With a command line script (or a trigger script, or macro script, or anything else), CMUD reuses an existing thread from it's pool (if available) and then waits on the thread to finish (either by terminating, or by suspending itself with a #waitxxx command).
Quote: |
Why are messages accummulating with test 1? |
Because of how Delphi is handling the Synchronize. As I mentioned before, it is doing *two* things: 1) it is raising a Windows Event signal, and 2) it is adding a message to the Windows message queue. When CMUD is waiting on a script thread to finish, it looks for the Windows Signal for Synchronize and handles it. But it does not remove any event from the Windows message queue. I don't have access to those Delphi messages and I don't even know what thread queue they are added to. But PeekMessage provides very little control over removing messages from queues of other threads.
Quote: |
Why would the "main" thread block messages regarding script exection that it is processing? |
Because it is *critical* not to cause another script to be processed while waiting for the previous script. Otherwise you could get scripts executed out of order. Also, it causes the exact problem that the original #WAIT command had in zMUD that we are trying to avoid. If you process a script while waiting for the previous script, then you can end up with a chain reaction of this:
Code: |
Running Script A
(waiting for Script A to finish)
Running Script B (while Script A is still running)
(waiting for Script B to finish)
Running Script C (while Script B is still running)
... |
If this happens, then you might never get back to the top level. If Script C gets into some sort of loop, then it will wait for Script C and never get back to whatever it was supposed to do after Script B or Script A were finished. The whole point of the new design is to prevent multiple scripts from ever running in parallel (unless forced using a #WAIT command or #THREAD command). We want CMUD to wait until the previous script is finished before executing a new script.
New scripts can be fired via a number of Windows messages. Pressing ENTER on the command line generates a Windows WM_KEYDOWN, WM_CHAR, and WM_KEYUP messages (among others). When CMUD processes the WM_CHAR message for the #13 key (Enter), it executes the script on the command line. When you press a Macro key, CMUD executes a script on the WM_KEYDOWN Windows message. When you click on a button, Windows generates a WM_LBUTTONDOWN message that causes CMUD to execute a script. Various CMUD dialogs and stuff are opened by processing Windows WM_COMMAND messages (which is what the menu system generates).
So just blocking "user" messages doesn't help at all.
Quote: |
Assuming you have set your message values according to MS recommendations you will have a clear break between user-defined messages and Windows messages |
Unfortunately, not true with Delphi. Delphi uses a lot of it's own internal messages as well. So there isn't any clear definition of what a "system message" might be, vs a CMUD message.
Finally, what you might not realize is how hard all of this is. The Windows MsgWaitForMultipleObjects API routine is all CMUD has to work with to wait for a script thread to finish. In this (and some related API calls), you can't specify any message filter that you want. Microsoft defined a couple of different message filters that can be used (like QS_KEY for keyboard messages, QS_MOUSE for mouse messages, etc). CMUD actually uses QS_ALLINPUT and then uses PeekMessage to determine if there is a WM_PAINT, WM_QUIT, or WM_MOUSEMOVE event, and if those are found, they are processed. Then it checks the AsyncKeyState to see if the ESC key is down to abort the wait loop.
One side effect "trick" of the Windows PeekMessage call is that it causes all messages in the Windows Message queue to be flagged as "seen", so the next call to MsgWaitForMultipleObjects in the waiting loop only returns if a NEW message is detected in the queue. So you don't get any race condition with this as long as you are sure to call PeekMessage if MsgWaitForMultipleObjects returns WAIT_OBJECT_0 + NumEvents (where NumEvents is the number of Event signals you are waiting for, like the signal for the thread termination, or the signal for the Synchronize method).
Anyway, so CMUD isn't getting a race condition because it's responding properly to the Event signals for Synchronization and not the Windows Messages.
The "quota" error must be generated by some side effect of how Delphi is creating the Synchronize messages, but I'd have to look deep in the Delphi source code to see exactly what they are doing that might be causing the problem.
If I can remove the messages from the queue using PeekMessage, then I will certainly do so. |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Tue Dec 04, 2007 1:28 am |
I am going to have to think about this a lot more. I know there is correctable bug in all of this; to borrow from the beholder image that so truly defines me here, all my eye stalks are standing on end. I am sorry you had to waste another good programming hour explaining how things work to me.
I really do recognize that some bugs are not correctable. They are limitations of the system within which the program is written. I will find a path to correcting this one though. That hyperbolic time curve I mention in my first post of this topic is more than incentive enough to find this bug. Your fixes to #NOOP for 2.15 are likely to make it harder though. The secondary bug in #NOOP's handling actually provided a very clean way to get at what I see as a the main bug without inolving other sections of CMud. In other words the bug in #NOOP vs #CALL provided an isolated way to access the deeper bug. I can probably work around this change. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Dec 04, 2007 1:49 am |
If you want to continue testing this in v2.15 with the change to #NOOP, just use '#NOOP 1' and that will do the same thing as #NOOP did in 2.14. As I mentioned, #NOOP compiles and processes it's arguments at runtime, so the trick is to give it a simple argument that is quick and easy to compile and parse. But that will force it to run in the main thread. (although admittedly it makes your test with #NOOP take about twice as long since it *is* calling the compiler even with just the '1' argument)
As you might guess, '#CALL 1' will be much faster because the argument is compiled at compile-time and then #CALL is smart enough to know that it can execute '1' without calling the main thread. This is just one of the reasons that I keep telling people to use #CALL instead of #NOOP since #CALL is optimized to work better with CMUD while #NOOP just calls the dump argument expansion that zMUD did.
If I learn more about how Delphi is implementing it's Synchronize method, I'll let you know. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Dec 04, 2007 2:37 am |
I ran Spy++ on this to see what Messages were getting added to the Windows Queue. If you leave the mouse hovering over the command line, then there is a flood of WM_NCHITTEST messages from the command line window trying to determine where the mouse is located. That causes a slowdown. If you run your:
Code: |
$var=%secs;#LOOP 9997 {#NOOP};#SHOW (%secs-$var) |
test and make sure the mouse isn't over the command line, then it's much quicker. But it still crashes when you use 9998. And in that case, no messages are being sent to the window. So, even though the "quota" error is generated by the call to PostMessage, it doesn't seem to have anything to do with the message queue.
It is the #SHOW command at the end that calls PostMessage which is generating the quota error. #SHOW uses PostMessage to tell CMUD to display the next line of the input buffer (which #SHOW just added text to).
It sure would be nice if Microsoft would provide better error messages. I'd really like to know *WHAT* quota is being exceeded here. Since the error was returned from the Windows PostMessage, I was assuming it was a quota related to the message queue, but who knows.
I cannot find any reference in the Delphi source to any message being added to the queue to handle the Synchronize routine. It's possible they are using SendMessage instead of PostMessage to bypass the queue. When I looked at an internal Delphi WaitFor routine, it looks like this:
Code: |
function TThread.WaitFor: LongWord;
{$IFDEF MSWINDOWS}
var
H: array[0..1] of THandle;
WaitResult: Cardinal;
Msg: TMsg;
begin
H[0] := FHandle;
if GetCurrentThreadID = MainThreadID then
begin
WaitResult := 0;
H[1] := SyncEvent;
repeat
{ This prevents a potential deadlock if the background thread
does a SendMessage to the foreground thread }
if WaitResult = WAIT_OBJECT_0 + 2 then
PeekMessage(Msg, 0, 0, 0, PM_NOREMOVE);
WaitResult := MsgWaitForMultipleObjects(2, H, False, 1000, QS_SENDMESSAGE);
CheckThreadError(WaitResult <> WAIT_FAILED);
if WaitResult = WAIT_OBJECT_0 + 1 then
CheckSynchronize;
until WaitResult = WAIT_OBJECT_0;
end else WaitForSingleObject(H[0], INFINITE);
CheckThreadError(GetExitCodeThread(H[0], Result));
end;
{$ENDIF} |
And this code is *very* similar to my own code that calls MsgWaitForMultipleObjects. But notice the QS_SENDMESSAGE instead of the QS_ALLINPUT that I am using. I changed CMUD to use QS_SENDMESSAGE instead, but it still gives the quota error. But that makes me suspicious because of the comment in their source code about SendMessage being used. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Dec 04, 2007 3:13 am |
Found the Delphi PostMessage routine. There is a procedure called WakeMainThread that does:
PostMessage(Handle, WM_NULL, 0, 0);
everytime the Synchronize method is called. This might explain why it didn't show up in the Spy++ log. I have no idea why they are doing this, and I can't imagine that this actually adds a real message to the queue. My guess is that they are trying to trigger some sort of Windows API side-effect that I'm not aware of. I'm going to try and stop this method call to see what happens.
Now you've got me obsessed with this too |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Dec 04, 2007 3:28 am |
That was it!!!!!!
The Synchronize method was looking for a event handler assigned to the WakeMainThread variable. This variable was assigned to the TApplication.WakeMainThread method, which did the PostMessage(...WM_NULL...) that I showed above. According to their source code comments:
Code: |
{ Assign a method to WakeMainThread in order to properly force an event into
the GUI thread's queue. This will make sure that non-GUI threads can quickly
synchronize with the GUI thread even if no events are being processed due to
an idle state } |
So, in the CMUD Wait routine, I save the WakeMainThread variable and then assign NIL to it before going into my MsgWaitForMultipleObjects loop. When the loop is done, then I restore WakeMainThread from the saved value.
And that did it! Now when I run Vijilante's test:
Code: |
$var=%secs;#LOOP 100 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 200 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 400 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 800 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 1600 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 3200 {#NOOP};#SHOW (%secs-$var)
$var=%secs;#LOOP 6400 {#NOOP};#SHOW (%secs-$var) |
I get *linear* progression instead of exponential progression (and this is with #NOOP working the same as it was in 2.14).
So, it *was* some sort of Message Queue quote violation. Although I'm not sure why Windows was allowing a WM_NULL to be added to the queue over and over again, especially since there is no way to remove a WM_NULL using PeekMessage. Or maybe it was triggering some sort of side effect that was using up some other resource within Windows.
The reason it caused exponential time progression is the PeekMessage call within my Wait loop. This is needed to mark all messages as "seen" so that the MsgWaitForMultipleObjects routine would only return for NEW messages added to the queue. And since a WM_NULL message was being added to the queue for each call to #NOOP, the message queue was getting longer and longer and longer, making PeekMessage take longer and longer. So that actually makes sense.
Anyway, this should be fixed in 2.15. Not sure it was really something that normal users would run across, but it should make CMUD that much more stable and faster. Hopefully Vijilante will see this tonight so that he doesn't lose any more sleep over this weird behavior ;) |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Tue Dec 04, 2007 3:52 am |
The short explanation of why I am bothered by this is this test. Enter at the command line
Code: |
#TRIGGER "test1" {test} {#LOOP 5000 {#NOOP}};#TRIGGER "test2" {test} {#LOOP 5000 {#NOOP}} |
Followed by
Triggers runs in the the commandline thread, and we can see that this is horribly slow, but does not crash. The break between the 2 triggers is enough to allow whatever the quota problem is to clear out. I guess you will have to imagine testing it, because I still haven't imagined my way around your changes for 2.15. Ah, the era of imaginary testing is upon us.
I know that quota crash is hiding behind there, but I also know that no one is going to find it in common usage. I have a few scripts that work with my map that might actually cause it. Those scripts were actually designed to be run from the command line.
To me the crash isn't even secondary, the slowdown caused by a message build up prior to the crash is the issue. I want to find a way to avoid that hyperbolic slow down. Any sort of linear slow down is understandable and makes sense. I do have scripts that make heavy usage of loops and COM calls, I don't want to have them turn into an untraceable 100% CPU lockup like that regex I had. Obviously the current messaging prevents that, but that same messaging is what is causing the slowdown. I am just trying to find the cause without ever seeing a single piece of source code.
I don't want to be an ass about this, but I fear that is exactly what I am becoming. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Tue Dec 04, 2007 4:01 am |
I was typing all that last post up for a while, and agonizing over every word. I post it, and then notice something odd...there is another post up ahead of my latest.
Kudos Zugg! I need hunt no more. You have once again surpassed all known standards. The knew Zugg standard of excellence is set! |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Tech GURU
Joined: 18 Oct 2000 Posts: 2733 Location: Atlanta, USA
|
Posted: Tue Dec 04, 2007 11:51 pm |
I know you probably do it for the love MUDding and great software, but after tracking this down I'll be more than willing to buy/upgrade you to the latest Zugg product of your choice.
You truly are The Beholder. |
|
_________________ Asati di tempari! |
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|