Zugg Software :: View topic

Posted: Sat Sep 29, 2007 6:04 am

#CW mage dodgerblue
#LOOP 25 {#SAY {the Mage}}

Not every instance of the word 'mage' gets colored.

Posted: Sat Sep 29, 2007 9:06 am

Confirmed. Oddly it worked fine in version 2.03

Posted: Mon Oct 01, 2007 10:52 pm

OK, I was able to use this simple test case to narrow down several problems with the threading system.

What is really hard in the threading system is to determine when a background thread is finished, without locking the user interface. Each thread sets an Event when it finishes (or suspends). The main thread can use the WaitForMultipleObjects Windows API call to wait until this event is set (using Multiple objects, it can also look for the Event that is set when the thread terminates).

Now, if this method is used, then executing a long loop from the command line, like

Posted: Mon Oct 01, 2007 11:03 pm

Glad to hear you found the culpret. Kepp up the good work!

SubAdmin Joined: 18 Nov 2001 Posts: 5182

Sounds like that is going to cover nearly all the crash bugs I hit. Nice work shalimar! Small tests are always better then complex ones.

Posted: Tue Oct 02, 2007 12:34 am

I've narrowed it down now to two separate issues that I think are causing most of the problems being reported:

1) There are several variables that are "global" within each Window. For example, when a trigger fires, it saves the screen line number and the X position and length of the matching trigger text on the line. Then, commands like #CW use this information to determine what to color on the screen. When multiple threads are running at the same time, these variables need to be stored uniquely for *each* thread. So I need to create a set of "threadsafe" variables that are tied to the ThreadID so that one thread doesn't effect the variables in the window being used in another thread.

2) The line number in the output window has always been a bit problematic (and kludged). If the number of lines in the scrollback buffer is N, then line numbers go from 0 to N-1. When the scrollback is full and it starts discarding the first lines, the line numbers still go from 0 to N-1, but need to be adjusted as scrolling occurs. Currently, the screen keeps track of the "last line" cursor position, and when the scrollback scrolls and discards lines, this last-line position is adjusted.

But again, this needs to be saved uniquely for each thread. Otherwise, if one thread adds a line that scrolls the screen, then another thread no longer has the proper line number value.

What I really need to do is fix the line numbers so that they are unique and continue to increase, even when scrolling. Then I just need to keep track of how many lines have been removed/scrolled from the scrollback buffer, and use CurrentLine-NumberScrolled to obtain the physical line number in the buffer.

This is a pretty major change to the screen output routine, so I'm going to need to be very careful with it. Hopefully it doesn't cause a lot of weird side-effects. But using the full line number will help with a number of kludged areas in CMUD (still left from some zMUD code), and will make it easier to support in the future, so this is a good time to do it.

In any case, it will take me at least another day to get all of this fixed properly. I'm kind of surprised I didn't already run into this...it's pretty basic "thread-safe" kind of stuff. Thanks to Shalimar for posting such a good and simple test case for this stuff.

Wizard Joined: 14 Aug 2004 Posts: 1269

I read as far as here:

Posted: Tue Oct 02, 2007 1:32 am

Bah...this stuff is such a #$%* pain!

So I get all of the variables that need to be instanced by threadid into their own class and change all of the code to search for the variables associated with the current thread ID. When I run it, it still has the same problems. Upon debugging, I discover (and remember) that, of course, #CW always runs in the MAIN THREAD because it needs to access the user interface (to color the lines). So the #CW routine can't determine what thread's variables to access...it always just access the main thread variables.

Bleh...I'm giving up for tonight and will think about it more tomorrow. My head is about to explode. If it wasn't 90% working, I'd just get rid of all of this threading crap. I'm not sure it's worth the headache (as much as I like #WAITFOR and the #WAIT fixes). But I'll see if I can figure out how to fix this tomorrow.

Seb: In theory, scrolling performance will be slightly improved, but I doubt it will be noticeable. It was only deducting 1 from the stored cursor variables, and was only doing this once each time the screen was scrolled. The scrolling performance is much more effected by the character-by-character code, rather than the line-by-line stuff.

Posted: Tue Oct 02, 2007 1:56 am

Here is the kind of code that is driving me crazy. Apparently, this is a really bad way to code when using threading. Unfortunately, I do it *everywhere* in CMUD.

The idea is that you have a routine that needs to mess with a global resource. Consider the routine for #LOOP, where it needs to set the global value of %repeatnum. Here is the kind of code that I have:

Posted: Tue Oct 02, 2007 5:37 am

I can imagine your frustration. I haven't done serious multi-threaded code for a while but I still get nightmares from when I did. It seems that you need a Singleton to manage variables like GlobalRepeatNum with appropriate getter/setter methods so that the class is always aware of how many intermittent updates occur and can account for them.

It's late for me so that idea may be a complete wash and you'll have something 20 times better in the morning.

Wizard Joined: 14 Aug 2004 Posts: 1269

Posted: Tue Oct 02, 2007 5:27 pm

No, because other parts of CMUD refer to the GlobalRepeatNum value when you access %repeatnum. I fixed this specific case by using the runtime stack like I do with %i, %i, %j, etc so that specific routine is working now. But there are a lot of other routines where I use this same bad logic for saving/restoring a global value.

The specific case of the #CW problem looks like this:

Posted: Tue Oct 02, 2007 5:38 pm

If it were me, I'd probably just scrap the whole threading thing. I don't think any performance boost is worth it (especially considering how cutting-edge the average MUD user's hardware is :P), and all the thread-related commands, including wait, can be done already through other methods.

But if you think you've found a solution, hooray! :)

Posted: Tue Oct 02, 2007 7:56 pm

That's the decision I've been struggling with. As we know, #WAIT didn't work properly without threads. In 1.34, #WAIT was implemented as a Windows message loop, which ends up causing it's own problems, and prevents #WAIT from working properly in loops and stuff. And I think #WAITFOR is a very cool concept, especially for newbies, since it can simplify a lot of basic scripts. It also will allow the Send Delay (and planned separate Speedwalk Delay) to work properly on MUDs that have a minimum speedwalk time. So there are a lot of advantages to doing it, and I've certainly already put a lot of effort into it.

It's not really about any "performance boost". It's more for the features like #WAIT and #WAITFOR.

But if it's going to make CMUD horribly unstable and I can't find a way to fix it, them I'm going to be forced to scrap it. I'm going to put today and tomorrow into working on possible solutions and then I'll release a 2.05 beta for people to play with. If 2.05 works well enough, then I'll continue with the threading ideas.

One other option is to go ahead and prevent Windows messages from being handled during thread execution, even on the command line. This would make the UI pause if you do a long #LOOP command on the command line, but it would prevent parallel command line threads from causing trouble. If that was done, then the only bad effects of non-thread-safe code would only be seen when using parallel threads started with one of the #WAIT commands. So this is an intermediate approach where we still use threads, but avoid any parallel execution for 99% of the scripts.

For now I'm keeping the parallel threads on the command line because it helps in tracking down the thread-safe issues.

Anyway, so I have a couple of options. I'll let you know how these fixes are going later today.

Posted: Tue Oct 02, 2007 8:42 pm

OK, I think my new Synchronize routine and thread-specific variables are working now. The code looks like this now:

Posted: Tue Oct 02, 2007 8:56 pm

Hmm, I can't just do the simple CurrentLine+LinesScrolled solution. Imagine that a trigger fires on line 10. So "10" is stored for the CurrentLine in the thread-specific variables. Now imagine that some other trigger fires and causes line 5 on the screen to be deleted/gagged (this doesn't happen much in real life, but it illustrates the problem). OK, now that line 5 was deleted, the "10" in the thread's CurrentLine no longer points to the proper line. It should now be "9" and not "10". In all of these cases the "LinesScrolled" was always zero.

So, I really need to still treat the "CurrentLine" in the thread variables as a screen "cursor" position. CMUD's screen routines already have code to manage a list of cursors. In the above case, CMUD would detect that line "10" was greater than the line being deleted and would adjust the cursor to properly point at line "9". So I think I just need to add a cursor for each thread so that the existing screen routines can keep it updated.

This might actually be easier and safer because it doesn't involve any changing to the low-level screen routines. And the performance hit of keeping track of additional cursors shouldn't be noticeable. I'll try this and let you know. (Yeah, I know, this thread has really become a Blog post...sorry about that)

Enchanter Joined: 09 Sep 2007 Posts: 605

Since I know almost nothing about CMUD threading logic, I can speak very approximately.

May be it is possible to make a trade-off like introducing some kind of restrictions on when a separate thread may be used? For ex., do not start new threads as a response on text feed from mud (pattern-based triggers) or command line commands and prohibit to use #WAITFOR in such a triggers (or start 1 thread for simulation #WAITFOR behavior which will maintain its own list of patterns and timeouts)? This can prevent UI-related commands from being messed up as they will execute in main thread only. Pauses of UI on long loops - consider them as a penalty for using such a loops. Threads that performs pure calculation, using #WAIT or issuing commands to mud still may be started. Though it is almost a new conception and it need damn lot of efforts to bring it to work.

Anyway I wish you, Zugg, good luck in finding any solution of threading issues.

Posted: Tue Oct 02, 2007 9:43 pm

Perhaps you could limit the threads to only be used in the new #WAITFOR instances? are threads really needed for everything?

Wizard Joined: 14 Aug 2004 Posts: 1269

Sounds like you're making decent progress. This #CW routing does bring to mind what people say about "never use global variables!", but you seem to be doing the best you can with code that has some parts that were written many years ago. Refactoring everything to not use globals would have been a thankless task and this new Synchronize routine seems like the best option (and was what I was grasping at before). This threading stuff will be useful if it works properly so I think it is worth persevering with for a bit.

Posted: Tue Oct 02, 2007 10:36 pm

Arde and Shalimar: I've thought about that. The compiler could certainly flag which scripts use the various #WAIT commands and then execute them in threads. But one of the other features that I'm trying to maintain is to use threads for the command line, so that you can enter a command that takes a while, and then still continue to enter other commands. In previous versions, either the UI would hang if you entered a loop, or the loop needed to call ProcessMessages, which had the bad side effect of adding layers of window message loops, causing lots of other problems.

Also, this would prevent Lua scripts from also taking advantage of zs.waitfor since there would be no good way to tell before the Lua script was executed if it should be a thread or not.

Finally, if I don't fix these problems, then anything that used a thread would be flaky and mostly useless anyway. It would just be hiding the problems and confining them to #WAIT and #WAITFOR, which would just lead people towards recommending that those commands were avoided (in which case, why even have them).

Seb The handling of cursor adjustment is done within the screen handling routine on an operation by operation basis. For example, when you #GAG a line, it calls the Screen.DeleteLine routine. The DeleteLine routine loops through the list of cursors and adjusts them as needed based upon which line was deleted. Same for the InsertLine routine, etc. This was all written a long time ago when it was *very* important to properly track the cursor location, because the same screen routines were used in the old zMUD settings editor in read/write mode. CMUD no longer uses the screen routines in write mode (it uses Scintilla editor instead), but the code to properly update cursors is still there.

Anyway, I've got the basic code implemented now. And since the Trigger processing already used a cursor in the past to handle this kind of stuff, I just had to reuse that cursor in the thread variables. So in normal cases, it isn't using any more cursors than it did in previous versions. Only if you have a bunch of parallel threads does it have more cursors to update. So there isn't any performance decrease. It's looking like it's working pretty well now. I even came up with a tougher test:

SubAdmin Joined: 18 Nov 2001 Posts: 5182

Just a wild thought here. The only things that truly need to be thread safe are display controls. We can wait for buttons and status bard to update until all threads are paused or stopped. The problems seem to be entirely a matter of coodinating in the main and child windows. The things acting on that display do not really need to be synchonized, they just can't conflict with each other.

How about making every command that interacts with the display actually pause the thread and return control to a handler. The handler would have initially created the threads and be resonsible for all screen updates. As things return the paused/stopped state of the thread would be used to determine whether they are done, and the return values they give would let the handler know what modifications to make. The handler is then assembling all the display information and in the end passing the resulting block to your old scrolling routines.

The flow would be something like this:
Incoming text received
Check for triggers
Trigger activated on lines 1 and 3 of 5
Trigger on line 3 says #CW for whatever, line 3 modified & stopped
Trigger on line 1 does #SAY something else, line 2 inserted
Check for triggers on new line 2 of 6
State advanced for trigger on line 1, check for matches new condition on lines 2-6
Trigger activated for line 3 of 6
Triggers stopped
Pass 6 lines to scrolling.

More or less what I am suggesting is add a seperate buffer area between the interprettation of ANSI, MXP, etc and the actual display. Kind of a sandbox where one section of code is responsible for keeping all the threads coordinated, instead of making every thread responsible.

Posted: Tue Oct 02, 2007 11:11 pm

Well, for one thing, this would be a complete rewrite of the entire CMUD logic. So I'm not going to do something that extensive unless there isn't any other option.

Remember that it's the screen that does all of the ANSI and MXP parsing, and it's very important that the triggers only capture the processed text (unless they are ANSI color triggers). This has always been a basic fundamental way that zMUD and CMUD operate and it prevents the MUD coders from inserting control codes into the network stream to mess up triggers. Triggers always display on exactly the text that you *see*, making it easier to create the proper triggers. Advanced users who want to mess with ANSI codes can deal with the extra complexities of ANSI triggers.

The things that act on the display *do* need to be synchronized, because they are accessing a global resource (whether it's the screen output buffer, or the network input buffer, or some intermediate buffer). Also, the more buffers and layers that you add, the slower it gets. The reason ANSI and MXP are processed by the screen output routines is because a) some ANSI codes directly effect the screen (such as gotoxy, deleteline, etc) and because it's faster to have a single buffer and single loop that handles the various processing. Everytime you need to copy text to another buffer, you get a large slowdown.

Keep in mind that CMUD is essentially displaying text scrolling at something like 500 FPS (at least on my system). That's pretty darn fast, and it's really easy to add a layer that kills this performance.

Separating the screen buffer from some sort of internal ANSI/MXP buffer doesn't solve the problem. Just because it's not displayed on the screen doesn't mean it still doesn't need to be thread-safe and synchronized. It's still a global memory resource, and whenever multiple threads access a global resource, they have to be thread-safe.

Messing around with pausing threads is also very tricky. You need to use events to tell the main thread that a background thread has paused. And when the main thread is waiting for a background thread to pause, what does it do in the "wait" loop? If you allow it to process Windows messages (which you really must to prevent the program from hanging), then you end up with commands that execute within the wait loop which then create their own wait loop to wait for another thread to pause. It's really easy to end up with a really deep stack of wait loops that call other wait loops, which is similar to the problems created with the #WAIT command in previous versions. In fact, it was the triggers that were allowing Windows message processing in their wait loops that was causing some of the major problems in 2.04.

Finally, I think the bottom line is that CMUD just needs to be thread-safe. No amount of messing around will get around this issue. If CMUD is going to allow multiple parallel threads of execution, then it needs to be thread-safe. Otherwise I should just get rid of threads entirely. When I rewrote the parser and added the compiler, I did all of that with threads in mind. I'd *never* be able to add stuff like #WAITFOR to zMUD the way it's parser works. The database used in CMUD is also thread-safe, as are the network routines. So, all that is left is some of this old code that was ported from zMUD to execute some of the commands (like #CW). And using the new Synchronize method, I now have a way to move previously "global" variables into thread-safe storage. So I think it's a good solution.

Anyway, I appreciate the wild ideas, but that one is probably a bit too wild for what I need to be considering at this point. I'm not about to embark on a 6-month rewrite of CMUD at this point, just to discover the same or different issues when I'm done. I'd rather stick with fixing what is 90% working already.

Wizard Joined: 14 Aug 2004 Posts: 1269

Zugg, your cursor sounds similar to my array in functionality, but with a different data structure. Anyway, this all sounds very positive. Well done. Smile

While we are sort of on the subject of screen updates, why is it when a large block of text whizzes past in CMUD, like when pressing Escape during a screen freeze, (and in zMUD from time to time) do lines not get #SUBbed? (Will this be fixed as a side-effect?) Or is it because there is a buffer that when it gets full is just dumped to the screen without processing triggers? (It might just be prompt triggers that are affected - not sure.)

Posted: Wed Oct 03, 2007 12:27 am

Zugg, it sounds like you made really solid progress and I'm looking forward to getting my hands on 2.05.

"There will feasting, and mudding and leveling of avatars." Book of MUD

Posted: Wed Oct 03, 2007 4:16 am

Seb: If it's only a problem in v2.00-2.04 then it's probably a side effect of all these thread problems. Also, if the message processing gets stuck, then the screen can stop updating (which is another problem I'm still looking into). I haven't seen this in zMUD myself.

There isn't any part of zMUD or CMUD that updates the screen without processing triggers. When text is added to the screen buffer, when it encounters a #13, it literally takes the line from the screen and passes it to the trigger processor. If the trigger fires, the #SUB command is directly modifying the screen buffer itself. So either the trigger doesn't fire, or it has lost it's cursor position and isn't modifying the correct location.

Prompt triggers are completely different. Remember that in past versions, the prompt triggers would only fire at the end of a packet. Only in v2.03 (I think) and later does it also fire on "lines" that end in IAC EOR or IAC GA. But without these markers, there is no way to tell if a line is a prompt or not. And there are all sorts of ways the network might be combining packets so that it misses a prompt trigger without these markers. So maybe that's the reason it misses some prompts.

But normal triggers are fired on every #13 character (or #10 character if LF is being used as the newline). And the only way to "miss" a trigger is if your trigger processor goes into an infinite loop and never returns. Like if you have an infinite #WAIT or something, or triggers get disabled.

Another problem that could happen in zMUD is also related to loops (and #WAIT) when used outside of the #PRIORITY command. In those cases, zMUD calls the Windows "ProcessMessages" routine within the loop to prevent zMUD from hanging during a long loop. Now, if this loop happens within a trigger that has the Trigger on Trigger flag turned off, then triggers are essentially disabled (it was a global flag in zMUD) for the duration of that trigger. The "ProcessMessages" call within the loop could allow a line of text to be received from the MUD, and triggers wouldn't be run on that line because they are disabled. This was one of the bad parts of the message-driven zMUD method of doing stuff like loops. With the CMUD threaded method, this doesn't happen. And now that the messaging bug is fixed in v2.05, nothing can interrupt a trigger like this and there is no way to receive a line of text while a trigger is running, unless you suspend the thread via #WAIT, #WAITFOR, etc. Normal loops will not be interrupted and will block incoming text until the loop is finished.

Gee, that was probably a longer answer than you wanted.