|
Zugg |
Posted: Thu Jul 05, 2007 10:50 pm
New CMUD Feature: Sequential Scripting Threads! |
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Sat Jul 07, 2007 3:52 am |
OK, looks like I'm past my headaches. The thread system is working well now. I've got the #WAITTHREAD working and have tested two triggers going back and forth. Here is my test:
Code: |
#trigger thread1 {
#thread thread1
#show "begin 1"
#show "thread2"
#show "back in 1"
#waitthread thread2 susp
#show "back in 1 again"
#waitthread thread2 susp
#show "end 1"
}
#trigger thread2 {
#thread thread2
#show "begin 2"
#waitthread thread1 susp
#show "back in 2"
#waitthread thread1 susp
#show "end 2"
} |
For bonus points, see if you can figure out what the proper output should be when the "thread1" trigger fires. If your head hurts as much as mine does, then just look at this answer:
#show thread1
gives the following output:
Code: |
thread1
begin 1
thread2
begin 2
back in 1
back in 2
back in 1 again
end 2
end 1 |
If you stare at this for a while, you'll see that it is correct. You are probably asking for trouble if you really try to do something this complicated in your scripts, but at least the power is there if you really need it. Just keep in mind that you are on your own to synchronize your threads and deal with the headaches. CMUD tries to handle the simple cases, but I'm sure there are complex cases where you could end up with a dead-lock. If you get a thread dead-lock, then CMUD is going to hang and you'll have to use the Windows Task Manager to end the process. But so far I haven't been able to hang it tonight.
I've also got the #SIGNAL and #WAITSIGNAL stuff working. CMUD keeps a count of the number of threads that are waiting on a signal. A signal is like a boolean flag...it has a value of "On" or "Off". When you use the #SIGNAL command, the signal is turned on. When the proper number of threads have resumed, then the signal is automatically turned off. You can also turn it off manually using the command "#SIGNAL name 0" but I don't recommend messing with that. If you turn the signal on before any thread is waiting on it, then the first thread that issues a #WAITSIGNAL will immediately resume and the signal will be turned off. This works pretty much as expected and allows for some simple synchronization of parallel threads. But again, it's possible to create some pretty complex scripts that might get you in trouble.
What I decided to do with the local pattern variables from the "#WAITFOR pattern" command is to kludge it a bit. As suggested, the compiler will not complain about undefined local variables after a #WAITFOR command. When you reference a local variable that the compiler doesn't know about, the compiler generates some new code that references the local variable by name instead of by number. At runtime, the #WAITFOR command stuffs the subpatterns into new local variables, and when it executes the code to get the local variable by name, it retrieves the correct value.
This has the side effect of also increasing the number of %1..%99 parameters. For example, consider this:
Code: |
#trigger {You get (%d) coins.} {
#show %1 is the number of coins.
#show %2 is empty
#waitfor {You get (%d) exp.}
#show %1 is still the number of coins.
#show %2 is now the amount of exp
} |
I don't recommend relying upon this numbering. It's better to use named patterns, like this:
Code: |
#trigger {You get ($coins:%d) coins.} {
#show $coins is the number of coins.
#waitfor {You get ($exp:%d) exp.}
#show $coins is still the number of coins.
#show $exp is now the amount of exp
} |
Since this is more readable and less likely to break with future parser changes.
OK, that's about it for this week. This probably should have been a blog entry, but oh well. Next week I'm adding zScript Functions, and then getting deep into the Lua implementation (I've only toyed with Lua so far this week). It's a lot of fun adding these cool new features! |
|
|
|
oldguy2 Wizard
Joined: 17 Jun 2006 Posts: 1201
|
Posted: Sat Jul 07, 2007 5:45 am |
Very nice. I've always wanted something like #waitfor. I'm looking forward to it.
|
|
|
|
Nezic Apprentice
Joined: 10 Oct 2000 Posts: 119 Location: Colorado
|
Posted: Sat Jul 07, 2007 6:13 am |
If there is a possibility of deadlocks in scripting threads, is it possible to add the ability for CMud to display existing threads? Basically a list that shows all the running script threads, from which the user could select individual threads to kill manually if needed. If it's possible to track what any given thread is waiting for, you could even add a check for dead-locks (maybe only run on user request, or infrequently if it takes a lot of time to run) so that the user doesn't have to eyeball the list to pick out which threads are waiting on each other -- tricky if there are more than two in the circle.
I don't know if this is feasible or not, but I thought I'd throw the idea out there. (I bet it'd be useful during the development of complicated scripts too.) |
|
|
|
Taz GURU
Joined: 28 Sep 2000 Posts: 1395 Location: United Kingdom
|
Posted: Sat Jul 07, 2007 9:43 pm |
So I read your test and I can finally agree with the output!!
I think my brain is finally giving up on me because it took me about 10 minutes to sort out what is happening in that test.
The tricky bit is trying to figure out exactly what is being processed at the point that #show "thread2" happens because at that point thread1 is still running and is going to continue to run until it gets suspended. It makes you wonder for a while why "back in 1" doesn't show up before "begin 2". |
|
_________________ Taz :) |
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Sat Jul 07, 2007 10:14 pm |
It'd be nice if that example could be given with the #waitthread command in the help. While it takes a fair bit of head-bending to understand, it's a very good example of how #waitthread works. I can't think of a simpler way to explain it.
|
|
|
|
gmueller Apprentice
Joined: 06 Apr 2004 Posts: 173
|
Posted: Sun Jul 08, 2007 5:27 am |
instead of making regex vs non regex why not think more general than you are thinging of and do something like this instead:
#WAITFOR @function_result()
where function_result is a function.
inside of @function_result you could test %line, against the %regex() or %match() expression of your choice.
Once the expression returns true it returns.
EDIT: {} could in this case refer to the "default" function of %match(%line,"whatever you typed")
so that #WAITFOR {blah} will be the same as: #WAITFOR @function()
#FUNCTION function {
%match(%line, blah)
} |
|
|
|
Hamstro Newbie
Joined: 03 Jan 2006 Posts: 9
|
Posted: Sun Jul 08, 2007 10:19 pm |
What great news! Zugg, this is an addition to CMUD that can potentially sell copies. Up until now I've not had much interest in changing from ZMUD to CMUD but this feature has changed my mind.
|
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Sun Jul 08, 2007 10:30 pm |
You're going to have to do so much updating to the CMUD section of the website for 2.0, Zugg... so many new features!
|
|
|
|
Zhiroc Adept
Joined: 04 Feb 2005 Posts: 246
|
Posted: Mon Jul 09, 2007 3:47 am |
As I look this over... these are not multiprocessing threads, but more like co-routines? In other words, only one thread is ever running in the CMUD interpreter, and only stops running if it ends or gets suspended?
|
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Mon Jul 09, 2007 7:03 am |
Sort of. Signals can make threads run simultaneously, but that's all.
|
|
|
|
Taz GURU
Joined: 28 Sep 2000 Posts: 1395 Location: United Kingdom
|
Posted: Mon Jul 09, 2007 1:06 pm |
Zhiroc wrote: |
As I look this over... these are not multiprocessing threads, but more like co-routines? In other words, only one thread is ever running in the CMUD interpreter, and only stops running if it ends or gets suspended? |
This is not what I understood from the very first post in this topic.
Zugg wrote: |
Everytime you execute a script in the new 2.0 version of CMUD, a background thread is created. This background thread can be stopped for any reason, such as waiting for a pattern from the MUD, or waiting for a timer to expire. While this background thread is paused, the rest of CMUD runs normally as expected. |
Which is what lead me to say what I did.
Taz wrote: |
The tricky bit is trying to figure out exactly what is being processed at the point that #show "thread2" happens because at that point thread1 is still running and is going to continue to run until it gets suspended. It makes you wonder for a while why "back in 1" doesn't show up before "begin 2". |
My belief is that multiple threads run at the same time. Zugg correct me if I'm wrong. |
|
_________________ Taz :) |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Jul 09, 2007 6:53 pm |
Quote: |
As I look this over... these are not multiprocessing threads, but more like co-routines? In other words, only one thread is ever running in the CMUD interpreter, and only stops running if it ends or gets suspended? |
No, they are real threads. Windows itself can only run one thread at a time too. Windows time-slices between threads. The trick is that you need to do something in the new CMUD to tell it to start time-slicing. Normal CMUD operation only has a single thread, and I'm trying to keep compatibility with that. But, as soon as you use something as simple as "#WAIT 0", that tells CMUD to allow additional threads to start running. Essentially, "#WAIT 0" suspends the current thread for 0ms, allows other threads to start and run, and then immediately resumes. So this is real threading, not just co-routines. You can prove it in the new version by running a system tool that shows you all of the threads running on your system.
The reason that "begin 2" is displayed before "back in 1" is that CMUD processes triggers immediately, whenever a new line of text is displayed on the screen. Background threads will continue to run, but in this case, the "#SHOW thread2" is being run within a thread, and that thread will pause while any triggers on the text are executed. It's only when the thread2 suspends itself with the #WAITTHREAD call, that it can return to continue running the thread1 code.
This is really how you'd want it to work. If thread1 continued without waiting for the thread2 trigger, then we'd be back in the old days where nothing was synchronized again, and we all know how badly that worked. People expect their triggers to fire immediately, and not in parallel with some background threading model. So the trick is to provide the advantages of threads for those who want to use them, without breaking the normal way in which triggers are executed in priority order.
gmueller: No, it's a big change between allowing *any* function and just allowing a pattern. The reason for this is that the current #WAITFOR ties into the trigger system. Before CMUD loops through your existing triggers, it checks for any threads that are waiting for a pattern. If #WAITFOR allowed you to use any expression, then it would need to tie into the expression/variable update system, which is a completely different result.
Also, if you want to do something like that, it's easy to just do this:
Code: |
#UNTIL (@expression) {#WAIT 10} |
Since #WAIT works within loops now, that will essentially wait until an expression is true, but allows you to control the polling interval (10ms in this case) so that it doesn't take 100% of your CPU while waiting. It's possible I'll add a short-cut command for this, but I want to keep the #WAITFOR command tied into the trigger system, since that's the main simple function that a lot of people have asked for over the years. |
|
|
|
Zhiroc Adept
Joined: 04 Feb 2005 Posts: 246
|
Posted: Mon Jul 09, 2007 11:52 pm |
If there is true concurrency, then without a mutex or at least an atomic test and set operator, I don't think you can develop a reliable thread synchronization method from user code, which makes modifying non-local variables iffy at best...
|
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 10, 2007 12:23 am |
Zhiroc: You can speculate all you want, but there *is* true concurrency and you don't need mutex in your CMUD scripting code. CMUD is handling that internally for you. All accesses to the user interface and the underlying database are synchronized via critical sections. That's the whole point of developing CMUD so that it's threadsafe. I've written two threads that access the same non-local variable, and it works fine. Of course, if both threads *store* different values to the same variable, then the value of that variable is undefined since you don't know which thread will save the final value. But it doesn't crash or anything like that because the database access is synched properly.
Yes, this was tricky code to write, but you make it sound like it should be impossible, and it's far from impossible. It just takes some careful coding. |
|
|
|
Zhiroc Adept
Joined: 04 Feb 2005 Posts: 246
|
Posted: Tue Jul 10, 2007 3:15 am |
Zugg wrote: |
Of course, if both threads *store* different values to the same variable, then the value of that variable is undefined since you don't know which thread will save the final value. But it doesn't crash or anything like that because the database access is synched properly. |
That's what I meant though... a user algorithm can't be threadsafe. I expect that the engine wouldn't crash, but code as simple as counting in two threads (#VAR count {@count + 1}) can't be relied on to count as correctly as two threads could simultaneously read count as the same value, add one, and store the same value. Thus, you miss one.
Or something like #IF (%ismember()} {do something; #DELITEM} can have something remove the item from the list between the test and the code.
Or #IF (@thread_not_running) {#VAR thread_not_running 0; #VAR arg x; #RESUME thread} can change arg in an already-running thread.
Without being able to define critical regions in user code, I can't imagine writing multithreaded code that won't break eventually if it uses any non-local variables. It might work "most of the time" but you'll have race conditions. And debugging race conditions is my worst nightmare.
Actually, from 10+ yrs of writing MT code, odds are I'll still have race conditions due to my own locking errors, but that's my fault
I believe at the minimum to do your own critical section, you need a function like %lock(x) where x is the name of a variable (so xxx not @xxx as the %push and %pop functions allow). This returns 1 if @x is 0, and sets @x to 1, or 0 otherwise. And this is guaranteed in the interpreter to be atomic and not allow interference by other threads.
An easier mechanism would be something like #SYNC (@var) {code}. This just ensures in the engine that only one thread will execute the code among all the others that use the same @var. I think that's easier on novice thread programmers. It means less concurrency, but that's not as important for CMUD code. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 10, 2007 6:21 am |
Well, stuff to think about for the future, but I'm not going to worry about it right now. If people start relying upon setting the same data variable from multiple threads, then they deserve these headaches. You *can* do it with the #WAITTHREAD method, because that effectively puts one thread to sleep, while the other thread writes to the variable. See the example I gave above for #WAITTHREAD...each of these threads could write to the same variable without trouble, because they are not running concurrently. They are using #WAITTHREAD to synchronize and ensure that only one accesses the data at once, which is really the same thing that would happen with any sort of lock or sync command.
The main purpose of the multithreading in v2.0 was to allow sequential scripts using #WAITFOR, and to allow the #WAIT command to finally work. Getting into true synchronized threading that allows multiple threads to write to the same variables is beyond the scope of this update. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 10, 2007 6:14 pm |
Actually, after thinking about this more overnight, I think Zhiroc is correct and I need to add a simple "critical section" method. Instead of using #SYNC, and staying away from the Mutex term (I think that term gives people headaches :), I decided to name it #SECTION:
#SECTION name {code}
The idea is that this creates a "named section", and if you have more than one section with the same name, only one will execute at a time. Essentially, this is a critical section. By using a "name" instead of tying it to a specific variable, it allows you to name your sections however you want. You can name them for the variable that you are changing, or anything else.
This is currently just a quick and simple way to avoid problems with multiple threads that write to the same variables. It uses the existing RTL_CRITICAL_SECTION features in the Windows API, so it was easy to add and should work pretty well. |
|
|
|
Zhiroc Adept
Joined: 04 Feb 2005 Posts: 246
|
Posted: Tue Jul 10, 2007 6:59 pm |
Cool. One question is whether the name uses the package/module/class namespace or is just a string. It might not be critical but it might be nice if packages that might be written by others could isolate themselves from another using the same names.
|
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 10, 2007 7:54 pm |
Nope, it's a global namespace. This is because you might actually want to protect code across packages. If a package designer wants something that just works within the package, then I'd suggest using a naming scheme, such as PackageName_SectionName or something like that.
|
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 10, 2007 8:41 pm |
Btw, here is another set of examples that show how #SECTION can be useful. Remember in the above example we saw that "thread2" got executed immediately (the "begin 2" happened as soon as "thread2" was displayed). With a #SECTION you have a bit more control over this. Consider the following example:
Without sections:
Code: |
#trig {test1} {
#show test2
#show in section 1
#show end of 1
}
#trig {test2} {
#show in section 2
#show end of 2
} |
Output when #SHOW test1 is entered:
Code: |
test1
test2
in section 2
end of 2
in section 1
end of 1 |
Notice that the first trigger waits until the second trigger is finished before continuing. Now look at the case when using a #SECTION:
Code: |
#trig {test1} {
#section test {
#show test2
#show in section 1
}
#show end of 1
}
#trig {test2} {
#section test {#show in section 2}
#show end of 2
} |
Now, the output is:
Code: |
test1
test2
in section 1
in section 2
end of 1
end of 2 |
Notice that the first trigger got to finish displaying "in section 1" before the second trigger had a chance to run. This is because the "#SHOW test2" was executed within the critical section named "Test" and when the second trigger also tried to enter the same named section, it had to wait until the first trigger was done with the section.
Internally, the test2 trigger was suspended in this case, waiting for the section to be released. When the named section was released by the first test1 trigger, then the test2 trigger was resumed. Once test2 was resumed (and printed "in section 2"), then both test1 and test2 were running in parallel. In this case, test1 finished before test2, but in theory you can't count on which thread would finish first. In fact, if you keep sending "#SHOW test1" to the command line, you will see that sometimes "end of 1" is displayed first, and sometimes "end of 2" is displayed first. This shows that both threads are really running in parallel.
But the #section ensures that the "in section 1" is *always* displayed before the "in section 2".
You just need to be careful to keep your named section code short and fast. If you start doing something slow when multiple threads use the same named section, then you can get a bottleneck. In the worse case, if your named section never completes (like an infinite loop), then you can get a deadlock if other threads depend upon the section. And if a thread is locked waiting for a section that is never released, there is no way to terminate the locked thread (except by somehow terminating the thread that is holding the section lock).
Anyway, I'll try to include some of these examples in the documentation. But it should give a better understanding of how this all works. |
|
|
|
Zhiroc Adept
Joined: 04 Feb 2005 Posts: 246
|
Posted: Tue Jul 10, 2007 8:53 pm |
Life's never simple... just thought of a complication you'll have to deal with: nested #SECTIONs.
If they are allowed, then it might be that two #SECTIONs for the same name nest (usually because of something like two aliases, and one calling another, not because the user actually wrote it that way intentionally). If you don't handle this with reference counts or the like, then the nested one will block forever. Also, nesting is a bit tricky for users, since they have to obey a locking order. If one thread does: #SECTION a { #SECTION b {}} and another does #SECTION b { #SECTION a {}}, they will at some point deadlock.
If you don't allow it, then the error behavior has to be defined, and this could cause scripts to fail unexpectedly (especially if the name that was used by more than one package by coincidence).
It might be nice to have the name be optional, and that would mean some global, unnamed #SECTION.
And finally, I guess that a #SECTION would act like a #WAIT in that it will (or might) allow other threads to run? So would exectuting one in a trigger always allow the next input line to be triggered on by another thread, or only if the #SECTION blocks? |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 10, 2007 11:32 pm |
The Critical section support in Windows already takes care of this. It has a reference count for the number of times a section is entered within the same thread. As long as there is the same number of EnterSection and LeaveSection calls, then there isn't any problem. A thread cannot deadlock itself with this. Take a look at:
http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx
for more information on how the RTL_CRITICAL_SECTION works in Windows.
Yes, users need to be careful when nesting in different locking orders. But this isn't anything I can worry about. Anyone playing with threads and synchronization always has to worry about this kind of stuff, no matter what programming language you are using.
I thought about making the name optional, but I think that leads to lazy programming. I want people using sections to *think* about what they are doing.
And finally, yes, a #section acts like a #wait in some cases. If the thread is suspended because it's waiting on the lock, then other threads can run. But this only happens if the thread gets suspended. If there isn't any lock, then execution proceeds as if the section was not there. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Wed Jul 11, 2007 3:58 pm |
Tarn: I think that's what we already been talking about...you *can* have multiple threads running at the same time. In the above example, the triggers "test1" and "test2" are running at the same time, which is why sometimes you get "end of 1" before "end of 2", and sometimes you get "end of 2" before "end of 1". (Hmm, looks like Tarn deleted his post while I was replying ;)
In fact, I thought of something else last night...in all of the above examples, I have used "#SHOW whatever" to fire another trigger to get a second thread started. This is because triggers run in their own threads. But you don't want to use #SHOW just to start another thread...that's a kludge. You could use #FIRE or #RAISE to fire a trigger or raise an event, but what if you just wanted to run a particular Alias within a new thread?
So, I've extended the syntax of the #THREAD command a bit more. Here is the full syntax now:
Code: |
#THREAD ; displays a list of running threads
#THREAD Name ; sets the name of the current thread to "Name"
#THREAD {code} ; runs code in a new background thread
#THREAD Name {code} ; runs code in a new background thread, and give the thread the name of "Name" |
With this syntax, you can easily run any code, including an alias, as a new background thread. For example:
#THREAD {aliasname}
will run the specified Alias in a new thread. This spawns a new thread, but immediately returns control to the calling thread. Remember in our previous examples the using #SHOW to fire a trigger caused the first thread to wait until the trigger was suspended? This was the case where "begin 2" was displayed before "back in 1" because trigger1 was waiting for trigger2 to be suspended. This was needed to maintain compatibility with existing scripts.
With the #THREAD command you can get around this. If you put "#SHOW thread2" within a #THREAD code block, then thread2 will start running independently of thread1.
This also lets you easily spawn a background task to perform some time-intensive calculations. Just keep in mind that as soon as you start using parallel threads like this, you need to worry about synchronization and need to be careful modifying global variables that might be modified by other scripts that are also running.
"Along with great power comes great responsibility!" |
|
|
|
Nick Gammon Adept
Joined: 08 Jan 2001 Posts: 255 Location: Australia
|
Posted: Wed Jul 11, 2007 10:49 pm |
I just want to point out that Lua supports co-routines, which are co-operative multi-tasking. Unlike native threads, a Lua function (which is running as a co-routine) can yield control back to its caller, giving you the ability to, effectively, pause a script in the middle.
There are advantages and disadvantages over pure threads - the advantage is that threads can actually be running "in the background" (assuming that is a good thing). The disadvantage with threads is that you need to worry about simultaneous access to variables, and then deadlocks if you start locking things. Real-life scripts (and not just examples) may soon need to explore the concept of the "deadly embrace" where two threads each lock a resource that the other one wants.
Co-routines avoid the deadly embrace problem, as they yield at known points. You can still make scripts that do things like display something and wait for a reply, with suitable use of co-routines and triggers which detect that a co-routine is running, and resume it at the appropriate point. |
|
|
|
Tarn GURU
Joined: 10 Oct 2000 Posts: 873 Location: USA
|
Posted: Thu Jul 12, 2007 4:06 am |
Zugg wrote: |
Tarn: I think that's what we already been talking about...you *can* have multiple threads running at the same time. In the above example, the triggers "test1" and "test2" are running at the same time, which is why sometimes you get "end of 1" before "end of 2", and sometimes you get "end of 2" before "end of 1". (Hmm, looks like Tarn deleted his post while I was replying ;)
|
Yes, I saw the examples but wasn't quite making the connection. Nice new set of features. I expect the ability to pass control around to be more valuable to most scripters than the true multithreading.
One note on the unnamed blocks you wanted to avoid to force people to think about it: an unnamed block seems like a logical way to guarantee that nothing else is running (useful as a safe way for a package to interact with other packages).
You must have had some challenges along the way making all of this work on a stack-based VM.
-Tarn |
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|