|
Arde Enchanter
Joined: 09 Sep 2007 Posts: 605
|
Posted: Tue Sep 16, 2008 9:04 pm
[2.36] Threading troubles - need help (#SECTION bug?) |
I want to ask for some help with my threading problems while Zugg is unable to respond on this having his hands full with the mapper. I know that it's all too complicated
I have a thread with internal queue (local string list) and an alias that adds elements to intermediate public queue. Once per specified amount of time the thread checks if there are any new elements in the public queue and if they are, moves them to internal queue.
ComLine -> Enqueue alias -> #SECTION{Public queue} <- THREAD ->(local queue)
Everything is cool while I'm calling enqueuing alias not so many times per second (about 3-4 commands per second), so basically my system works. But under some specific stress test conditions sometimes CMUD makes weird things. It can, for example, terminate running thread or, as I saw in my primary package, stop recognize existing aliases as if they were out of scope (ultimately this leads to a crash on CMUD close with messed up pointers). So I can't say that my script is working 100%, but I can't find any flaws in it as well. May be I just missed something?
At the end of this post you can see my threading script from which I've carved up all the unneccessary "meat" and to which I've add several tracing #PRINT statements. I've already check that with this shortened test script I see all the same symptoms I saw in its full version, so I guess that problem was isolated. Can you see any obvious errors in the threading algorithm, in the #SECTION code blocks? If you can't see any serious flaws, then I suspect that there is a problem with #SECTION command.
Usage:
run - start the "WorkingThread" thread
stop - stop the "WorkingThread" thread
enqueue X - this alias adds an integer with value X to the intermediate public queue (X must be >=0).
Stress test: make sure that you do not have "clear the command line" option enabled. Enter at the command line something like enqueue 5. Now, press your Enter key rapidly, 10 or more times per second - get your main thread busy! Do not hold the Enter key in pressed state, don't do #LOOP 15 {enqueue 5} - these cases seems work properly. MAY BE this all is system dependent things and on your more powerful computers you will not see anything - I really don't know. On my machine sooner or later I'll see that "WorkingThread" thread has stopped.
Analyzing logs
This I saw in the script debugger:
Code: |
10:46:17.453p | c untitled | [52] untitled Comline : start :
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | n untitled | Exec Alias "Enqueue"
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | d untitled | [52] untitled Comline : wait for section: : SectQueueAccess
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.468p | k untitled | [WorkingThread] Var "PendingInts" changed from "0|0|0|0|0|0" to ""
10:46:17.468p | k untitled | [52] Var "PendingInts" changed from "" to "0"
10:46:17.468p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0|0
10:46:17.468p | d untitled | [52] untitled Comline : terminated
10:46:17.468p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:17.562p | ---
10:46:17.562p | c untitled | [53] untitled Comline : start :
10:46:17.562p | n untitled | Exec Alias "Enqueue"
10:46:17.562p | k untitled | [53] Var "PendingInts" changed from "0" to "0|0"
10:46:17.562p | d untitled | [53] untitled Comline : stopped
10:46:17.656p | ---
10:46:17.656p | c untitled | [53] untitled Comline : start :
10:46:17.656p | n untitled | Exec Alias "Enqueue"
10:46:17.656p | k untitled | [53] Var "PendingInts" changed from "0|0" to "0|0|0"
10:46:17.656p | d untitled | [53] untitled Comline : stopped
10:46:17.968p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:17.968p | a untitled |PUBLIC queue: 0|0|0
10:46:17.984p | a untitled |Moving value from public queue to internal: 0
10:46:17.984p | a untitled |Moving value from public queue to internal: 0
10:46:17.984p | a untitled |Moving value from public queue to internal: 0
10:46:17.984p | k untitled | [WorkingThread] Var "PendingInts" changed from "0|0|0" to ""
10:46:17.984p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0|0|0|0
10:46:17.984p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:18.500p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:18.500p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0|0|0
10:46:18.500p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:19.000p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:19.015p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0|0
10:46:19.015p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:19.515p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:19.515p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0
10:46:19.515p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:20.031p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:20.031p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0
10:46:20.031p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:20.531p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:20.531p | a untitled |INTERNAL queue: 0|0|0|0|0|0
10:46:20.546p | a untitled |====================================================== End of loop. Wait for 500 msec
10:46:20.640p | ---
10:46:20.640p | c untitled | [53] untitled Comline : start :
10:46:20.640p | n untitled | Exec Alias "Enqueue"
10:46:20.640p | k untitled | [53] Var "PendingInts" changed from "" to "15"
10:46:20.640p | d untitled | [53] untitled Comline : stopped
10:46:20.718p | ---
10:46:20.718p | c untitled | [53] untitled Comline : start :
10:46:20.718p | n untitled | Exec Alias "Enqueue"
10:46:20.734p | k untitled | [53] Var "PendingInts" changed from "15" to "15|15"
10:46:20.734p | d untitled | [53] untitled Comline : stopped
10:46:20.843p | ---
10:46:20.843p | c untitled | [53] untitled Comline : start :
10:46:20.843p | n untitled | Exec Alias "Enqueue"
10:46:20.843p | k untitled | [53] Var "PendingInts" changed from "15|15" to "15|15|15"
10:46:20.843p | d untitled | [53] untitled Comline : stopped
10:46:20.953p | ---
10:46:20.953p | c untitled | [53] untitled Comline : start :
10:46:20.953p | n untitled | Exec Alias "Enqueue"
10:46:20.953p | k untitled | [53] Var "PendingInts" changed from "15|15|15" to "15|15|15|15"
10:46:20.953p | d untitled | [53] untitled Comline : stopped
10:46:21.062p | ---
10:46:21.062p | c untitled | [53] untitled Comline : start :
10:46:21.062p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:21.062p | a untitled |PUBLIC queue: 15|15|15|15
10:46:21.078p | d untitled | [53] untitled Comline : wait for section: : SectQueueAccess
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | k untitled | [WorkingThread] Var "PendingInts" changed from "15|15|15|15" to ""
10:46:21.093p | k untitled | [53] Var "PendingInts" changed from "" to "15"
10:46:21.093p | d untitled | [53] untitled Comline : terminated
10:46:21.093p | d untitled | [WorkingThread] untitled Comline : terminated |
As you can see, there is [WorkingThread] untitled Comline : terminated line at the end, although I did not tell CMUD terminate the thread.
The problem is here:
10:46:21.062p | c untitled | [53] untitled Comline : start :
10:46:21.062p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:21.062p | a untitled |PUBLIC queue: 15|15|15|15
10:46:21.078p | d untitled | [53] untitled Comline : wait for section: : SectQueueAccess
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | a untitled |Moving value from public queue to internal: 15
10:46:21.078p | k untitled | [WorkingThread] Var "PendingInts" changed from "15|15|15|15" to ""
10:46:21.093p | k untitled | [53] Var "PendingInts" changed from "" to "15"
10:46:21.093p | d untitled | [53] untitled Comline : terminated
10:46:21.093p | d untitled | [WorkingThread] untitled Comline : terminated
Although Zugg has told several months ago that it is normal when the main thread reusing another thread from the pool, I don't like that command line thread oftenly gets terminated as well:
10:46:17.453p | d untitled | [52] untitled Comline : wait for section: : SectQueueAccess
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.468p | k untitled | [WorkingThread] Var "PendingInts" changed from "0|0|0|0|0|0" to ""
10:46:17.468p | k untitled | [52] Var "PendingInts" changed from "" to "0"
10:46:17.468p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0|0
10:46:17.468p | d untitled | [52] untitled Comline : terminated
This how the log SHOULD looks like (please, ignore another integer values, only threads start/stopped/resumed flags matters):
Code: |
10:46:17.656p | c untitled | [53] untitled Comline : start :
10:46:17.656p | n untitled | Exec Alias "Enqueue"
10:46:17.656p | k untitled | [53] Var "PendingInts" changed from "0|0" to "0|0|0"
10:46:17.656p | d untitled | [53] untitled Comline : stopped
10:46:17.968p | e untitled | [WorkingThread] untitled Comline : resumed
10:46:17.968p | a untitled |PUBLIC queue: 0|0|0
|
You can see that here command line thread was properly stopped and the WorkingThread thread was properly resumed
Testing script:
Code: |
<class name="ThreadingTest" initenable="true" id="1">
<notes>ThreadingTest</notes>
<alias name="theThread" id="2">
<value><![CDATA[//CHECK IF THE WorkingThread THREAD IS ALREADY RUNNING. IF IT IS, THEN EXIT
#IF (%threadname("WorkingThread"))
{
#EXIT
}
// ==========================================================
// NEW THREAD - LAUNCHING
// Force CMUD to run this script in another thread and give it the name
#WAIT
#CALL %threadname(,"WorkingThread")
// ==========================================================
//LOCAL VARIABLES
$counter = 0 // Counter for #FORALL loops (we'll need index values later)
$position = -1 // Position of the first integer to remove from the queue
$tmpList = "" // Temporary string list for queue modification
$queueInts = "" // Queue of timings
$cachedInt = 0 // cached value for current integer
//THREAD - MAIN LOOP
#WHILE (@fStop == 0)
{
// First, check if we have any new integers in the public queue
#SECTION SectQueueAccess
{
#IF (%numitems(@PendingInts) > 0)
{
// We have. Add them to our internal queue
#PRINT %concat("PUBLIC queue: ", @PendingInts)
#FORALL @PendingInts {
#PRINT %concat("Moving value from public queue to internal: ", %i)
$queueInts = %additem(%i, $queueInts)
}
// Clear public queue
PendingInts = ""
}
}
//We have a queue of integers and will substract 1 from each integer each
//thread loop. When an integer becomes 0, it can be removed from the queue.
// ==========================================================
// CHECKING INT VALUES - IGNORE THIS BLOCK - IT WORKS NORMALLY
// Do we have any integers in our internal queue?
#IF (%numitems($queueInts) > 0)
{
// Yes, we do have some.
$tmpList = ""
$counter = 0
$position = -1
//Iterate throuh the list
#FORALL $queueInts {
//Remember current loop number for later use
$counter = $counter + 1
//Cache current EDT value for speed
$cachedInt = %int(%i)
#SWITCH ($cachedInt > 0)
{
//Int value is greater then 0. Decrease its value by 1
$tmpList = %additem(($cachedInt - 1), $tmpList)
}
($cachedInt == 0)
{
//Int value is equal to 0.
$tmpList = %additem($cachedInt, $tmpList)
#IF ($position == -1)
{
//Remember its position
$position = $counter
}
}
{#PRINT %concat("$cachedInt has unexpected value ",$cachedInt)}
}
//Move data from the temporary $tmpList queue to the $queueInts
$queueInts = $tmpList
$tmpList = ""
#PRINT %concat("INTERNAL queue: ", $queueInts)
//Check if there were some integers==0 in the previous FORALL loop?
#IF ($position <> -1)
{
// remove this entry from internal queue
#DELNITEM $queueInts $position
}
}
// ==========================================================
// DONE WITH THE QUEUE
#PRINT "====================================================== End of loop. Wait for 500 msec"
#WAIT 500
}
#PRINT "Stopping the thread."]]></value>
</alias>
<var name="fStop" type="Integer" usedef="true" id="3">
<value>0</value>
<default>0</default>
<notes>=1 if the thread must exit now</notes>
</var>
<var name="PendingInts" usedef="true" id="4">
<notes>Public queue of integers. Values are stored here until thread moves them to its internal queue</notes>
</var>
<alias name="Enqueue" id="5">
<value>//Check parameters
#IF ($value == "")
{
#PRINT "Parameter was not specified, set to 0"
$value = 0
}
// Is parameter a number?
#IF (%regex($value,"^\d+$"))
{
#IF ($value < 0)
{
$value = 0
#PRINT "Parameter must be >0, set to 0"
}
// Accessing public queue
#SECTION SectQueueAccess
{
PendingInts = %additem(%string($value), @PendingInts)
}
}
{
#PRINT "Parameter is not a number"
}</value>
<arglist>$value</arglist>
</alias>
<alias name="run" id="6">
<value>fStop=0
theThread</value>
</alias>
<alias name="stop" id="7">
<value>fStop=1</value>
</alias>
</class>
|
Edit: thread header renamed |
|
Last edited by Arde on Wed Sep 17, 2008 7:52 am; edited 2 times in total |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Tue Sep 16, 2008 11:25 pm |
I see a few problem lines in your debug log, and it looks to be a bug in CMud. This is definitely a very tight timing bug, but your procedure and script should make it replicatable. For ease of replication I would suggest adjusting the #WAIT 500 down to #WAIT 50. This looks to be problem spot
Code: |
10:46:21.062p | c untitled | [53] untitled Comline : start :
10:46:21.062p | e untitled | [WorkingThread] untitled Comline : resumed |
You will notice that the time stamps show the exact same moment. Looking at the time stamps for the WorkingThread's last going to sleep message it appears to have awakened on time. I am going to guess the very close timing of thread wake, and command line entry messages cause something to get crossed and makes the WorkingThread be considered part of the command line thread. Notice the indentation in those 2 debugger lines I pointed out. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
Arde Enchanter
Joined: 09 Sep 2007 Posts: 605
|
Posted: Wed Sep 17, 2008 7:56 am |
It appears that every time a thread waits when another thread will leave a #SECTION code block, one of these threads gets terminated at the end.
10:46:17.453p | c untitled | [52] untitled Comline : start :
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | n untitled | Exec Alias "Enqueue"
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | d untitled | [52] untitled Comline : wait for section: : SectQueueAccess
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.453p | a untitled |Moving value from public queue to internal: 0
10:46:17.468p | k untitled | [WorkingThread] Var "PendingInts" changed from "0|0|0|0|0|0" to ""
10:46:17.468p | k untitled | [52] Var "PendingInts" changed from "" to "0"
10:46:17.468p | a untitled |INTERNAL queue: 0|0|0|0|0|0|0|0|0
10:46:17.468p | d untitled | [52] untitled Comline : terminated
And while CMUD itself can run another thread for handling the command line instead of terminated one, the working thread can't be restarted automatically, making this error visible. In other words, when such things happens for command line thread, CMUD hides this error by launching another thread, but it can't do that for user threads. In my debug log posted above the main thread ID had get to 53 for only about 2 minutes!
So again, I think that there is a problem with the #SECTION command. I've asked Zugg about the #SECTION and he told me that it is just a wrapper for WinAPI calls. I'll hope that it is a bug in the wrapper, otherwise I'm not sure is this problem fixable at all.
Vijilante:
I have absolutely forgotten that I can reduce the number of milliseconds to wait. Thanks for the tip! |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Wed Sep 17, 2008 11:24 am |
Everything looks right in you most recent post.
I don't think the problem is with #SECTION directly. I don't really have time to explain it right now, but below is a test that isolates just the #SECTION command.
Code: |
#THREAD test {#SECTION a {#WHILE (%section(a)=1) {};#SHOW Releasing section a %section(a)}}
#SECTION a {#SHOW %section(a)} |
Code: |
0.0027 | c untitled | [5] untitled Comline : start :
0.0035 | c untitled | [test] : start : Compiled #THREAD command
0.0020 | d untitled | [5] untitled Comline : stopped
14.808 | ---
0.0013 | c untitled | [5] untitled Comline : start :
0.0005 | d untitled | [5] untitled Comline : wait for section: : a
0.0038 | a untitled |Releasing section a 2
0.0024 | d untitled | [test] : terminated
0.0026 | a untitled |1
0.0027 | d untitled | [5] untitled Comline : terminated |
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|