|
Zugg |
Posted: Fri Jul 20, 2007 4:30 am
FASTER string lists and database variables in CMUD v2.0! |
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Jul 23, 2007 10:21 pm |
OK, I have some database variable benchmarks for you today.
Here are the test scripts:
Code: |
#ALIAS makedb {
list=""
$last=%secs
#loop 1000 {#addkey list %i %i}
#show (%secs-$last)
}
#ALIAS searchdb {
$last=%secs
#loop 1000 {$a=%iskey(@list,500)}
#show (%secs-$last)
} |
(yes, I know I should probably be doing a lookup of a random key, but I didn't want to complicate the timing with the %random functions. I tested several key values, and didn't get very different results).
CMUD v1.34: makedb = 35125ms, searchdb = 25400 ms
CMUD v2.00: makedb = 28700ms, searchdb = 90ms
if I add the #BEGINUPDATE and #ENDUPDATE statements for the list, then the makedb time drops to 312 ms. So, the results for "makedb" are about the same as "makelist"...a factor of 100 improvement when using BeginUpdate/EndUpdate. Otherwise only a minor speed improvement for creating lists. But searching a database variable gives a better result than with string lists. We get more than a factor of 200 speed improvement instead of just a factor of 100. Also notice that retrieving the value from a database variable key is actually faster than searching a non-sorted string list. But if the string list is sorted, then %ismember is about the same as %iskey.
In CMUD v2.0, you cannot control the order in which keys are displayed or stored within the database variables. In other words, if you do this:
Code: |
#ADDKEY list name zugg
#ADDKEY list level 10 |
then if you display the variable with #VAR list, you cannot control which order the fields are displayed. In this example, the "name" key might be first, or the "level" key might be first. It depends upon the low-level hash function as to which is first in the list. And since the hash function might change in future versions, you should *never* assume that you know the order in which the keys will be displayed.
If you use the #SORT command on the database variable, the #SHOWDB command will always display the keys in alphabetical order. The %expanddb function will also sort the keys before creating the string result. Also, the keys will appear in sorted order when you edit the variable in the settings editor. By default, the #SORT flag is enabled for new database variables. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 24, 2007 1:29 am |
It would be nice if I could someday figure out what causes my scripting languages to be so slow (compared to Lua). In theory, CMUD and Lua are very similar: both use an internal compiled byte-code language. Scripts are compiled on-the-fly and then the byte-code is executed.
In the tests that I gave above, I tested my method using a trivial loop:
Code: |
#ALIAS testloop {
list=""
$last=%secs
#loop 1000 {$a=1}
#show (%secs-$last)
} |
This executes with a time of 0 or 15, which is the smallest time CMUD can currently measure (it's just using the Windows tick counter for the %secs function). But this tells me that the loop overhead is very small. Using the "$a=%iskey(@list,500)" script, given above, to test the %iskey speed *should* be testing my raw hash table functions. But I created a similar set of Lua scripts, just to compare:
Code: |
#ALIAS luamakelist {
list={}
last=zs.secs
for i=1,1000 {list["a"..i]=i}
print(last-zs.secs)
}
#ALIAS luasearchlist {
last=zs.secs
for i=1,1000 {a=list["a500"]}
print(last-zs.secs)
} |
Both of these run in 0-15 ms, even if I increase the loop from 1000 to 10,000. How is Lua doing this *so* fast? I know it's been heavily optimized over the years, but somehow I think it must be *really* smart and is somehow not actually doing the loop, or is only doing the loop once because it doesn't see any dependence upon the loop variable. Or maybe tables in Lua really are that fast.
In any case, even though I've gotten some huge speed benefits for zScript, it looks like Lua will still be much much faster.
Makes me think that there is some really basic problem with my byte-code execution that is causing it to be slow somehow. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 24, 2007 1:54 am |
Before making the same mistake that OldGuy2 did, I realized that I should really be comparing the Lua time with the optimized release (non-debug) times for CMUD. So, I compiled the new version using my release build script, and here are the *real* timing comparisons between the 1.34 release and the 2.00 release:
String Lists:
v1.34 makelist: 578 ms
v2.00 makelist: 656 ms
v2.00 makelist: 47 ms (with beginupdate/endupdate)
v1.34 searchlist: 359 ms
v2.00 searchlist: 265 ms
v2.00 searchlist: 47 ms (after using #SORT list command)
Database variables:
v1.34 makedb: 1015 ms
v2.00 makedb: 4093 ms
v2.00 makedb: 63 (with beginupdate/endupdate)
v1.34 searchdb: 360 ms
v2.00 searchdb: 31 ms
This is on an AMD Sempron 3100+ (1.80 Ghz) system with 2.0GB of RAM.
I'm a little confused about the speed *decrease* in the "makedb" alias on v2.0. I'm not sure why the basic routine without beginupdate/endupdate is so much slower. It must be the overhead in adding values to the hash table, so I'll need to look a bit closer at this.
Anyway, looks like it's a factor of 10 improvement, and not a factor of 100. |
|
|
|
forren Novice
Joined: 26 Apr 2007 Posts: 44
|
Posted: Tue Jul 24, 2007 12:26 pm |
Do the data records in Cmud 2.00 support lookups in relatively constant time?
|
|
|
|
Llwethen Novice
Joined: 08 Dec 2006 Posts: 37 Location: Lancaster,Oh
|
Posted: Tue Jul 24, 2007 12:49 pm |
Zugg wrote: |
This executes with a time of 0 or 15, which is the smallest time CMUD can currently measure (it's just using the Windows tick counter for the %secs function). But this tells me that the loop overhead is very small. Using the "$a=%iskey(@list,500)" script, given above, to test the %iskey speed *should* be testing my raw hash table functions. But I created a similar set of Lua scripts, just to compare:
|
Have you considered using nanoTime that is supported by the newer processors since the Pentium 3 I believe. I know Java and C# both have commands to access it. I've found it's much more enlightening when trying to figure out how time is being used. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Tue Jul 24, 2007 4:28 pm |
Yes, there are several things I can do to implement better timing...but it's never been a big enough issue to worry about in CMUD. When I care about real time profiling, I have a Delphi profiler tool that I use for serious stuff. The above stuff is good enough for basic comparisons.
Forren: I'm not sure what you mean...I've already said that CMUD 2.0 is using a hash table for the database variables, so doesn't that already mean that lookups are done in relatively constant time?
Also, before anyone gets confused, none of these changes have anything to do with the "Database Module" that is in CMUD. This module still doesn't use SQL and still hasn't been rewritten. This is a big job and won't be done for several months. The above changes only pertain to using database variables within zScript, such as using #ADDKEY, %db, %iskey, etc. I haven't even tested the existing database module yet to see if it still works with the above changes to the hash table. |
|
|
|
forren Novice
Joined: 26 Apr 2007 Posts: 44
|
Posted: Tue Jul 24, 2007 5:38 pm |
Zugg wrote: |
Forren: I'm not sure what you mean...I've already said that CMUD 2.0 is using a hash table for the database variables, so doesn't that already mean that lookups are done in relatively constant time?
|
Yeah, I didn't notice that before - this is great news, will be a big boost of speed to my system. At certain points I change every value to 1 in a 150-enty data record. |
|
|
|
oldguy2 Wizard
Joined: 17 Jun 2006 Posts: 1201
|
Posted: Wed Jul 25, 2007 2:17 am |
Hehe :-) Leave it to me to screw something up. It is odd that the times actually increase on makelist and makedb without the beginupdate and endupdate.
If I am just adding items to an afflictions stringlist one at a time through an alias, would it even be beneficial to use beginupdate and endupdate and based upon this slight decrease in speed will my system run slower since this is how I track afflictions?
Before anyone jumps on me, I never said I was an expert. I'm just trying to understand. Thanks. |
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Wed Jul 25, 2007 2:25 am |
It'll probably be beneficial to use #nosave since you probably won't log off with any afflictions, but it seems like #beginupdate and #endupdate are for when you're doing a lot of updates all at once. It might benefit if you're adding more than one affliction with the same trigger or alias, but other than that it'll probably cause problems because you might try curing an affliction without #endupdating (see this post). Perhaps you could do something where you always #endupdate the variable before you try pulling a value from it to cure anything and you might see a speed benefit there, but it's quite a complex problem. It'd probably need testing to see how it'd work.
Regardless, if you're worried about the speed decreases because you use database variables, you might find it beneficial to switch to string lists. It's probably worth testing all this stuff with your particular scripts. |
|
|
|
oldguy2 Wizard
Joined: 17 Jun 2006 Posts: 1201
|
Posted: Wed Jul 25, 2007 2:54 am |
Well I pretty much figured it was for adding a large amount of updates, but wondered about in a case like you pointed out where it adds a few at once. Some triggers do add about 3 afflictions at once, but I add them to the afflictions stringlist with a simple afflict alias such as afflict a, afflict b, afflict c and so on. Then to cure them I use #switch and %ismember to run through the list and cure. I am sure there is a better way to do it. However, I am not expert and this seemed the easiest way to me. So basically in my case I wouldn't really get any benefit out of it. By the way, I don't use a database just a simple stringlist. I was going to use databases but they seemed to be much slower originally.
With the way some classes of characters are in the MUD I play they afflict at tremendous speed so any speed benefit I can find I try to use it. Thanks for the response Fang. |
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Wed Jul 25, 2007 3:12 am |
It depends how much you want to limit your usage of your afflictions stringlist I guess - if the ONLY time you're accessing it is when you cure something, you might see a benefit from #beginupdating the variable as soon as you log in and then using #endupdate before your #switch command goes through to find something to cure (and then it'll #beginupdate again). But this'll cause problems if you wanted to use the afflictions variable for anything else - I used various different real-time displays of the afflictions I had, either using buttons or a window and the #clr command, or just printing a list into the main MUD window every time the list changed. Stuff like that would need to keep #endupdating so often I doubt there'd be much benefit.
|
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Wed Jul 25, 2007 4:53 pm |
I think that I've shown that database variables are always going to be faster than string lists in the 2.0 version. The only time a string list can equal the speed of the database variable is when the string list is sorted. The extra time involved in adding keys to a database variable compared to a string list isn't that much in practice, and remember that adding an item to a sorted string list will also take a bit longer (since it needs to get inserted into the correct spot in the list instead of just added to the end).
Think about a string list as a special case of a database variable. In fact, in v2.0, database variables are now displayed in human readable format as a string list:
Code: |
#ADDKEY dbvar Name Zugg
#ADDKEY dbvar Race Dwarf
#ADDKEY dbvar Level 10
#SHOW @dbvar
-> displays: Level=10|Name=Zugg|Race=Dwarf |
I've gotten rid of the non-printable characters that were used in database variables in zMUD. Anyway, a database variable is like a sorted string list, except that it is sorted by hash value.
Fang is correct that using #beginupdate/#endupdate around a single #additem or #addkey isn't going to help. It won't slow anything down, but it won't speed things up unless you are making multiple changes to the variable. However, if you are adding 3 items to the afflictions list, then beginupdate/endupdate will definitely help.
#NOSAVE would help some, but I've found that the biggest speed improvement comes from not recalculating the string value of the list each time it is changed. I'm still thinking about a better way to handle this. But it's a real problem with backwards compatibility. All previous scripts expect a string list to be a string value of the form "a|b|c...". And this is how it is saved to the database. So, each time you modify the list, CMUD needs to compute the new string value. So even though it's stored internally as a hash table, each time you modify the list, CMUD has to traverse the entire table to construct the string value, in case you use @list somewhere in your script and expect the normal string value.
Recomputing this string value each time the list changes is where the real slowdown occurs. The #beginupdate/#endupdate prevents this recalculating...the string value is only recalculated at the end by the #endupdate call. The #nosave just prevents the list from being saved to the database, but still requires the string value to be recomputed.
Anyway, I'm going to try and come up with a way so that it only needs to recompute the string value as needed, like when you actually reference the @list value, or when the background database needs to be saved. This should really help, and should make #nosave even more useful since a string list won't need to be recalculated during the background database save if the #nosave flag is enabled. But this kind of advanced caching gets complicated pretty fast, and could cause all sorts of weird side effects, so I'm trying to be careful. |
|
|
|
Seb Wizard
Joined: 14 Aug 2004 Posts: 1269
|
Posted: Thu Jul 26, 2007 7:50 pm |
Sounds really cool Zugg! I had noticed that searching string lists was pretty slow in zMUD, and so hadn't made such heavy use of it, but now it opens up more possibilities! #NOSAVE, #BEGINUPDATE/#ENDUPDATE are going to really useful for the intense processing stuff too. Keep it up!
|
|
|
|
Seb Wizard
Joined: 14 Aug 2004 Posts: 1269
|
Posted: Thu Jul 26, 2007 8:11 pm |
Zugg wrote: |
Anyway, I'm going to try and come up with a way so that it only needs to recompute the string value as needed, like when you actually reference the @list value, or when the background database needs to be saved. |
I was going to suggest this! |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Thu Jul 26, 2007 10:40 pm |
OK, I think I've got the new cache working. It only recomputes the string for a stringlist or database record when it's needed. This gives a very large speed boost to string lists and database variables. Here are some numbers:
makelist (before cache): 16500 ms
makelist (cache added): 96 ms
makelist (#NOSAVE): 94 ms
makelist (#BEGINUPDATE): 94 ms
searchlist: 250 ms
searchlist (#SORT): 47 ms
makedb (before cache): 28700 ms
makedb (cache added): 188 ms
makedb (#NOSAVE): 187 ms
makedb (#BEGINUPDATE): 187 ms
searchdb: 94 ms
So now that the cache has been added, it doesn't really matter if you use #NOSAVE or #BEGINUPDATE/#ENDUPDATE. You will always get the speed improvement anyway. In theory, the #NOSAVE and #BEGINUPDATE *should* be slightly faster because they are not causing any database update. But you can see that the database update isn't really a big issue, at least in the above test.
It's possible that this new cache might have bugs that could cause CMUD to think that your stringlist or database variable is empty. I've implemented this cache at a very low level, so the change *should* be transparent to the rest of CMUD. But because it's a low-level change, there might be some weird side effect. So far it's looking pretty good though. Just something to keep an eye out for in the beta testing period. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Jul 27, 2007 5:06 pm |
OK, now that the internal cache is working better, I'm not sure there is any need for #BEGINUPDATE/#ENDUPDATE anymore. It no longer makes any difference with the string lists and database records. So I think I'm going to remove these commands from the 2.0 version that I'm working on.
I'll keep the #NOSAVE option. Turns out that #NOSAVE doesn't have much effect on "makelist" and "makedb" because of the internal cache...the new value of the string list or database record isn't saved until the background save thread needs it. But I found another script that shows the difference with #NOSAVE on normal variables. Here is the test script:
Code: |
#ALIAS testnosave {
a=0
$last=%secs
#loop 1000 {a=@a+1}
#show (%secs-$last)
} |
When I run "testnosave" normally, I get a time of 150ms. If I then do "#NOSAVE a" and run "testnosave" again, then I get a time of 31ms.
I should mention that #NOSAVE might not actually work exactly how you would expect. CMUD *still* creates the variable and saves it to the *.PKG database file. What #NOSAVE does is that it prevents the database from being updated when the variable changes it's value. So the variable still exists in the package, but it just won't have the correct saved value. This might be a bit confusing to some people since they might think that "#NOSAVE" means that it won't save it to the database at all.
Anyway, it's still a useful option. And as I've said, setting the "Use Default" option for a variable turns on the "nosave" option for the variable. So between the "Use Default" option and the "#NOSAVE" command, I think I've got it covered. |
|
|
|
Tech GURU
Joined: 18 Oct 2000 Posts: 2733 Location: Atlanta, USA
|
Posted: Fri Jul 27, 2007 6:17 pm |
With the internal cache working better I agree it's a good thing to remove the BEGINUPDATE/ENDUPDATE.
I'm not so sure about the #NOSAVE though. I guess someone else would be able to make use of it. |
|
_________________ Asati di tempari! |
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Mon Jul 30, 2007 3:11 am |
I was just thinking about the new #sort command. Does this use the new cache as well, only sorting when you access the variable rather than when you update it?
|
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Mon Jul 30, 2007 5:32 pm |
Not really. When you use the #SORT command, the "sorted" property of the internal string list is set. When a string list is marked as sorted, adding an item to it causes it to get inserted into the proper location to maintain the sort. So adding items to a sorted string list is slower. When a string list is marked as sorted, it also uses a binary search method for locating items in the list (which is why %ismember is faster).
If you are going to add a lot of items to a string list, it's best to turn off sorting, then add all of the items, then turn sorting back on:
Code: |
#SORT list 0
...add items to list here..
#SORT list |
The internal cache is only used when converting the string list into a string value. Sorting doesn't have any effect on this, since converting to the string value just loops through the internal linked list starting at the first element. Doesn't matter if the linked list is sorted or not. |
|
|
|
Malaphus Rousseau Newbie
Joined: 25 Jun 2007 Posts: 6
|
Posted: Tue Jul 31, 2007 11:40 pm |
What's the ETA on 2.0 anyway? *hides*
|
|
|
|
Fang Xianfu GURU
Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Tue Jul 31, 2007 11:45 pm |
Answered this back on the first page :)
Fang Xianfu wrote: |
Zugg's last estimate was either before or after the Sony Fan Faire. That's August 2nd-5th, so it'll be around then :) |
Last estimate said definitely afterwards I think, to avoid leaving us with a buggy beta release and no Zugg to complain at for three days :P
It helps to read the blog, it's normally got this kind of info on it. |
|
|
|
|
|