Register to post in forums, or Log in to your existing account
 

Play RetroMUD
Post new topic  Reply to topic     Home » Forums » CMUD General Discussion
Delgar
Beginner


Joined: 22 Dec 2008
Posts: 21

PostPosted: Sat Dec 04, 2010 7:43 pm   

v3.32 Memory Leak [Confirmed Fix 3.33]
 
I couldn't find a specific topic on the General forums for this, so I am essentially moving this over from the Beta forums with some test code that actually reproduces the issue.

Background that I've see:

    1. Prior to the Beta that added the Json code, I was typically reaching a max memory usage of around 125MB after well more than a 24 hours of uptime.
    2. Not sure exactly when, but around the time of the Json code, I started reaching 250-300MB after 15-20 hours of uptime. It was annoying, but didn't seem to cause much of an issue. Others reported that they were seeing over 900MB usages.
    3. Around 2 weeks ago, I added my first GMCP trigger to collect character usage info for my status buttons rather than my MUD prompt trigger. I immediately started seeing usages of 300MB after only a few hours of usage and would easily hit 900MB in a typical weekend day of 12 hours online (though after about 600MB usage, random lockups became more frequent, so I restart when I reach about 600MB).
    4. I recently got access to the AardWolf test port to try to create a simple piece of code to replicate this issue and my results and code are below (this issue is very hard to see when not actually getting data from a MUD via GMCP).

Testing and Results (using Win7 "Working Set" task manager numbers when the CMUD process is idle; exited and restarted CMUD between each run, edited the scripts after recording the finished memory values for the next test before exiting CMUD for the current test):

    1. Standalone test not connected to a MUD:

      a. MainStuff/Triggers/GMCP class disabled.
      b. MainStuff/Triggers/Prompt trigger enabled.
      c. All other Classes and aliases enabled.
      d. alias command: standalone 221777 5
      Runtime 1.92 hours. Memory use started at 49,680K and finished at 90,996K which is ~21,518K/hr increase.


    2. Connected to test port for AardWolf killing a 5 mob room such that it takes a couple rounds and character took some damage.

      a. MainStuff/Triggers/<all classes> disabled.
      b. alias command: loopkill 500
      Runtime 1.72 hours. Memory use started at 41,360K and finished at 55,104K which is ~8,006K/hr increase.


    3. Same as 2.

      a. alias command: loopkill 1500
      Runtime 5.05 hours. Memory use started at 41,716K and finished at 65,568K which is ~4,723K/hr increase.
      This indicates that there is a static increase probably for the scroll-back buffer size, but still there is a significant increas in memory usage.


    4. Same as 2.

      a. MainStuff/Triggers/GMCP class enabled.
      b. MainStuff/Triggers/Prompt trigger disabled.
      c. All other Classes and aliases enabled.
      d. alias command: loopkill 500
      Runtime 1.72 hours. Memory use started at 42,860K and finished at 164,800K which is ~71,031K/hr increase.
      10 hours would give about 710MB increase in memory usage which is about what I'm currently seeing with all of my scripts active.



I hope this helps track down what data structures aren't being freed correctly for garbage collection.
OR
Someone point out what I am doing wrong that is causing the memory consumption (I believe all I am doing is copying data from one area into pre-existing variables). I did try access the GMCP fields directly from the buttons, but not all of them update when the GMCP data changes (HP updates "always" and Mana rarely/never changes with direct button access to GMCP structure which is not in the code below).

Let me know if there are questions about this testing and code.

Thanks,
-- Delgar

Code:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<cmud>
  <class name="MainStuff">
    <class name="Data">
      <var name="hp_cur">
        <value>1856</value>
        <json>1856</json>
      </var>
      <var name="mana_cur">
        <value>1060</value>
        <json>1060</json>
      </var>
      <var name="mv_cur">
        <value>1366</value>
        <json>1366</json>
      </var>
      <var name="hp_max">
        <value>1856</value>
        <json>1856</json>
      </var>
      <var name="mana_max">
        <value>1060</value>
        <json>1060</json>
      </var>
      <var name="mv_max">
        <value>1366</value>
        <json>1366</json>
      </var>
      <var name="tnl_cur">
        <value>3500</value>
        <json>3500</json>
      </var>
      <var name="tnl_max">
        <value>7000</value>
        <json>7000</json>
      </var>
      <var name="align">
        <value>22</value>
        <json>22</json>
      </var>
      <var name="hp_percent">100</var>
    </class>
    <class name="Buttons">
      <button type="Gauge" autosize="false" width="77" height="16" autopos="false" left="178" top="17" toolbar="2" margin="0" color="blue" textcolor="#80FFFF" gaugelowcol="red" gaugebackcol="#0000A0" priority="42">
        <caption>Align: @MainStuff/Data/align</caption>
        <value>align</value>
        <expr>@MainStuff/Data/align + 2500</expr>
        <gaugemax>5000</gaugemax>
        <gaugelow>2700</gaugelow>
      </button>
      <button name="HP" type="Gauge" autosize="false" width="177" height="16" autopos="false" toolbar="2" color="#0080C0" textcolor="white" gaugelowcol="#004080" gaugebackcol="red" priority="14">
        <caption>@MainStuff/Data/hp_cur  /  @MainStuff/Data/hp_max  HP (%format("&amp;5.1n", @MainStuff/Data/hp_percent)%)</caption>
        <expr>@MainStuff/Data/hp_cur</expr>
        <gaugemax>@MainStuff/Data/hp_max</gaugemax>
        <gaugelow>@MainStuff/Data/hp_max/6</gaugelow>
      </button>
      <button name="Mana" type="Gauge" autosize="false" width="177" height="16" autopos="false" top="17" toolbar="2" color="#0080C0" textcolor="white" gaugelowcol="#004080" gaugebackcol="red" priority="10">
        <caption>@MainStuff/Data/mana_cur  /  @MainStuff/Data/mana_max  Mana</caption>
        <expr>@MainStuff/Data/mana_cur</expr>
        <gaugemax>@MainStuff/Data/mana_max</gaugemax>
        <gaugelow>@MainStuff/Data/mana_max/6</gaugelow>
      </button>
      <button name="Move" type="Gauge" autosize="false" width="142" height="16" autopos="false" left="255" top="17" toolbar="2" color="#0080C0" textcolor="white" gaugelowcol="#800040" gaugebackcol="red" priority="12">
        <caption>@MainStuff/Data/mv_cur  /  @MainStuff/Data/mv_max  MV</caption>
        <expr>@MainStuff/Data/mv_cur</expr>
        <gaugemax>@MainStuff/Data/mv_max</gaugemax>
        <gaugelow>77</gaugelow>
      </button>
      <button name="TNL" type="Gauge" autosize="false" width="142" height="16" autopos="false" left="300" toolbar="2" color="#0080C0" textcolor="white" gaugelowcol="#800040" gaugebackcol="red" priority="52">
        <caption>@MainStuff/Data/tnl_cur  /  @MainStuff/Data/tnl_max  TNL</caption>
        <expr>@MainStuff/Data/tnl_cur</expr>
        <gaugemax>@MainStuff/Data/tnl_max</gaugemax>
        <gaugelow>77</gaugelow>
      </button>
    </class>
    <class name="Triggers">
      <class name="GMCP">
        <trigger name="GMCP Char Copy Values" type="GMCP" priority="1" newline="false" prompt="true">
          <pattern>char*</pattern>
          <value>// Copy the GMCP data into local variables.
MainStuff/Data/hp_cur     = %gmcp.char.vitals.hp
MainStuff/Data/mana_cur   = %gmcp.char.vitals.mana
MainStuff/Data/mv_cur     = %gmcp.char.vitals.moves

MainStuff/Data/hp_max     = %gmcp.char.maxstats.maxhp
MainStuff/Data/mana_max   = %gmcp.char.maxstats.maxmana
MainStuff/Data/mv_max     = %gmcp.char.maxstats.maxmoves

MainStuff/Data/tnl_cur    = %gmcp.char.status.tnl
MainStuff/Data/tnl_max    = %gmcp.char.base.perlevel

MainStuff/Data/align      = %gmcp.char.status.align

// Calculate the current percentage of HP remaining for self.
MainStuff/Data/hp_percent = 100.0 * (%float(%gmcp.char.vitals.hp)/%float(%gmcp.char.maxstats.maxhp))</value>
        </trigger>
      </class>
      <trigger name="Prompt" priority="770" prompt="true" enabled="false">
        <pattern>^~[{Fighting: |""}(%d)/(%d)hp (%d)/(%d)mn (%d)/(%d)mv {(%d)qt |""}(%d)tnl*~]{""| }~></pattern>
        <value>// Copy the GMCP data into local variables.
MainStuff/Data/hp_cur     = %1
MainStuff/Data/mana_cur   = %3
MainStuff/Data/mv_cur     = %5

MainStuff/Data/hp_max     = %2
MainStuff/Data/mana_max   = %4
MainStuff/Data/mv_max     = %6

MainStuff/Data/tnl_cur    = %8
// MainStuff/Data/tnl_max    = %gmcp.char.base.perlevel

// MainStuff/Data/align      = %gmcp.char.status.align

// Calculate the current percentage of HP remaining for self.
MainStuff/Data/hp_percent = 100.0 * (%float(%1)/%float(%2))</value>
      </trigger>
    </class>
    <alias name="chl">
      <value>#SEND {c 54 %-1} // Heal</value>
    </alias>
    <alias name="cfb">
      <value>#SEND {c 84} // Fire breath</value>
    </alias>
    <alias name="ceq">
      <value>#SEND {c 39} // Earthquake</value>
    </alias>
    <var name="stopkill">0</var>
    <alias name="loopkill">
      <value>$loop_cnt = 1
$max_loop = 2100

#IF ($loops != %null) {
  $max_loop = $loops
}

#ECHO {START: $max_loop loops: %time()}
#WHILE (($loop_cnt <= $max_loop) &amp;&amp; (@MainStuff/stopkill != 1)) {
  #ECHO {START LOOP: $loop_cnt}
  #SEND {reset}

  #SEND {restore}
  ceq
  #WAIT 1500
  #WHILE (%gmcp.char.status.state == 8) { #WAIT 1500 }
  #SEND {drop all}
  #SEND {junk all}
  #WAIT 777
  #SHOW {""}
  #ECHO {END LOOP: $loop_cnt}
  #SHOW {""}
  #SHOW {""}
  $loop_cnt = $loop_cnt + 1
}

#ECHO {DONE: $max_loop loops: %time()}
</value>
      <arglist>$loops</arglist>
    </alias>
    <alias name="cwr">
      <value>#SEND {c 82} // Word of recall</value>
    </alias>
    <alias name="standalone">
      <value><![CDATA[$loop_cnt = 1
$max_loop = 2100
$delay    = 500

#IF ($loops != %null) {
  $max_loop = $loops
}

#IF ($time != %null) {
  $delay = $time
}

#ECHO {START: $max_loop loops: %time()}
#WHILE (($loop_cnt <= $max_loop) && (@MainStuff/stopkill != 1)) {
  #ECHO {%concat("[",($loop_cnt*7),"/",($max_loop*7),"hp ",($loop_cnt*2),
                 "/",($max_loop*2),"mn ",$loop_cnt,"/",($max_loop),"mv ",
                 $loop_cnt,"qt ",(($loop_cnt-1)*3),"tnl] >")}
  #WAIT $delay
  $loop_cnt = $loop_cnt + 1
}

#ECHO {DONE: $max_loop loops: %time()}]]></value>
      <arglist>$loops,$time</arglist>
    </alias>
  </class>
</cmud>


Last edited by Delgar on Thu Dec 16, 2010 12:38 am; edited 1 time in total
Reply with quote
Rahab
Wizard


Joined: 22 Mar 2007
Posts: 2320

PostPosted: Mon Dec 06, 2010 1:27 pm   
 
How big is your scrollback buffer?
Reply with quote
Delgar
Beginner


Joined: 22 Dec 2008
Posts: 21

PostPosted: Mon Dec 06, 2010 4:02 pm   
 
Scrollback set to 4242 lines.

If this were a scrollback issue, all of the tests would show very similar increases.
Only when triggers are actually enabled, is this a big issue and it only became a huge issue to me when
I added a GMCP trigger.
If there is something that causes triggers (esp GMCP ones) to cause the scrollback buffer to take up memory
and not release it, then this is an issue that needs to be fixed.

Actually, if there is something that causes the scrollback buffer to take up memory and then not release it,
this also needs to be fixed.

One more test

    5. Same as 4 (GMCP enabled, prompt disabled).

      a. MainStuff/Buttons disabled.
      b. So, now there is nothing referencing the variables except the GMCP trigger.
      Runtime 1.67 hours. Memory use started at 42,080K and finished at 151,420K which is ~65,473K/hr increase.
      Not a big difference with or without buttons enabled.

_________________
Thanks,
-- Delgar
Reply with quote
Rahab
Wizard


Joined: 22 Mar 2007
Posts: 2320

PostPosted: Mon Dec 06, 2010 4:28 pm   
 
I wasn't suggesting that the scrollback buffer wasn't releasing memory. I just wanted to check whether the scrollback buffer was large enough to cause the problem. If triggers were turned off, you wouldn't be generating all those lines from the mud, which is what fills up the scrollback buffer. But 4242 lines is probably not the problem.

Are these scripts you posted the only scripts on the session you tested this on? You didn't explicitly say.
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Mon Dec 06, 2010 5:39 pm   
 
Now that you have a way to reproduce the problem, can you start playing with your GMCP triggers to see if you can track down the problem. Some things to try:

1) If you remove the buttons completely, does the memory still increase the same? Don't just disable the button...remove it completely.

2) Start removing some of the sections in the GMCP trigger. For example, how does the memory usage change if you remove the section for assigning the %gmcp.char.vitals data vs removing the %gmcp.char.maxstats data. For example, I'm wondering if there is a problem in CMUD accessing the %gmcp variables for a message that didn't cause the trigger. In other words, if the GMCP message is char.vitals but you access char.status.align, is that causing a memory leak?

3) Does the memory usage scale with the number of %gmcp variables referenced in your GMCP trigger?

Any way you can start simplifying your script to try and nail down the exact problem would be a huge help.
Reply with quote
Delgar
Beginner


Joined: 22 Dec 2008
Posts: 21

PostPosted: Mon Dec 06, 2010 6:38 pm   
 
Rahab,
I was just giving additional information and where my thought process was with regard to scrollback.
Yes, I do clean runs when trying to track down bugs, so what I posted is the only thing in CMUD other than the login triggers (user name and password).


Zugg,
I probably should have put this comment in my original that I did do a test where the GMCP trigger only copied the hp vitals and maxstats with no real change from the worst case test 4.
I will do some additional testing as requested, but I may not be able to post the results until Thursday (long runs and I'm going to be doing some traveling tomorrow and Wednesday with only sporadic Internet access). I may post interim results if anything interesting pops up.
Order of additional tests:

    1. Remove buttons completely and rerun test 4 (buttons may not get added back in for tests below if removing them makes no/little difference).
    2. Only collect vitals and rerun test 4 (removing all other data items).
    3. Only collect maxstats and rerun test 4 (removing all other data items).
    4. Only collect vitals.hp and rerun test 4 (removing all other data items). The maxstats are not changing at all, so this should cover the scenario of accessing a gmcp message that didn't cause the trigger causing a leak since vitals.hp will be one of the main triggers causing the gmcp message.


Any other specific combinations you would want?
_________________
Thanks,
-- Delgar
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Tue Dec 07, 2010 1:28 am   
 
Just keep removing stuff from the GMCP trigger until the memory usage goes away. If you can track it down to a single line in the script, that would be great. I'm not doing the December version until next week, so Thursday is fine.
Reply with quote
Delgar
Beginner


Joined: 22 Dec 2008
Posts: 21

PostPosted: Tue Dec 07, 2010 5:19 am   
 
Ok, I was actually able to get the testing done enough and the results are interesting (all modifications to script, including deletion of data fields done at the end of previous run after collecting memory information, so that the editor memory use won't impact readings, this was done in all testing so far).

Rates shown are the K per hour increase from start memory to the finish memory readings.
1. Full data copy from GMCP trigger, but buttons completely removed: 56,785K/hr.
2. Copy only vitals hp/mana/moves, and deleted the data fields prior to run: 21,146K/hr
3. Copy only max hp/mana/moves, and deleted the data fields prior to run: 18,029K/hr
4. Copy only vitals hp, and deleted the data fields prior to run: 4,393K/hr
5. Copy nothing, and deleted the data fields prior to run: essentially 0K/hr

You will notice that if you add #2 and #3 together and take into consideration that there were some fields that were being copied in #1 that didn't get copied in either #2 or #3 (i.e. tnl, align, ...), they add up to be approximately the increase rate of #1.
#4 is approximately one third of #2 and #3 (order of magnitude wise, I'm sure if I ran the tests longer they would be a lot closer).
Since having the GMCP trigger still firing, but not copying anything #5, does not cause any memory issue; it appears that any data fields being copied in the GMCP trigger are the culprits in not always being garbage collected when no longer being used.

When I get to a point where I have time and an Internet connection again (Thursday), I can do additional testing, but the only thing I can think of is to copy non-%gmcp fields from inside the GMCP trigger; other than that I don't know what more I can do since it appears that the copying of any data inside the GMCP trigger is what is causing the leak.
_________________
Thanks,
-- Delgar
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Tue Dec 07, 2010 6:09 pm   
 
I need to come up with a good way to test this offline. Are any of the Gurus interested in putting together a CMUD script that uses #SHOW to send GMCP data to CMUD in a loop of some kind so we can try to ramp this up. A 4MB/hour leak is awfully slow for debugging. We need a way to make this leak more obvious and easier to test.

Actually, here is another test for Delgar to try. In your test #4, what happens if you put your data assignment within your GMCP trigger into a loop. Something like this:
Code:
#LOOP 100 {
  MainStuff/Data/hp_cur     = %gmcp.char.vitals.hp
  }

Does that cause your memory usage to increase by a factor of 100?
Reply with quote
Delgar
Beginner


Joined: 22 Dec 2008
Posts: 21

PostPosted: Wed Dec 08, 2010 1:27 am   
 
Good idea (tried this from really slow hotel internet).
This actually found a much higher rate of memory leak.
I used 1000 instead of 100 and with only "loopkill 50" got a memory loss rate of 3,729,150K/hr (yes, 3.7GB/hr).
Usage went from 56,588K to 553,808K in 7 minutes (good thing I didn't try to run the full 500 loops, would have crashed my system if CMUD didn't crash or lock up first).

The test only ran for 23 of the 50 loops before the test port crashed (not due to this), but I figured that was long enough to get us the information needed.
_________________
Thanks,
-- Delgar
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Wed Dec 08, 2010 10:27 pm   
 
Wow, ok then...that's good info. I'll try putting a #LOOP into my own GMCP trigger and see what happens. It's possible the leak was just too small for me to notice in my normal testing before.
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Wed Dec 08, 2010 10:46 pm   
 
Woot!! That did it. Adding a #LOOP 1000 in my own GMCP trigger also shows the memory leak. So I should be able to track this down and fix it in the next update. Thanks for your patience and help on this.
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Thu Dec 09, 2010 6:29 pm   
 
Well, this is interesting. I played around with this memory leak a bit more and here is what I learned:

1) Doesn't matter if it's in a GMCP trigger or not. Just doing this:
Code:
#loop 1000 {RoomNum=%gmcp.Room.Info.num}

on the command line also shows the memory leak.

2) Doesn't matter if it's a local variable or not. Doing this:
Code:
#temp="";#loop 1 {$temp=%gmcp.Room.Info.num}

also shows the leak.

3) Doesn't even need to be the %gmcp variable! Doing this:
Code:
db.room.info.num="Test"
#loop 1000 {RoomNum=@db.room.info.num}

also shows the leak.

4) The leak gets worse the more "levels" deep you go with the database variable. Doing this:
Code:
db.a.b.c.d.e = "Test"
#loop 1000 {RoomNum=@db.a.b.c.d.e}

shows the memory leak is even worse than in (3)

Item (4) is probably why I never saw this leak in testing. In testing I rarely used more than one level deep database variables. This definitely seems to indicate a problem in the low-level JSON library. I'll be looking into this source code later today to see what I can find and hopefully can fix it. But now I've got a really simple way to reproduce the problem just from the command line, and that always makes debugging stuff like this a lot easier.

This could also explain why some other people have seen memory leaks they could not reproduce all the time since this leak can definitely occur outside of GMCP code. Somebody (sorry, I forget who) posted in the past that they thought GMCP just made the leak a bit more visible and they were correct. Since %gmcp variables tend to be several levels deep, they are certainly more prone to this leak that normal database variables.
Reply with quote
charneus
Wizard


Joined: 19 Jun 2005
Posts: 1876
Location: California

PostPosted: Thu Dec 09, 2010 7:11 pm   
 
That someone would have been me. I knew I wasn't crazy! I'm just glad that someone was finally able to figure out a way to reproduce it. Thanks!
Reply with quote
Shezmu
Wanderer


Joined: 23 Apr 2010
Posts: 53

PostPosted: Thu Dec 09, 2010 8:29 pm   
 
Can I suggest that Delgar gets a magic cookie or something? Maybe change his profile rank from Beginner to "Memory Leak stud." Well done, Delgar.
Reply with quote
Zugg
MASTER


Joined: 25 Sep 2000
Posts: 23379
Location: Colorado, USA

PostPosted: Thu Dec 09, 2010 9:28 pm   
 
Cool, I found the leak. It was actually a very common leak, which makes it surprising this was so hard to track down. The routine that handles the "dot" operation with a database variable (like @db.a) had a small leak in it. This leak was exponential based upon the depth of the value. So @db.a only leaked a couple of bytes. But @db.a.b leaked 2-times that amount, and @db.a.b leaked 4-times the amount, etc.

This leak applied to *any* "dot access" of a database variable. Doing something in a loop with a great depth was the key to finding it. Even doing:
Code:
#loop 1000 {a=@db.a}

really wasn't enough to track it down.

The other thing I learned when tracking this down was that the "FastMM4" library that I thought I was always using when debugging to track memory leaks had somehow gotten disabled at some point. I know it is not compatible with the madExcept library that is used to trap and send crash dumps, so I must have just disabled it and forgotten about it.

So I *thought* Delphi was always looking for memory leaks when I was debugging. But it wasn't. When I turned FastMM4 back on (and disabled madExcept), I actually found a few more memory leaks. Fortunately these were just leaks in some initialization routines (loading global default preferences), so it wasn't a leak that would cause CMUD to use more and more memory. It just meant that CMUD wasn't properly deallocating a small amount of variables when closing. This isn't a big deal in Windows since Windows will go ahead and free all memory when a program exits anyway. But I still spent the time to clean it up.

Once FastMM4 was enabled, tracking down the database "dot" leak was really trivial since the #loop caused a huge number of leaks. It quickly pointed me to the problem and it was a one-line fix.

In the future I'll try to remember to re-enable FastMM4 now and then to check for problems. For today I went ahead and logged into a MUD and ran around killing stuff for a few minutes to make sure there weren't any other leaks. I also played in the settings editor and a few other windows to check them too. No other leaks were found, so I think we are good now.

Look for the December CMUD update sometime next week.
Reply with quote
Rahab
Wizard


Joined: 22 Mar 2007
Posts: 2320

PostPosted: Fri Dec 10, 2010 3:37 pm   
 
Woot indeed! And big thanks to Delgar for diligently finding a method for reproducing the problem!
Reply with quote
Display posts from previous:   
Post new topic   Reply to topic     Home » Forums » CMUD General Discussion All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

© 2009 Zugg Software. Hosted by Wolfpaw.net