|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Aug 04, 2006 4:59 am
Another REALLY BAD day |
Man, this was one of the worst programming days in months!
I was working on the UI for the package tabs in the settings editor. The settings editor already supported tabs so this was going to be easy. It already had a database of "views" (each tab is a "view" with it's own filter settings). All I needed to do was store this internal database into the SQL file so that any tabs that you define would be saved across sessions, and then set it up to automatically add views for each package.
This was going really well. Then, all hell broke loose.
I really don't understand how this happens, but all of a sudden, CMUD was crashing in completely different places. It was crashing in code that I hadn't even changed today.
I closed Delphi and rebooted the computer since sometimes Delphi can get itself corrupted in memory. That didn't help.
The crashes were mostly coming from the kbmMemTable in-memory database components. Looked like it was some kind of memory problem. Yeah, memory errors...just what I *LOVE* to work on
So, I started with the usual culprits. I disabled the background thread that saves settings. That didn't help. I started commenting out the code I had added to save the Views table to the database. I got it back to where it was when I started the day and it STILL didn't work!
Even loading CMUD and immediately exiting without loading any files or running any scripts was crashing, and it was WORKING FINE YESTERDAY!
First I thought it was related to my memory leak "fix" in kbmMemTable. When I commented out that fix (which had been working fine for months), the problem seemed to go away. But only for a few minutes, then it was back. So it wasn't that.
It appeared that one of the internal database buffers was getting pointed to the same memory location as one of the data records. This stuff is a PAIN to debug because of how the pointers work. Even conditional breakpoints don't help much.
I remembered that CMUD was using that FastMM memory manager, so I went and looked for an update and found a newer version than what I was using. So I installed the update and then it was even worse! Now it was crashing in new places, and part of the kbmMemTable code was getting into an infinite loop because it's linked list of records seemed to have a loop in it.
I removed the FastMM stuff and it went back to the previous crash, although it was still intermittent and flaky. At least with FastMM I was getting a consistent crash location.
So I put FastMM back in and tried to figure out what was causing the linked list to get a loop in it. Now I was debugging down at the low level of the memory manager, which is horrible since it gets called all the time and tracking down a single allocation/deallocation problem takes forever.
I found that with FastMM I was getting a consistent crash and the memory pointer has a consistent value. Without FastMM, the Delphi memory manager was returning different allocation pointers everytime I ran the program (even without recompiling).
So, with the consistent pointer value, I was able to set up a conditional breakpoint in the memory manager to stop whenever that particular memory location was accessed.
The internal database has several cursor sets pointing at the same physical data. Also, the internal database is "versioned" which means that changes to records are stored in a linked list so that it always knows the original value. This is for Undo functions, as well as for the background process that needs to create the proper SQL statement (insert vs update vs delete) to update the SQL database.
What I noticed is that the memory location pointed to one of the versions of a particular database record. A routine called SetUnModified loops through the internal records and throws away changes. This routine is called when the database is first loaded into memory to indicate that the in-memory records are up to date (otherwise it thinks they are all new records and issues sql insert commands instead of sql updates).
When this record was updated, the previous version linked list was disposed. But one of the other cursor sets had a buffer still pointing to the previous version.
Now, if the other cursor set tried to access it's buffer at this time, then it would give me a memory error. But it wasn't accessing it's buffer. Instead, the database called the allocate routine to get a new record buffer, and the memory manager returned a pointer to the block of memory that it just deallocated.
So now the other cursor set buffer is pointing to valid data again, but it's not the record that it should be pointing to. This causes it to get confused and set the pointers to linked list of previous versions of the old record instead of the new record, causing a loop in the linked list.
So, what was the bug? The bug was in the SetUnModified routine of all places. I've had the same code for this routine for months! But what it wasn't doing was updating the buffers of the other cursor sets after it set the unmodified property of all the records. So the other cursor sets could still be pointing at the previous version of some records.
I have no idea why this just now started causing problems. Fortunately, the new version of FastMM seems more sensitive to the problem and would crash consistently. The Delphi memory manager wasn't always returning the record that it just deallocated, so it would only cause the problem intermittently.
My guess is that this bug was the cause of some really obscure crashes in the 1.03 (and previous) versions of CMUD. And somehow the changes that I made caused CMUD to become more sensitive to this bug.
Yeah, that took forever to explain, didn't it. Well, it took TEN HOURS to figure this out. And it was some of the most frustrating hours that I've had in months. You should have heard me cursing the computer. I think I scared the cats (and probably even Chiara).
What made it the most frustrating is that I couldn't imagine that something this basic was still screwed up in CMUD. Usually when there are memory errors like this, it's pretty obvious because it crashes all over the place, especially when you add the background updating thread.
Maybe it was something related to the new code for filtering the database to only show the correct package. It's possible that when filtering, the database creates other buffers and this was enough to put it over the edge.
Anyway, I just couldn't understand how something that seemed to work fine yesterday could suddenly be so messed up and crash in places that I hadn't even changed.
Some days I just can't believe that I do this for "fun". |
|
|
|
Rainchild Wizard
Joined: 10 Oct 2000 Posts: 1551 Location: Australia
|
Posted: Fri Aug 04, 2006 6:42 am |
Yeah, pointer bugs and getting busted by the cops in Need for Speed are the two spots where the profanity level rises above and beyond the phrases found in pulp fiction _and_ south park. It's amazing how many four letter words you can fit into a single breath and still not feel satisfied hehe.
And it's probably worse programming pointer stuff in Delphi compared to C++ because it's not as geared toward finding all those null or dereferenced pointers. I wonder if this could have been why some of our settings seem to get 'forgotten' from time to time. Nice and obscure :)
Time to have a pint of strong hobgoblin ale and soothe the cats poor innocent ears :) |
|
|
|
Tech GURU
Joined: 18 Oct 2000 Posts: 2733 Location: Atlanta, USA
|
Posted: Fri Aug 04, 2006 8:41 am |
Hey Zugg... I feel your pain. That's why I'm not looking forward to beginning on my new project on Monday. Did mention that we rearchitected most of it today (although admittedly based on new requirements it is the better solution) and I've got to get it all documented ASAP.
Delphi pointers are hell, but I'm not exactly having fun straddling the Mainframe, many many queues, EJBs an appserver and DB2 on the back end.
Hopefully you'll have a great nights sleep and knock it all out tomorrow. Everytime I start entertaining the idea of going back to coding C++ all the *fond* memories of pointers and memory references that you bring up remind me while I'm still hesitant to do so. I'm glad you're not as faint of heart as I am.
|
|
_________________ Asati di tempari! |
|
|
|
slicertool Magician
Joined: 09 Oct 2003 Posts: 459 Location: USA
|
Posted: Fri Aug 04, 2006 10:14 pm |
I've had days like this... It almost makes me want to become a Luddite.
|
|
|
|
Taz GURU
Joined: 28 Sep 2000 Posts: 1395 Location: United Kingdom
|
Posted: Fri Aug 04, 2006 10:50 pm |
Zugg that sucks but I'm glad you got to the bottom of it and it's great that you did because that's one hell of a bug out of the way.
Rainchild wrote: |
It's amazing how many four letter words you can fit into a single breath and still not feel satisfied hehe. |
OMG!! I really thought I had some sort of issue, people tell me I have, or perhaps that is because it's the sort of behaviour that is supposed to be in private rather than public.
Rainchild wrote: |
Time to have a pint of strong hobgoblin ale and soothe the cats poor innocent ears :) |
Oh WOW! Can't believe you can get that in Australia. Zugg I don't know if you drink alchohol and if you do what sort but if you do and you can get that stuff in America believe me as a very keen supporter of Real Ale that is a damn fine specimen and well worth trying. |
|
_________________ Taz :) |
|
|
|
Baram Novice
Joined: 23 Apr 2006 Posts: 33 Location: Seoul, Korea
|
Posted: Sun Aug 06, 2006 4:54 am |
I know that feeling, though in my case it's scaring the dog, not the cat. What's worse for me is that I work better at night, then run into some major problem like that... well I guess it's not a problem for me, but my wife doesn't seem to like waking up at 3am to hear me cursing out the computer.
|
|
_________________ Joseph Monk
Working on yet unannounced MUD. |
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|