|
Zugg |
Posted: Sat Feb 12, 2005 5:35 am
An SQL and ADO rant |
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Thu Feb 17, 2005 6:52 am |
I don't plan to use something like SqLite for the main mail store or contacts or anything like that. It's more likely to be used for the first-level cache of data. Remember that the backend store will be customizeable for the user. So, you can use whatever database you want. If you want the ultimate integrity at the sacrifice of some performance, then you'll use some big SQL database. If you want to use MSDE then you can use that. If you want to use Access (not sure why anyone would), you can use that. The back end is totally customizeable.
However, I have to choose *something* to cache the data in for the user interface. And that will probably be something very simple and very fast.
Don't worry, I'm certainly not going to risk *my* email either. Then again, almost anything is probably better than the PST file it is stored in right now.
As far as your zMUD maps, you'd find it almost unplayable, which is why it doesn't work that way. The only way to make the display and path finding fast was to load the entire map database into a memory cache. Then again, I was using ADO as the interface, and as I said, it will be interesting to play with the map database sometime in the future to see how some of these other systems might improve it. |
|
|
|
Castaway GURU
Joined: 10 Oct 2000 Posts: 793 Location: Swindon, England
|
Posted: Thu Feb 17, 2005 8:35 am |
Yup, I think row_number() didnt appear until the 1999 standard, I'll check up on that. (I also note, that DB2 has its own way to do this, AND supports the standard.. it has "FETCH FIRST XX ROWS ONLY' .. fun fun)
There is no link to the official standard, because you have to pay for the PDFs, in bits, from the ISO/ANSI group. I managed to find a piece or two of the 1999 standard (ansi-iso-9075) on the net to play with, but I'm not posting those URLs in a forum.. At the http://webstore.ansi.org/ansidocstore/default.asp, the pieces are available for $18 each. (Just search for "SQL").
Darker pointed to a nice set of comparisons, as you can see, theres no 'X supports better than Y', just X supports some things, Y others, they don't necessarily overlap.
Having looked for quite a while myself, and not found a comprehensive comparison table, of exactly which functions/statements each RDBMS supports, I started creating my own, just using the documentation of each DB as a guide. Like most such things though, it was tedious, and annoying. Anyhow, such as it is, you're welcome to it.. (I may well turn it into/add it to my Wiki, currently it's an Open/StarOffice spreadsheet. http://desert-island.dynodns.net/SQL/sql-comparison.sxc.
Oracle is surprisingly slow at implementing the standard, btw. Version 8 (we're using 9 at work, 10 is current I believe), didn't support JOIN, for example, it had some strange own way of doing those. Microsoft isnt all that bad at it, though they have added a bunch of their own stuff.
DB2 is my favourite for following the standard, at the mo, though it may be that I'm biased :) (I don't work there, just been using it for years)
Anyhow, any other questions, just ask.. I've been fiddling with SQL a fair bit (and tools such as sqlfairy, which attempt to parse the various DBs SQL.. )
Lady C. |
|
|
|
Castaway GURU
Joined: 10 Oct 2000 Posts: 793 Location: Swindon, England
|
Posted: Thu Feb 17, 2005 8:46 am |
Oops, missed the second page of this thread while posting that..
I've used SQLite lots, under linux, its very practical for small speedy database work. It gets a little unpractical for large ones, because the db is all in one file.
I'm not sure what all the hassle is with foreign keys, to be honest I've never used any. (I understand our stuff at work doesn't use them because it makes things lots slower..) - It seems like an extraneous feature to me, the programmer should also be able to keep track of keys themselves, and do it when they think its necessary enough to slow things down..
I still don't quite get the concern with not wanting to load an entire table.. Or rather, if I create a query which returns the whole table, then thats what I wanted to do. If you choose to display it in chunks, scrollable, can't you just use cursors? I thought thats what they were for. ie. don't do it in the SQL, let the database driver do it by only asking it for X rows ?
Lady C. |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Thu Feb 17, 2005 7:13 pm |
I'll take a look at your spreadsheet. I also thought about doing that, but as you said, it gets time consuming, and I've got other things I should be spending my time on
I agree with you about the keys. I've never had any trouble keeping track of them myself and somehow I feel better doing it that way rather than trusting the database to do it. I've never lost any data or gotten a corrupt database either. Maybe it's just because I don't understand the inner workings, but I'm paranoid and think that the more complicated it gets, the more of a chance for something to go wrong. So I tend to stay away from lots of the fancy features.
As far as loading an entire table...sure, if you *want* to do this, you certainly can. But let's look at a very practical example...the "posts" table in this current PHPBB database. It has posts from several years ago. If you try to do a "SELECT * FROM Posts" query (which I accidentally did the other day), you will wait several minutes to get that data all returned across the wire. In the meantime you are staring at an hourglass cursor waiting for all of this data. Once it is received and displayed in the Grid, then you can start scrolling through it.
But that's silly because there is no reason to ever want to manually scroll through thousands and thousands of records like that. You end up doing a search for something anyway, which could have been the original SQL query in the first place.
As far as cursors, I don't have any experience with the bigger databases. And my experience in accessing databases has always been via ADO. In order to just get data in chunks via a cursor, you must use a *server* side cursor. In ADO if you use a server cursor, most of the rest of the ADO "features" stop working. Things like Sort and Filtering only work with client cursors, because ADO needs all of the data in order to sort it or filter it (makes sense). Which means that when using a server cursor, you might as well just be doing straight SQL in the first place and not suffering the overhead of the ADO layer. And some databases like MySql don't even support server-side cursors I don't think.
There also isn't much of a standard on how to support server-side cursors. I know, for example, that MS SQL Server has a server-side cursor that can be used for large dataset paging, but it's not the server-side cursor that is used for the ADO support. The ADO server-side cursor doesn't let you access most of the MS-specific optimizations.
So basically, if you use a server-side cursor you are stuck with a single database provider and optimizing it for that specific database. There is really no way to write a "universal data access" application and use server-side cursors efficiently.
On top of that, from a scaling perspective, server-side cursors are bad. They eat up server resources like crazy. So once again, you are often stuck using a client-side cursor to prevent server load, and then end up waiting a long time if you accidentally make a huge query with no graceful way to cancel the query in the middle.
If there was universal SQL support for stuff like LIMIT and OFFSET (or ROWNUM and ROWOFFSET, or something like that), then we could implement paging on the client side easily with regular SQL commands. For example, I could do something like Google where only the first 20 rows are displayed in a grid, and when you click the Page Down button, it could fetch the next 20 rows from the server. This way you only ever need to wait a short amount of time to get the initial data and there is very little overhead. The only drawback to this approach is that it would be difficult to have a sliding scroll-bar that updates the grid as you drag. You'd have to wait till the user stopped dragging, then fetch the page of records they want to view.
Right now, implementing some "smart" SQL grid like this on the client is nearly impossible, except for a certain few databases that implement the needed SQL commands.
To conclude, be sure and think of this from the *user* perspective, and not the programmer perspective. I'm talking about an email client here. The user might potentially have thousands and thousands of email messages stored in their SQL message store on some local or remote machine. When they want to search for a past mail message, they are essentially doing an SQL query behind the scenes. But the user doesn't know about this, nor do they care. They want to enter some search criterion and get results back in a timely manner. If they make a typo and click the Submit button, it shouldn't if they just requested a million records. They expect to get something back quickly. Even if they request a million records, they will want to start looking at the first 20 right away. Why should they have to wait for the entire query to finish. Or why isn't there a good way to cancel the query in the middle. Even determining the size of the query and prompting the user for confirmation can take a long time since doing a COUNT on the server might still take a while if the query is complicated.
This is my general peeve with so-called "modern" client/server/multitier applications. It's hard to provide good feedback to the user on what they are doing if something is going to end up taking minutes to get data back. The user doesn't care about indexes or cursors...they just want their data. Now, if you are designing a very specific application with a very specific backend database, you can optimize this and pick the database so that this problem can be solved. But in eMobius, I'm allowing the end-user to choose what kind of backend database they want to use. Handling these situations in this general case is a lot harder, which is probably why nobody has done it before. |
|
|
|
Rainchild Wizard
Joined: 10 Oct 2000 Posts: 1551 Location: Australia
|
Posted: Thu Feb 17, 2005 11:49 pm |
The foreign keys, triggers, stored procedures, etc are all methods of maintaining data integrity, especially in a multi-user situation. A very simple example is:
User "A" starts adding a note to the "Rainchild" contact.
User "B" logs on and deletes the "Rainchild" contact.
User "A" submits the note.
If you have no foreign key checking, you now have a note pointing to a non-existant contact, which (depending on how you do your indexing) might end up as being attached to a new contact at a later date. If you do have foreign key checking, an error is thrown stating that you can't add the record because it violates a foreign key constraint.
It is also useful in the other direction, eg "Rainchild" has some purchase orders attached to his name, so when User "B" tries to delete the contact, an error is thrown saying "You can't delete the Rainchild contat because it is referenced in the purchase order and delivery tables".
If you didn't have your foreign key checking you would now have a whole lot of orphaned purchase orders / deliveries. This again may not apply so much to the email client, but its why they exist and are critical in a lot of places.
Maintaing your own keys via code is a lot more risky, eg what would happen if you:
- DELETE FROM contact WHERE name = 'Rainchild'
- DELETE FROM purchaseorders WHERE contact = 'Rainchild'
<network cable unplugged / power cut / something crashed / etc so this never gets executed>
- DELETE FROM deliveries WHERE contact = 'Rainchild'
Suddenly you get your orphaned records again.
If you do it in the other way eg:
- BEGIN TRANSACTION
- DELETE FROM purchaseorders WHERE contact = 'Rainchild'
<network cable unplugged / power cut / something crashed / etc so this never gets executed>
- DELETE FROM deliveries WHERE contact = 'Rainchild'
- DELETE FROM contact WHERE name = 'Rainchild'
- COMMIT
nothing untoward happens because the COMMIT is not received so the transaction rolls back and your data isn't deleted.
Now what's worse is, say I write a plugin for eMobius and do something like "DELETE FROM contact WHERE name = 'Rainchild'" ... I don't know you had other things tied to the contact, whoops, now I just messed up your database royally... so when you start talking about multiple programmers accessing the database it's only a matter of time before something breaks.
If you set up your relationships correctly then it can cascade delete or deny the delete. But to aid with it this is where stored procedures come in. If you deny write access to the tables and write stored procedures to check the data and apply "business rules" to it before anything touches the database, then you can simply things for other programmers / pieces of software / etc. Eg if you wanted to delete the "Rainchild" contact, you could have the stored procedure execute the 3 SQL's that needed executing (delete purchaseorder, delete delivery, delete contact). Likewise, if you add another table (say, a record of phone calls), you update this one stored procedure and you don't have to update the 400 other plugins to add a 4th delete SQL, as the database knows how to maintain it.
Likewise, I use views for this same reason, you can change the underlying table structure / add colums / etc but if you are calling the views from your program instead of directly accessing the tables, those changes won't break your SQL statements, since the 'business practice' is the views are to maintain backward compatibility for legacy support of applications.
My friend's studying to be a DBA so I cop a lot from him when I start asking questions like "who needs key constraints" and "why can't we select * from table" and "the database is just a little black box which stores whatever". Well you got the reason for key constraints above. Selecting * is bad because it's not optomized so the server has to do heaps more work. And the little black box is a misconception that programmers get and then wonder why a query fails and creates a record lock when you try to "update contact set fullname = firstname + ' ' + lastname" on a recordset containing 1million entries ;)
I try to think of a DB in an OO sense, that stored procedures are the "Set" functions, and views are the "Get" functions. The 'object' knows how to maintain itself, you just have to call the correct methods and it will do everything else internally. It'd probably offend my DBA friend to still have his precious databases referred to in a programming sense, but be less offensive than writing data directly to his tables ;) |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Feb 18, 2005 12:16 am |
Yes, I understand all of this. That is why there *IS* transaction support and foriegn key support in zApp. It just means you can't use something like SQLite for the MAIN database store for your email. But using SQLite for the local cache is fine. Nobody is going to be mucking with the local cache except for eMobius itself. Plugins that try to delete or change stuff in the cache are doomed to cause trouble and that kind of access probably won't be allowed. The cache isn't storing complicated stuff with lots of relationships, so it doesn't need a robust database.
In terms of the remote database, the use of constraints and keys is completely dependant upon the specific application being written and what it's purpose is. It depends on whether it is multi-user and how multiple users will be accessing it. In the case of email, for example, you might have multiple users *reading* email messages, but you don't run into the case where someone deletes something while someone else is modifying it. Email doesn't get modified. It gets stored on the database and then never changed. Flags like whether a message has been read are all user-specific and are not stored with the email message itself.
Also, in *real world* situations, nobody should be deleting any records from the database at all. Deletion is handled by setting a flag so that you can undelete stuff. Any plugin that thinks it can just do a DELETE on some table is already in trouble. I can't forsee any situation where a plugin is going to be deleting contracts or stuff like that.
Even with foreign keys, you can still end up with dangling data. For example, when I have a cache of mail messages, what happens when a mail message is deleted from the remote data store? Now my cache points to a non-existent record. No big deal...when that is discovered the entry is removed from the cache. The fact that there is no way to update all of the caches that might exist on multiple computers when a mail message is removed from the central store isn't a big problem.
Another example: the zMUD mapper. You have two rooms with an exit between them. Each room is a record in the Room table. The exit is a record in the Exit table and has fields FromRoom and ToRoom that point to the rooms on each end of the exit. If you try to make these into foreign keys, you run into all sorts of problems. You can no longer delete either room record, because there is an exit pointing to it. You don't want to delete the exit because the other room is still pointing to it. You need to update the ToRoom or FromRoom fields in the exit manually, and then delete the room. Foreign keys just get in the way in that situation.
Remember when deleting stuff in the mapper took forever? That is when I was using foriegn keys. When I took away the foriegn keys, performance went up by an order of magnitude at least. That was a case where performance mattered more than data integrity. So what if there ends up being a dangling exit in the database that doesn't point to any rooms. It's not a big deal.
So anyway, while I understand everything you said and agree with it from a high-level design point of view, when you really get down to optimizing performance you usually have to make some tradeoffs. And for a great number of programs, just sticking with plain local databases without anything fancy is fine. Think about the MUD List database in zMUD. It's a file on your local computer. You are not sharing it with other users. You don't need anything fancy for that. Only if you put this MUD List on some remote server and had multiple updates happening to it would you need to start worrying.
What I'm basically saying is that not everything needs to be client/server. All of this client/server/multitiered stuff is fine when necessary, but it isn't as necessary all the time as some people think. There is still a lot of power and functionality to be gained by local databases as ways to store large amounts of data rather than using text files or binary streams. For example, think of the zMUD settings file. Right now it's a binary stream of proprietary formatted data because at the time there were no good local database tools for Delphi and using ADO would have been overkill. Using something like SQLite I can put data like that into a database table and build a better data structure for the user settings. Then maybe settings files wouldn't get corrupted all the time. But in that single-user mode, something like SQLite is fine...SQL Server would be overkill.
As far as eMobius, there will be plenty of methods and properties for plugins to use. A plugin would NEVER NEVER do something like "DELETE FROM contracts...". They would call the eMobius method DeleteContract( "Rainchild"). In a sense, it's like calling a stored procedure. But the plugins don't have to care how it is implemented. |
|
|
|
Castaway GURU
Joined: 10 Oct 2000 Posts: 793 Location: Swindon, England
|
Posted: Fri Feb 18, 2005 11:33 am |
Zugg, somehow it sounds to me like you arent utilizing the actual database enough, whichever one it may be. Why rely on ADO sorting/filtering, when you can construct an SQL query that sorts and filters, and thus returns less data from the DB?
From the users point of view, if they are looking for an email from X, or with subject Y, then they enter that first, then you construct the query, send it to the DB, and display the results (which hopefully won't be lots) it may even be worth it to first construct a 'select count(*) where .. ' query first, and see how many rows it would be, and complain to the user if there are lots, that they should restrict some more, like dates or whatever...
I'm afraid I've not used ADO much at all, so I cant comment on it's handling of retrieving large blocks of results all at once. My perl experiences tell me it must be possible not to (or I havent tried on big enough tables yet.. I should do that before I make such claims .. And mebbe I will)
I wonder, if I attempt to use PerlScript with zApp, whether I can use Perls DBI code for the database stuff instead.. Hrmm..
Rainchild: Yes, I know what foreign keys are, and what problems they're supposed to solve. But still, with careful programming and use of transactions, one can replicate.. Maybe I'll think otherwise, when I ever have that much of a need of them. But generally, when one is creating a fixed frontend for a DB, one can create transactions, stored procedures etc, that do things atomically. (Plus, deleting contracts should just set a 'deleteddate' field, and leave the rest of the stuff alone.. but I see what you meant :)
Lady C. |
|
|
|
Tarn GURU
Joined: 10 Oct 2000 Posts: 871 Location: USA
|
Posted: Fri Feb 18, 2005 6:11 pm |
Zugg wrote: |
Also, in *real world* situations, nobody should be deleting any records from the database at all. Deletion is handled by setting a flag so that you can undelete stuff. Any plugin that thinks it can just do a DELETE on some table is already in trouble. I can't forsee any situation where a plugin is going to be deleting contracts or stuff like that.
|
I may misunderstand what you mean. If your comment about not wanting to really delete something is intended to apply to the actual mailstore database in eMobius, then I disagree.
Just as an example, I routinely send myself and several other people I work with very large attachements via email of data that will shortly be obsolete. These should be removable from the database- otherwise my mail store would become unmanageably large.
I've also worked with data that I've been asked not to retain- it was ok for me to look at while I was working on it, but I was asked to delete all copies afterwards.
Quote: |
As far as eMobius, there will be plenty of methods and properties for plugins to use. A plugin would NEVER NEVER do something like "DELETE FROM contracts...". They would call the eMobius method DeleteContract( "Rainchild"). In a sense, it's like calling a stored procedure. But the plugins don't have to care how it is implemented. |
As long as there's a method that really does do the deletion. I can easily imagine having a plugin to manage certain types of emails, including deleting them once they're no longer relevant.
-Tarn |
|
|
|
Zugg MASTER
Joined: 25 Sep 2000 Posts: 23379 Location: Colorado, USA
|
Posted: Fri Feb 18, 2005 8:14 pm |
Castaway: Here is an example. In the newer versions of Outlook, there is a "quick find" feature. When it is activated, you get a little search bar above your list of email. Normally all of the messages in the current folder are displayed in the message list. But if you enter a string into the quick-search field, then it narrows the view to only display messages that match the string.
So, in this case, the email client has already fetched ALL of the messages in the folder. Now you want to "filter" those messages and only display matching messages. Sure, you can create an SQL statement to do this, but you don't need to. The client already has all of the data it needs, so why query the server again? You just need to only display those records in the current dataset that match the criterion.
Same thing with sort. This is used when you click a column in your email message list to change the sort order. For example, instead of sorting by date, I click the Sender column to sort by sender. Again, the client already has all of the data...it doesn't need to ask the server again. It just needs to resort how the data is displayed to the user. It uses the local Sort property to do this.
Fortunately the ZeosLib stuff I'm using also implements all of this. And it seems to be smart. If all of the data is already at the client, it just filters or sorts the existing data. If it only has a partial set of data, then it forms an SQL query to send to the server.
But local filtering and sorting is *very* important for a desktop application like email where you want things to be fast. This is a perfect example where the standard client/server mentality breaks down. Just because you *can* create an SQL statement to do a query doesn't mean that you *should*.
As far as Perl and zApp, please give it a try and let me know. I don't use ActivePerl much myself, so I don't really know how it will work with zApp. It *should* work fine, and you *should* be able to use anything that you normally use, like the DBI stuff. It's more a matter of whether ActivePerl supports this and isn't really a zApp question. If it works in ActivePerl, then it should work in zApp. If it *does* work, I'd be interested in some comparisons of using the Perl DBI stuff vs using the zApp DB stuff. Which is faster? Which is easier? etc.
Tarn: I didn't mean that you couldn't delete stuff. I meant that a plugin should never be using a direct SQL DELETE statement. To delete a message they should use the messagestore API. That will let zApp take care of the details. Of course you eventually want to delete email. I think that current clients are a bit to quick to actually delete stuff though. Hard disks are a lot bigger these days, and I think the default should be to keep around deleted messages for a while. I know this has saved me *many* times when I went looking for something I deleted.
Obviously there will be settings for when to Purge deleted messages, and obviously you'll be able to perm delete something that you really need to get rid of. And yes, I can see plugins doing some filtering like this, but I'd be *really* careful about running a plugin that perm deleted a lot of stuff. So much chance of losing something important. But certainly additional spam filter plugins will be doing this.
My main point is that no plugin should be doing direct SQL calls on the backend database. For example, what if the user has their Mail Store set to use plain MBOX file formats? SQL statements aren't going to work with that. That's why the zMessageStore object was created...to isolate eMobius (and any plugins) from the details of the mail storage system. |
|
|
|
Kiasyn Apprentice
Joined: 05 Dec 2004 Posts: 196 Location: New Zealand
|
Posted: Sat Feb 19, 2005 10:59 am |
I look forward to ZQL...
|
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|