Elbonian database technology

The project I'm currently working on requires a database, and although I'm personally happy to use Oracle, the code I'm writing will probably be used elsewhere by other people, so I needed to keep the barrier to entry as low as possible. Most of the code is written in Perl, so whatever I chose as a database needed to have a DBI interface available for it.

Plan A was to use SQLite, which is a C library that implements a self-contained database engine, and has a DBI interface. Great, zero setup and provides everything I need - or so I thought. Unfortunately it has at best a tenuous grasp of SQL syntax, a frighteningly long list of bugs, most of which don't seem to be getting much attention as it appears the developers are busy completely reimplementing it. In addition, it seems to behave quadratically on large queries (even when it does use an index), the query plan output is in some RISC-like pseudo-assembler that is completely undecipherable and to top it all off it doesn't even support subselects, which I happen to need - and yes I know about outer joins, they won't cut it in this particular case.

Plan B was to use MySQL. I chronicled my problems getting it to build in my last post, the most serious issue being the compiler bug that it tickled. What is really annoying about the MySQL docs is that they talk about all the features available the alpha version of the software - unless you read the release notes very carefully it is far from clear that a particular feature isn't actually available in the production version! In my particular case having gone through the pain of getting it working I then found out that unless I use the latest beta I don't get subselects, which I've already said need, and I'd rather not use beta software if I can avoid it.

OK, on to plan C - Postgres. At first glance it appears to be much more SQL99 compliant that MySQL. Download, build - yep, no problems. Ok, last step - let's look at the perl DBI interface for it:

Cursors

Although PostgreSQL has a cursor concept, it has not been used in the current
implementation. Cursors in PostgreSQL can only be used inside a transaction
block. Because only one transaction block at a time is allowed, this would
have implied the restriction, not to use any nested SELECT statements.
Hence the execute method fetches all data at once into data structures located
in the frontend application. This has to be considered when selecting large
amounts of data!

One of my tables will have 375,000 rows. Damn, Postgres is pretty much useless for my purposes as well. Ok, back to plan B, I'll take a look at the latest MySQL beta (4.1.3-beta) - good, it looks like it has subselect support. Phew. OK, let's go look at the perl DBI interface to it - hang on, what's this?

Note, that most attributes are valid only after a successful execute.
An undef value will returned in that case. The most important exception is
the mysql_use_result attribute: This forces the driver to use mysql_use_result
rather than mysql_store_result. The former is faster and less memory consuming,
but tends to block other processes. (That's why mysql_store_result is the default.)

Digging through the code and the MySQL docs reveals that by default DBD::mysql will also fetch the entire result set into memory. Aagh! At least I can turn it off, but having to specify mysql_store_result = false for each and every query is going to be a right pain. I could patch the code to allow a per-database setting for the attribute, but that means I'm either going to have to fork my own DBD::mysql driver, or try to get the change accepted my the module maintainers. Ugh.

As a long-time Oracle user I'm beginning to realise just how spoilt I have been, in that environment all this stuff 'just works'. Bearing in mind the type of issues that I've encountered in my explorations, I can only assume that most Open Source databases are used in fairly constrained circumstances - fetching the entire result set in one huge wodge doesn't seem particularly scalable, and subselects are a pretty widely used SQL feature that I'd expect to be widely supported. I'm guessing, but it seems to me that mostly they must be used for OLTP-style applications (e.g. websites, bug databases etc) where you are only fetching a small number of rows across a small number of joined tables, and generally via an index. I can't see how they could be suitable for the sort of heavy lifting that Oracle (or any of the other commercial RDBMSs) are often used for.

Categories : Tech, Perl, Work


Re: Elbonian database technology

There are at least two other options that may (or may not) work better for you. Both have Perl DBD drivers: SapDB (http://www.sapdb.org) [now apparently known as MaxDB] & Firebird (http://firebird.sourceforge.net/).

Re: Elbonian database technology

I'm not sure what your problem would be with PostgreSQL. 375,000 rows is not big deal, I've got one table with 1.2 million rows and I haven't had any problems. I'd imagine that if you asked around you'd find PostgreSQL databases that are much, much larger. Sub-selects have worked fine. I'm using it to replace an old Oracle 7 database.

Re: Elbonian database technology

Thanks for the tips on the alternative databases cp, I'll take a look.

Joseph, the issue with Postgres is not the number of rows in the database itself, it's the behaviour of the Perl DBI interface. If I do a select that matches every row in a 375,000 row table I'll end up with all 375,000 rows in memory in the client DBI script. The reason I originally looked as Postgres was precisely because of it's good support for SQL99, it's just the Perl DBI interface which isn't up to the job.

Re: Elbonian database technology

We recently came across this problem in DBD::mysql. I was in shock. I simply expect database to automatically have cursors and I don't understand how they do not. This is simply bad and currently these databases are just quick hacks. Sigh.

Re: Elbonian database technology

Hi Leon, nice of you to drop by ;-) I agree that many of the OSS databases lack a lot of the features that you would reasonably expect, despite all the hype they seem to engender.

Re: Elbonian database technology

I asked the DBD-Pg guys. Here is their response. Doesn't look good.

Re: Elbonian database technology

No it certainly doesn't look good :-( MySQL 4.1.3 supports transactions and nested subselects plus a load of other new goodies, and the version 5 alpha supports stored procedures, so I'm going with 4.3.1 as the best compromise.

I'm in the process of hacking the DBD::mysql driver so you can switch it into 'fetch only one row at a time' mode on a per-database handle basis, and I intend to submit the patch back to the developers. Give me a shout if you want a copy in the meantime.

Re: Elbonian database technology

The DBD::mysql driver already functions on a per database handle basis. That is to say:

<code>local $dbh->{mysql_use_result} = 1;</codE>

Works fine, without affecting the other connections.

Here's a couple of other points you may (or may not) have missed:

  • Once you start using mysql_use_result you can't use that handle for anything else until that query is over. You probably want to do a my $new_dbh = $dbh->clone before setting the mysql_use_result value and then use the $new_dbh for your SELECTs.
  • During the query you're locking the table you're SELECTing on with the mysql_use_result for writing. This means that anything that tries to UPDATE a table you're selecting on will block. And it also means that if you're not careful with your priorities any SELECTs you do will block waiting for the higher priority UPDATE to complete (this can result in deadlock if you're waiting for the result of a second select on the table before you complete the mysql_use_result SELECT).
Fun, isn't it.

Re: Elbonian database technology

Thanks for the useful info Mark - I'd just finished reading DBD::mysql the source and figured out that because it uses the standard DBI functions for attribute handling, it got inheritance for free - however having it confirmed by you was helpful :-) It might be worth asking the developers to make the docs a little clearer though.

The tip about cloning the handle is also a good one, I hadn't thought about that - if I understand what you are saying, once I have done an execute I have to complete all the fetching before using the handle for (say) another prepare - is that correct? So this is OK:

$dbh->{mysql_use_result} = 1;
my $q1 = $dbh->prepare(...);
my $q2 = $dbh->prepare(...);
$q1->execute(...);
while ($_ = $q1->fetchrow_arrayref()) {
...
}
$q2->execute(...);
while ($_ = $q2->fetchrow_arrayref()) {
...
}
my $q3 = $dbh->prepare(...);

But this isn't OK:

$dbh->{mysql_use_result} = 1;
my $q1 = $dbh->prepare(...);
$q1->execute(...);
while ($_ = $q1->fetchrow_arrayref()) {
...
my $q2 = $dbh->prepare(...);
$q2->execute(...);
while ($_ = $q2->fetchrow_arrayref()) {
...
}
...
}

As regards the locking issue, I was intending to use InnoDB tables, and according to the MySQL docs they support row-level locking rather than table locking as well as consistent non-locking reads, so I should be OK.

Re: Elbonian database technology

Regarding DBD::Pg limitations, I think that the reason is that libpq doesn't support cursors very well is the answer that the support in DBD::Pg is not very good. Have a look at this thread.

Re: Elbonian database technology

Leon: the lack of cursors contributes significantly to the performance of mysql and others. It makes the protocol very simple.

Mark: You can avoid the lock-held-for-longer problem by adding SQL_BUFFER_RESULT to the SELECT (a feature mysql.com added some time ago at my request).

Alan: mysql's definition of beta is fairly conservative. Many sites use mysql beta releases in production.

[An aside: This whole thread would have been a valuable addition to the dbi-users mailing list. It would have been seen by many more people and so would have got more helpful comments and educated more people along the way.]

Re: Elbonian database technology

Thanks for the comments everyone. From my point of view the penalty in terms of flexibility that you pay for not supporting cursors doesn't seem worth the improvement in speed, but that's purely my perspective. Being able to use either model would be the ideal solution. Tim, I've read through the MySQL docs and I agree that their definition of beta is conservative, which is why I've decided to use 4.1.3-beta (Although I'd still prefer to be using Oracle ;-)

I wonder if it is worth posting some sort of summary of this thread to dbi-users?

Re: Elbonian database technology

Any comments on the recent developments at CA wrt Ingres?
http://opensource.ca.com/projects/ingres
I've not used Ingres myself but this seems like it could be interesting.

Re: Elbonian database technology

When CA annonced the availability of the source (http://www3.ca.com/Press/PressRelease.asp?CID=61597) I went and had a look at the website. Unfortunately at the moment it is only available for Linux, although from reading the forum posts I expect this will change at some point in the future. As the site was so new there wasn't much feedback from people who'd tried to download and build Ingres, so after having a quick poke around I left ;-) If you are interested, you can find the Open Ingres website at http://opensource.ca.com/projects/ingres/.

Re: Elbonian database technology

Why would the perl DBI interface need nested transactions to use a cursor? It seems to me you could begin; declare cursor bubba ..(complex SQL goes here) fetch fetch fetch update ... fetch insert ... declare cursor bubba2 do more stuff here commit / rollback; Just fine, I don't see this being a limit of PostgreSQL, I see it being a weird implementation behaviour in perl's DBI interface that requires savepoints to do cursors properly, or someone misunderstanding things and believing you need them somehow for cursors to work. Or are you talking about updateable cursors or something. Don't know if anyone will see this or not...