Elbonian database technology

The project I'm currently working on requires a database, and although I'm personally happy to use Oracle, the code I'm writing will probably be used elsewhere by other people, so I needed to keep the barrier to entry as low as possible. Most of the code is written in Perl, so whatever I chose as a database needed to have a DBI interface available for it.

Plan A was to use SQLite, which is a C library that implements a self-contained database engine, and has a DBI interface. Great, zero setup and provides everything I need - or so I thought. Unfortunately it has at best a tenuous grasp of SQL syntax, a frighteningly long list of bugs, most of which don't seem to be getting much attention as it appears the developers are busy completely reimplementing it. In addition, it seems to behave quadratically on large queries (even when it does use an index), the query plan output is in some RISC-like pseudo-assembler that is completely undecipherable and to top it all off it doesn't even support subselects, which I happen to need - and yes I know about outer joins, they won't cut it in this particular case.

Plan B was to use MySQL. I chronicled my problems getting it to build in my last post, the most serious issue being the compiler bug that it tickled. What is really annoying about the MySQL docs is that they talk about all the features available the alpha version of the software - unless you read the release notes very carefully it is far from clear that a particular feature isn't actually available in the production version! In my particular case having gone through the pain of getting it working I then found out that unless I use the latest beta I don't get subselects, which I've already said need, and I'd rather not use beta software if I can avoid it.

OK, on to plan C - Postgres. At first glance it appears to be much more SQL99 compliant that MySQL. Download, build - yep, no problems. Ok, last step - let's look at the perl DBI interface for it:


Although PostgreSQL has a cursor concept, it has not been used in the current
implementation. Cursors in PostgreSQL can only be used inside a transaction
block. Because only one transaction block at a time is allowed, this would
have implied the restriction, not to use any nested SELECT statements.
Hence the execute method fetches all data at once into data structures located
in the frontend application. This has to be considered when selecting large
amounts of data!

One of my tables will have 375,000 rows. Damn, Postgres is pretty much useless for my purposes as well. Ok, back to plan B, I'll take a look at the latest MySQL beta (4.1.3-beta) - good, it looks like it has subselect support. Phew. OK, let's go look at the perl DBI interface to it - hang on, what's this?

Note, that most attributes are valid only after a successful execute.
An undef value will returned in that case. The most important exception is
the mysql_use_result attribute: This forces the driver to use mysql_use_result
rather than mysql_store_result. The former is faster and less memory consuming,
but tends to block other processes. (That's why mysql_store_result is the default.)

Digging through the code and the MySQL docs reveals that by default DBD::mysql will also fetch the entire result set into memory. Aagh! At least I can turn it off, but having to specify mysql_store_result = false for each and every query is going to be a right pain. I could patch the code to allow a per-database setting for the attribute, but that means I'm either going to have to fork my own DBD::mysql driver, or try to get the change accepted my the module maintainers. Ugh.

As a long-time Oracle user I'm beginning to realise just how spoilt I have been, in that environment all this stuff 'just works'. Bearing in mind the type of issues that I've encountered in my explorations, I can only assume that most Open Source databases are used in fairly constrained circumstances - fetching the entire result set in one huge wodge doesn't seem particularly scalable, and subselects are a pretty widely used SQL feature that I'd expect to be widely supported. I'm guessing, but it seems to me that mostly they must be used for OLTP-style applications (e.g. websites, bug databases etc) where you are only fetching a small number of rows across a small number of joined tables, and generally via an index. I can't see how they could be suitable for the sort of heavy lifting that Oracle (or any of the other commercial RDBMSs) are often used for.

Categories : Tech, Perl, Work