Neil Gunton has written an interesting article debunking some common Open Source myths. I agree with most of what he says, and have made similar points myself in the past. Some of the Slashdot thread ensuing from Neil's article is predictable - software is theft, all commercial software needs can be met by HobbyHackers, gratis vs libre blah blah blah, but some of the people commenting do understand and agree with Neil's points, which is refreshing. I don't believe that either closed or open development is necessarily better than the alternative approach. I've seen some goddam awful closed source products, and some equally bad open source ones. Some people inside Sun seem to believe that Closed Source is a dead development model, but I personally just don't buy it. Both open and closed source approaches can yield high quality products with short development times, the key factor is the the team who are working on the project and the tools and techniques they are using, not whether they are from Lilliput or Blefuscu.
There seems also to be a great deal of confusion over the interpretation of the term 'Open Source'. Quite often people assume it necessarily implies Open Development, but I believe the two concepts are entirely different:
- Open Source
The source code for a piece of software is available without restriction or cost.
- Open Development
Anyone who is so inclined is able to contribute to a project without restriction.
Whilst Open Source undoubtedly exists, I'm not so sure that Open Development in the widest sense actually does. I've been looking at several Open Source databases over the last couple of days, and to take just one example, while MySQL is Open Source, it certainly isn't Open Development. Although it is possible to get patches back in to MySQL, it has to be done by submitting them to the MySQL developers, and when you look at the release notes and development roadmaps for MySQL it is quite clear that MySQL AB (the company) is in complete control of MySQL (the product). Even Linux isn't completely Open Development - Linus keeps a very firm rein on what goes in to Linux, as Bryan Cantrill points out, the Linux LTT folks have been trying to get their stuff integrated for more than five years.
That's not to say there aren't extremely good reasons for this state of affairs - letting everyone who had the whim integrate their changes willy-nilly would lead to anarchy and permanent brokenness. However, when evaluating the 'Openness' of a piece of software, the discussion seems to centre entirely around if it is Open Source - possibly because it's a much easier, almost binary, distinction between 'Open Source' and 'Not Open Source'. Evaluating how a project ranks with regards to Open Development is is much harder - there is a wide continuum of how easy it is to contribute to a project, and of the restrictions which control how you may contribute. MySQL is obviously not Open Development, Linux is more Open Development, but not completely so. Some 'Open Source' projects also require assignment of copyright to the project if any contributions are made. Again there are often very good reasons for this, but it doesn't seem exactly 'Open' if I contribute something and it ends up being owned by someone else.
At the moment Open Source is Soup de Jour, and it is interesting to see how different companies react and adopt their business models to encompass it. It's often convenient for companies to be able to fill in the 'Open Source' checkbox whilst keeping tight control of their intellectual property. I'm not implying this is a necessarily bad thing, after all they have a responsibility to their shareholders to protect their assets. However it's interesting to note the lack of discussion around what access to source code actually means in practical terms. It appears that most of the Open Source bigots can't and/or don't want to distinguish between Open Source and Open Development, and that some companies are taking advantage of this to jump on the Open Source bandwagon whilst continuing to behave exactly as they did before.
cp kindly commented on my last post regarding Open Source databases, and suggested two others I should go look at, SapDB/MaxDB and Firebird. I looked at MaxDB, and it is a re-branded and enhanced version of SAP DB, SAP AG's open source database. It has been taken over by MySQL AG, and it appears that it will eventually be merged into MySQL. It therefore doesn't look like a particularly good choice, as it might not be around in the future.
I then went and looked at Firebird. This started out as Borland's InterBase product, which they released as open source in 2000, then changed their minds and tried to close it back up. The result was a fork - the original Borland code was taken and rewritten in C++ (no, I have no idea why) by the Firebird community (including the original Interbase author) that developed around the original Borland code release. It's an interesting story that anyone who is thinking about open sourcing a large piece of software (*cough*) should be aware of. The documentation is a bit thin - the Firebird website relies heavily on the Borland docs, but it seems to be a mature product, so I'm going to give it a whizz.
The project I'm currently working on requires a database, and although I'm personally happy to use Oracle, the code I'm writing will probably be used elsewhere by other people, so I needed to keep the barrier to entry as low as possible. Most of the code is written in Perl, so whatever I chose as a database needed to have a DBI interface available for it.
Plan A was to use SQLite, which is a C library that implements a self-contained database engine, and has a DBI interface. Great, zero setup and provides everything I need - or so I thought. Unfortunately it has at best a tenuous grasp of SQL syntax, a frighteningly long list of bugs, most of which don't seem to be getting much attention as it appears the developers are busy completely reimplementing it. In addition, it seems to behave quadratically on large queries (even when it does use an index), the query plan output is in some RISC-like pseudo-assembler that is completely undecipherable and to top it all off it doesn't even support subselects, which I happen to need - and yes I know about outer joins, they won't cut it in this particular case.
Plan B was to use MySQL. I chronicled my problems getting it to build in my last post, the most serious issue being the compiler bug that it tickled. What is really annoying about the MySQL docs is that they talk about all the features available the alpha version of the software - unless you read the release notes very carefully it is far from clear that a particular feature isn't actually available in the production version! In my particular case having gone through the pain of getting it working I then found out that unless I use the latest beta I don't get subselects, which I've already said need, and I'd rather not use beta software if I can avoid it.
OK, on to plan C - Postgres. At first glance it appears to be much more SQL99 compliant that MySQL. Download, build - yep, no problems. Ok, last step - let's look at the perl DBI interface for it:
Cursors Although PostgreSQL has a cursor concept, it has not been used in the current implementation. Cursors in PostgreSQL can only be used inside a transaction block. Because only one transaction block at a time is allowed, this would have implied the restriction, not to use any nested SELECT statements. Hence the execute method fetches all data at once into data structures located in the frontend application. This has to be considered when selecting large amounts of data!
One of my tables will have 375,000 rows. Damn, Postgres is pretty much useless for my purposes as well. Ok, back to plan B, I'll take a look at the latest MySQL beta (4.1.3-beta) - good, it looks like it has subselect support. Phew. OK, let's go look at the perl DBI interface to it - hang on, what's this?
Note, that most attributes are valid only after a successful execute. An undef value will returned in that case. The most important exception is the mysql_use_result attribute: This forces the driver to use mysql_use_result rather than mysql_store_result. The former is faster and less memory consuming, but tends to block other processes. (That's why mysql_store_result is the default.)
Digging through the code and the MySQL docs reveals that by default DBD::mysql will also fetch the entire result set into memory. Aagh! At least I can turn it off, but having to specify
mysql_store_result = false for each and every query is going to be a right pain. I could patch the code to allow a per-database setting for the attribute, but that means I'm either going to have to fork my own DBD::mysql driver, or try to get the change accepted my the module maintainers. Ugh.
As a long-time Oracle user I'm beginning to realise just how spoilt I have been, in that environment all this stuff 'just works'. Bearing in mind the type of issues that I've encountered in my explorations, I can only assume that most Open Source databases are used in fairly constrained circumstances - fetching the entire result set in one huge wodge doesn't seem particularly scalable, and subselects are a pretty widely used SQL feature that I'd expect to be widely supported. I'm guessing, but it seems to me that mostly they must be used for OLTP-style applications (e.g. websites, bug databases etc) where you are only fetching a small number of rows across a small number of joined tables, and generally via an index. I can't see how they could be suitable for the sort of heavy lifting that Oracle (or any of the other commercial RDBMSs) are often used for.
I'm putting this here in the hope that it will help someone else once it gets indexed by Google, as I certainly couldn't find anything helpful myself. I've been trying to build and install MySQL 4.0.20 on Solaris 10 x86. It was building OK, but when I did
make install I got the following error:
Making install in extra make: Fatal error: Don't know how to make target `../mysys/libmysys.a' Current working directory /home1/tonic/infrastructure/bld/mysql-4.0.20/extra *** Error code 1 make: Fatal error: Command failed for target `install-recursive'
The cause of the error was simple to fix but distinctly non-obvious to find. I was setting the
INSTALL environment variable to point to the
install-sh script that comes as part of mysql to do the install (the Solaris
install is incompatible with the GNU one), and that was the cause of the problem - the
install-sh that ships as part of MySQL won't actually install it. So much for testing! I grabbed and built a copy of the GNU install utility from here, used that and it then installed fine.
Unfortunately even once built and installed, MySQL fails its test suite - the 'union' test fails because the
found_rows() SQL function returns a large negative integer instead of the correct positive integer number of rows. I'm pretty certain it is a unsigned/signed 32-bit integer overflow problem as the value returned is
(-2^32) + <correct row count>, but tracking it down could be tricky - it's probably the result of some sort of disagreement between gcc and our compiler.
This is all supposed to 'just work', sigh.
I've raised bugs for both of these:
Turns out the found_rows() issue is actually a bug in the Forte compiler optimizer on x86 - it's calculating an incorrect value when doing mixed 32/64 bit unsigned arithmetic - sigh. I've put some workarounds in the MySQL bug database and raised a Sun bug - the bugid is 5077233 for those of you with SunSolve access.
I went down to Watford yesterday to meet up with Danese Cooper who was giving talks on both Open Source within Sun and Blogging. I didn't fancy driving the 7-8 hour round trip, so I decided to go by train. The first shock was the price - 187 Pounds (for my colonial colleagues that's 350 US Dollars). The Virgin Trains service was over half an hour late leaving, three-quarters of an hour late arriving, the buffet bar had no hot food or drinks and there was only one working toilet on the entire eight-carriage train. The only plus was that the seat I managed to grab had a table with a power socket so I could use my laptop on the way down (and on that front does anyone know the correct XOrg HorizSync and VertRefresh values for a Tecra M3?). I was complaining to the staff on the train about the toilet situation, and they said that it was 'normal' and that there wasn't any point either them or I complaining as we would both be equally ignored by Virgin Train's management. Virgin Trains often get panned in the UK Press, and I can fully understand why - boy I'm glad I'm not a shareholder.
Before listening to Danese I got to meet and chat with Dave Edmondson which I thoroughly enjoyed - Dave runs the PlanetSun Sun weblog aggregator, and like me works in Solaris Software (although a different part - at least until the next reorg ;-) Like many people in Sun, I know lots of people within the company who I've never physically met - for example when I was on the Solaris 10 cteam I worked with the rest of the group on a daily basis for 9 months before I actually met any of them in the flesh. As I said, this isn't unusual within Sun, in fact it has pretty much become the norm.
I also got to meet Danese - I'd heard her talking before at OSCON 2001, but I snuck away without introducing myself. She talked about Open Source at Sun, and the different business models that the various OSS companies were trying out. This was interesting for me as I'm working on the project to Open Source Solaris at the moment, and listening to her gave me a different perspective on some the issues, which I found useful and thought-provoking. She also talked about blogging within Sun, Danese (in best American tradition ;-) has being doing a whistlestop tour of Europe evangelising blogging within Sun. She mentioned that she had nearly flamed me over my requiem for Open Source, but I still maintain my position. Danese told a little anecdote during her talk yesterday - she has a friend who teaches people in the third world how to farm sustainably even though he believes the planet has passed the point of no return and is doomed - he's trying to delay what he believes is the inevitable catastrophe. Perhaps I'm in the same position with respect to Open Source ;-)
I occasionally want to visit a website (e.g. The New York Times that requires registration, and I object to having to give out my personal details and email address - I already get quite enough spam. I've come across the following services that can help out:
- bugmenot provides a pool of pre-registered usernames and passwords to commonly accessed websites. If you are feeling altruistic you can add your own for others to use.
- spamgourmet provides disposable email addresses - the first 3 emails sent to the disposable address are forwarded to your real email address, subsequent ones are discarded.
- mailinator provides another source of disposable email addresses, but with this site you don't even have to register first, you just make up an address and then go to the website to view the mails.
I'm part of a local Samba band, and now summer is here (hah!) we are in the thick of our gig season. We rely on performance fees collected over the summer to keep us going over the winter months. Yesterday we played at St Swithun's Community Centre in Wakefield. The weather was pretty poor, but at one point we had a group of 8-10 year old majorettes dancing round and round us as we played, which was kinda nice.
Today we played at Tameside Canal Festival at Portland Basin in Ashton. This event happens each year and is in aid of Willow Wood Hospice which cares for terminally ill people in the area. Just as we ended our first set it started to rain, so we played the second set inside the marquee. Danny who dances with us on occasion was there, and he got all the kids (and a few adults!) up and dancing round the marquee in a huge conga. To top it off, a bagpiper in full highland regalia who was also playing at the festival came in and started jamming with us on our last number. Mixing Samba and bagpipes is not entirely unknown - there is a band called MacUmba who specialise in Brazilian/Scottish fusion. I thought we played pretty damn well, and when we finished to cheers of 'More! More! Encore!" it topped off what was a really enjoyable gig. As soon as we finished James and I had to dash off to get him to his swimming test, so I'll find out if we managed to attract any new members at the next practice on Tuesday.
I took these photos a few weeks ago - we'd taken the kids to Lyme Park, a stately home near where we live, for an evening stroll and I took these on the way back from the Cage (a medieval hunting lodge in the grounds) to the house itself.