Sunday, May 9, 2010


My comments in response to nosql episode of Command Line podcast

There were few mistakes on perception of nosql databases; First of all, the advantage of nosql is not that it does something SQL databases "cannot do". It does distribution of data out of the box, that is, it is so simplified, ingrained in the product that you don't even think twice about them. But with SQL databases, sharding, distribution is an afterthought. Not that you cannot DO these with SQL databases, it's just that with nosql these tasks are SIMPLER. Included in the product from day one.

There are pedagocial issues at play here, which are almost as important as technological ones.

Same is true for basic CRUD operations. They are SIMPLER with nosql than they are with sql dbs. With Google Bigtable, I define Model classes in Python, send them over to the cloud, and I _have_ a database. Following through pointers, as in order.owner.address.street is very simple to do, and built-in, in contrast to SQL databases where you have to use something like Hibernate to achive the same result.

Plus, nosql makes you concious of sharding of data from day one; since joins are discouraged, you think distribution, and you have to think big. Sure, for small Web sites, small # of users you can use one database, and keep using joins, but you can also use one nosql shard, and use LESS complicated query (meaning no joins) and achieve same result.