For our new startup we  looking for a good database that we can store schema-less data of different entities. Last year I had started reading the book MongoDB: The Definitive Guide but I ended up with stop reading since it didn’t sound much interesting. However one year later with more solid knowledge of database systems, it sounded cool.

First of all I started reading Bret Taylor’s  article How FriendFeed uses MySQL to store schema-less data. What he does is to store entities as JSON objects in MySQL column and then manage indexes in separate tables to refer those JSON objects.

Although is a brilliant technique, it has a few drawbacks. First of all, indexing. If you need to perform search on columns, you should update both index table and JSON table values. This is twice expensive on updates in terms of query overhead. Second is, search on unindexed columns are not practical as hell. Third is, it is not possible to return only specific fields from the database. You need to return the whole entity and then process in the client, which can result in network transfer overhead.

I knew that document databases weren’t so great when FriendFeed started to design their system in 2007, so today there are many good document databases out there. Therefore, I decided to give a chance to MongoDB, and in fact, it amazed me. Here’s a few of my impressions I got after reading the book:

  • Easy to install, almost no configuration.

  • Really easy to insert/query/update. SQL is a complete mess after seeing JSON-style documents and querying.

  • Update queries affect only single document, in contrast to SQL databases that scan the whole table and may cause disasters if you forget the “where” clause. However you can always make update queries to update all the matching documents.

  • Upsert queries exist, which is insert if not exists, or update if exists. It also exists in MySQL but not practical.

  • Embedding documents is such a good idea instead of creating multiple tables in SQL if you have one-to-many relationships dominating. Embedded document columns can be indexed.

  • Query cursors are cool. You can skip 100 records and then get 5 records and then skip another 100 and get another 5. Skipped records won’t be transferred over the network. If new results arrive matching that query, cursor will collect them and add to your results. Don’t forget to dispose cursors.

  • Indexes are cool in MongoDB. You can just create ascending/descending indexes on several columns, and composite indexes are used when requested columns are prefixes of those indexes. Unique indexes exist.

  • Indexes are easy to create and remove either programmatically or from shell.

  • Java driver is their first and the frequently developed driver. Because of the nature of Java, it doesn’t look so good (compared to Scala DSLs and Ruby drivers) however it does the job. It has built-in connection pooling and thread-safety.

  • Backups are so easy. Just backup the db file after stopping the database. mongodump can be used to dump db without stopping server, however causes slowdown during backup.

  • Primary keys exist in all inserted documents by default. MongoDB inserts a hexadecimal _id field to all inserted documents and it is in ascending order. You can find insertion date of a document from “_id” field, which is cool.

  • If you don’t need a lot of many-to-many relationships, you can embed documents easily. If you need, DBRef is a good method to reach documents on other collections (tables).

  • MapReduce queries are not quite easy to learn and prepare, however, once you get the concept very well they are so useful and used mostly in background jobs to collect data from your database.

  • Scales very easily. Replication and sharding is easier than MySQL. (at least it seemed so) Sharding still does not seem like a piece of cake, however. Probably I won’t need both for a long time.

  • Running stored-JavaScript on database looks cool but I still could not find a use case and I think they are not well supported by language drivers.

  • Licensed under GNU AGPL license, all the database and drivers are open source.

  • foursquare uses MongoDB, which is cool.

  • It has a good “POJO (Java objects) to Mongo documents” mapping tool developed by developers of MongoDB, called Morphia. Although I have a few concerns about it, probably I will use it because it is pretty handy. It also has a Play! framework plugin, very useful for me.