When I start designing a data-driven system, whether it is for an interview, home project, or work-related purpose, the database type is the last thing I think of. The wrapper design pattern gives that flexibility.
Sometimes, the answer is obvious as it could be Relational, Document, or a hybrid of both.
- In this post, I will not discuss performance and scaling.
- When it comes to text search, considering text-search servers such as Solr or ElasticSearch is the right choice.
Resolving the argument
data relationship and application code simplicity
The data relationship is an important factor for choosing a data model. The possible data relationships are:
- one-to-one: For example, each country id is mapped into a country name.
- one-to-many: A hierarchial data model. e.g. user record and its group of related info such as gender, age, school…
- many-to-many: When entity A has many children from entity B and vice-versa. A classical example for that is the Student-Class relationship. A student can enroll to many classes and any class may include many enrolled students.
- many-to-one: The opossite direction of one-to-many: e.g. All the users at cetain age.
The document database is more efficient to use for one-to-many relationships because one query will retrieve all the entity’s related data. However, reading partial data is not efficient as the whole document will be loaded each time (See Data locality below).
Use-case: If you want to use the document database to store linkedIn profiles, and you are interested in retrieving the available users names only, this function is less efficient in the document model.
With either many-to-many or many-to-one relationships, the Document model is less appealing as the complexity of the joins will move to the application code.
With document databases, the document is stored as a continuous json or xml while in relational databases, the data is shredded over several tables. Retrieving the data in a document model is more efficient and easier for processing. The advantage comes in handy only when the user need large parts of the document at the same time.
Updating a specific document key requires rewriting the whole document. This is not the case with relational databases.
Declarative vs imperative query
The query language of the document model is imperative. i.e. the application code implements a logic to filter the desired query by iterating the documents and filter the desired ones. MapReduce comes in handy to eliminate such boilerplate code. However, it is not efficient as using a declarative SQL query language.
Imperative querying is less efficient as it doesn’t enable parallelism easily: It is hard to asynchronously iterate over documents to filter the desired ones. While it is less easier to run SQL query over SQL engine where the later has the flexibility to optemize it.
The document model places no restriction on the schema (and the JSON mode in the relational databases as in DB2/PostgresDb). As schemaless as it may seem, the application code still assumes a basic structure over the document.
Changing an attribute name in the document model doesn’t require down-time as in relational database.
With relational database, an UPDATE+ALTER query must execute over the database which requires downtime. while with document databases, the user applies the new schema for each record.