Log-In to post
One of the most significant differences between the new generation of non-relational (also known as NoSQL) databases and the traditional RDBMS is the way in which consistency of data is handled. In a traditional RDBMS, all users see a consistent view of the data. Once a user commits a transaction, all subsequent queries will report that transaction and certainly no-one will see partial results of a transaction. RDBMS transactions are typically described as “ACID” transactions. That is, they are:
As databases become distributed across multiple hosts, maintaining ACID consistency becomes increasingly difficult. In a transaction that spans multiple independent databases, complex two-phase commit protocols must be employed. In the case of a truly clustered distributed database even more complex protocols are required, since the state of data in memory and the state of data in various transaction logs and data files must be maintained in a consistent state (cache fusion in Oracle RAC for instance).
In 2000, Eric Brewer outlined the CAP (AKA Brewer’s) Theorem. Simplistically, CAP theorem says that in a distributed database system, you can only have at most two of the following three characteristics:
Interpretation and implementations of CAP theorem vary, but most of the NoSQL database system architectures favour partition tolerance and availability over strong consistency.
A compromise between eventual consistency and weak (no guarantees) consistency is Eventual Consistency.
The core of the eventual consistency concept is that although the database may have some inconsistencies at a point in time, it will eventually become consistent should all updates cease. That is, inconsistencies are transitory: eventually all nodes will receive the latest consistent updates.
BASE – Basically Available Soft-state Eventually consistent is an acronym used to contrast this approach with the RDBMS ACID transactions described above.
Not all implementations of eventually consistent are equal. In particular, an eventually consistent database may also elect to provide the following:
NRW notation describes at a high level how a distributed database will trade off consistency, read performance and write performance. NRW stands for:
When N=W then the database will always write every copy before returning control to the client – this is more or less what traditional databases do when implementing synchronous replication. If you are more concerned about write performance than read performance, then you can set W=1, R=N. Then each read must access all copies to determine which is correct, but each write only has to touch a single copy of the data. Most NoSQL databases use N>W>1: more than one write must complete, but not all nodes need to be updated immediately. You can increase the level of consistency in roughly three stages:
1. If R=1, then the database will accept whatever value it reads first. This might be out of date if not all updates have propagated through the system.
2. If R>1 then the database will more than one value and pick either the most recent (or “correct”) value.
3. If W+R>N, then a read will always retrieve the latest value, although it may be mixed in with “older” values.
In other words, the number of copies you write and the number of copies you read is high enough to guarantee that you’ll always have at least one copy of the latest version in your read set. This is sometimes referred to as quorum assembly.
NoSQL databases generally try hard to be as consistent as possible, even when configured for weaker consistency. For instance, the read repair algorithm is often implemented to improve consistency when R=1. Although the application does not wait for all the copies of a data item to be read, the database will read all known copies in the background after responding to the initial request. If the application asks for the data item again, it will therefore see the latest version.
NoSQL databases can seem simplistic in some respects, but there’s a lot of really clever algorithms going on behind the scenes. For example, the vector clock algorithm can be used to ensure that updates are processed in order (monotonic consistency).
With vector clocks, each node participating in the cluster maintains a change number (or event count) similar to the System Change Number used in some RDBMSs. The “vector” is a list including the current node’s change number as well as the change numbers that have been received from other nodes. When an update is transmitted, the vector is included with the update and the receiving node compares that vector with other vectors that have been received to determine if updates are being received out of sequence. Out of sequence updates can be held until the preceding updates appear.