commonly called "NoSQL" systems. Horizontal scaling allows
dozens or hundreds of machines to operate as a single database
system, performance improving approximately linearly with the
number of machines. This is interesting because traditional
relational database systems failed to scale well when their data
is distributed over many servers (with the exception of
read-mostly data warehousing). I have been studying scalable
datastores, and have written a paper comparing them:
This paper has now been published in ACM
SIGMOD Record. In the paper, I discuss scalable data
stores and categorize them into four groups:
I also wrote a paper summarizing the most important elements of
scalable database systems (in my opinion). This paper is
unpublished, but can be viewed here:
- Key-value stores,
including Voldemort, Riak, Redis, Membase,
- Document stores,
including CouchDB, MongoDB, Terrastore, and SimpleDB.
- Extensible record stores,
HBase, HyperTable, and Cassandra.
- Scalable RDBMSs,
Cluster, VoltDB, Clustrix, and ScaleDB.
Any input on these papers is welcomed. You can contact me at
rick(at)cattell.net. I will post corrections and revisions on
this web site (cattell.net/datastores).
If you'd like further reading on scalable SQL and NoSQL datastores,
you can click on the links above to learn more about specific
systems, or the links below for some other general references that I
You will find lots of claims about the performance and scalability
of systems out there, but few apples-to-apples comparisons. In
my opinion, the best scalability benchmark today is the Yahoo Cloud Serving
Benchmark. With the help of Roberto Zicari, I
interviewed two authors of the Yahoo benchmark paper:
is frequently updated with posts, videos, and articles on NoSQL
topics and systems.
has lots of good articles, upcoming events, and a complete list
of all the systems.
has regular postings, jobs, upcoming events, and links.
has good articles on scalability of databases and applications.
Anglade's NoSQL Tapes include great interviews of leading
- NoSQLWeekly.com offers
weekly newsletters on NoSQL systems from Rahul Chaudhary.
Sankar's blog had lots of good NoSQL and cloud computing
posts, along with web references.
Ellis has broad knowledge in NoSQL systems and has good
posts on his blog. There are some interesting discussions
in the responses as well.
- Schooner has a
good blog on an important trend I don't cover in my paper:
effectively using solid state disks as a third level in the
has various papers and links on NoSQL systems as well as other
I'm trying to encourage others to run the Yahoo (YCSB) benchmark as
well. This will hopefully reduce some of the scalability
hype and confusion out there.
I've recently written a paper with Mike Stonebraker, as well,
weighing the important factors in making a datastore scalable:
This paper has some discussion of SQL vs NoSQL scalability. A
revision of the paper has been published in Communications of the ACM (June