commonly called "NoSQL" systems. Horizontal scaling allows
dozens or hundreds of machines to operate as a single database
system, with performance improving approximately linearly with the
number of machines. This is interesting because traditional
relational database systems failed to scale well when their data
is distributed over many servers (with the exception of
read-mostly data warehousing).
Scalable data stores can be categorized into four groups:
- Key-value stores,
including Riak, Redis,
- Document stores,
including CouchDB, MongoDB, Terrastore,
- Extensible record stores,
HBase, HyperTable, and Cassandra.
- Scalable RDBMSs,
Cluster, VoltDB, Clustrix, and ScaleDB.
Here is a paper I wrote comparing these systems:
This paper was published in ACM
SIGMOD Record in 2010. Some of the system descriptions
are now out of date, so you should check the sources above or talk to me to learn
what has changed. I also wrote a paper on the 8 most
important elements of scalable database systems. This paper
is unpublished, but can be viewed here:
And finally, I wrote a paper with Mike Stonebraker, weighing the
important factors in making a datastore scalable:
This paper has some discussion of SQL vs NoSQL scalability.
It was published in Communications
of the ACM (June 2011).
Any input on these papers or suggestions on this web page are
welcomed. You can contact me at rick(at)cattell.net.
If you'd like further reading on scalable SQL and NoSQL datastores,
you can click on the links at the top of this page to learn more
about specific systems, or the links below for some other general
references that I like:
You will find lots of claims about the performance and scalability
of systems out there, but few apples-to-apples comparisons. In
my opinion, the best scalability benchmark today is the Yahoo
Cloud Serving Benchmark. With Roberto Zicari, I
interviewed two authors of the Yahoo benchmark paper:
has various papers and links on NoSQL systems as well as other
is frequently updated with posts, videos, and articles on NoSQL
topics and systems.
has lots of good articles, upcoming events, and a complete list
of all the systems.
has regular postings, jobs, upcoming events, and links.
has good articles on scalability of databases and applications.
Anglade's NoSQL Tapes include great interviews of leading
- NoSQLWeekly.com offers
weekly newsletters on NoSQL systems from Rahul Chaudhary.
Sankar's blog had lots of good NoSQL and cloud computing
posts, along with web references.
Ellis has broad knowledge in NoSQL systems and has good
posts on his blog. There are some interesting discussions
in the responses as well.
I'm trying to encourage others to run the Yahoo (YCSB) benchmark as
well. This will hopefully reduce some of the scalability
hype and confusion out there.