In
recent
years,
a
number
of
data
storage
systems
have
been
developed
with
excellent
horizontal
scaling
properties.
Most
are
commonly called "NoSQL" systems. Horizontal scaling allows
dozens or hundreds of machines to operate as a single database
system, with performance improving approximately linearly with the
number of machines. This is interesting because traditional
relational database systems failed to scale well when their data
is distributed over many servers (with the exception of
read-mostly data warehousing).
Scalable data stores can be categorized into four groups:
- Key-value stores,
including Riak, Redis,
Voldemort, Membase,
and Dynamo.
- Document stores,
including CouchDB, MongoDB, Terrastore,
and SimpleDB.
- Extensible record stores,
including BigTable,
HBase, HyperTable, and Cassandra.
- Scalable RDBMSs,
including MySQL
Cluster, VoltDB, Clustrix, and ScaleDB.
Here is a paper I wrote comparing these systems:
Datastore Comparison
This paper was published in ACM
SIGMOD Record in 2010. Some of the system descriptions
are now out of date, so you should check the sources above or talk to me to learn
what has changed. I also wrote a paper on the 8 most
important elements of scalable database systems. This paper
is unpublished, but can be viewed here:
Requirements
for Scalability
And finally, I wrote a paper with Mike Stonebraker, weighing the
important factors in making a datastore scalable:
CACM paper
This paper has some discussion of SQL vs NoSQL scalability.
It was published in Communications
of the ACM (June 2011).
Any input on these papers or suggestions on this web page are
welcomed. You can contact me at rick(at)cattell.net.
If you'd like further reading on scalable SQL and NoSQL datastores,
you can click on the links at the top of this page to learn more
about specific systems, or the links below for some other general
references that I like:
- ODBMS.org
has various papers and links on NoSQL systems as well as other
object stores.
- NoSQL.mypopescu.com
is frequently updated with posts, videos, and articles on NoSQL
topics and systems.
- NoSQL-Database.org
has lots of good articles, upcoming events, and a complete list
of all the systems.
- NoSQLDatabases.com
has regular postings, jobs, upcoming events, and links.
- HighScalability.com
has good articles on scalability of databases and applications.
- Tim
Anglade's NoSQL Tapes include great interviews of leading
NoSQL players.
- NoSQLWeekly.com offers
weekly newsletters on NoSQL systems from Rahul Chaudhary.
- Krishna
Sankar's blog had lots of good NoSQL and cloud computing
posts, along with web references.
- Jonathan
Ellis has broad knowledge in NoSQL systems and has good
posts on his blog. There are some interesting discussions
in the responses as well.
You will find lots of claims about the performance and scalability
of systems out there, but few apples-to-apples comparisons. In
my opinion, the best scalability benchmark today is the Yahoo
Cloud Serving Benchmark. With Roberto Zicari, I
interviewed two authors of the Yahoo benchmark paper:
YCSB
Interview
I'm trying to encourage others to run the Yahoo (YCSB) benchmark as
well. This will hopefully reduce some of the scalability
hype and confusion out there.