MongoDB still /dev/null webscale

Although still popular with many devs, MongoDB is still a /dev/null webscale database. I’ve found a few recent articles that help better illustrate some of the issues with mongodb in your projects. These issues are in line with my personal experiences using Mongodb 2.0 and 2.2:

Take a look at these recent articles:
Broken by Design: MongoDB Fault Tolerance
When PR says No but Engineering says Yes

A few key quotes:

So, let’s imagine that you’re building an Instagram clone that you’ll eventually sell to Facebook for $1B, and see if we can accomplish that with MongoDB. Your site probably has a front-end web server where users upload data, like putting up new pictures or sending messages or whatever else. You can’t really drop this data on the ground — if you could lose data without consequences, you would just do precisely that all the time, skip having to configure MongoDB, and read all those dev blogs. And if you possess the least bit of engineering pride, you’ll want to do an honest stab at storing this data properly. So you need to store the data, shown in yellow, in the database in a fault-tolerant fashion, or else face a lot of angry users. Luckily for you, MongoDB can be configured to perform replication. Let’s assume that you indeed configured MongoDB correctly to perform such replication. So, here’s the $1 billion dollar question: what does it mean when MongoDB says that a write (aka insert) is complete?

A. When it has been written to all the replicas.

B. When it has been written to a majority of the replicas.

C. When it has been written to one of the replicas.

D. When it hasn’t even left the client.

The answer is none of the above. MongoDB v2.0 will consider a write to be complete, done, finito as soon as it has been buffered in the outgoing socket buffer of the client host. Read that sentence over again. It looks like this:

The data never left the front-end box, no packet was sent on the network, and yet MongoDB believes it to be safely committed for posterity. It swears it mailed the check last week, while it’s sitting right there in the outgoing mail pile in the living room.

Now, if machines crashed infrequently or never, this design choice might make sense. People with limited life experience can perhaps claim that failures, while they might “theoretically” happen, are too rare in practice to worry about. This would be incorrect. Hardware fails. More often than you’d think. System restarts are quite common when rolling out updates. Application crashes can occur even when using scripting languages, because underneath every type-safe language are a slew of C/C++ libraries. The software stack may need a restart every now and then to stem resource leaks. Bianca Schroeder’s careful empirical studies show that failures happen all the time and if you don’t believe academic studies, you can talk to your ops team, especially over a beer or five. Failures are common; it’d be silly to pretend they would never strike your system.

Since NoSQL systems depend fundamentally on sharding data across large numbers of hosts, they are particularly prone to failures. The more hosts are involved in housing the data, the greater the likelihood that the system as a whole will encounter a failure. This is why RAID storage systems, which spread data across multiple disks, have to spend some effort making sure that they’re as reliable as a single large expensive disk. Imagine if you had invented Arrays of Inexpensive Disks (AID) back in the day, but weren’t clever enough to invent the part involving the “R”, namely, redundantly striping the data for fault-tolerance. I hate to break it to you but, far from applauding you for getting pretty close to a great idea, everyone would just make fun of you mercilessly. So, we can’t just not replicate the data, not if we want to retain our engineering pride.


Version 2.2 isn’t much better…

In November 2012, a new version of MongoDB was released with a significant change. MongoDB drivers now internally set WriteConcern to 1 instead of 0 by default. While this is a change for the better, it is still not sufficient. To wit:

  • This change does not make MongoDB fault-tolerant. A single failure can still lead to data loss.
  • It used to be that a single client failure would lead to data loss. Now, a single server failure can lead to data loss for the reasons I outlined above.
  • MongoDB is now a lot slower compared to v2.0. On the industry-standard YCSB benchmark, MongoDB used to be competitive with Cassandra, as seen in theperformance measurements we did when benchmarking HyperDex. Ever since the change, MongoDB can no longer finish the entire benchmark suite in the time allotted.


And this older article, which is still relevant in many ways, especially the comments about the culture and development priorities of 10gen, maker of mongodb. I paste the article here in full, since it’s currently posted anonymously on pastebin, and may not exist for long:

Don’t use MongoDB

I’ve kept quiet for awhile for various political reasons, but I now
feel a kind of social responsibility to deter people from banking
their business on MongoDB.

Our team did serious load on MongoDB on a large (10s of millions
of users, high profile company) userbase, expecting, from early good
experiences, that the long-term scalability benefits touted by 10gen
would pan out. We were wrong, and this rant serves to deter you
from believing those benefits and making the same mistake
we did. If one person avoid the trap, it will have been
worth writing. Hopefully, many more do.

Note that, in our experiences with 10gen, they were nearly always
helpful and cordial, and often extremely so. But at the same
time, that cannot be reason alone to supress information about
the failings of their product.

Why this matters

Databases must be right, or as-right-as-possible, b/c database
mistakes are so much more severe than almost every other variation
of mistake. Not only does it have the largest impact on uptime,
performance, expense, and value (the inherit value of the data),
but data has *inertia*. Migrating TBs of data on-the-fly is
a massive undertaking compared to changing drcses or fixing the
average logic error in your code. Recovering TBs of data while
down, limited by what spindles can do for you, is a helpless

Databases are also complex systems that are effectively black
boxes to the end developer. By adopting a database system,
you place absolute trust in their ability to do the right thing
with your data to keep it consistent and available.

Why is MongoDB popular?

To be fair, it must be acknowledged that MongoDB is popular,
and that there are valid reasons for its popularity.

  • It is remarkably easy to get running
  • Schema-free models that map to JSON-like structures

have great appeal to developers (they fit our brains),
and a developer is almost always the individual who
makes the platform decisions when a project is in
its infancy

  • Maturity and robustness, track record, tested real-world

use cases, etc, are typically more important to sysadmin
types or operations specialists, who often inherit the
platform long after the initial decisions are made

  • Its single-system, low concurrency read performance benchmarks

are impressive, and for the inexperienced evaluator, this
is often The Most Important Thing

Now, if you’re writing a toy site, or a prototype, something
where developer productivity trumps all other considerations,
it basically doesn’t matter *what* you use. Use whatever
gets the job done.

But if you’re intending to really run a large scale system
on Mongo, one that a business might depend on, simply put:


Why not?

**1. MongoDB issues writes in unsafe ways *by default* in order to
win benchmarks**

If you don’t issue getLastError(), MongoDB doesn’t wait for any
confirmation from the database that the command was processed.
This introduces at least two classes of problems:

  • In a concurrent environment (connection pools, etc), you may

have a subsequent read fail after a write has “finished”;
there is no barrier condition to know at what point the
database will recognize a write commitment

  • Any unknown number of save operations can be dropped on the floor

due to queueing in various places, things outstanding in the TCP
buffer, etc, when your connection drops of the db were to be KILL’d or
segfault, hardware crash, you name it

**2. MongoDB can lose data in many startling ways**

Here is a list of ways we personally experienced records go missing:

1. They just disappeared sometimes. Cause unknown.
2. Recovery on corrupt database was not successful,
pre transaction log.
3. Replication between master and slave had *gaps* in the oplogs,
causing slaves to be missing records the master had. Yes,
there is no checksum, and yes, the replication status had the
slaves current
4. Replication just stops sometimes, without error. Monitor
your replication status!

**3. MongoDB requires a global write lock to issue any write**

Under a write-heavy load, this will kill you. If you run a blog,
you maybe don’t care b/c your R:W ratio is so high.

**4. MongoDB’s sharding doesn’t work that well under load**

Adding a shard under heavy load is a nightmare.
Mongo either moves chunks between shards so quickly it DOSes
the production traffic, or refuses to more chunks altogether.

This pretty much makes it a non-starter for high-traffic
sites with heavy write volume.

**5. mongos is unreliable**

The mongod/config server/mongos architecture is actually pretty
reasonable and clever. Unfortunately, mongos is complete
garbage. Under load, it crashed anywhere from every few hours
to every few days. Restart supervision didn’t always help b/c
sometimes it would throw some assertion that would bail out a
critical thread, but the process would stay running. Double

It got so bad the only usable way we found to run mongos was
to run haproxy in front of dozens of mongos instances, and
to have a job that slowly rotated through them and killed them
to keep fresh/live ones in the pool. No joke.

**6. MongoDB actually once deleted the entire dataset**

MongoDB, 1.6, in replica set configuration, would sometimes
determine the wrong node (often an empty node) was the freshest
copy of the data available. It would then DELETE ALL THE DATA
ON THE REPLICA (which may have been the 700GB of good data)
AND REPLICATE THE EMPTY SET. The database should never never
never do this. Faced with a situation like that, the database
should throw an error and make the admin disambiguate by
wiping/resetting data, or forcing the correct configuration.
NEVER DELETE ALL THE DATA. (This was a bad day.)

They fixed this in 1.8, thank god.

**7. Things were shipped that should have never been shipped**

Things with known, embarrassing bugs that could cause data
problems were in “stable” releases–and often we weren’t told
about these issues until after they bit us, and then only b/c
we had a super duper crazy platinum support contract with 10gen.

The response was to send up a hot patch and that they were
calling an RC internally, and then run that on our data.

**8. Replication was lackluster on busy servers**

Replication would often, again, either DOS the master, or
replicate so slowly that it would take far too long and
the oplog would be exhausted (even with a 50G oplog).

We had a busy, large dataset that we simply could
not replicate b/c of this dynamic. It was a harrowing month
or two of finger crossing before we got it onto a different
database system.

**But, the real problem:**

You might object, my information is out of date; they’ve
fixed these problems or intend to fix them in the next version;
problem X can be mitigated by optional practice Y.

Unfortunately, it doesn’t matter.

The real problem is that so many of these problems existed
in the first place.

Database developers must be held to a higher standard than
your average developer. Namely, your priority list should
typically be something like:

1. Don’t lose data, be very deterministic with data
2. Employ practices to stay available
3. Multi-node scalability
4. Minimize latency at 99% and 95%
5. Raw req/s per resource

10gen’s order seems to be, #5, then everything else in some
order. #1 ain’t in the top 3.

These failings, and the implied priorities of the company,
indicate a basic cultural problem, irrespective of whatever
problems exist in any single release: a lack of the requisite
discipline to design database systems businesses should bet on.

Please take this warning seriously.

About these ads