Uncategorized —

Amazon reveals its distributed storage “Dynamo”

Amazon researchers are set to present a formal paper describing Dynamo, its …

Writing your own operating system can be anything from a summer hobby project to a full-fledged commercial endeavor taking many years to complete. One might think that with so many operating systems in existence today, that there is little or no need for new work on the subject. A company that would disagree is Amazon, who just published a paper (PDF) detailing a distributed storage project called Dynamo.

While Dynamo is not a complete operating system by itself, it is an intriguing solution to the problem of always-available, fault-tolerant online database storage. For the last year or so, Dynamo has been at the core of many of Amazon's web services, including the shopping cart, customer preferences, and the product catalog. It has also been used to power S3, Amazon's online web application storage service.

Dynamo runs on a cluster of hundreds of commodity PCs running Linux that are hooked up to an internal network. Because the system is not designed to be exposed to the Internet—according to Amazon CTO Werner Vogels, Dynamo is not intended to be offered as a public service—the designers say they don't need to focus on security issues, but can instead concentrate on performance and service availability. According to the authors of the paper, Amazon's customers "should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados."

The philosophy behind Dynamo is that service availability and responsiveness should always take priority, which leads to some interesting scenarios when nodes go down. The shopping cart service, for example, must always allow customers to add and remove items from their carts, even during network and server failures. To ensure this, Dynamo uses "optimistic replication" techniques, which copy data to many servers, ensuring that all replicas are updated to the proper values eventually, and using intelligent "conflict resolution" techniques when the data is inconsistent.

Amazon's platform architecture
Amazon's service architecture. Image courtesy Amazon.

The idea of a distributed operating system has been around in academia for a long time. Professor Andrew Tanenbaum, who once famously told Linus Torvalds that his operating system was "obsolete" before it was even completed, created a distributed OS called "Amoeba" back in 1985 that was intended to run on clusters of then-new 386 computers. The idea was to have all the functions of a standard operating system but abstract the operation (including program execution, memory access, and disk input and output) over as many PCs as possible, while still making the cluster appear as a single machine to the user. While the OS was a fascinating project, it never really addressed a commercial need and development stopped some time in 2001. Instead, development focus shifted over to more pedestrian ideas such as Beowulf-style computing clusters.

However, more recently large Internet companies such as Google have realized that writing their own infrastructure code was vital to staying ahead of the competition. Google operates its own gigantic farm of commodity PCs running Linux and a distributed "GoogleOS" that, among other things, has its own distributed file system called GFS. Amazon also wrote its own file system for Dynamo. To optimize performance, many of the smaller database transactions in Dynamo use the lightweight Berkeley Database (BDB), while others utilize MySQL.

The paper was written by Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels, and is scheduled to be presented this month at SOSP, the biennial ACM Symposium on Operating Systems Principles.

Channel Ars Technica