Reviews

Windows Azure Storage

What makes this paper special is that it is one of the only published papers about a production cloud blobstore. The 800-pound gorilla in this space is Amazon S3, but I find Windows Azure Storage (WAS) the more interesting system since it provides strong consistency, additional features like append, and serves as the backend for not just WAS Blobs, but also WAS Tables (structured data access) and WAS Queues (message delivery). It also occupies a different design point than hash-partitioned blobstores like Swift and Rados.

This paper, “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency” by Calder et al., was published at SOSP ’11.

Continue reading →

Posted by andrew in Reviews, 0 comments

Mesos, Omega, Borg: A Survey

Google recently unveiled one of their crown jewels of system infrastructure: Borg, their cluster scheduler. This prompted me to re-read the Mesos and Omega papers, which deal with the same topic. I thought it’d be interested to do a compare and contrast of these systems. Mesos gets credit for the groundbreaking idea of two-level scheduling, Omega improved upon this with an analogy from databases, and Borg can sort of be seen as the culmination of all these ideas.

Continue reading →

Posted by andrew in Reviews, 0 comments

Paper review: Facebook f4

It’s been a while since I did one of these! I did a previous review of Facebook Haystack, which was designed as an online blob storage system. f4 is a sister system that works in conjunction with Haystack, and is intended for storage of warm rather than hot blobs. As is usual for Facebook, they came up with a system that is both eminently practical and tailored for their exact use case.

This paper, “f4: Facebook’s Warm BLOB Storage System” by Muralidhar et al., was published at OSDI ’14.

Continue reading →

Posted by andrew in Reviews, 0 comments

MinuteSort with Flat Datacenter Storage

Microsoft Research recently crushed the world record for MinuteSort, sorting 1.4TB in a minute. This replaces the former record held by Yahoo’s 1406 node Hadoop cluster in the Daytona MinuteSort category, and means that Hadoop no longer holds any world sorting record titles.

I found MSR’s approach of “MinuteSort with Flat Datacenter Storage” (FDS) to be intriguing. Most of the prior sort winners (e.g. Hadoop, TritonSort) try to colocate computation and data, since you normally pay a throughput (and thus latency) cost to go over the network. FDS separates out compute from storage, heavily provisioning a full bisection bandwidth network to match the I/O rate of the hard disks on storage nodes.

I’m going to give a rundown of the paper, and then pull out salient points for Hadoop at the end.

Continue reading →

Posted by andrew in Reviews, 0 comments

JVM Performance Tuning (notes)

A presentation by Attila Szegedi titled “Everything I Ever Learned about JVM Performance Tuning @twitter” has been floating around for a few months. I’ve restructured much of the content into a set of notes. This covers the basics of memory allocation and garbage collection in Java, the different garbage collectors available in HotSpot and how they can be tuned, and finally some anecdotes from Attila’s experiences at Twitter.

I’m still fuzzy on some things, so it’s not ground truth. If more experienced people weigh in, I’ll fix things up. The very informative hour-long presentation is still highly recommended.

Continue reading →

Posted by andrew in Reviews, 0 comments

Paper review: Facebook Haystack

This is a review of Facebook’s Haystack storage system, used to store the staggering amount of photos that are uploaded to Facebook everyday. Facebook Photos started out with an NFS appliance, but was forced to move to a custom solution for the reasons of cost, scale, and performance. Haystack is an engineering solution that applies well-known techniques from GFS and log-structured filesystems to their distributed, append-only, key-value blob situation. Metadata management is somewhat novel, as well as their CDN integration.

The paper, “Finding a needle in Haystack: Facebook’s photo storage” by Beaver et al., was published at OSDI ’10.

Continue reading →

Posted by andrew in Reviews, 0 comments