Paper review: Facebook f4

October 29, 2014

It's been a while since I did one of these! I did a previous review of Facebook Haystack, which was designed as an online blob storage system. f4 is a sister system that works in conjunction with Haystack, and is intended for storage of warm rather than hot blobs. As is usual for Facebook, they came up with a system that is both eminently practical and tailored for their exact use case.

This paper, "f4: Facebook's Warm BLOB Storage System" by Muralidhar et al., was published at OSDI '14.

In-memory Caching in HDFS: Lower latency, same great taste

May 3, 2014

My coworker Colin McCabe and I recently gave a talk at Hadoop Summit Amsterdam titled "In-memory Caching in HDFS: Lower latency, same great taste." I'm very pleased with how this feature turned out, since it was approximately a year-long effort going from initial design to production system. Combined with Impala, we showed up to a 6x performance improvement by running on cached data, and that number will only improve with time. Slides and video of our presentation are available online.

Two engineering principles

January 8, 2014

I received two interesting pieces of advice at the AMP Lab retreat this past week, which concisely state some of my favorite software engineering principles:

  1. Don't be a zealot. Understand in technical detail why a given language, framework, or design should be preferred, not because of technological fascination or fanboy-ism. The canonical examples here are programming language flamewars, e.g. Java vs. C++.
  2. Ruthlessly optimize for your requirements. This means first, carefully defining said requirements, but then being completely unafraid to buck conventional wisdom if it's not a good match. This often means intentionally pruning out features, even common ones implemented by other systems.

Apache Hadoop committer

August 3, 2013

A quick post celebrating that I recently was made a committer on the Apache Hadoop project. I owe a big thanks to everyone who's reviewed my patches and helped me along the way (especially my colleagues ATM, Todd, and Colin here at Cloudera).

My very first patch was HDFS-1952 in May 2011, via a Hadoop hackathon hosted at Cloudera. It was the most promising newbie HDFS JIRA on the list, and I still remember all the basic issues I had checking out the repo, setting up Eclipse, using Ant, and generating the diff. Two years later, these things have gotten easier :)

Here's to many more contributions in the future!

<< Older posts