Grad school four months out

May 12, 2013

Here’s my account of leaving the PhD program at Berkeley to work at Cloudera. My experience might not be representative or generalize beyond my own situation, but I’m writing this because a number of people have asked me about the differences between grad school and industry. Choosing to leave Berkeley was a very personal decision, but fortunately I’m happy with how it’s turned out.

This also serves as my “Year in review: 2012” post, since this was the major change in my life last year.

Hadoop 101 slides

May 2, 2013

I gave a guest lecture on the Hadoop stack last week at Tapan Parikh’s INFO 206: Distributed Computing Applications and Infrastructure course at Berkeley. I took a more academic approach than most, talking about the original motivating problem of Google search before moving into a deep dive of HDFS and MapReduce and an overview of the rest of the Hadoop ecosystem.

A couple students came up afterwards to say they enjoyed the talk, so I think it was well-received.

Slides: PPTX with animations and PDF.

Highly-available audio in HDFS

April 1, 2013

Here on the HDFS team at Cloudera, we believe in eating our own dogfood. Since we value our (substantial) MP3 collections quite dearly, it’s only natural to store them in a high performance, highly-available, enterprise-quality distributed filesystem like HDFS. Today, I’m announcing the next generation in aural HDFS enjoyment: listening to music directly from the Namenode web UI.

Paper review: DRAM errors in the wild

February 5, 2013

Today, I’m looking at an excellent study by Schroeder et al., DRAM errors in the wild: A Large-Scale Field Study”. This is the definitive paper on the subject, covering two years, thousands of machines, and millions of DIMM hours. Memory errors are particularly important in the context of growing cluster sizes; one-in-a-million errors become common at scale.

<< Older posts