Two engineering principles

January 8, 2014

I received two interesting pieces of advice at the AMP Lab retreat this past week, which concisely state some of my favorite software engineering principles:

  1. Don’t be a zealot. Understand in technical detail why a given language, framework, or design should be preferred, not because of technological fascination or fanboy-ism. The canonical examples here are programming language flamewars, e.g. Java vs. C++.
  2. Ruthlessly optimize for your requirements. This means first, carefully defining said requirements, but then being completely unafraid to buck conventional wisdom if it’s not a good match. This often means intentionally pruning out features, even common ones implemented by other systems.

Apache Hadoop committer

August 3, 2013

A quick post celebrating that I recently was made a committer on the Apache Hadoop project. I owe a big thanks to everyone who’s reviewed my patches and helped me along the way (especially my colleagues ATM, Todd, and Colin here at Cloudera).

My very first patch was HDFS-1952 in May 2011, via a Hadoop hackathon hosted at Cloudera. It was the most promising newbie HDFS JIRA on the list, and I still remember all the basic issues I had checking out the repo, setting up Eclipse, using Ant, and generating the diff. Two years later, these things have gotten easier :)

Here’s to many more contributions in the future!

Grad school four months out

May 12, 2013

Here’s my account of leaving the PhD program at Berkeley to work at Cloudera. My experience might not be representative or generalize beyond my own situation, but I’m writing this because a number of people have asked me about the differences between grad school and industry. Choosing to leave Berkeley was a very personal decision, but fortunately I’m happy with how it’s turned out.

This also serves as my “Year in review: 2012” post, since this was the major change in my life last year.

Hadoop 101 slides

May 2, 2013

I gave a guest lecture on the Hadoop stack last week at Tapan Parikh’s INFO 206: Distributed Computing Applications and Infrastructure course at Berkeley. I took a more academic approach than most, talking about the original motivating problem of Google search before moving into a deep dive of HDFS and MapReduce and an overview of the rest of the Hadoop ecosystem.

A couple students came up afterwards to say they enjoyed the talk, so I think it was well-received.

Slides: PPTX with animations and PDF.

<< Older posts