Uncategorized

Blog refresh: WordPress

I’ve come full-circle. My very first websites circa-2005 were built with a CMS (Joomla or WordPress). I started messing with custom themes and plugins (which is how I really learned to code) then drank deeply of the semantic web koolaid and started hand-coded everything in XHTML, CSS, and PHP. I migrated to a static site generator seeking simplicity and reduced hosting costs, and now, umbrant.com is again powered by a full-fledged CMS: WordPress.

Continue reading →

Posted by andrew, 1 comment

Two engineering principles

I received two interesting pieces of advice at the AMP Lab retreat this past week, which concisely state some of my favorite software engineering principles:

  1. Don’t be a zealot. Understand in technical detail why a given language, framework, or design should be preferred, not because of technological fascination or fanboy-ism. The canonical examples here are programming language flamewars, e.g. Java vs. C++.
  2. Ruthlessly optimize for your requirements. This means first, carefully defining said requirements, but then being completely unafraid to buck conventional wisdom if it’s not a good match. This often means intentionally pruning out features, even common ones implemented by other systems.
Posted by andrew, 0 comments

Highly-available audio in HDFS

Here on the HDFS team at Cloudera, we believe in eating our own dogfood. Since we value our (substantial) MP3 collections quite dearly, it’s only natural to store them in a high performance, highly-available, enterprise-quality distributed filesystem like HDFS. Today, I’m announcing the next generation in aural HDFS enjoyment: listening to music directly from the Namenode web UI.

Continue reading →

Posted by andrew, 0 comments

External sorting of large datasets

This is a common interview question: how do you sort data that is bigger than memory? “Big data” in the range of tera or petabytes can now almost be considered the norm (think of Google saving every search, click, and ad impression ever), so this manifests in reality as well. This is also a canonical problem in the database world, where it is referred to as an “external sort”.

Your mind should immediately turn to divide and conquer algorithms, namely merge sort. Write out intermediate merged output to disk, and read it back in lazily for the next round. I decided this would be a fun implementation and optimization exercise to do in C. There will probably be a follow-up post, since there are lots of optimizations I haven’t yet implemented.

Continue reading →

Posted by andrew, 0 comments

Static website hosting on Amazon S3

Werner Vogels, Amazon CTO, posted on his blog about a month ago on “New AWS feature: Run your website from Amazon S3“. S3 now offers the ability to host static HTML pages directly from an S3 bucket, which is a great alternative for small blogs and sites (provided, of course, that you don’t actually need any dynamic content). This has the potential to greatly reduce your hosting costs. A small Dreamhost/Slicehost/Linode costs around $20 a month, and I used to run this site out of an extreme budget VPS (Virpus) which was only $5 a month, but I expect to be paying only a few cents per month for S3 (current pricing is just 15¢ per GB-month). Of course, you also gain best-of-class durability, fault-tolerance, and scalability from hosting out of S3, meaning that your little site should easily survive a slashdotting.

The difficulty here is that most of the popular blogging engines require a backing database, and do their content generation dynamically server side. That doesn’t fly with S3; since it is, after all, just a Simple Storage Service, content has to be static and pregenerated. I chose to use Hyde, a Python content generator that turns templates (based on the Django templating engine) into HTML. Hyde page templates are dynamic, written in Django’s templating language which supports variables, control flow, and hierarchal inheritance. Hyde will parse these templates, fill in the dynamic content, and finally generate static HTML pages suitable for uploading to S3. Ruby folks can check out Jekyll as an alternative.

Continue reading →

Posted by andrew, 0 comments