Apache Hadoop turned ten this year. To celebrate, Karthik and I gave a talk at USENIX ATC ’16 about open problems to solve in Hadoop’s second decade. This was an opportunity to revisit our academic roots and get a new crop of graduate students interested in the real distributed systems problems we’re trying to solve in industry.
This is a huge topic and we only had a 25 minute talk slot, so we were pitching problems rather than solutions. However, we did have some ideas in our back pocket, and the hallway track and birds-of-a-feather we hosted afterwards led to a lot of good discussion.
Karthik and I split up the content thematically, which worked really well. I covered scalability, meaning sharded filesystems and federated resource management. Karthik addressed scheduling (unifying batch jobs and long-running services) and utilization (overprovisioning, preemption, isolation).
I’m hoping to give this talk again in longer form, since I’m proud of the content.
Slides: pptx