Scaling GlusterFS @ Facebook
GlusterFS is an open-source (mostly) POSIX compliant distributed filesystem originally written by Gluster Inc and now maintained by RedHat Inc. Here at Facebook it had humble beginings: a single rack of machines, serving only a single use-case. Over the next 4 years it grew to thousands of machines, hosting 10s of petabytes of data. This presentation is a story of how this transformation occurred and the things we did to make it happen.
In part 1, I'll go into the landscape before GlusterFS was on the scene (hint: lots of enterprise storage appliances) and some of the problems we suffered by relying too much on these storage solutions. Some examples: multi-day time to resolution for bugs, blackbox architecture, slow-moving development cycles, proprietary hardware, and lack of hooks to deeply integrate into automation & monitoring infrastructures. I'll also touch on the first months of GFS at FB, some (healthy) skepticism we encountered, and the steps we took to try to gain the confidence of these folks (though some are never really persuaded :) ).
I'll also dive into where GlusterFS fits into the storage landscape at Facebook, what kind of hardware we run it on, what do our customers typically look like and give some rough numbers of the kind of scale we operate at (node counts, QPS, overall and namespace capacities).
In part 2, I'll go into some specifics of what our GFS stack/deployments actually look like today, the kinds of automation we've had to build to scale Gluster to thousands of nodes operating in multiple regions and how we do High Availability-NFS (aka HA-NFS). I'll also get into some internal changes we've made to allow use at these scales: robust healing, DC-aware master-less geo-replication, throttling, automatic split-brain resolution, exposing thousands of counters, request sampling, IPv6 support and 100% mount-less methods our customers use to talk to GlusterFS clusters (almost all of these of these are open-sourced already, or in the process of being open-sourced).
Finally, I'll close with some of the current areas of focus and challenges we face to take Gluster to the next level in scaling: the road to 1000 node volumes, large erasure coded volumes and improving multi-tenancy.