CEPH: Petabyte Scale Storage for Large- and Small-scale Deployments
Explanation and case studies of the CEPH distributed file system for system administrators
As the size and performance requirements of storage systems have increased, ?le system designers have looked to new architectures to facilitate system scalability. This talk will describe a deployable and highly scalable solution to the current feature-limited selection of file storage systems. Ceph is an open source distributed file system capable of managing many petabytes of storage with ease. The architecture leverages device intelligence to provide a reliable, scalable, and high-performance ?le service in a dynamic cluster environment. Ceph’s architecture consists of two main components: An object storage layer, and a distributed file system that is constructed on top of this object store. The object store provides a generic, scalable cloud storage platform (much like Amazon S3) with support for snapshots and distributed computation. The distributed file system similarly provides advanced features like per-directory granularity snapshots, and a recursive accounting feature that provides a convenient view of how much data is stored beneath any directory in the system. In addition to a standard file system interface with support in the mainline Linux kernel, we have also built interfaces to integrate directly with Hadoop and Hypertable distributed computation and database systems. A distributed block device also provides shared reliable storage for virtual machine instances in a cloud environment (much like Amazon EBS), with support in Qemu/KVM and the Linux kernel. The project is licensed under the LGPL/GPL, and aims to play nice with the larger open source cloud, data processing and storage ecosystems.