From the Xen Summit - Block Storage For VMs With Ceph

These are my live notes from Sage Weil‘s presentation at Xen Summit on Block Storage for VMs with Ceph. I typed as fast as I could! 🙂

why care about another storage system?

requirements: orgs have diverse storage needs
time: orgs don’t have tons of time – so they need ease of administration, no manual anything, painless scaling (expanding AND contracting) seamless migrations
cost (low cost per gigabute, no vendor lockin, software solution that can run on commodity sw)

what is ceph?

Open source – LGPLv2 (copy left)
no copyright assignment
Distributed storage system – data center, TB to EB
Fault tolerant, runs on commodity hardware
Object storage model
- Pools
- objects
- Object storage cluster provides interface
  - client/server model doesn’t scale enough for modern applications
- ceph-osds are intelligent storage daemons (sit on local fs btrfs, xfs, ext4…..)
disk, fs on it, osd on top of fs, builds up a storage cluster
Monitor clusters coordinates cluster (maintains cluster state, provides consensus, do not serve storage objects)
data distribution
- all objects replicated N times
- Objects auto placed, balanced, migrated in dynamic cluster
- consider physical infrastructure
- 3 approaches to storing data:
  - pick a spot, remember it
  - pick spot, write down where you put it
  - calculate ideal place for data, and put the data there <this is what CRUSH does
Crush is pseudo random placement algorithm, no lookup, you can write rules to specify where data goes, stable mapping
large clusters are inherently dynamic, OSD maps are versioned, any map update can trigger data migration

What does this mean for my cloud?

virtual disks – reliable, accessible from many hosts
appliances aren’t great for large clouds
avoids single server bottleneck
provides efficient management
RBD (RADOS block device) library aggregates OSDs into a virtual disk
RBD is in the standard linux kernel (/dev/rbd0)
librbd, libvert, /KVM, cloudstack, openstack

For Xen?

Linux kernel driver
blktap – generic kernel driver, easy to integrate with librbd
rbd-fuse coming soon

libvert

used by cloudstack, openstack
understands rbd images but only usable w KVM right now.

librd

shared library for storage management
I/O
has C, C++ & Python bindings

Lots of stuff on the roadmap. 🙂

Very limited options for scalable open source storage, and proprietary solutions don’t scale and are expensive. Lock in for the vendor, and the money is made on support. The industry needs to have an alternative scale-out storage option.

We are hiring!

Q&A

q: what are plans for geo distributions.

a. depends on how far away dcs are right now.2 parts of puzzle on roadmap – async replication and replication that provides zero latency betwen

q. does RBD on an OSD know about the others?

a. yes every client knows where all the OSDs are

q. what performance do you get for virtual machines

a. totally depends on your hardware

q. is cloning new?

a. yes will be in .52

q. are there plans to cut out fs and write blobs directly?

a. need a fs to structure the layout (because you don’t know how big the blobs will be)

In the beginning Ceph wrote its only file system!

q. how do you see Ceph integrating with XenServer?

a. Make a component that integrates

From the Xen Summit – Block Storage For VMs With Ceph

Leave a Reply Cancel reply

Recent Comments

Recent Posts

Archives