From the Xen Summit – Block Storage For VMs With Ceph

Posted by Gina Rosenthal in ceph | Tagged , , , , | Leave a comment

These  are my live notes from Sage Weil‘s presentation at Xen Summit on Block Storage for VMs with Ceph. I typed as fast as I could! 🙂

why care about another storage system?

  • requirements: orgs have diverse storage needs
  • time: orgs don’t have tons of time – so they need ease of administration, no manual anything, painless scaling (expanding AND contracting) seamless migrations
  • cost (low cost per gigabute, no vendor lockin, software solution that can run on commodity sw)
what is ceph?
  • Open source – LGPLv2 (copy left)
  • no copyright assignment
  • Distributed storage system – data center, TB to EB
  • Fault tolerant, runs on commodity hardware
  • Object storage model
    • Pools
    • objects
    • Object storage cluster provides interface
      • client/server model doesn’t scale enough for modern applications
    • ceph-osds are intelligent storage daemons (sit on local fs btrfs, xfs, ext4…..)
  • disk, fs on it, osd on top of fs, builds up a storage cluster
  • Monitor clusters coordinates cluster (maintains cluster state, provides consensus, do not serve storage objects)
  • data distribution
    • all objects replicated N times
    • Objects auto placed, balanced, migrated in dynamic cluster
    • consider physical infrastructure
    • 3 approaches to storing data:
      • pick a spot, remember it
      • pick spot, write down where you put it
      • calculate ideal place for data, and put the data there <this is what CRUSH does
  • Crush is pseudo random placement algorithm, no lookup, you can write rules to specify where data goes, stable mapping
  • large clusters are inherently dynamic, OSD maps are versioned, any map update can trigger data migration
What does this mean for my cloud?
  • virtual disks – reliable, accessible from many hosts
  • appliances aren’t great for large clouds
  • avoids single server bottleneck
  • provides efficient management
  • RBD (RADOS block device) library aggregates OSDs into a virtual disk
  • RBD is in the standard linux kernel (/dev/rbd0)
  • librbd, libvert, /KVM, cloudstack, openstack
For Xen?
  • Linux kernel driver
  • blktap – generic kernel driver, easy to integrate with librbd
  • rbd-fuse coming soon
libvert
  • used by cloudstack, openstack
  • understands rbd images but only usable w KVM right now.
librd
  • shared library for storage management
  • I/O
  • has C, C++ & Python bindings
Lots of stuff on the roadmap. 🙂
Very limited options for scalable open source storage, and proprietary solutions don’t scale and are expensive. Lock in for the vendor, and the money is made on support. The industry needs to have an alternative scale-out storage option.
Q&A
q: what are plans for geo distributions.
a. depends on how far away dcs are right now.2 parts of puzzle on roadmap – async replication and replication that provides zero latency betwen
q. does RBD on an OSD know about the others?
a. yes every client knows where all the OSDs are
q. what performance do you get for virtual machines
a. totally depends on your hardware
q. is cloning new?
a. yes will be in .52
q. are there plans to cut out fs and write blobs directly?
a. need a fs to structure the layout (because you don’t know how big the blobs will be)
In the beginning Ceph wrote its only file system!
q. how do you see Ceph integrating with XenServer?
a. Make a component that integrates

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.