Enterprise Cloud Summit @ Interop session: Understanding big data

Posted by Gina Rosenthal in interop | Tagged , , , | 1 Comment

Here are my (mostly) unedited notes from the session Understanding big data by Bradford Stephens and Jeremy Edberg.

  • What is Big Data?
    • When you have so much info on 1 machine that you can’t get to it
    • When it took over an hour to process an hour’s worth of traffic – reddit knew they had a big data problem
  • Problems developers will see:
    • SQL stops working
    • Performance exponetially decays
  • Problems Ops will see:
    • SPOFs/Cascading failures
    • Keep hiring ops – can’t keep up with installations, deployments, failures, etc
  • Problems Biz will see:
    • Product failures – customers leave
    • Exponential hardware cost
  • Big data solutions are similar to moving to cloud
    • For dev:
      • MapReduce – Key-Value (Cassandra, Hbase)
      • Distributed, scale-out systems (systems and applications have to be designed to work to scale, not traditional client-server design)
    • For Ops:
      • Automate ops with development (puppet – auto-configure servers)
      • Build for seamless future
    • For the Biz:
      • Hire for scale from day 1 (<-hire best engineers – you are building systems, you need engineers not hackers or script kiddies)
      • Clusters of commodity hardware (don’t need huge up front expenditures)
  • Big Data in action – Reddit
    • 1st tried to shard data across db’s
    • Still couldn’t read data out fast enough
    • eventually consistent readings not good enough
  • Big Data in action – Drawn to Scale
    • Single node, can’t keep up.
    • Go to shard solution – gives advantage of storing across computers. Disadvantage is you can’t do simple sql queries across distributed nodes, and they can’t use distributed indexes
    • Used hTable model to make indexes distributed

One Response to Enterprise Cloud Summit @ Interop session: Understanding big data

  1. Pingback: Interop – week in review | Storage according to a dixie chick

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.