I don’t know why I feel compelled to write this. And I don’t think I’m the only one thinking about this.
Hendrik Volkmer published a great post this week entitled “There will be no reliable cloud (part 1)“. He talks about the boring bits of cloud – cloud for cloud’s sake. He starts out the post with this:
The main issue here is scale. Things (very generally) work very, very different at scale. And cloud infrastructures are all about scale. Keep in mind that complexity of systems does increase exponentially and thus the things that work fine with small systems might completely fail with bigger systems.
He goes on to talk about the difference between HA and Service Availability, as well as availability vs reliability. As someone with a strong SAN background, I really enjoyed his discussion…but I started thinking why are we hung up on old HA architectures? It’s not that hard to understand why you would want to use new ways of doing things…does it freak us all out out because how we learned to do was the only way to do it (and let’s be honest, how many times was the *right* way circumvented to make sure stakeholders got the data they wanted as fast as possible)?
Anyways you should read and re-read that article. Really good stuff.
He links to a really great Cloudscaling presentation that explains how HA is not the only type of resiliency. I love this presentation because anyone who has had to architect, build, or maintain an HA environment can understand what they are talking about. And it makes sense. And what is fuzzy is simple enough to google in order to find more info. You don’t have to be a cloud scientist or cloud guru 42nd level chaotic neutral cleric to figure it out.
Although one question that came to my mind immediately was about compliance regulations. Are the regulations written specifically for HA architectures vs resilient architectures? If you build this and document it correctly, would that be enough to satisfy an audit? (Seriously – those are questions – would love to hear others’ thoughts on that!).
Before I started writing this I tweeted that I was writing a cloud marketing post….so here’s where the marketing part comes in. I’m currently in product marketing for backup software that is actually used by many SPs. I hear and read lots of words about the cloud. Most times its hard to find the technical nitty gritty about what you need to think about when architecting your move to “the cloud”, the trade offs you will make and what you really NEED to make the move happen because people are so busy trying to position stuff to make sure you don’t forget to take them with you on your cloud journey.
This is probably a good time to remind everyone while I work for Dell, this is my personal blog.
It is ok if you miss the cloud bandwagon. Just make the time to create the plans that will take your organization to this new technical level we’re calling “cloud”. Make sure you understand how things we had to do in the past aren’t the only valid way to do them anymore. Duncan Epping had a post on software defined storage that really said the same thing (and also helped to inspire this post). Don’t miss that train because you fell for the hype and didn’t learn about the new technologies.
I know I’ll do my part in not to overhyping things – there is enough cool stuff going on in our field now that no one has time for the hype. We really just need to get back to basics…especially since it seems like the line for basic in the data center has changed quite a bit!
Pingback: G's view of the world
the point you’re raising about compliance is interesting, indeed. In my experience compliance documents seem to be answered the same way all the time, but most requirements are written broadly enough to actually allow for different approaches. And if your implementation actually works (that’s what an audit is supposed to show) then there should be no problem with either approach.
I recently watched a presentation about the NextGen US air traffic control system (which as one can imagine is pretty heavily regulated regarding reliability). It was interesting to see that while these guys are very conservative and only use stuff that has been used for years, they actually don’t trust vendor claims about availability and reliability. They set the system up and make it fail and see how it handles failures and how it recovers. So they are really after resiliency rather then availability – with pretty tight time constrains, though.
Hendrik so sorry I just saw your post (I always approve first time posters manually). Very interesting about air traffic control!