Learning by building: an introduction to OpenStack

Audience:

Topic:

One does not simply learn how to build production-grade OpenStack!

This session is for anyone that's somewhere in the process of learning OpenStack, from "just tinkering" to "hey I've gotta build a real enterprise production environment." These are miles apart in terms of complexity and requirements. But there's no roadmap that tells you how to learn this stuff. I think the best way to learn is by building (and breaking) real systems.... so that's exactly what we're going to do. This is the journey our teams take as the set out to learn OpenStack.

As with anything in OpenStack, there is no single "right way" to do this (after all, that's the point of an open source framework for cloud) - and that includes no "right way" to learn this stuff. But this is one way that's worked for me and a lot of other people.

We'll start by dipping our toes in the water with DevStack. This is a scripted, all-in-one installation of OpenStack that gets you up and running in minutes. You wouldn't use this for anything serious, as running VMs in emulation, but you can see what OpenStack *should* look like when it's built correctly (important for knowing when something breaks later!) and you can learn the CLI. You'll learn how to use Horizon, and you'll get comfortable with what the individual components are (but not necessarily how they work - that comes later). The barrier to entry here is low - all you need is a laptop with VirtualBox on it.

Next, it's time to build OpenStack from scratch, by hand, using the openstack.org official tutorial. In this step-by-step process, you're going to build each and every component, building the relationships between them. You'll learn the intimate details of how each component works together (and there are a lot of them!). Disclaimer: this takes a LONG time. Don't feel bad if you lose a week or two doing this. For this process, we do need a bit of hardware (though it's not a strict requirement) and I'll give an overview of the lab environment that I use for my own work.

As we go through the process of building using the How-To tutorial, it's important that we document every step we take. I'll show some examples of what this looks like, but the reason it's so important is because we're going to use it as the basis for building in high availability later.... because there IS no step-by-step How-To on making OpenStack HA!

Once we have an operational "How-To Cloud" we can launch VMs and start running applications in them. This is great, and it feels amazing after a week or two of work when you launch your first VM and can ping something! But, this still isn't close to something we'd actually use in production. Why not? The architecture has no resiliency built in whatsoever, and the ancillary support services aren't built to be scalable. The good news is now we know how all the pieces work together.

Now tear it down and do it all again! Fix the mistakes you made the first time...

So is it time for a proof of concept yet? Not yet! Time to make our How-To cloud a scalable, highly-available cloud. Now we get to learn about a bunch of non-OpenStack specific software. Again, I'll reiterate there's no "right way" to build this - but these are the pieces I've picked that have worked well for our customers.

Starting with storage - we're going to replace the built-in OpenStack storage with Ceph, using the cinder driver. Why? Ceph is a scale-out cell-based storage architecture that allows us to grow out storage on commodity hardware. It's self-healing and (nearly) infinitely scalable. We'll spend a few minutes on the advantages of Ceph and why we're using it for our storage.

Now we're going to "wrap" all our OpenStack services inside other applications that make them redundant. An architectural view of how OpenStack services work together is important here. We'll start with all the basic OpenStack services that we already have built, and we'll wrap them with Keepalived, Haproxy, Galera, and RabbitMQ clustering. We'll also touch on Corosync and Pacemaker, though that isn't the approach we're using in this case. Rolling these systems out involves tearing down and rebuilding a lot of things that we just did because there isn't a good tutorial here. Walk through of the key components - specifically making the service API endpoints attached to floating VRRP IPs, using Haproxy to load balance between redundant controllers, and etc. We'll roll these changes into the config documentation we did as the previous step, so we've got a step-by-step build for a full HA environment. With this being built, we'll now run the system through a series of performance and redundancy tests to ensure it works as intended.

With a solid understanding of how OpenStack works, we'll now proceed with a POC (Proof of Concept) for our business. Step 1: forget everything we just did! We've now got some business decisions to make. Building a POC means building for production, because the assumption is that if your POC is successful, people are going to use it! And in that context, the last thing we want to do is be editing config files by hand, deploying services from scratch, etc. This is where the distributions come into play.

The first thing we need to understand is the business requirements for cloud. A POC inherently means we're trying to demonstrate that our technology solves a particular problem. In this portion of the session, we'll walk through common business requirements and what they mean in terms of technology decisions. OpenStack distributions, as well as freestanding automated deployment/operations technologies play a major role in how we build our cloud. One example would be using Ansible, Chef, Puppet, or Juju for deployment of OpenStack. Each has advantages and disadvantages, but those should be weighed against business requirements rather than technical preference.

Taking our POC cloud into production means adding new technologies as well as business processes. Key considerations for technical implementations include monitoring, tracking, and reporting systems. Business processes include user surveys, feature release, resource consumption process, etc. The two overlap when it comes to updates and upgrades of the cloud. Each of these items will be discussed with examples provided.

Session participants will leave with a tangible plan that enables them to get started with OpenStack, learn how the systems work together, attain expert level and deploy production-grade clouds in their company.

Presentation:

Learning by building - presentation.pdf

Room:

Ballroom A

Time:

Saturday, March 4, 2017 - 11:30 to 12:30