Christopher Nolan, Darren Dao, Jeff Roberts
Modern web infrastructures should be able to scale easily. A number of critical tools are also needed to reach this nirvana of fast scale. Automation is essential as well. The core pieces to getting there are a CI/build/release tool (jenkins), CM for both system and app (chef, puppet, etch), node builder (kickstart, cobbler, cloud), orchestration (capistrano, mcollective), and monitoring/metrics (nagios, graphite/collectd, cacti). The last piece that most companies don't realize is very essential is a Source of Truth that ties everything together. Some companies manage it on a tool by tool basis. Each tool has its own concept of nodes on the network. There is no convention for how nodes are grouped together and acted upon. If someone in app support spins up new instances for a service they then have to let the monitoring folks know so that they can be added to monitoring. Many companies use text files or xml files to describe hosts. Though better than nothing it is probably the worst way to do things. It must be manually maintained. It must be pushed to many places for various tools to leverage it. It isn't always consistent. It is also difficult to extend with detailed information. Sometimes the problem is solved with elaborate hostname conventions. "Let's describe everything about a host in it's hostname!" Bad idea. It leads to hostnames like: production-web-2gb-apache-proxy.rack7.slot8.cluster3.datacenter2.losangeles.domain.com. That's ugly and annoying but we've all seen it. Your network needs a Source of Truth. This SoT needs to support both automatic and manual descriptions of what it knows. Devices on your network should automatically check in and send as much information as possible. You should be able to layer information on top of these devices to describe their roles. You should have a number of ways to access this information including webui, command line, RESTful interface, APIs for popular languages like Perl and Ruby. This is why we use nventory. Then into detailed explanation of nventory including it's architecture, a demo, and real world descriptions of how we use it at eHarmony to integrate with tools like Jenkins, Nagios, Chef, etch, etc. Lots of screenshots and examples.