PuppetConf 2012

""#monitoringsucks."" I can appreciate the sentiment. I used to use Nagios too! However, I can't agree that monitoring sucks. Monitoring is awesome! We observe our systems to understand their behaviour. We do this in various ways like reading logs or taking measurements and, more recently, storing them in a timeseries database such as collectd or graphite. However, the standard practice for alerting is still to check the measurement at the time that it is taken and it is this ""check script"" model of monitoring that is long due for an overhaul. In this talk, I'll start over from first principles: what do we want monitoring to do for us? I'll deconstruct the ""check script"" and rebase its essentials on the humble timeseries. I'll demonstrate simple aggregation and apply some maths and stats to show how monitoring can scale to cluster size without increasing maintenance costs. With worked examples based on real-world situations, you'll learn techniques that you can use to make your monitoring systems more usefu

Rated: Everyone
Viewed 719 times
Tags: There are no tags for this video.