Difference between revisions of "Newsletter:1/Hacking Event Notifications"

From Zenoss Wiki
Jump to: navigation, search
Line 1: Line 1:
 
{{Article
 
{{Article
 
|Author=Trelane
 
|Author=Trelane
|Abstract=Meet Foo. Foo is a rickety old CentOS 5 box running an Apache httpd server. That Apache server tends to crash, a lot, but given that Foo isn’t mission critical, Foo isn’t going to get time dedicated to it for several weeks to determine why Apache httpd keeps crashing. Now meet Dick, the on-call engineer who is starting to look like a zombie because of sleep deprivation. That can’t be good. Let’s try to get Dick some sleep, and prevent a zombie uprising. Maybe Zenoss can help!
+
|Abstract=
|Cover image=Hacking-event-notification.jpg
+
|Cover image=Transforms-Title.png
 
}}
 
}}
 +
 +
 +
Transforms allow an event to be manipulated, to achieve a desired result. In layman's terms this means that if an event matches the criteria specified, the preset “manipulation” will be applied.
 +
 +
The definition of a transform, directly from the Zenoss Core Administration PDF, “Transform - Takes Python code that will be executed on the event only if it matches this mapping.”
 +
 
Meet Foo.  Foo is a rickety old CentOS 5 box running an [http://httpd.apache.org/ Apache <tt>httpd</tt> server]. ''(Ed: what about [http://nginx.org nginx]?)'' The Apache <tt>httpd</tt> server tends to crash, a lot, but given that Foo isn’t mission critical, Foo isn’t going to get time dedicated to it for several weeks to determine why Apache <tt>httpd</tt> keeps crashing.
 
Meet Foo.  Foo is a rickety old CentOS 5 box running an [http://httpd.apache.org/ Apache <tt>httpd</tt> server]. ''(Ed: what about [http://nginx.org nginx]?)'' The Apache <tt>httpd</tt> server tends to crash, a lot, but given that Foo isn’t mission critical, Foo isn’t going to get time dedicated to it for several weeks to determine why Apache <tt>httpd</tt> keeps crashing.
  

Revision as of 14:22, 20 September 2013

Hacking Event Notifications


Transforms allow an event to be manipulated, to achieve a desired result. In layman's terms this means that if an event matches the criteria specified, the preset “manipulation” will be applied.

The definition of a transform, directly from the Zenoss Core Administration PDF, “Transform - Takes Python code that will be executed on the event only if it matches this mapping.”

Meet Foo. Foo is a rickety old CentOS 5 box running an Apache httpd server. (Ed: what about nginx?) The Apache httpd server tends to crash, a lot, but given that Foo isn’t mission critical, Foo isn’t going to get time dedicated to it for several weeks to determine why Apache httpd keeps crashing.

Now meet Dick, the on-call engineer who is starting to look like a zombie because of sleep deprivation. That can’t be good. Let’s try to get Dick some sleep, and prevent a zombie uprising. Maybe Zenoss can help!

Zenoss 4.x’s Event trigger/notification system supports command notifications, so with five minutes of effort, we can get Dick more sleep! Let's see how.

Tutorial

First, set up a trigger that looks like this for foo, with a Count of less than 5:

Countlessthan5.PNG

Next, make sure you exclude httpd on Foo from your main critical notification. Since Zenoss will execute every matching trigger, we don't want Foo's httpd crashing to hit the catch-all, because Dick will lose sleep!

Crittriggernofoo.PNG

Add another trigger that looks like this:

Countge5.PNG

This trigger will fire if, after attempting to restart Apache four times, Apache still won't restart. At this point we'll have to to notify poor Dick that it's still a problem.

Next copy the Zenoss ssh key to root@Foo, and set up a remote restart of Apache with the following command notification:

Trigger foo httpd restart.PNG

To wrap this up, if Zenoss generates five events that the Web server is down, we tell someone by adding the second trigger we created to our main e-mail notification:

Criticalnote.PNG

Zenoss will now attempt to restart the web server 4 times before notifying someone of the outage. The event will still show up in the event console so this problem can be addressed normally during working hours, but it should save Dick some sleep. Sleep Dick, sleep.