ZenOSS Logical Model

From Zenoss Wiki
This is the approved revision of this page; it is not the most recent. View the most recent revision.
Jump to: navigation, search

Logical Model

Below is the ball of wire that is ZenOSS. While confusing looking, it's a huge improvement over earlier architecture in both speed and scalability.

Bulbgraph.png Note: Some of this information may require updating for version 4.2.3+.

Zenoss-flow-diagram.png

Event subsystem

The event subsystem is built around rabbitmq-server, a message queuing daemon. Inside rabbit are a number of queues which exchange messages across an internal bus known as the exchange. The name of the exchange in ZenOSS v4.1 is 'zenoss'. In more than one situation rabbit can become constipated and stop passing messages.


To check the message queues, execute:

# rabbitmqctl list_queues -p /zenoss

This must be done as root.

Example output:

Listing queues ...
zenoss.queues.zep.migrated.summary 0
zenoss.queues.zep.migrated.archive 0
zenoss.queues.zep.rawevents 0
zenoss.queues.zep.heartbeats 0
zenoss.queues.zep.zenevents 4189532
zenoss.queues.gom.8e294eb0-f44a-11e0-a475-00221903d46c 0
zenoss.queues.zep.modelchange 0
zenoss.queues.zep.signal 0
zenoss.queues.gom.events.fanout 0
...done.
Bulbgraph.png Note: the line 'zenoss.queues.zep.zenevents 4189532'. It clearly shows that rabbit is on its way to being over its head. Do not restart rabbit in a case like this. Doing so will cause rabbit to hang on restart and complicate the purging.


To purge a large queue use the flusher.py script located in $ZENHOME. The script must be run as the zenoss user.

flusher.py {
#!/usr/bin/env python
 
import sys
from amqplib.client_0_8.connection import Connection
 
import Globals
from Products.ZenUtils.GlobalConfig import getGlobalConfiguration
 
if len(sys.argv) < 2:
   print >> sys.stderr, "Usage: flusher.py <queue_name> [...]"
   sys.exit(1)
 
global_conf = getGlobalConfiguration()
hostname = global_conf.get('amqphost', 'localhost')
port = global_conf.get('amqpport', '5672')
username = global_conf.get('amqpuser', 'zenoss')
password = global_conf.get('amqppassword', 'zenoss')
vhost = global_conf.get('amqpvhost', '/zenoss')
ssl = global_conf.get('amqpusessl', '0')
use_ssl = True if ssl in ('1', 'True', 'true') else False
 
conn = Connection(host="%s:%s" % (hostname, port),
userid=username,
password=password,
virtual_host=vhost,
ssl=use_ssl)
channel = conn.channel()
for queue in sys.argv[1:]:
   print "Purging queue: %s" % queue
   channel.queue_purge(queue)
   channel.close()
   conn.close()
}

In our example one can clear this queue by executing `env python flusher.py zenoss.queues.zep.zenevents` as root you can do this with one line to run a command as user zenoss as follows:

$ rabbitmqctl list_queues -p /zenoss 
$ su - zenoss -c "cd $ZENHOME; ./flusher.py zenoss.queues.zep.zenevents" 

Once a queue has been cleared any daemons associated with that queue need to be restarted as they will have a large cache of the queue which can only be purged by a restart.

Ambox warning.jpeg Warning: Don't purge queues needlessly. It should only be done if something has gone wrong and you're out of options. Chances are if the queues are insane something is broken elsewhere and it will happen again.

Ambox warning.jpeg Warning: Restarting ZenOSS will usually make queue and event problems worse. Don't restart anything you do not have to. This is especially true of collectors.

Bulbgraph.png Note: See Queue_Troubleshooting if queue pileup ensues.

Zeneventserver

Zeneventserver is the java and erlang portion of ZenOSS. Zeneventserver serves as the pimary means of processing all events and communicating changes around the model. Zeneventserver is also responsible for communicating event system changes to connected clients via JSON calls.

Depends on:
  • zends / mysqld
  • memcached
  • rabbitmq-server

If any of these daemons fail to start or are left in an abnormal state zeneventserver will fail to start.

Depended on by:
  • zenhub
  • zeneventd
  • zenactions

If zeneventserver fails to start or is in an undead state these daemons will fail to start and/or operate properly.

Zeneventd

Zeneventd is a python based daemon in ZenOSS that is exclusively responsible for processing event transforms. Event transforms were processed by zenhub in the past but often resulted in an overloaded zenhub. Zeneventd, like zenhub, supports workers and is notoriously slow when bombarded. Efficient transforms are key to proper zeneventd performance. In a worst case scenario, more workers is better even should some be occasionally idle. AnyWorker is supported but has caused problems in the past.

Zenhub

Zenhub is a python based daemon in ZenOSS that is responsible for messaging between performance daemons, the event system and the ZODB. Zenhub primarily handles daemon configuration and is usually the cause of daemon configuration problems. Each collector, in the adopted ZenOSS topology, has its own dedicated Zenhub to avoid queue invalidation issues during mass daemon configuration (e.g. restarting a collector). Two to four workers per zenhub is generally enough for a fully loaded collector. AnyWorker is supported but has caused problems in the past.

Zenactiond

Zenactiond is the new fan-angled replacement for zenactions. Zenactiond is responsible for executing actions associated with notifications such as paging, email and executing commands. Zenactiond will periodically inspect the 'Signal' queue for signal messages and dump them in to its share of memcached. At such point zenactiond will act on the messages as instructed to in the associated notification. Zenactiond will continue to act on repeating events until it receives a signal to stop. That said, situations where zeneventd, zeneventserver or zenhub are plugged up with high queues or stop processing messages, zenactions will continue to page harass notification recipients until the queues are flushed and zenactiond restarted.