ZenOSS Logical Model

From Zenoss Wiki
Revision as of 19:14, 23 November 2012 by Hackman238 (Talk | contribs)$7

(diff) ← Older revision | Approved revision (diff) | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Template:Infobox information Ball of Wire Model

Information

Above is the ball of wire model that is ZenOSS. You might be asking yourself "How can this actually work?" Truth of the situation is this model works magically- not one of those pesky "well thought" models we all hate so much.
The layers of the new model include:
  • Event system
  • Relstorage system
  • Memcache system

Zenoss-flow-diagram.png

Event subsystem

The event subsystem is built around rabbitmq-server, a message queuing daemon. Inside rabbit are a number of queues which exchange messages across an internal bus known as the exchange. The name of the exchange in ZenOSS v4.1 is 'zenoss'. In more than once situation rabbit can become constipated and stop passing messages.

To check the message queues, execute `rabbitmqctl list_queues -p /zenoss` as root. This must be done as root.
Example output:
Listing queues ...
zenoss.queues.zep.migrated.summary 0
zenoss.queues.zep.migrated.archive 0
zenoss.queues.zep.rawevents 0
zenoss.queues.zep.heartbeats 0
zenoss.queues.zep.zenevents 4189532
zenoss.queues.gom.8e294eb0-f44a-11e0-a475-00221903d46c 0
zenoss.queues.zep.modelchange 0
zenoss.queues.zep.signal 0
zenoss.queues.gom.events.fanout 0
...done.

Ambox notice.png Note: the line 'zenoss.queues.zep.zenevents 4189532'. It clearly shows that rabbit is way over its head and possibly drowning in its own queue. Do not restart rabbit in a case like this. Doing so will cause rabbit to hang on restart and complicate the purging.

To purge a large queue use the flusher.py script located in $ZENHOME. The script must be run as the zenoss user.
flusher.py {
#!/usr/bin/env python

import sys
from amqplib.client_0_8.connection import Connection

import Globals
from Products.ZenUtils.GlobalConfig import getGlobalConfiguration

if len(sys.argv) < 2:
print >> sys.stderr, "Usage: flusher.py <queue_name> [...]"
sys.exit(1)

global_conf = getGlobalConfiguration()
hostname = global_conf.get('amqphost', 'localhost')
port = global_conf.get('amqpport', '5672')
username = global_conf.get('amqpuser', 'zenoss')
password = global_conf.get('amqppassword', 'zenoss')
vhost = global_conf.get('amqpvhost', '/zenoss')
ssl = global_conf.get('amqpusessl', '0')
use_ssl = True if ssl in ('1', 'True', 'true') else False

conn = Connection(host="%s:%s" % (hostname, port),
userid=username,
password=password,
virtual_host=vhost,
ssl=use_ssl)
channel = conn.channel()
for queue in sys.argv[1:]:
print "Purging queue: %s" % queue
channel.queue_purge(queue)
channel.close()
conn.close()
}
In our example one can clear this queue by executing `env python flusher.py zenoss.queues.zep.zenevents`. As root you can do this with one line to run a command as user zenoss as follows:
# rabbitmqctl list_queues -p /zenoss 
# su - zenoss -c "cd $ZENHOME; ./flusher.py zenoss.queues.zep.zenevents" 
Once a queue has been cleared any daemons associated with that queue need to be restarted as they will have a large cache of the queue which can only be purged by a restart.

Ambox warning.jpeg Warning: Don't purge queues needlessly. It should only be done if something has gone wrong and you're out of options. Chances are if the queues are insane something is broken elsewhere and it will happen again.

Ambox warning.jpeg Warning: Restarting ZenOSS will usually make queue and event problems worse. Don't restart anything you do not have to. This is especially true of collectors.

See Queue_Troubleshooting if queue pileup ensues.

ZenEventServer

Zeneventserver is the java and erlang portion of ZenOSS. Zeneventserver serves as the pimary means of processing all events and communicating changes around the model. Zeneventserver is also responsible for communicating event system changes to connected clients via JSON calls.
Depends on:
  • zends
  • memcached
  • rabbitmq-server
If any of these daemons fail to start or are left in an abnormal state zeneventserver will fail to start.
Depended on by:
  • zenhub
  • zeneventd
  • zenactions
If zeneventserver fails to start or is in an undead state these daemons will fail to start and/or operate properly.

ZenEventD

Zeneventd is a python based daemon in ZenOSS that is exclusively responsible for processing event transforms. Event transforms were processed by zenhub in the past but often resulted in an overloaded zenhub. Zeneventd, like zenhub, supports workers and is notoriously slow when bombarded. In a worst case scenario, more workers is better even should some be occasionally idle. AnyWorker is supported but has caused problems in the past.

ZenHub

Zenhub is a python based daemon in ZenOSS that is responsible for messaging between performance daemons, the event system and the ZODB. Zenhub primarily handles daemon configuration and is usually the cause of daemon configuration problems. Each collector, in the adopted ZenOSS topology, has its own dedicated Zenhub to avoid queue invalidation issues during mass daemon configuration (e.g. restarting a collector). Two to four workers per zenhub is generally enough for a fully loaded collector. AnyWorker is supported but has caused problems in the past.

ZenActionD

Zenactiond is the new fandangled replacement for the ever so hated zenactions. Zenactiond is responsible for executing actions associated with notifications such as paging, email and executing commands. Zenactiond will periodically inspect the 'Signal' queue for signal messages and dump them in to its share of memcached. At such point zenactiond will act on the messages as instructed to in the associated notification. Zenactiond will continue to act on repeating events until it receives a signal to stop. That said, situations where zeneventd, zeneventserver or zenhub are plugged up with high queues or stop processing messages, zenactions will continue to page harass notification recipients until the queues are flushed and zenactiond restarted.

Relstorage subsystem

The relstorage subsystem is comprised of mysql server in a tortilla wrapper (zends), Zope and the Relstorage adapter module.