Difference between revisions of "ZenOSS Logical Model"
Hackman238 (Talk | contribs) (→Event subsystem) |
(Typo.) |
||
(18 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | == Logical Model == |
+ | Below is the ball of wire that is ZenOSS. While confusing looking, it's a huge improvement over earlier architecture in both speed and scalability. | ||
− | + | {{Note| Some of this information may require updating for version 4.2.3+. }} | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
[[File:Zenoss-flow-diagram.png]] | [[File:Zenoss-flow-diagram.png]] | ||
=== Event subsystem === | === Event subsystem === | ||
+ | The event subsystem is built around rabbitmq-server, a message queuing daemon. Inside rabbit are a number of queues which exchange messages across an internal bus known as the exchange. The name of the exchange in ZenOSS v4.1 is 'zenoss'. In more than one situation rabbit can become constipated and stop passing messages. | ||
− | |||
− | To check the message queues, execute | + | To check the message queues, execute: |
+ | <console># ##i##rabbitmqctl list_queues -p /zenoss</console> | ||
+ | This must be done as root. | ||
''' Example output:''' | ''' Example output:''' | ||
Line 31: | Line 29: | ||
</pre> | </pre> | ||
− | {{Note| the line 'zenoss.queues.zep.zenevents 4189532'. It clearly shows that rabbit is on its way to being over its head. '''Do not restart rabbit''' in a case like this. Doing so will cause rabbit to hang on restart and complicate the purging. }} | + | :{{Note| the line 'zenoss.queues.zep.zenevents 4189532'. It clearly shows that rabbit is on its way to being over its head. '''Do not restart rabbit''' in a case like this. Doing so will cause rabbit to hang on restart and complicate the purging. }} |
To purge a large queue use the '''flusher.py''' script located in $ZENHOME. The script must be run as the zenoss user. | To purge a large queue use the '''flusher.py''' script located in $ZENHOME. The script must be run as the zenoss user. | ||
− | + | <syntaxhighlight lang=python> | |
− | + | ||
− | <syntaxhighlight lang= | + | |
flusher.py { | flusher.py { | ||
#!/usr/bin/env python | #!/usr/bin/env python | ||
Line 73: | Line 69: | ||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | |||
− | In our example one can clear this queue by executing '''`env python flusher.py zenoss.queues.zep.zenevents`''' | + | In our example one can clear this queue by executing '''`env python flusher.py zenoss.queues.zep.zenevents`''' as root you can do this with one line to run a command as user zenoss as follows: |
− | < | + | <console> |
− | rabbitmqctl list_queues -p /zenoss | + | $ ##i##rabbitmqctl list_queues -p /zenoss |
− | su - zenoss -c "cd $ZENHOME; ./flusher.py zenoss.queues.zep.zenevents" | + | $ ##i##su - zenoss -c "cd $ZENHOME; ./flusher.py zenoss.queues.zep.zenevents" |
− | </ | + | </console> |
Once a queue has been cleared any daemons associated with that queue need to be restarted as they will have a large cache of the queue which can only be purged by a restart. | Once a queue has been cleared any daemons associated with that queue need to be restarted as they will have a large cache of the queue which can only be purged by a restart. | ||
Line 90: | Line 85: | ||
=== Zeneventserver === | === Zeneventserver === | ||
− | + | Zeneventserver is the java and erlang portion of ZenOSS. Zeneventserver serves as the primary means of processing all events and communicating changes around the model. Zeneventserver is also responsible for communicating event system changes to connected clients via JSON calls. | |
− | Zeneventserver is the java and erlang portion of ZenOSS. Zeneventserver serves as the | + | |
: Depends on: | : Depends on: | ||
Line 108: | Line 102: | ||
=== Zeneventd === | === Zeneventd === | ||
− | + | Zeneventd is a python based daemon in ZenOSS that is exclusively responsible for processing event transforms. Event transforms were processed by zenhub in the past but often resulted in an overloaded zenhub. Zeneventd, like zenhub, supports workers and is notoriously slow when bombarded. Efficient transforms are key to proper zeneventd performance. In a worst case scenario, more workers is better even should some be occasionally idle. AnyWorker is supported but has caused problems in the past. | |
− | Zeneventd is a python based daemon in ZenOSS that is exclusively responsible for processing event transforms. Event transforms were processed by zenhub in the past but often resulted in an overloaded zenhub. Zeneventd, like zenhub, supports workers and is notoriously slow when bombarded. In a worst case scenario, more workers is better even should some be occasionally idle. AnyWorker is supported but has caused problems in the past. | + | |
=== Zenhub === | === Zenhub === | ||
− | |||
Zenhub is a python based daemon in ZenOSS that is responsible for messaging between performance daemons, the event system and the ZODB. Zenhub primarily handles daemon configuration and is usually the cause of daemon configuration problems. Each collector, in the adopted ZenOSS topology, has its own dedicated Zenhub to avoid queue invalidation issues during mass daemon configuration (e.g. restarting a collector). Two to four workers per zenhub is generally enough for a fully loaded collector. AnyWorker is supported but has caused problems in the past. | Zenhub is a python based daemon in ZenOSS that is responsible for messaging between performance daemons, the event system and the ZODB. Zenhub primarily handles daemon configuration and is usually the cause of daemon configuration problems. Each collector, in the adopted ZenOSS topology, has its own dedicated Zenhub to avoid queue invalidation issues during mass daemon configuration (e.g. restarting a collector). Two to four workers per zenhub is generally enough for a fully loaded collector. AnyWorker is supported but has caused problems in the past. | ||
=== Zenactiond === | === Zenactiond === | ||
+ | Zenactiond is the new fan-angled replacement for zenactions. Zenactiond is responsible for executing actions associated with notifications such as paging, email and executing commands. Zenactiond will periodically inspect the 'Signal' queue for signal messages and dump them in to its share of memcached. At such point zenactiond will act on the messages as instructed to in the associated notification. Zenactiond will continue to act on repeating events until it receives a signal to stop. That said, situations where zeneventd, zeneventserver or zenhub are plugged up with high queues or stop processing messages, zenactions will continue to page harass notification recipients until the queues are flushed and zenactiond restarted. | ||
− | + | [[Category:Deployment]] | |
− | + | [[Category:Troubleshooting]] | |
− | + | [[Category:User Guide]] | |
− | + | ||
− | + |
Latest revision as of 22:55, 5 June 2014
Contents
Logical Model
Below is the ball of wire that is ZenOSS. While confusing looking, it's a huge improvement over earlier architecture in both speed and scalability.

Event subsystem
The event subsystem is built around rabbitmq-server, a message queuing daemon. Inside rabbit are a number of queues which exchange messages across an internal bus known as the exchange. The name of the exchange in ZenOSS v4.1 is 'zenoss'. In more than one situation rabbit can become constipated and stop passing messages.
To check the message queues, execute:
# rabbitmqctl list_queues -p /zenoss
This must be done as root.
Example output:
Listing queues ... zenoss.queues.zep.migrated.summary 0 zenoss.queues.zep.migrated.archive 0 zenoss.queues.zep.rawevents 0 zenoss.queues.zep.heartbeats 0 zenoss.queues.zep.zenevents 4189532 zenoss.queues.gom.8e294eb0-f44a-11e0-a475-00221903d46c 0 zenoss.queues.zep.modelchange 0 zenoss.queues.zep.signal 0 zenoss.queues.gom.events.fanout 0 ...done.
Note: the line 'zenoss.queues.zep.zenevents 4189532'. It clearly shows that rabbit is on its way to being over its head. Do not restart rabbit in a case like this. Doing so will cause rabbit to hang on restart and complicate the purging.
To purge a large queue use the flusher.py script located in $ZENHOME. The script must be run as the zenoss user.
flusher.py { #!/usr/bin/env python import sys from amqplib.client_0_8.connection import Connection import Globals from Products.ZenUtils.GlobalConfig import getGlobalConfiguration if len(sys.argv) < 2: print >> sys.stderr, "Usage: flusher.py <queue_name> [...]" sys.exit(1) global_conf = getGlobalConfiguration() hostname = global_conf.get('amqphost', 'localhost') port = global_conf.get('amqpport', '5672') username = global_conf.get('amqpuser', 'zenoss') password = global_conf.get('amqppassword', 'zenoss') vhost = global_conf.get('amqpvhost', '/zenoss') ssl = global_conf.get('amqpusessl', '0') use_ssl = True if ssl in ('1', 'True', 'true') else False conn = Connection(host="%s:%s" % (hostname, port), userid=username, password=password, virtual_host=vhost, ssl=use_ssl) channel = conn.channel() for queue in sys.argv[1:]: print "Purging queue: %s" % queue channel.queue_purge(queue) channel.close() conn.close() }
In our example one can clear this queue by executing `env python flusher.py zenoss.queues.zep.zenevents` as root you can do this with one line to run a command as user zenoss as follows:
$ rabbitmqctl list_queues -p /zenoss $ su - zenoss -c "cd $ZENHOME; ./flusher.py zenoss.queues.zep.zenevents"
Once a queue has been cleared any daemons associated with that queue need to be restarted as they will have a large cache of the queue which can only be purged by a restart.
Warning: Don't purge queues needlessly. It should only be done if something has gone wrong and you're out of options. Chances are if the queues are insane something is broken elsewhere and it will happen again.
Warning: Restarting ZenOSS will usually make queue and event problems worse. Don't restart anything you do not have to. This is especially true of collectors.

Zeneventserver
Zeneventserver is the java and erlang portion of ZenOSS. Zeneventserver serves as the primary means of processing all events and communicating changes around the model. Zeneventserver is also responsible for communicating event system changes to connected clients via JSON calls.
- Depends on:
- zends / mysqld
- memcached
- rabbitmq-server
If any of these daemons fail to start or are left in an abnormal state zeneventserver will fail to start.
- Depended on by:
- zenhub
- zeneventd
- zenactions
If zeneventserver fails to start or is in an undead state these daemons will fail to start and/or operate properly.
Zeneventd
Zeneventd is a python based daemon in ZenOSS that is exclusively responsible for processing event transforms. Event transforms were processed by zenhub in the past but often resulted in an overloaded zenhub. Zeneventd, like zenhub, supports workers and is notoriously slow when bombarded. Efficient transforms are key to proper zeneventd performance. In a worst case scenario, more workers is better even should some be occasionally idle. AnyWorker is supported but has caused problems in the past.
Zenhub
Zenhub is a python based daemon in ZenOSS that is responsible for messaging between performance daemons, the event system and the ZODB. Zenhub primarily handles daemon configuration and is usually the cause of daemon configuration problems. Each collector, in the adopted ZenOSS topology, has its own dedicated Zenhub to avoid queue invalidation issues during mass daemon configuration (e.g. restarting a collector). Two to four workers per zenhub is generally enough for a fully loaded collector. AnyWorker is supported but has caused problems in the past.
Zenactiond
Zenactiond is the new fan-angled replacement for zenactions. Zenactiond is responsible for executing actions associated with notifications such as paging, email and executing commands. Zenactiond will periodically inspect the 'Signal' queue for signal messages and dump them in to its share of memcached. At such point zenactiond will act on the messages as instructed to in the associated notification. Zenactiond will continue to act on repeating events until it receives a signal to stop. That said, situations where zeneventd, zeneventserver or zenhub are plugged up with high queues or stop processing messages, zenactions will continue to page harass notification recipients until the queues are flushed and zenactiond restarted.