Queue Troubleshooting

From Zenoss Wiki
This is the approved revision of this page, as well as being the most recent.
Jump to: navigation, search

Traffic-jam.jpg Queue traffic jam

At times queues can begin to pile up and the event flow can stop entirely. Determining which queue is piling up is the first step. As the root user on the master, `/usr/sbin/rabbitmqctl list_queues -p /zenoss`. Once you've identified which queue is the issue, take a look at the queues associated consumer. Take a look below for most common cases or ZenOSS Logical Model for advanced troubleshooting.

Symptom Queue not being consumed:

  • If Zep.RawEvents is filling and Zep.ZenEvents is not, the problem is likely zeneventd.
  • If Zep.ZenEvents is filling, the problem is likely zeneventserver.
  • If Zep.Signal is filling, the problem is likely zenactiond.
  • If Gom.{hash} is filling, the problem is likely zengomd.
  • If the Heartbeat queue is filling, the problem is likely zeneventserver.
  • If the celery queue is filling, the problem is likely zeneventserver.
  • If the hub.invalidation queue is filling faster than it empties for a significant period of time and the hub/collecotrs weren't just restarted, more hub invalidation workers might be needed.
  • If the hub.invalidation queue is filling faster than it empties for a significant period of time and the hub/collecotrs were just restarted, the queue may take time to empty.
  • If the hub.collectorcall queue is filling faster than it empties for a significant period of time, a collector daemon might be down. Check the collector.
  • If the tune queue is greater than 0 for several minutes then zentune may have hung.


Once you've determined which daemon is at fault for the pileup, diagnose it. See below for common cases.

Daemon Problems:

Zeneventd isnt consuming its queue:
  • Check to make sure zeneventd is running and hasn't hung.
  • Verify that there aren't any transforms that could be significantly bogging zeneventd down or triggering a bug.
  • Start initially by looking at zeneventd.log. If nothing stands out, optionally zeneventd debug to increase log verbosity.
Zenactiond isnt consuming its queue:
  • Check to make sure zenactiond is running and hasn't hung.
  • Verify that zenactiond isn't having a problem sending messages to the local postfix daemon. Double check cpolicyd.
  • Start initially by looking at zenactiond.log. If nothing stands out, optionally zenactiond debug to increase log verbosity.
  • Don't start zenactiond without flusing the Zep.Signal queue if the queue is very large. This will cause zenactiond to act on those events and could potentially send a large number of pages or emails.
Zeneventserver isnt consuming its heartbeat/zenevent queue:
  • Check to make sure zeneventserver is running and hasn't hung.
  • Check the zeneventserver.log to verify if zeneventserver is having a problem accessing zends/mysql or amqp.
  • Check zends/mysql to validate if the maximum connections limit has been reached. Consider the zeneventserver JDBC workaround if the primary consumer is zep.
  • If the log looks normal, check the process list in zends/zends. Look for deadlocks or table optimizes. Other problem processes are possible, but these are most common.
  • If there are no obvious problems, consider stopping zeneventserver and restarting it. Some zeneventserver processes could take long periods of time and as a result keep it from promply processing other queues. One example of this is the ModelState consumer.