Zenoss tuning

From Zenoss Wiki
Jump to: navigation, search

Zope

Zope is the foundation of the Object Database the the web UI. Proper tuning of Zope is crucial for a positive user experience.

Key items to tune:

  • python-check-interval
  • zserver-threads
  • pool-size
  • cache-local-mb
  • cache-size
  • cache-module-name
  • cache-servers


python-check-interval

The python-check-interval determines interrupt and effects threading. This value will periodically need to be adjusted for load.

To calculate the value, ssh to the master then su zenoss. under zendmd execute the following:

import math; from test import pystone; int(math.ceil(sum(s[1] for s in (pystone.pystones() for i in range(3)))/150.0))

The returned value will be your python-check-interval. Try to compute this number after ZenOSS has settled from a startup or reconfiguration period otherwise the value will be less accurate.

Ambox check.png Recommended value: Calculate


zserver-threads

The zserver-threads value represents the number of threads zope will use to service web requests. Having this value too high or too low can cause undesirable web UI problems.

Ambox check.png Recommended value: 5

Bulbgraph.png Note: Generally speaking, the value should not be greater than 5.


pool-size

The pool-size sets the number of connections that are held open to the ZODB for zope. Instances with less than 50 users should omit this option altogether. Systems with greater than 50 users should set it to 10.

Ambox check.png Recommended (less than 50 users) value: Comment out option

Ambox check.png Recommended (greater than 50 users) value: 10

Note: In v 4.x, this is removed from zope.conf. It can be added in <zodb_db main> stanza.

cache-local-mb

The cache-local-mb value configures how much local caching each zope instance will use in addition to the cache-servers (if set). Depending on the size of the instance and the number of zopes, this value can vary.

Ambox check.png Recommended single zope configuration values:

  • 1-500 devices: 1536
  • 501-1000 devices: 2560
  • 1001-2000 devices: 3072
  • 2001+ devices: 4096

Ambox check.png Recommended dual zope configuration values:

  • 1-500 devices: 768
  • 501-1000 devices: 1705
  • 1001-2000 devices: 2048
  • 2001+ devices: 3072


Bulbgraph.png Note: See Multi-Zope for more information.
Bulbgraph.png Note: Many Zope configurations will require special evaluation.

Ambox warning.jpeg Warning: Don't over commit your available memory!


cache-size

The cache-size dictates how many objects each zope should cache. Try to keep this value at 115% of the total objects in the global catalog.

To calculate the value, ssh to the master then su zenoss. under zendmd execute the following:

len(dmd.global_catalog)

The returned value is the size of the global catalog in entries. Take 115% of this value for your cache-size.

Ambox check.png Recommended value: Calculated 115% of global catalog length.

Bulbgraph.png Note: Excess allocation offers no benefit.


cache-module-name

The cache-module-name entry instructs zope to use a specific caching plugin. In all cases use memcache.

Ambox check.png Recommended value: memcache


cache-servers

The cache-servers entry directs the caching plugin to a specific cache server. All but advanced ZenOSS configurations should set this to 127.0.0.1:11211, the localhost memcached server.

Ambox check.png Recommended value: 127.0.0.1:11211

Bulbgraph.png Note: Some distributed installations may run memcached on a dedicated box. In such case point to that IP address. Ensure port 11211 is open.


Zope.conf examples

File: $ZENHOME/etc/zope.conf

zodb_db stanza

<zodb_db main>
  mount-point /
  cache-size 1000000 #<- Len of global catalog is 875,000  ->#
  %import relstorage
  <relstorage>
    cache-local-mb 4096 #<- Device count equals 12,500 ->#
    cache-servers 127.0.0.1:11211 #<- memcached running on the master, default memcached port 11211 ->#
    cache-module-name memcache #<- using memcached plugin ->#
    keep-history false
    <mysql>
      host localhost
      port 3306
      db zodb
      user zenoss
      passwd zenoss
    </mysql>
  </relstorage>
</zodb_db>
Bulbgraph.png Note: This is only an example of part of the zope.conf.


Zeneventserver

Zeneventserver is the core of the ZenOSS v4.x event processing system.


Key items to tune:

  • Optimize flag
  • JDBC test on borrow workaround


Optimize flag

By default Zeneventserver is configured to issue an 'optimize table' against several key tables. Under certain circumstances this can cause the system to stop processing events during the lock. Optimization of these tables, in theory, renders only minor improvement in performance since the tables in question are frequently partitioned off.

In the $ZENHOME/etc/zeneventserver.conf, uncomment 'zep.database.optimize_minutes' and set it equal to 0.

Ambox check.png Recommended value: 0


JDBC test on borrow workaround

By default Zeneventserver is configured to not validate the status of reused connections. Circumstantially this can cause zeneventserver to lose connectivity with mysqld/zends.

In the $ZENHOME/etc/zeneventserver.conf, uncomment 'zep.jdbc.pool.test_on_borrow' and set it equal to true.

Ambox check.png Recommended value: true


JVM Memory Limit

By default Zeneventserver will use all the mory in your system and perhaps even the memory in servers nearby. (Just kidding) Inevitably it's a good idea to define a maximum allocation for the java VM.

In the $ZENHOME/bin/zeneventserver script and add JVM_ARGS="$JVM_ARGS -Xmx768m" after the other JVM arguments. Save the file and restart zeneventserver.

Ambox check.png Recommended value: Small Environments 768m

Ambox check.png Recommended value: Medium Environments 1024m

Ambox check.png Recommended value: Large Environments 4096m

Ambox check.png Recommended value: Jumbo Environments 8192m


Zeneventd

Zeneventd is a daemon that consumes from the zep.rawevents and populates the zep.zenevents rabbit queues. The main purpose of zeneventd is to execute transforms against events before they persist. In ZenOSS Core, zenventd is a single threaded, non-worker daemon. More than one zeneventd may be spawned to combat high rawevent queues.


Key items to tune:

  • cachesize
  • workers


cachesize

This value instructs zeneventd how large its cache should be. Depending on the event rate this value may need to be increased, however, a guideline would be 20,000 for small instances, 75,000 for medium instances and 150,000 for larger instances.

Ambox check.png Recommended (small instance) value: 20000

Ambox check.png Recommended (medium instance) value: 75000

Ambox check.png Recommended (large instance) value: 150000

workers

This value dictates how many workers Zeneventd will spawn. this option is available only if the EnterpriseCollectors ZenPack is installed.


Zenhub

Zenactiond

Zengomd

memcached

Commandline Arguments

Memcached comes equipped with basic documentation about its commandline arguments. View memcached -h or man memcached for up to date documentation. The service strives to have mostly sensible defaults.
When setting up memcached for the first time, you will pay attention to -m, -d, and -v. -m tells memcached how much RAM to use for item storage (in megabytes). Note carefully that thisisn't a global memory limit, so memcached will use a few % more memory than you tell it to. Set this to safe values. Setting it to less than 48 megabytes does not work properly in 1.4.x and earlier. It will still use the memory.
-d tells memcached to daemonize. If you're running from an init script you may not be setting this. If you're using memcached for the first time, it might be educational to start the service without -d and watching it.
-v controls verbosity to STDOUT/STDERR. Multiple -v's increase verbosity. A single one prints extra startup information, and multiple will print increasingly verbose information about requests hitting memcached. If you're curious to see if a test script is doing what you expect it to, running memcached in the foreground with a few verbose switches is a good idea.

Everything else comes with sensible defaults; you should alter these only if necessary.



Init Scripts


If you have installed memcached from your OS's package management system, odds are it already comes with an init script. They come with alternative methods to configure what startup options memcached receives. Such as via a /etc/sysconfig/memcached file. Make sure you check these before you run off editing init scripts or writing your own.
:If you're building memcached yourself, the 'scripts/' directory in the source tarball contains several examples of init scripts.


Multiple Instances

Running multiple local instances of memcached is trivial. If you're maintaining a developer environment or a localhost test cluster, simply change the port it listens on, ie: memcached -p 11212.
There is an unmerged (as of this writing) set of example init scripts for managing multiple instances over at bug 82. This will likely be merged (in some fashion) for 1.4.6.



Networking

By default memcached listens on TCP and UDP ports, both 11211. -l allows you to bind to specific interfaces or IP addresses. Memcached does not spend much, if any, effort in ensuring its defensibility from random internet connections. So you must not expose memcached directly to the internet, or otherwise any untrusted users. Using SASL authentication here helps, but should not be totally trusted.

TCP

-p changes where it will listen for TCP connections. When changing the port via -p, the port for UDP will follow suit.

UDP


-U modifies the UDP port, defaulting to on. UDP is useful for fetching or setting small items, not as useful for manipulating large items. Setting this to 0 will disable it, if you're worried.

Unix Sockets

If you wish to restrict a daemon to be accessable by a single local user, or just don't wish to expose it via networking, a unix domain socket may be used. -s <file> is the parameter you're after. If enabling this, TCP/UDP will be disabled.

Connection Limit

By default the max number of concurrent connections is set to 1024. Configuring this correctly is important. Extra connections to memcached may hang while waiting for slots to free up. You may detect if your instance has been running out of connections by issuing a stats command and looking at "listen_disabled_num". That value should be zero or close to zero.
Memcached can scale with a large number of connections very simply. The amount of memory overhead per connection is low (even lower if the connection is idle), so don't sweat setting it very high.
Lets say you have 5 webservers, each running apache. Each apache process has a MaxClients setting of 12. This means that the maximum number of concurrent connections you may receive is 5 x 12 (60). Always leave a few extra slots open if you can, for administrative tasks, adding more webservers, crons/scripts/etc.



Threading

Threading is used to scale memcached across CPU's. Its model is by "worker threads", meaning that each thread makes itself available to process as much as possible. Since using libevent allows good scalability with concurrent connections, each thread is able to handle many clients.
This is different from some webservers, such as apache, which use one process or one thread per active client connection. Since memcached is highly efficient, low numbers of threads are fine. In webserver land, it means it's more like nginx than apache.
By default 4 threads are allocated. Unless you are running memcached extremely hard, you should not set this number to be any higher. Setting it to very large values (80+) will not make it run any faster.


Inspecting Running Configuration

$ echo "stats settings" | nc localhost 11211
STAT maxbytes 67108864
STAT maxconns 1024
STAT tcpport 11211
STAT udpport 11211
STAT inter NULL
STAT verbosity 0
STAT oldest 0
STAT evictions on
STAT domain_socket NULL
STAT umask 700
STAT growth_factor 1.25
STAT chunk_size 48
STAT num_threads 4
STAT stat_key_prefix :
STAT detail_enabled no
STAT reqs_per_event 20
STAT cas_enabled yes
STAT tcp_backlog 1024
STAT binding_protocol auto-negotiate
STAT auth_enabled_sasl no
STAT item_size_max 1048576
END

cool huh? Between 'stats' and 'stats settings', you can double check that what you're telling memcached to do is what it's actually trying to do.

Memcached configuration file can be located at below path:

/etc/sysconfig/memcached 

rabbitmq-server

RabbitMQ is known to refuse to start if the hostname of the server is changed. You can prevent this with the following steps.

(Do not perform these steps if you are an Enterprise customer and are intending to use a High Availability setup.)

echo 'NODENAME=rabbit@localhost' > /etc/rabbitmq/rabbitmq-env.conf


service rabbitmq-server restart


rabbitmqctl add_user zenoss zenoss
rabbitmqctl add_vhost /zenoss
rabbitmqctl set_permissions -p /zenoss zenoss '.*' '.*' '.*'


service rabbitmq-server restart


rm -f /var/log/rabbitmq/*zenoss-base*
rm -rf /var/lib/rabbitmq/mnesia/*zenoss-base*


Adding rabbitmq user to sudoers file for user zenoss . Run below commands on Zenoss Master Server.

 su - zenoss
 sudo visudoer 

Add below line to the file

 %zenoss ALL = NOPASSWD: /usr/sbin 

Zends

The below ZenDS configuration is for a 15,000 node ZenOSS instance with a fairly high flow rate. Tune the innodb_buffer_pool_size and innodb_buffer_pool_instances for your hardware. Avoid changing innodb_thread_concurrency and thread_pool_size.

[mysqld]
socket = /var/lib/zends/zends.sock
pid-file = /var/run/zends/zends.pid
basedir = /opt/zends
datadir = /opt/zends/data
port = 13306
user = zenoss
innodb_file_per_table
skip_external_locking

#
# Per the current Zenoss Resource Manager Install Guide,
# please size innodb_buffer_pool_size according to the following
# guidelines:
#
# Deployment Size       Value of innodb_buffer_pool_size
# --------------------  --------------------------------
#    1 to  250 devices   512M
#  250 to  500 devices   768M
#  500 to 1000 devices  1024M
# 1000 to 2000 devices  2048M
#
# If ZenDS is installed on a dedicated system, this can (and should) be set
# to as much as 75% of available memory on the system.
#
innodb_buffer_pool_size = 16G
# log file size should be 25% of of buffer pool size
innodb_log_file_size = 512M
innodb_additional_mem_pool_size = 32M
innodb_log_buffer_size = 8M
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2

# In previous releases of MySQL, this was recommended to be set to 2 times the
# number of CPUs, however the default and recommended option in 5.5 is to not
# set a bound on the thread pool size.
innodb_thread_concurrency = 0

# Setting this setting to 0 is recommended in virtualized environments. If
# running virtualized, it is recommended to uncomment the setting below when
# seeing database performance issues.
#innodb_spin_wait_delay = 0

# In large installs, there were a significant number of mutex waits on the 
# adaptive hash index, and this needed to be disabled.
#innodb_adaptive_hash_index = OFF

# Use the Barracuda file format which enables support for dynamic and 
# compressed row formats.
innodb_file_format = Barracuda

# Enable the thread pool plug-in - recommended on 5.5.16 and later.
plugin-load = thread_pool.so
thread_pool_size = 32

# Disable the query cache - it provides negligible performance improvements
# and leads to significant thread contention under load.
query_cache_size = 0
query_cache_type = OFF

max_allowed_packet = 64M
wait_timeout = 86400

# 1,000 is usually enough for moderately large deployments
max_connections = 1000

# Enable dedicated purge thread. (default is 0)
innodb_purge_threads = 1

# Introduce operation lag to allow purge operations. (default is 0)
innodb_max_purge_lag = 0

# Set buffer pool instances (cpu core count for physical machines, subtract one for VMs)
innodb_buffer_pool_instances = 16

[client]
socket = /var/lib/zends/zends.sock
user = zenoss

[mysql]
max_allowed_packet = 64M
prompt = "zends> "

[mysqldump]
max_allowed_packet = 64M

If using ZenOSS Analytics, be sure to include the following under [mysqld]:

open_files_limit=200000
event_scheduler=ON

# Need to tweak key_buffer_size, 25% of system memory for analytics
key_buffer_size = 64G

# This setting is for Innodb files, Analytics 4.x uses very few of these
innodb_buffer_pool_size = 8G

mysqld

The base config that comes with MySQL is not optimized for performance and therefore requires tweaking to obtain maximum performance with Zenoss.


The following config is a good base config to use, even for a smaller instance of Zenoss (/etc/my.cnf):

[mysqld]
skip_external_locking
innodb_buffer_pool_size = 512M
innodb_log_file_size = 64M
innodb_additional_mem_pool_size = 32M
innodb_log_buffer_size = 8M
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2
innodb_thread_concurrency = 8
innodb_file_format = Barracuda
innodb_purge_threads = 1
innodb_max_purge_lag = 100000
query_cache_size = 0
query_cache_type = OFF
max_allowed_packet = 64M
wait_timeout = 86400
interactive_timeout = 86400
max_connections = 500
max_user_connections = 500

[client]
user = zenoss

[mysql]
max_allowed_packet = 64M

[mysqldump]
max_allowed_packet = 64M


After adding the above options you must do the following (this must be done any time innodb_log_file_size changes):

service mysql stop
rm /var/lib/mysql/ib_logfile*
service mysql start


If running a larger instance then you can increase the following settings (the values shown here are examples, adjust according to your needs):

innodb_buffer_pool_size = 4096M
innodb_log_file_size = 512M
max_user_connections = 1500


You can also optionally add the following settings to potentially improve performance on busier databases:

innodb_purge_threads = 1
innodb_max_purge_lag = 100000


It is also highly recommended that you perform the tuning described in the Optimize flag and JDBC test on borrow workaround sections of this page.

zenSOS

Not yet complete.