===== Configurar nginx_status y un check de Nagios para monitorizarlo (+ Datos de rendimiento) =====
==== Instalar / Configurar Nginx status ====
Requerimiento: Nginx debe estar compilado con la opción "//--with-http_stub_status_module//".
Crear fichero de configuración: /etc/nginx/sites-enabled/status
server {
listen 127.0.0.1:80;
listen [::1]:80;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
service nginx restart
URL: http://127.0.0.1/nginx_status
==== Check para "Nagios status" (check_nginx_status) ====
El check en python es compatible con PNP4Nagios ya que ofrece datos de rendimiento (Performance Data).
Ejemplo: Monitorizando las conexiones activas.
/usr/local/nagios/libexec/check_nginx_status -H 127.0.0.1 -t active_conns -w 150 -c 200
NginxStatus.ActiveConnections OK [ 91 ] | ac=91;acc=86; han=96; req=196; err=2; rpc=5; rps=2; cps=10; dreq=25; dcon=20; read=0; writ=1; wait=0; ct=20ms;
Ejemplo de entrada en nrpe.cfg
command[check_nginx_status]=/usr/local/nagios/libexec/check_nginx_status -H 127.0.0.1 -t active_conns -w 15 -c 30
Ejemplo de configuración de servicio (Servidor Nagios).
define service{
use local-service,srv-pnp
host_name busindre.com
service_description Nginx ac
check_command check_nrpe!check_nginx_status
}
Lógicamente se podría utilizar sin necesidad de NRPE si se configura nginx status para recibir peticiones desde el exterior.
#!/usr/bin/python
#
# (c) copyright 2012-2015 dogtown@mare-system.de
#
# License: GPL v2
#
# dload: https://bitbucket.org/maresystem/dogtown-nagios-plugins
#
#
# credits / this plugin is inspired by:
# check_nginx
# yangzi2008@126.com
# http://exchange.nagios.org/directory/Plugins/Web-Servers/nginx/check_nginx/details
#
# check_nginx_status.pl
# regis.leroy@gmail.com
# http://exchange.nagios.org/directory/Plugins/Web-Servers/nginx/check_nginx_status-2Epl/details
#
# gustavo chaves for findinx and fixing some bugs
#
# reimplemented by dogtown with more features
#
# TODO:
# - make -a work
# - make -t test1,test2,test available w/ -w 1,2,3 -c 1,2,3
#
#
import string, urllib2, getopt, sys, time, os
version = "0.3.0.59 - rc-2 - 2015-02-02"
### default_values
# default status url
url = "/nginx_status"
# default host to check
host = "localhost"
# default nginx_poret; is set to 443 if -s is used
port = 80
# default result-file for calculations, see -r/-n options
# set to 0 if you want to deactivate this feature globally,
# can be turned on using -r if so
result_file = "/tmp/check_nginx.results"
# definitions
def usage():
print """
check_nginx_status is a Nagios-Plugin
to monitor nginx status and alerts on various values,
based on teh output from HttpStubStatus - Module
it also creates, based on the returned values, a csv to store data
Usage:
check_nginx_status [-H|--HOST] [-p|--port] [-u|--url] [-a|--auth] [-s|--ssl]
[-t|--test] [-w|--warning] [-c|--critical]
[-o|--output] [-r|--resultfile][-n|--noresult]
[-h|--help] [-v|--version] [-d|--debug]
Options:
--help|-h)
print check_nginx_status help
--HOST|-H)
Sets nginx host
Default: localhost
--port|-p)
Sets connection-port
Default: 80/http, 443/https
--ssl|-s)
Turns on SSL
Default: off
--url|-u)
Sets nginx status url path.
Default: /nginx_status
--auth|-a)
Sets nginx status BasicAuth user:password.
Default: off
***
--test|-t)
Sets the test(check)_value for w/c
if used, -w/-c is mandatory
Default: checktime
possible Values:
active_conns -> active connections
accepts_err -> difference between accepted and
handled requests (should be 0)
requests -> check for requests/connection
reading -> actual value for reading headers
writing -> value for active requests
waiting -> actual keep-alive-connections
checktime -> checks if this check need more than
given -w/-c milliseconds
--calculated checks ---------------
rps -> requests per seconds
cps -> connections per second
dreq -> delta requests to the previous one
dcon -> delta connections to the previous one
these checks are calculated at runtime with a timeframe
between the latest and the current check; time is
extracted from the timestamp of the result_file
to disable calculation (no files are written) use -n;
you cannot use -t [rps,cps,dreq,dcon] with -n; this
will raise an error and the plugin returns UNKNOWN
see -r - option for an alternate filepath for temporary results
--warning|-w)
Sets a warning level for selected test(check)
Default: off
--critical|-c)
Sets a critical level for selected test(check)
Default: off
--debug|-d)
turn on debugging - messages (use this for manual testing,
never via nagios-checks; beware of the messy output
Default: off
--version|-v)
display version and exit
--output|-o)
output only values from selected tests in perfdata; if used w/out -t
the check returns the value for active connections
--resultfile|-r)
/path/to/check_nginx.results{.csv}
please note, beside the values from the actual check
(eg.g check_nginx.results) a second
file is created, if not existent, and written on each plugin-run
(check_nginx.results.csv), containign a historic view on all
extracted values
default: /tmp/check_nginx.results{.csv}
--noresult|-n)
never write a results-file; CANNOT be used with calculated checks
-t [rps|cps|dreq|dcon]
default: off
*** ) -> please dont use this option, not implemented or not functional
Examples:
just get all perfdata, url is default (/nginx_status)
./check_nginx_status --HOST www.example.com
just get active connections perfdata
./check_nginx_status -H www.example.com -o
check for plugin_checktime, error > 10ms (warning) or 50ms (error) and output
only perfdata for that values
./check_nginx_status -H www.example.com -u /status -w 10 -c 50 -o
check for requests per second, alert on > 300/1000 active connections
./check_nginx_status -H www.example.com -u /status -t rps -w 300 -c 1000
Check for accepts_errors
./check_nginx_status -H www.example.com -t accepts_err -w 1 -c 50
Performancedata:
NginxStatus.Check OK | ac=1;acc=64; han=64; req=64; err=0; rpc=1; rps=0; cps=0; dreq=1;
dcon=1; read=0; writ=1; wait=0; ct=6ms;
ac -> active connections
acc -> totally accepted connections
han -> totally handled connections
req -> total requests
err -> diff between acc - han, thus errors
rpc -> requests per connection (req/han)
rps -> requests per second (calculated) from last checkrun vs actual values
cps -> connections per (calculated) from last checkrun vs actual values
dreq -> request-delta from last checkrun vs actual values
dcon -> accepted-connection-delta from last checkrun vs actual values
read -> reading requests from clients
writ -> reading request body, processes request, or writes response to a client
wait -> keep-alive connections, actually it is ac - (read + writ)
ct -> checktime (connection time) for this check
rpc/rps/dreq/dcon are always set to 0 if -n is used
Nginx-Config
be sure to have your nginx compiled with Status-Module
(--with-http_stub_status_module), you might want to test
your installation with nginx -V
http://wiki.nginx.org/HttpStubStatusModule
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
Requirements:
nginx compiled with HttpStubStatusModule (see Nginx-Config)
python 2.x
this plugin is not yet compatible with python 3.x,
but should be converted easily, using 2to3
Docs & Download:
https://bitbucket.org/maresystem/dogtown-nagios-plugins
"""
def ver():
print """
check_nginx_status
version : %s
usage : check_nginx_status -h
""" % version
def print_debug(dtext):
if debug == 1:
print "[d] %s" % dtext
return(0)
def calculate():
rps = cps = dreq = dcon = 0
if result_file != 0:
if not os.path.isfile(result_file):
print_debug("no result_file found, creating with next run :: %s " % result_file)
else:
#~
try:
f = open(result_file, "r")
ro = f.readline().split(";")
f.close()
o_ac = int(ro[0])
o_acc = int(ro[1])
o_han = int(ro[2])
o_req = int(ro[3])
o_err = int(ro[4])
o_rpc = int(ro[5])
o_rps = int(ro[6])
o_cps = int(ro[7])
o_dreq = int(ro[8])
o_dcon = int(ro[9])
o_read = int(ro[10])
o_writ = int(ro[11])
o_wait = int(ro[12])
o_ct = int(ro[13])
now = int(time.time())
last = int(os.path.getmtime(result_file))
dtime = now - last
if req >= o_req:
dreq = req - o_req
else:
dreq = req
rps = int(dreq / dtime)
if acc >= o_acc:
dcon = acc - o_acc
else:
dcon = acc
cps = int(dcon / dtime)
except:
print_debug("cannot read/process result_file :: %s \n use -r" % result_file)
#return(rps, cps, dreq, dcon)
else:
if test in ("rps", "cps", "dreq", "dcon"):
print "NginxStatus.%s UNKNONW - noresult selected (-n), cannot calculate test_results" % (test)
sys.exit(3)
print_debug("noresult selected, return 0_values")
return(rps, cps, dreq, dcon)
print_debug("writing result_file -> %s" % result_file)
try:
f = open(result_file, "w")
f.write("%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s;\n" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct))
f.close()
except:
print_debug("cannot create result_file :: %s \n use -r" % result_file)
return(rps, cps, dreq, dcon)
csv = "%s.csv" % result_file
if not os.path.isfile(csv):
try:
print_debug("creating result_file.csv -> %s" % result_file)
f = open(csv, "w")
f.write(""""active conns"; "accepted"; "handled"; "requests"; "req_errors"; "reqs per conn"; "reqs per sec"; "conns per sec"; "delta reqs"; "delta conns"; "reading"; "writing"; "waiting"; "checktime"; \n""")
except:
print "ERR.cannot create result_file.csv :: %s \n use -r" % csv
print_debug("writing result_file.csv -> %s.csv" % result_file)
try:
f = open(csv, "a")
f.write("%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s;" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct))
f.close()
except:
print "ERR.cannot write result_file.csv :: %s \n use -r" % csv
return(rps, cps, dreq, dcon)
#### main
exot = 0
ssl = 0
debug = 0
test = 0
w = c = 0
output = 0
user = passwd = 0
msg = "CheckNginx - UNKNOWN"
perfdata = ""
try:
iv = sys.argv[1]
except:
print """
usage: check_nginx_status -h
-----------------------------------------------------"""
usage()
sys.exit(3)
try:
options,args = getopt.getopt(sys.argv[1:],"ndovDshH:p:u:p:w:c:t:r:",["help","SSL","Debug","HOST","port","auth","test","url","warning","critical", "output", "resultfile", "noresult"])
except getopt.GetoptError:
usage()
sys.exit(3)
for name,value in options:
if name in ("-H","--HOST"):
host = "%s" % value
elif name in ("-u","--url"):
url = value
elif name in ("-a","--auth"):
user, passwd = value.split(":")
elif name in ("-s","--ssl"):
ssl = 1
elif name in ("-r","--resultfile"):
result_file = "%s" % value
elif name in ("-t","--test"):
test = "%s" % value
elif name in ("-d","--debug"):
debug = 1
elif name in ("-o","--output"):
output = 1
elif name in ("-n","--noresult"):
result_file = 0
elif name in ("-p","--port"):
try:
port = int(value)
except:
print("""%s Usage.ERROR - -p [PORT] must be an Integer """ % msg)
exot = 3
elif name in ("-w","--warning"):
try:
w = int(value)
except:
print("""%s Usage.ERROR -w [WARNING] must be an Integer """ % msg)
exot = 3
elif name in ("-c","--critical"):
try:
c = int(value)
except:
print("""%s Usage.ERROR -c [CRITICAL] must be an Integer """ % msg)
exot = 3
elif name in ("-v","--version"):
ver()
sys.exit(0)
else:
usage()
ver()
sys.exit(0)
if exot != 0:
sys.exit(exot)
# creating test-url
if host.find("http") > -1:
print("""%s Usage.ERROR - use -H [hostname], NOT -H [http://hostname] (%s)""" (msg, host))
sys.exit(3)
if ssl == 1:
turl = "https://%s" % host
print_debug("setting HTTP**S**")
else:
turl = "http://%s" % host
print_debug("setting HTTP")
if port != 80:
turl = "%s:%s" % (turl, port)
print_debug("setting Port: %s" % port)
curl = "%s%s" % (turl, url)
print_debug("final url to fetch: %s" % curl)
# start_time for checktime-calculation
st = time.time()
try:
req = urllib2.Request(curl)
response = urllib2.urlopen(req)
status = response.readlines()
print_debug("returned_status from url: \n %s" % " ".join(status))
response.close()
#~ if 'user' in dir() and 'passwd' in dir():
#~ passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
#~ passman.add_password(None, curl, user, passwd)
#~ authhandler = urllib2.HTTPBasicAuthHandler(passman)
#~ opener = urllib2.build_opener(authhandler)
#~ urllib2.install_opener(opener)
# TODO: http://stackoverflow.com/questions/2712524/handling-urllib2s-timeout-python
except Exception:
print "%s: Error while getting Connection :: %s " % (msg, curl)
sys.exit(3)
if len(status) == 0:
print "%s: No values found in %s " % (msg , curl)
sys.exit(3)
# end_time for checktime-calculation
et = time.time()
try:
ct = int((et - st)*1000)
l1 = status[0]
ac = int(l1.split(":")[1].strip())
l2 = status[2]
acc, han, req = l2.split()
acc = int(acc)
han = int(han)
req = int(req)
err = acc - han
rpc = int(req/han)
l3 = status[3]
read = int((l3.split("Reading:")[1]).split()[0])
writ = int((l3.split("Writing:")[1]).split()[0])
wait = int((l3.split("Waiting:")[1]).split()[0])
except:
print "%s: Error while trying to convert values from status_url %s " % (msg , curl)
for line in status:
print " :: %s" % line.strip()
sys.exit(3)
# calculate results, if wanted
rps, cps, dreq, dcon = calculate()
# creating needed output
print_debug("""-- status-report (perfdata)---
active_conns : %s
accepted conns : %s
handled : %s
requests : %s
accept_errors : %s
req per conn : %s
req per second : %s
conn per second : %s
delta requests : %s
delta conns : %s
reading : %s
writing : %s
waiting : %s
checktime : %s ms
""" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct ))
#~ if test == 0:
#~ if w == 0 or c == 0:
#~ pass
#~ else:
#~ test = "checktime"
if test != 0:
if w == 0:
print("""Usage.ERROR :: -w [WARNING] must be set and Integer (cannot be 0)""")
sys.exit(3)
if c == 0:
print("""Usage.ERROR :: -c [CRITICAL] must be set and Integer (cannot be 0)""")
sys.exit(3)
# default test_text
tt = "unknown"
# checking which test to perform
if test == 0:
ta = ac
tt = "Check"
elif test == "active_conns":
ta = ac
tt = "ActiveConnections"
elif test == "accepts_err":
ta = err
tt = "AcceptErrors"
elif test == "requests":
ta = req
tt = "Requests/Connection"
elif test == "reading":
ta = read
tt = "Reading"
elif test == "writing":
ta = writ
tt = "Writing"
elif test == "waiting":
ta = wait
tt = "Waiting"
# calculated checks
elif test == "rps":
ta = rps
tt = "Req_per_second"
elif test == "cps":
ta = cps
tt = "Conn_per_second"
elif test == "dreq":
ta = dreq
tt = "Delta_Requests"
elif test == "dcon":
ta = dreq
tt = "Delta_Conn"
else:
ta = ct
tt = "CheckTime"
print_debug("set test: %s" % tt)
dt = "NginxStatus.%s" % tt
# creating perfdata
if output == 1:
if test == 0:
perfdata="active_conns=%s;" % ta
else:
perfdata = "%s=%s;" % (test, ta)
else:
perfdata = "ac=%s;acc=%s; han=%s; req=%s; err=%s; rpc=%s; rps=%s; cps=%s; dreq=%s; dcon=%s; read=%s; writ=%s; wait=%s; ct=%sms;" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct )
print_debug("perfdata: %s" % perfdata)
if test == 0:
print "%s OK [ %s ] | %s" % (dt, ta, perfdata)
sys.exit(0)
if ta >= c:
print "%s CRITICAL: %s | %s" % (dt, ta, perfdata)
sys.exit(2)
elif ta >= w:
print "%s WARNING: %s | %s" % (dt, ta, perfdata)
sys.exit(1)
else:
print "%s OK [ %s ] | %s" % (dt, ta, perfdata)