===== Configurar nginx_status y un check de Nagios para monitorizarlo (+ Datos de rendimiento) ===== ==== Instalar / Configurar Nginx status ==== Requerimiento: Nginx debe estar compilado con la opción "//--with-http_stub_status_module//". Crear fichero de configuración: /etc/nginx/sites-enabled/status server { listen 127.0.0.1:80; listen [::1]:80; location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } } service nginx restart URL: http://127.0.0.1/nginx_status ==== Check para "Nagios status" (check_nginx_status) ==== El check en python es compatible con PNP4Nagios ya que ofrece datos de rendimiento (Performance Data). Ejemplo: Monitorizando las conexiones activas. /usr/local/nagios/libexec/check_nginx_status -H 127.0.0.1 -t active_conns -w 150 -c 200 NginxStatus.ActiveConnections OK [ 91 ] | ac=91;acc=86; han=96; req=196; err=2; rpc=5; rps=2; cps=10; dreq=25; dcon=20; read=0; writ=1; wait=0; ct=20ms; Ejemplo de entrada en nrpe.cfg command[check_nginx_status]=/usr/local/nagios/libexec/check_nginx_status -H 127.0.0.1 -t active_conns -w 15 -c 30 Ejemplo de configuración de servicio (Servidor Nagios). define service{ use local-service,srv-pnp host_name busindre.com service_description Nginx ac check_command check_nrpe!check_nginx_status } Lógicamente se podría utilizar sin necesidad de NRPE si se configura nginx status para recibir peticiones desde el exterior. #!/usr/bin/python # # (c) copyright 2012-2015 dogtown@mare-system.de # # License: GPL v2 # # dload: https://bitbucket.org/maresystem/dogtown-nagios-plugins # # # credits / this plugin is inspired by: # check_nginx # yangzi2008@126.com # http://exchange.nagios.org/directory/Plugins/Web-Servers/nginx/check_nginx/details # # check_nginx_status.pl # regis.leroy@gmail.com # http://exchange.nagios.org/directory/Plugins/Web-Servers/nginx/check_nginx_status-2Epl/details # # gustavo chaves for findinx and fixing some bugs # # reimplemented by dogtown with more features # # TODO: # - make -a work # - make -t test1,test2,test available w/ -w 1,2,3 -c 1,2,3 # # import string, urllib2, getopt, sys, time, os version = "0.3.0.59 - rc-2 - 2015-02-02" ### default_values # default status url url = "/nginx_status" # default host to check host = "localhost" # default nginx_poret; is set to 443 if -s is used port = 80 # default result-file for calculations, see -r/-n options # set to 0 if you want to deactivate this feature globally, # can be turned on using -r if so result_file = "/tmp/check_nginx.results" # definitions def usage(): print """ check_nginx_status is a Nagios-Plugin to monitor nginx status and alerts on various values, based on teh output from HttpStubStatus - Module it also creates, based on the returned values, a csv to store data Usage: check_nginx_status [-H|--HOST] [-p|--port] [-u|--url] [-a|--auth] [-s|--ssl] [-t|--test] [-w|--warning] [-c|--critical] [-o|--output] [-r|--resultfile][-n|--noresult] [-h|--help] [-v|--version] [-d|--debug] Options: --help|-h) print check_nginx_status help --HOST|-H) Sets nginx host Default: localhost --port|-p) Sets connection-port Default: 80/http, 443/https --ssl|-s) Turns on SSL Default: off --url|-u) Sets nginx status url path. Default: /nginx_status --auth|-a) Sets nginx status BasicAuth user:password. Default: off *** --test|-t) Sets the test(check)_value for w/c if used, -w/-c is mandatory Default: checktime possible Values: active_conns -> active connections accepts_err -> difference between accepted and handled requests (should be 0) requests -> check for requests/connection reading -> actual value for reading headers writing -> value for active requests waiting -> actual keep-alive-connections checktime -> checks if this check need more than given -w/-c milliseconds --calculated checks --------------- rps -> requests per seconds cps -> connections per second dreq -> delta requests to the previous one dcon -> delta connections to the previous one these checks are calculated at runtime with a timeframe between the latest and the current check; time is extracted from the timestamp of the result_file to disable calculation (no files are written) use -n; you cannot use -t [rps,cps,dreq,dcon] with -n; this will raise an error and the plugin returns UNKNOWN see -r - option for an alternate filepath for temporary results --warning|-w) Sets a warning level for selected test(check) Default: off --critical|-c) Sets a critical level for selected test(check) Default: off --debug|-d) turn on debugging - messages (use this for manual testing, never via nagios-checks; beware of the messy output Default: off --version|-v) display version and exit --output|-o) output only values from selected tests in perfdata; if used w/out -t the check returns the value for active connections --resultfile|-r) /path/to/check_nginx.results{.csv} please note, beside the values from the actual check (eg.g check_nginx.results) a second file is created, if not existent, and written on each plugin-run (check_nginx.results.csv), containign a historic view on all extracted values default: /tmp/check_nginx.results{.csv} --noresult|-n) never write a results-file; CANNOT be used with calculated checks -t [rps|cps|dreq|dcon] default: off *** ) -> please dont use this option, not implemented or not functional Examples: just get all perfdata, url is default (/nginx_status) ./check_nginx_status --HOST www.example.com just get active connections perfdata ./check_nginx_status -H www.example.com -o check for plugin_checktime, error > 10ms (warning) or 50ms (error) and output only perfdata for that values ./check_nginx_status -H www.example.com -u /status -w 10 -c 50 -o check for requests per second, alert on > 300/1000 active connections ./check_nginx_status -H www.example.com -u /status -t rps -w 300 -c 1000 Check for accepts_errors ./check_nginx_status -H www.example.com -t accepts_err -w 1 -c 50 Performancedata: NginxStatus.Check OK | ac=1;acc=64; han=64; req=64; err=0; rpc=1; rps=0; cps=0; dreq=1; dcon=1; read=0; writ=1; wait=0; ct=6ms; ac -> active connections acc -> totally accepted connections han -> totally handled connections req -> total requests err -> diff between acc - han, thus errors rpc -> requests per connection (req/han) rps -> requests per second (calculated) from last checkrun vs actual values cps -> connections per (calculated) from last checkrun vs actual values dreq -> request-delta from last checkrun vs actual values dcon -> accepted-connection-delta from last checkrun vs actual values read -> reading requests from clients writ -> reading request body, processes request, or writes response to a client wait -> keep-alive connections, actually it is ac - (read + writ) ct -> checktime (connection time) for this check rpc/rps/dreq/dcon are always set to 0 if -n is used Nginx-Config be sure to have your nginx compiled with Status-Module (--with-http_stub_status_module), you might want to test your installation with nginx -V http://wiki.nginx.org/HttpStubStatusModule location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } Requirements: nginx compiled with HttpStubStatusModule (see Nginx-Config) python 2.x this plugin is not yet compatible with python 3.x, but should be converted easily, using 2to3 Docs & Download: https://bitbucket.org/maresystem/dogtown-nagios-plugins """ def ver(): print """ check_nginx_status version : %s usage : check_nginx_status -h """ % version def print_debug(dtext): if debug == 1: print "[d] %s" % dtext return(0) def calculate(): rps = cps = dreq = dcon = 0 if result_file != 0: if not os.path.isfile(result_file): print_debug("no result_file found, creating with next run :: %s " % result_file) else: #~ try: f = open(result_file, "r") ro = f.readline().split(";") f.close() o_ac = int(ro[0]) o_acc = int(ro[1]) o_han = int(ro[2]) o_req = int(ro[3]) o_err = int(ro[4]) o_rpc = int(ro[5]) o_rps = int(ro[6]) o_cps = int(ro[7]) o_dreq = int(ro[8]) o_dcon = int(ro[9]) o_read = int(ro[10]) o_writ = int(ro[11]) o_wait = int(ro[12]) o_ct = int(ro[13]) now = int(time.time()) last = int(os.path.getmtime(result_file)) dtime = now - last if req >= o_req: dreq = req - o_req else: dreq = req rps = int(dreq / dtime) if acc >= o_acc: dcon = acc - o_acc else: dcon = acc cps = int(dcon / dtime) except: print_debug("cannot read/process result_file :: %s \n use -r" % result_file) #return(rps, cps, dreq, dcon) else: if test in ("rps", "cps", "dreq", "dcon"): print "NginxStatus.%s UNKNONW - noresult selected (-n), cannot calculate test_results" % (test) sys.exit(3) print_debug("noresult selected, return 0_values") return(rps, cps, dreq, dcon) print_debug("writing result_file -> %s" % result_file) try: f = open(result_file, "w") f.write("%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s;\n" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct)) f.close() except: print_debug("cannot create result_file :: %s \n use -r" % result_file) return(rps, cps, dreq, dcon) csv = "%s.csv" % result_file if not os.path.isfile(csv): try: print_debug("creating result_file.csv -> %s" % result_file) f = open(csv, "w") f.write(""""active conns"; "accepted"; "handled"; "requests"; "req_errors"; "reqs per conn"; "reqs per sec"; "conns per sec"; "delta reqs"; "delta conns"; "reading"; "writing"; "waiting"; "checktime"; \n""") except: print "ERR.cannot create result_file.csv :: %s \n use -r" % csv print_debug("writing result_file.csv -> %s.csv" % result_file) try: f = open(csv, "a") f.write("%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s; %s;" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct)) f.close() except: print "ERR.cannot write result_file.csv :: %s \n use -r" % csv return(rps, cps, dreq, dcon) #### main exot = 0 ssl = 0 debug = 0 test = 0 w = c = 0 output = 0 user = passwd = 0 msg = "CheckNginx - UNKNOWN" perfdata = "" try: iv = sys.argv[1] except: print """ usage: check_nginx_status -h -----------------------------------------------------""" usage() sys.exit(3) try: options,args = getopt.getopt(sys.argv[1:],"ndovDshH:p:u:p:w:c:t:r:",["help","SSL","Debug","HOST","port","auth","test","url","warning","critical", "output", "resultfile", "noresult"]) except getopt.GetoptError: usage() sys.exit(3) for name,value in options: if name in ("-H","--HOST"): host = "%s" % value elif name in ("-u","--url"): url = value elif name in ("-a","--auth"): user, passwd = value.split(":") elif name in ("-s","--ssl"): ssl = 1 elif name in ("-r","--resultfile"): result_file = "%s" % value elif name in ("-t","--test"): test = "%s" % value elif name in ("-d","--debug"): debug = 1 elif name in ("-o","--output"): output = 1 elif name in ("-n","--noresult"): result_file = 0 elif name in ("-p","--port"): try: port = int(value) except: print("""%s Usage.ERROR - -p [PORT] must be an Integer """ % msg) exot = 3 elif name in ("-w","--warning"): try: w = int(value) except: print("""%s Usage.ERROR -w [WARNING] must be an Integer """ % msg) exot = 3 elif name in ("-c","--critical"): try: c = int(value) except: print("""%s Usage.ERROR -c [CRITICAL] must be an Integer """ % msg) exot = 3 elif name in ("-v","--version"): ver() sys.exit(0) else: usage() ver() sys.exit(0) if exot != 0: sys.exit(exot) # creating test-url if host.find("http") > -1: print("""%s Usage.ERROR - use -H [hostname], NOT -H [http://hostname] (%s)""" (msg, host)) sys.exit(3) if ssl == 1: turl = "https://%s" % host print_debug("setting HTTP**S**") else: turl = "http://%s" % host print_debug("setting HTTP") if port != 80: turl = "%s:%s" % (turl, port) print_debug("setting Port: %s" % port) curl = "%s%s" % (turl, url) print_debug("final url to fetch: %s" % curl) # start_time for checktime-calculation st = time.time() try: req = urllib2.Request(curl) response = urllib2.urlopen(req) status = response.readlines() print_debug("returned_status from url: \n %s" % " ".join(status)) response.close() #~ if 'user' in dir() and 'passwd' in dir(): #~ passman = urllib2.HTTPPasswordMgrWithDefaultRealm() #~ passman.add_password(None, curl, user, passwd) #~ authhandler = urllib2.HTTPBasicAuthHandler(passman) #~ opener = urllib2.build_opener(authhandler) #~ urllib2.install_opener(opener) # TODO: http://stackoverflow.com/questions/2712524/handling-urllib2s-timeout-python except Exception: print "%s: Error while getting Connection :: %s " % (msg, curl) sys.exit(3) if len(status) == 0: print "%s: No values found in %s " % (msg , curl) sys.exit(3) # end_time for checktime-calculation et = time.time() try: ct = int((et - st)*1000) l1 = status[0] ac = int(l1.split(":")[1].strip()) l2 = status[2] acc, han, req = l2.split() acc = int(acc) han = int(han) req = int(req) err = acc - han rpc = int(req/han) l3 = status[3] read = int((l3.split("Reading:")[1]).split()[0]) writ = int((l3.split("Writing:")[1]).split()[0]) wait = int((l3.split("Waiting:")[1]).split()[0]) except: print "%s: Error while trying to convert values from status_url %s " % (msg , curl) for line in status: print " :: %s" % line.strip() sys.exit(3) # calculate results, if wanted rps, cps, dreq, dcon = calculate() # creating needed output print_debug("""-- status-report (perfdata)--- active_conns : %s accepted conns : %s handled : %s requests : %s accept_errors : %s req per conn : %s req per second : %s conn per second : %s delta requests : %s delta conns : %s reading : %s writing : %s waiting : %s checktime : %s ms """ % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct )) #~ if test == 0: #~ if w == 0 or c == 0: #~ pass #~ else: #~ test = "checktime" if test != 0: if w == 0: print("""Usage.ERROR :: -w [WARNING] must be set and Integer (cannot be 0)""") sys.exit(3) if c == 0: print("""Usage.ERROR :: -c [CRITICAL] must be set and Integer (cannot be 0)""") sys.exit(3) # default test_text tt = "unknown" # checking which test to perform if test == 0: ta = ac tt = "Check" elif test == "active_conns": ta = ac tt = "ActiveConnections" elif test == "accepts_err": ta = err tt = "AcceptErrors" elif test == "requests": ta = req tt = "Requests/Connection" elif test == "reading": ta = read tt = "Reading" elif test == "writing": ta = writ tt = "Writing" elif test == "waiting": ta = wait tt = "Waiting" # calculated checks elif test == "rps": ta = rps tt = "Req_per_second" elif test == "cps": ta = cps tt = "Conn_per_second" elif test == "dreq": ta = dreq tt = "Delta_Requests" elif test == "dcon": ta = dreq tt = "Delta_Conn" else: ta = ct tt = "CheckTime" print_debug("set test: %s" % tt) dt = "NginxStatus.%s" % tt # creating perfdata if output == 1: if test == 0: perfdata="active_conns=%s;" % ta else: perfdata = "%s=%s;" % (test, ta) else: perfdata = "ac=%s;acc=%s; han=%s; req=%s; err=%s; rpc=%s; rps=%s; cps=%s; dreq=%s; dcon=%s; read=%s; writ=%s; wait=%s; ct=%sms;" % (ac, acc, han, req, err, rpc, rps, cps, dreq, dcon, read, writ, wait, ct ) print_debug("perfdata: %s" % perfdata) if test == 0: print "%s OK [ %s ] | %s" % (dt, ta, perfdata) sys.exit(0) if ta >= c: print "%s CRITICAL: %s | %s" % (dt, ta, perfdata) sys.exit(2) elif ta >= w: print "%s WARNING: %s | %s" % (dt, ta, perfdata) sys.exit(1) else: print "%s OK [ %s ] | %s" % (dt, ta, perfdata)