Nagios 数据采集器
nagios 简介
nagios是一款开源的免费网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设备,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。nagios只能安装在Linux或Unix类型机器上。nagios中文资料较少,若想要全面了解nagios请查阅官方文档https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/toc.html。nagios安装与配置请自行求助网络解决,欲快速搭建nagios环境,可以使用https://github.com/ethnchao/docker-nagios/ docker镜像,并且其提供中文帮助文档。
nagflux 简介
nagflux是由go语言开发的开源工具,用于从Nagios/Icinga(2)/Naemon导出数据到InfluxDB/Elasticsearch中。详细了解nagflux,请访问https://github.com/Griesbacher/nagflux。
nagios 使用
nagflux不停地从nagflux产生的spoolfile
和livestatus
中读取数据,转换相应的协议后发送到InfluxDB/Elasticsearch中。nagios安装目录通常位于/usr/local/nagios
,但不同的版本或者镜像的安装目录可能有所不同。安装目录中文件如下:etc为配置文件目录;bin为nagios相关二进制可执行文件目录;libexec为各种插件所在目录。
root@754b2f150617:/usr/local/nagios# ls -l
total 36
drwxrwxr-x 1 nagios nagcmd 4096 May 31 2018 bin
drwxrwxr-x 1 nagios nagcmd 4096 Mar 18 00:28 etc
drwxrwxr-x 1 nagios nagcmd 4096 Mar 13 11:38 libexec
drwxrwxr-x 1 nagios nagcmd 4096 May 31 2018 sbin
drwxrwxr-x 1 nagios nagcmd 4096 May 31 2018 share
drwxrwxr-x 1 nagios nagcmd 4096 Mar 18 11:22 var
spoolfile
文件分为:host与service两种类型,其所在目录位置在/usr/local/nagios/etc/nagios.cfg
通过如下配置
service_perfdata_file=/var/spool/nagios/graphios/service-perfdata
host_perfdata_file=/var/spool/nagios/graphios/host-perfdata
spoolfile
文件会被不停地覆盖掉,需要及时把它们拷贝出来供nagflux处理,拷贝spoolfile
文件需要定义如下command文件,$TIMET$
为nagios一系列内置宏之一,每次拷贝文件会在文件名加上诸如.1584502640
时间戳后缀。
root@754b2f150617:/usr/local/nagios/etc# cat mycommand.cfg
define command {
command_name move_perf_host
command_line /bin/mv /var/spool/nagios/graphios/host-perfdata /var/spool/nugflux/host-perfdata.$TIMET$
}
define command {
command_name move_perf_service
command_line /bin/mv /var/spool/nagios/graphios/service-perfdata /var/spool/nugflux/service-perfdata.$TIMET$
}
若需上述command文件生效,还需在nagios.cfg增加如下配置。
cfg_file=/usr/local/nagios/etc/mycommand.cfg
上述配置完成后,请重启nagios,并查看/var/spool/nugflux/
目录(由你配置决定,并且需确保目录对nagios程序有写的权限)是否不停地有文件产生。
nagflux 使用
把nagflux程序拷贝到nagios的server上,通过如下命令启动nagflux,若不指定-configPath
参数,则默认使用当前目录下config.gcfg
配置文件。
./nagflux -configPath=/path/to/config.gcfg
config.gcfg
去掉和我们dataway无关的参数后配置文件如下,重要配置加上了注释。
root@754b2f150617:/usr/local/nugflux# cat config.gcfg
[main]
NagiosSpoolfileFolder = "/var/spool/nugflux" # 拷贝的spoolfile目的目录
NagiosSpoolfileWorker = 1 # 读文件的Goroutine数目
InfluxWorker = 2 # 发往dataway的Goroutine数目
MaxInfluxWorker = 5 # 最多发往dataway的Goroutine数目
DumpFile = "nagflux.dump" # 数据发送错误,存储数据的文件
NagfluxSpoolfileFolder = "/usr/local/nugflux/spool" # 这个配置不清楚
FieldSeparator = "&" # CSV文件分隔符,不修改
BufferSize = 10000 # 每个InfluxWorker消息队列大小
FileBufferSize = 65536 # 读文件缓冲区大小
# If the performancedata does not have a certain target set with NAGFLUX:TARGET.
# The following field will define the target for this data.
# "all" sends the data to all Targets(every Influxdb, Elasticsearch...)
# a certain name will direct the data to this certain target
DefaultTarget = "all" # 不修改
[Log]
# leave empty for stdout
LogFile = "" # 保存log文件名,若为空,则log输出到终端
# List of Severities https://godoc.org/github.com/kdar/factorlog#Severity
MinSeverity = "DEBUG" # log打印级别
[Monitoring]
# leave empty to disable
# PrometheusAddress = ":8080"
PrometheusAddress = ":10000" # nagflux还额外实现了Prometheus Exporter,通过此地址可以查看nagflux内存占用,处理文件数等信息
[Livestatus]
# tcp or file
Type = "file" # 不修改,file即表示socket文件
# tcp: 127.0.0.1:6557 or file /var/run/live
Address = "/usr/local/nagios/var/livestatus" # 通过nagios.cfg配置文件中
# broker_module=/usr/local/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/livestatus
# The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
MinutesToWait = 2 # 检测不到Address存在,等待分钟数,不修改
# Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
# If left empty Nagflux will try to detect it on it's own, which will not always work.
Version = "" # 不修改
[ModGearman "example"] # ModGearman配置不修改
Enabled = false
Address = "127.0.0.1:4730"
Queue = "perfdata"
# Leave Secret and SecretFile empty to disable encryption
# If both are filled the the Secret will be used
# Secret to encrypt the gearman jobs
Secret = ""
# Path to a file which holds the secret to encrypt the gearman jobs
SecretFile = "/etc/mod-gearman/secret.key"
Worker = 1
[InfluxDBGlobal] # InfluxDBGlobal配置不修改
CreateDatabaseIfNotExists = true
NastyString = ""
NastyStringToReplace = ""
HostcheckAlias = "hostcheck"
ClientTimeout = 5
[InfluxDB "nagflux"]
Enabled = true # 是否开启
Version = 1.0 # 确保数字大于0.9,否则数据不上传,不修改
Address = "http://10.100.64.106:19528" # dataway地址
Arguments = "precision=ms&db=xx&rp=xx" # dataway的URL参数,根据需要增加
# precision:时间精度,db:dataway认证的Token值,rp:保存规则
StopPullingDataIfDown = true # dataway ping失败后,是否停止从spoolfile读取数据