最近网站偶尔出现无法访问的现象,当发现网站无法访问晒时,登录到服务器后,手动重启nginx进程。
所以决定导入monit,使用monit监控https端口是否能正常访问,当https端口无法访问时,自动重启nginx服务。
monit可以监控 Apache,MySQL 等服务,以尽量减少宕机时间,并在被监控进程处于异常时自动启动进程。
Linux系统提供的类似服务有whatdog和systemd,但和monit提供的功能功能不一样。
这里使用的系统是Rocky Linux 8.7。
NAME="Rocky Linux"
VERSION="8.7 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.7"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.7 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.7"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.7"
安装monit
# yum install monit -y
开机自动启动
使用systemd命令进行开机自动启动设定。
使用is-enabled选项确认当前设定,disabled为开机时不自动启动monit服务,enabled为开机时自动启动monit服务。
# systemctl is-enabled monit
disabled
使用enable选项修改为开机自动启动。
# systemctl enable monit
Created symlink /etc/systemd/system/multi-user.target.wants/monit.service → /usr/lib/systemd/system/monit.service.
再次确认monit服务的开机自动启动状态,在这里可以确认到已被设定为enabled。
# systemctl is-enabled monit
enabled
设定monit
monit的配置文件为 /etc/monitrc,把注释去掉后的内容如下。
# cat /etc/monitrc |grep -v '^#' | grep -v '^$'
set daemon 30 # check services at 30 seconds intervals
set log syslog
set httpd port 2812 and
use address localhost # only accept connection from localhost (drop if you use M/Monit)
allow localhost # allow localhost to connect to the server and
allow admin:monit # require user 'admin' with password 'monit'
#with ssl { # enable SSL/TLS and set path to server certificate
# pemfile: /etc/ssl/certs/monit.pem
#}
include /etc/monit.d/*
在这里可以确认到,monit读取/etc/monitrc配置文件之后,会读取/etc/monit.d/目录下的所有文件。因此在这里会在/etc/monit.d/目录下创建nginx.conf文件,并在该文件上进行关于ningx的设定。
设定后的/etc/monit.d/nginx.conf的文件内容如下。
# vi /etc/monit.d/nginx.conf
check host sys-blog.net with address sys-blog.net
start program = "/usr/bin/systemctl start nginx" with timeout 60 seconds
stop program = "/usr/bin/systemctl stop nginx"
if failed host sys-blog.net port 443 protocol https for 3 cycles then restart
修改配置文件之后使用 monit -t命令 确认语法是否正确。
# monit -t
Control file syntax OK
确认配置文件的语法正确之后,重启monit服务激活对nginx服务的监控。
# systemctl restart monit
激活之后,可以在nginx的access日志上确认到,monit在每隔30秒(根据/etc/monitrc上的set daemon 30设定)在访问网站进行健康检查。
# grep "Monit/5.30.0" /var/log/nginx/access.log
47.92.126.152 - - [29/Oct/2023:11:30:56 +0800] "GET / HTTP/1.1" 200 62272 "-" "Monit/5.30.0" "-"
47.92.126.152 - - [29/Oct/2023:11:31:26 +0800] "GET / HTTP/1.1" 200 62295 "-" "Monit/5.30.0" "-"
47.92.126.152 - - [29/Oct/2023:11:31:56 +0800] "GET / HTTP/1.1" 200 62272 "-" "Monit/5.30.0" "-"
手动停止nginx服务,确认monit是否会启动nginx服务。执行 systemctl stop nginx 命令停止nginx后,观察 /var/log/monit.log 的结果如下。
# tail -f /var/log/monit.log
~省略~
[2023-10-29T11:49:31+0800] warning : 'sys-blog.net' failed protocol test [HTTP] at [sys-blog.net]:443 [TCP/IP TLS] -- Connection refused
[2023-10-29T11:50:01+0800] warning : 'sys-blog.net' failed protocol test [HTTP] at [sys-blog.net]:443 [TCP/IP TLS] -- Connection refused
[2023-10-29T11:50:31+0800] error : 'sys-blog.net' failed protocol test [HTTP] at [sys-blog.net]:443 [TCP/IP TLS] -- Connection refused
[2023-10-29T11:50:31+0800] info : 'sys-blog.net' trying to restart
[2023-10-29T11:50:31+0800] info : 'sys-blog.net' stop: '/usr/bin/systemctl stop nginx'
[2023-10-29T11:50:31+0800] info : 'sys-blog.net' start: '/usr/bin/systemctl start nginx'
[2023-10-29T11:51:01+0800] info : 'sys-blog.net' connection succeeded to [sys-blog.net]:443 [TCP/IP TLS]
~省略~
在上面的monit日志当中可确认到,连续失败3次之后,执行stop及start命令重启nginx后,监控成功的内容。
常用命令
介绍2个monit常用命令。
使用 monit summary 命令查看概要。
# monit summary
Monit 5.30.0 uptime: 0m
┌─────────────────────────────────┬────────────────────────────┬───────────────┐
│ Service Name │ Status │ Type │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ moto001 │ OK │ System │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ mysql │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ sys-blog.net │ OK │ Remote Host │
└─────────────────────────────────┴────────────────────────────┴───────────────┘
使用 monit status 命令查看详细状态。
# monit status
Monit 5.30.0 uptime: 0m
Remote Host 'sys-blog.net'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
port response time 153.430 ms to sys-blog.net:443 type TCP/IP using TLS (certificate valid for 208 days) protocol HTTP
data collected Sun, 29 Oct 2023 12:13:53
Process 'mysql'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
pid 1057
parent pid 1
uid 27
effective uid 27
gid 27
uptime 1d 22h 45m
threads 33
children 0
cpu 0.1%
cpu total 0.1%
memory 8.5% [144.7 MB]
memory total 8.5% [144.7 MB]
security attribute -
filedescriptors 50 [0.2% of 32184 limit]
total filedescriptors 50
read bytes 903.2 B/s [89.9 MB total]
disk read bytes 0 B/s [51.3 MB total]
disk read operations 0.3 reads/s [25288 reads total]
write bytes 17.7 kB/s [1.5 GB total]
disk write bytes 32.8 kB/s [2.6 GB total]
disk write operations 3.8 writes/s [290820 writes total]
data collected Sun, 29 Oct 2023 12:13:53
System 'moto001'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
load average [0.00] [0.02] [0.00]
cpu 0.7%usr 0.3%sys 0.0%nice 0.0%iowait 0.1%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
memory usage 1.1 GB [68.3%]
swap usage 0 B [0.0%]
uptime 1d 22h 45m
boot time Fri, 27 Oct 2023 13:28:35
filedescriptors 1984 [1.2% of 169338 limit]
data collected Sun, 29 Oct 2023 12:13:53
小结
针对个人博客网站来说使用monit进行服务的监控&重启,导入门槛低且实用。但企业级的系统导入专业的监控软件(例如zabbix)的场景会更多。
关于monit的更多配置可查看官方文档。
https://mmonit.com/monit/documentation/monit.html