最近网站偶尔出现无法访问的现象,当发现网站无法访问晒时,登录到服务器后,手动重启nginx进程。

所以决定导入monit,使用monit监控https端口是否能正常访问,当https端口无法访问时,自动重启nginx服务。

monit可以监控 Apache,MySQL 等服务,以尽量减少宕机时间,并在被监控进程处于异常时自动启动进程。
Linux系统提供的类似服务有whatdog和systemd,但和monit提供的功能功能不一样。

这里使用的系统是Rocky Linux 8.7。

NAME="Rocky Linux"
VERSION="8.7 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.7"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.7 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.7"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.7"

安装monit

# yum install monit -y

开机自动启动

使用systemd命令进行开机自动启动设定。

使用is-enabled选项确认当前设定,disabled为开机时不自动启动monit服务,enabled为开机时自动启动monit服务。

# systemctl is-enabled monit
disabled

使用enable选项修改为开机自动启动。

# systemctl enable monit
Created symlink /etc/systemd/system/multi-user.target.wants/monit.service → /usr/lib/systemd/system/monit.service.

再次确认monit服务的开机自动启动状态,在这里可以确认到已被设定为enabled。

# systemctl is-enabled monit
enabled

设定monit

monit的配置文件为 /etc/monitrc,把注释去掉后的内容如下。

# cat /etc/monitrc |grep -v '^#' | grep -v '^$'
set daemon  30              # check services at 30 seconds intervals
set log syslog
set httpd port 2812 and
    use address localhost  # only accept connection from localhost (drop if you use M/Monit)
    allow localhost        # allow localhost to connect to the server and
    allow admin:monit      # require user 'admin' with password 'monit'
    #with ssl {            # enable SSL/TLS and set path to server certificate
    #    pemfile: /etc/ssl/certs/monit.pem
    #}
include /etc/monit.d/*

在这里可以确认到,monit读取/etc/monitrc配置文件之后,会读取/etc/monit.d/目录下的所有文件。因此在这里会在/etc/monit.d/目录下创建nginx.conf文件,并在该文件上进行关于ningx的设定。

设定后的/etc/monit.d/nginx.conf的文件内容如下。

# vi /etc/monit.d/nginx.conf
check host sys-blog.net with address sys-blog.net
    start program = "/usr/bin/systemctl start nginx" with timeout 60 seconds
    stop program  = "/usr/bin/systemctl stop nginx"
    if failed host sys-blog.net port 443 protocol https for 3 cycles then restart

修改配置文件之后使用 monit -t命令 确认语法是否正确。

# monit -t
Control file syntax OK

确认配置文件的语法正确之后,重启monit服务激活对nginx服务的监控。

# systemctl restart monit

激活之后,可以在nginx的access日志上确认到,monit在每隔30秒(根据/etc/monitrc上的set daemon 30设定)在访问网站进行健康检查。

# grep "Monit/5.30.0" /var/log/nginx/access.log
47.92.126.152 - - [29/Oct/2023:11:30:56 +0800] "GET / HTTP/1.1" 200 62272 "-" "Monit/5.30.0" "-"
47.92.126.152 - - [29/Oct/2023:11:31:26 +0800] "GET / HTTP/1.1" 200 62295 "-" "Monit/5.30.0" "-"
47.92.126.152 - - [29/Oct/2023:11:31:56 +0800] "GET / HTTP/1.1" 200 62272 "-" "Monit/5.30.0" "-"

手动停止nginx服务,确认monit是否会启动nginx服务。执行 systemctl stop nginx 命令停止nginx后,观察 /var/log/monit.log 的结果如下。

# tail -f /var/log/monit.log
~省略~
[2023-10-29T11:49:31+0800] warning  : 'sys-blog.net' failed protocol test [HTTP] at [sys-blog.net]:443 [TCP/IP TLS] -- Connection refused
[2023-10-29T11:50:01+0800] warning  : 'sys-blog.net' failed protocol test [HTTP] at [sys-blog.net]:443 [TCP/IP TLS] -- Connection refused
[2023-10-29T11:50:31+0800] error    : 'sys-blog.net' failed protocol test [HTTP] at [sys-blog.net]:443 [TCP/IP TLS] -- Connection refused
[2023-10-29T11:50:31+0800] info     : 'sys-blog.net' trying to restart
[2023-10-29T11:50:31+0800] info     : 'sys-blog.net' stop: '/usr/bin/systemctl stop nginx'
[2023-10-29T11:50:31+0800] info     : 'sys-blog.net' start: '/usr/bin/systemctl start nginx'
[2023-10-29T11:51:01+0800] info     : 'sys-blog.net' connection succeeded to [sys-blog.net]:443 [TCP/IP TLS]
~省略~

在上面的monit日志当中可确认到,连续失败3次之后,执行stop及start命令重启nginx后,监控成功的内容。

常用命令

介绍2个monit常用命令。

使用 monit summary 命令查看概要。

# monit summary
Monit 5.30.0 uptime: 0m
┌─────────────────────────────────┬────────────────────────────┬───────────────┐
│ Service Name                    │ Status                     │ Type          │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ moto001                         │ OK                         │ System        │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ mysql                           │ OK                         │ Process       │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ sys-blog.net                    │ OK                         │ Remote Host   │
└─────────────────────────────────┴────────────────────────────┴───────────────┘

使用 monit status 命令查看详细状态。

# monit status
Monit 5.30.0 uptime: 0m

Remote Host 'sys-blog.net'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  port response time           153.430 ms to sys-blog.net:443 type TCP/IP using TLS (certificate valid for 208 days) protocol HTTP
  data collected               Sun, 29 Oct 2023 12:13:53

Process 'mysql'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  pid                          1057
  parent pid                   1
  uid                          27
  effective uid                27
  gid                          27
  uptime                       1d 22h 45m
  threads                      33
  children                     0
  cpu                          0.1%
  cpu total                    0.1%
  memory                       8.5% [144.7 MB]
  memory total                 8.5% [144.7 MB]
  security attribute           -
  filedescriptors              50 [0.2% of 32184 limit]
  total filedescriptors        50
  read bytes                   903.2 B/s [89.9 MB total]
  disk read bytes              0 B/s [51.3 MB total]
  disk read operations         0.3 reads/s [25288 reads total]
  write bytes                  17.7 kB/s [1.5 GB total]
  disk write bytes             32.8 kB/s [2.6 GB total]
  disk write operations        3.8 writes/s [290820 writes total]
  data collected               Sun, 29 Oct 2023 12:13:53

System 'moto001'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  load average                 [0.00] [0.02] [0.00]
  cpu                          0.7%usr 0.3%sys 0.0%nice 0.0%iowait 0.1%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
  memory usage                 1.1 GB [68.3%]
  swap usage                   0 B [0.0%]
  uptime                       1d 22h 45m
  boot time                    Fri, 27 Oct 2023 13:28:35
  filedescriptors              1984 [1.2% of 169338 limit]
  data collected               Sun, 29 Oct 2023 12:13:53

小结

针对个人博客网站来说使用monit进行服务的监控&重启,导入门槛低且实用。但企业级的系统导入专业的监控软件(例如zabbix)的场景会更多。

关于monit的更多配置可查看官方文档。

https://mmonit.com/monit/documentation/monit.html