host 的 check-host-alive 使用 check_interval 5,每五分鐘間隔檢查,當發生第一次不  ok 時,檢查間隔變為每10秒一次,檢查十次不ok,變為 HARD 狀態發送通知

check_interval                  5
max_check_attempts       10
notification_interval          25

nagios.log

2008-12-02.10:11:45 [1228183905] HOST ALERT: ssorc.tw;DOWN;SOFT;1;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:11:55 [1228183915] HOST ALERT: ssorc.tw;DOWN;SOFT;2;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:12:05 [1228183925] HOST ALERT: ssorc.tw;DOWN;SOFT;3;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:12:15 [1228183935] HOST ALERT: ssorc.tw;DOWN;SOFT;4;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:12:25 [1228183945] HOST ALERT: ssorc.tw;DOWN;SOFT;5;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:12:35 [1228183955] HOST ALERT: ssorc.tw;DOWN;SOFT;6;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:12:45 [1228183965] HOST ALERT: ssorc.tw;DOWN;SOFT;7;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:12:55 [1228183975] HOST ALERT: ssorc.tw;DOWN;SOFT;8;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:13:05 [1228183985] HOST ALERT: ssorc.tw;DOWN;SOFT;9;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:13:15 [1228183995] HOST ALERT: ssorc.tw;DOWN;HARD;10;CRITICAL – Plugin timed out after 10 seconds
2008-12-02.10:13:15 [1228183995] HOST NOTIFICATION: nagios-admin-email-cross;ssorc.tw;DOWN;host-notify-by-email;CRITICAL – Plugin timed out after 10 seconds

當 Host DOWN 狀態,有其它監控服務時,比方說是 HTTP,此時也是 Critical,它只有呈現紅色顯示,並不會發送通知,

2008-12-02.10:13:15 [1228183995] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;HARD;1;CRITICAL – Socket timeout after 10 seconds

HOst UP 狀態,僅發送 Host UP 通知

2008-12-02.10:32:25 [1228185145] HOST ALERT: ssorc.tw;UP;HARD;1;PING OK – Packet loss = 0%, RTA = 29.41 ms
2008-12-02.10:32:25 [1228185145] HOST NOTIFICATION: nagios-admin-email-cross;ssorc.tw;UP;host-notify-by-email;PING OK – Packet loss = 0%, RTA = 29.41 ms
2008-12-02.10:32:25 [1228185145] SERVICE ALERT: ssorc.tw;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK – 69255 bytes in 2.491 seconds

那 check-host-alive 與 check_ping,它們是一樣的東西,只是判斷的標準不一樣

在監控 service

max_check_attempts         6
normal_check_interval       5
retry_check_interval          1
notification_interval          25

每五分鐘檢查一次,當發生第一次不OK時,間隔改為每一分鐘,檢查六次都不ok時,發送第一次通知,此時隔間檢查改為每五分鐘檢查,一直經過25分鐘後仍不ok,發送第二次通知,

—————————-25 分————————————->
OK          不OK            不OK          不OK           不OK           不OK          Alert1                                                                                Alert2
5分              1分              1分            1分              1分            1分              5分            5分            5分              5分              5分
。————。————。————。————。————。————。————。————。————。————。————。————。
soft1          soft2          soft3           soft4         soft5     soft6/Hard      Hard          Hard           Hard          Hard           Hard

圖片

記錄

2008-12-02.11:33:35 [1228188815] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;SOFT;1;CRITICAL – Socket timeout after 10 seconds
2008-12-02.11:34:45 [1228188885] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;SOFT;2;CRITICAL – Socket timeout after 10 seconds
2008-12-02.11:35:55 [1228188955] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;SOFT;3;CRITICAL – Socket timeout after 10 seconds
2008-12-02.11:36:55 [1228189015] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;SOFT;4;CRITICAL – Socket timeout after 10 seconds
2008-12-02.11:38:05 [1228189085] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;SOFT;5;CRITICAL – Socket timeout after 10 seconds
2008-12-02.11:38:55 [1228189135] SERVICE ALERT: ssorc.tw;HTTP;CRITICAL;HARD;6;CRITICAL – Socket timeout after 10 seconds
2008-12-02.11:38:55 [1228189135] SERVICE NOTIFICATION: nagios-admin-email-cross;ssorc.tw;HTTP;CRITICAL;notify-by-email;CRITICAL – Socket timeout after 10 seconds
2008-12-02.12:04:05 [1228190645] SERVICE NOTIFICATION: nagios-admin-email-cross;ssorc.tw;HTTP;CRITICAL;notify-by-email;CRITICAL – Socket timeout after 10 seconds

2010-09-10 補充 host 的通知週期圖:  nagios.txt

nagios 3 的 host 參數有 retry_interval 可以設定 soft 態狀時間隔為多久 check 一次,而 nagios 2 看樣子只能 10 秒吧!

Related posts 相關文章
使用 Grafana 與 Prometheus 監控主機
More...
簡單容易自己架設的監控平台-Uptime Kuma
More...
監控系統 icinga (nagios 的分支) 安裝 icinga-web 時遇到 500 internal server error
More...
監控系統 icinga (nagios 的分支) Q&A 篇
More...

作者

留言

撰寫回覆或留言

發佈留言必須填寫的電子郵件地址不會公開。