[Zabbix] Trải ngiệm dựng và tối ưu zabbix 5.4 với database postgres

Trước khi đi vào phần chính, xin phép chia sẻ quá trình triển khai từ đầu con zabbix này. Hiện con cũ đang xài 5.0, được upgrade nhiều lần từ bản 3.2 lên. Tuy nhiên do nhiều vấn đề xảy ra nên team quyết lên 5.4 thì đập đi xây lại, cuối cùng dừng chân với postgresql
Việc thay đổi database backend cũng chỉ là thử nghiệm chút thôi, trước đây cũng chưa từng dùng postgres hồi nào. Nhưng quá khứ đã từng sử dụng:
+ Mysql inodb: đúng theo sách giáo khoa, tối ưu các thứ như tách mỗi bảng thành 1 file, mọi thứ vẫn ổn cho tới khi dữ liệu lên tầm 100GB thì bắt đầu có những case lỗi xảy ra, đôi khi mysql hoạt động full CPU, dù đã tối ưu rất nhiều nhưng slow query vẫn thường xuất hiện. Mysql thì vốn đã phân mảnh, do có 2 nhánh phát triển do oracle cầm đầu và nhánh mariadb cầm đầu, vì thế các con số để tunning tương ứng mất rất nhiều công sức tìm hiểu của anh em, mà tốn công với DB thế thì AE bỏ nghề làm DBA lương cao hơn devops vs SA nhiều 😀 Trong team bắt đầu manh nha ý nghĩ đổi sang database backend khác.
+ Oracle database: team có 2 ông DBA oracle nên nghĩ rằng đổi sang orac thì sẽ ngon hơn. Tuy nhiên khi triển khai đã gặp các vấn đề:
– Zabbix không có bản build sẵn chạy với oracle: OK cái này k sao, down source về rồi tự build là xong, cái này dễ.
– Các table có dung lượng lớn của zabbix: history*, trend* tầm 7 bảng chứa dữ liệu chính và rất nặng. Với mysql bên mình thấy zabbix select where clock xxx nên trường clock của mỗi bảng được đánh partition theo ngày, cứ mỗi ngày bên mình đánh thành 1 partition khác nhau ==> import data từ source zabbix vào sau đó drop hết các bảng mặc định đi rồi sửa lại là xong. cái này vẫn dễ.
– Ban đầu thiết kế DB zabbix sẽ nằm chung phần cứng với cụm database RAC 5 node hiện có, tuy nhiên phát hiện ra ông zabbix xài charset khác với mặc định của đám oracle, cái này thường các ông DBA, hay dev cũng chả sửa

    Incorrect parameter "NLS_NCHAR_CHARACTERSET" value: "AL16UTF16" instead "AL32UTF8, UTF8".

Vậy là dựng riêg con oracle khác. Sau 1 tuần vận hành, add thêm 50 con máy vào thì thường xuyên queue > 10m. không có slow query tuy nhiên thỉnh thoảng zabbix server báo lỗi ko select được. do kiểu dữ liệu trả về k khớp. ví dụ bảng alert trường message kiểu dữ liệu là nvarchar, nhưng select zabbix server báo lỗi 😐 phải drop cả bảng đi để sửa lại cái trường đó sang varchar.
– perfomance Chậm hơn mysql rất nhiều. Mở web lên vào zabbix, bấm sang mục host phải 5s mới có kết quả, show debug lên thấy mỗi click chuột, zabbix chạy có lúc lên tới 6k câu SQL vào database. con số này còn tăng lên nữa theo số lượng host mới đc add vào.

==> mệt mỏi quá, anh em quyết quay về zabbix native, lựa chọn ở đây mà mysql hay postgres, cả 2 thằng đều đc zabbix hỗ trợ native. Tuy nhiên do đã chịu đựng quá đủ với mysql nên đổi qua postgres.
Môi trường: Ubuntu 20
DB: postgres 12

Quá trình cài đặt thì thực hiện theo trang chủ zabbix, tới phần import DB vào postgres thì đánh partition cho trường clock của bảng:

'TRENDS', 'TRENDS_UINT', 'HISTORY', 'HISTORY_LOG', 'HISTORY_STR', 'HISTORY_TEXT', 'HISTORY_UINT','ALERTS'

Đây là script cấu trúc bảng history, các bảng khác tươg tự

-- Table: public.history

-- DROP TABLE public.history;

CREATE TABLE IF NOT EXISTS public.history
(
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    value double precision NOT NULL DEFAULT '0'::double precision,
    ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);

ALTER TABLE public.history
    OWNER to zabbix;
-- Index: history_1

-- DROP INDEX public.history_1;

CREATE INDEX history_1
    ON public.history USING btree
    (itemid ASC NULLS LAST, clock ASC NULLS LAST)
;

-- Partitions SQL

CREATE TABLE IF NOT EXISTS public.history_p_2021_06_01 PARTITION OF public.history
    FOR VALUES FROM (0) TO (1622566800);

ALTER TABLE public.history_p_2021_06_01
    OWNER to zabbix;
CREATE TABLE IF NOT EXISTS public.history_p_2021_06_01 PARTITION OF public.history
    FOR VALUES FROM (0) TO (1622566800);

...

ALTER TABLE public.history_p_p_2021_07_31
    OWNER to zabbix;

Như các bạn thấy, mỗi ngày 1 partition theo range của clock. (clock là epoch time)
Tiếp theo, tunning các con số cấu hình cho postgres, các bạn có thể lên trang này để generate các con số config theo điều kiện của mình:
https://pgtune.leopard.in.ua/#/

Sau khi đánh partition xong, cho zabbix đẩy data vào, select thử history theo clock

explain (format json) select from history where clock >= 1623762000
and clock <= 1623804210

Wtf, câu select trên vẫn fullscan table, nó đi tìm data lần lượt trên tất cả partition hiện có rồi mới trả về 😐
Qua tìm hiểu thì hoá ra thiếu config trong /etc/postgresql/12/main/postgresql.conf

constraint_exclusion = partition        # on, off, or partition

Lọ mọ cấu hình vào restart, select thử lại theo câu lệnh trên xem nó scan những partition nào

explain (format json) select from history where clock >= 1623762000
and clock <= 1623804210

Như vậy là đã OK, chỉ scan trong 2 partition liên quan theo range của clock.
Tạm thời yên tâm về mặt truy vấn.
tiếp theo là đánh partition tự động, do nếu insert vào 1 row có clock nằm ngoài range của partition thì sẽ bị lỗi, do DB ko biết phải cho nó vào đâu. Viết 1 script bằng python đánh tự động partition cho các bảng này. Nguyên tắc tháng T thì đánh partition cho T+1
Script, các bạn thay IP, user, password vào để sử dụng. viết bằng python3, cần cài thêm psycopg2 để kết nối postgres yum install -y python3-psycopg2.x86_64
Cho chạy cái này vào đầu tháng

import datetime
import time
from calendar import monthrange

import psycopg2



import logging
logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level=logging.INFO,
    datefmt='%Y-%m-%d %H:%M:%S')
# dsnStr="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.16.28.70)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=PDBZBX)))"
# useridOra = 'ZABBIX'
# passwdOra = 'ZBXfintech2021'
# logging.info('Khoi tao ket noi toi database: {}'.format(dsnStr))
conn = psycopg2.connect(
    host="postgres-IP",
    database="zabbix",
    user="zabbix",
    password="password")
curs = conn.cursor()
logging.info('Khoi tao ket noi toi database thanh cong')

LISTABLES = ['TRENDS', 'TRENDS_UINT', 'HISTORY', 'HISTORY_LOG', 'HISTORY_STR', 'HISTORY_TEXT', 'HISTORY_UINT','ALERTS']

def gen_sql(table):
    logging.info('generate sql cho table: {}'.format(table))
    dateformat = 'P_%Y_%m_%d'
    TODAY = datetime.date.today()
    THIS_MONTH = TODAY.month
    NEXT_MONTH = THIS_MONTH+1 if THIS_MONTH < 12 else 1
    BASEDAY = datetime.date.today().replace(month=NEXT_MONTH)
    DAYS_OF_MONTH = monthrange(BASEDAY.year,BASEDAY.month)[1]
    # sql = """ALTER TABLE {} MODIFY
    # PARTITION BY RANGE (CLOCK) (""".format(table)
    bodysql = []
    for i in range(0,DAYS_OF_MONTH,1):
        basedate = datetime.date(year=BASEDAY.year,month=BASEDAY.month,day=1+i)
        basedate_for_epoch = basedate + datetime.timedelta(1)

        partitionname = datetime.datetime.strftime(basedate,dateformat)

        epoch_date = datetime.datetime.strftime(basedate_for_epoch,dateformat)
        epochtime = int(time.mktime(time.strptime(epoch_date, dateformat)))

        epoch_date_from = datetime.datetime.strftime(basedate,dateformat)
        epochtime_from = int(time.mktime(time.strptime(epoch_date_from, dateformat)))
        # bodysql.append("PARTITION {} VALUES LESS THAN ({})".format(partitionname,epochtime))
        # sql = "ALTER TABLE {} ADD PARTITION {} VALUES LESS THAN ({})".format(table,partitionname,epochtime)
        sql = "CREATE TABLE public.{0}_P_{1} PARTITION OF public.{0} FOR VALUES FROM ({3}) TO ({2})".format(table,partitionname,epochtime,epochtime_from)
        logging.info(sql)
        curs.execute(sql)
        conn.commit()
        logging.info('SQL executed')
    # sql+=",".join(bodysql)
    # sql+=""") ONLINE"""
    
    # logging.info('SQL : {}'.format(sql))
    # return sql


for table in LISTABLES:
    gen_sql(table)
    # sql_string = gen_sql(table)
    # logging.info('execute sql on {}'.format(table))
    
    # logging.info('executed')

Tránh zabbix tự quy đổi giá trị số 1000 -> 1k khi hiển thị trên graph

Khi hiển thị biểu đồ trên zabbix, các giá trị >1000 sẽ tự quy đổi sang 1K.,2K ví dụ dung lượng ổ cứng, giá trị item là byte nhưng graph sẽ tự hiển thị là kb, MB, GB cho chúng ta. Đây là một tính năng của zabbix, tuy nhiên không phải đơn vị nào cũng cần quy đổi kiểu này, ví dụ số lương connection nếu hiển thị 1k connection, hay 9.9k connection thì k đẹp lắm, hoặc 1 số bạn đếm số lượng file, message queue.

Các xử lý như sau:
– ví dụ đối với connection, đổi unit của connection sang —> connections
– vào thư mục source code của zabbix front end: edit file sau

/include/func.inc.php

Tìm tới dòng

$blacklist = ['%', 'ms', 'rpm', 'RPM']

Thêm unit connection ở trên và lưu lại

$blacklist = ['%', 'ms', 'rpm', 'RPM', 'connections'];

Done

Extent/Expand/Grow disk space đối với máy chủ ảo chạy linux

Máy chủ có 500GB ổ cứng, theo nhu cầu cần mở rộng lên 1TB
Máy chủ chạy centos 7 trên VMware
1, Lên VMWare, edit disk của máy chủ lên TB
2, Truy cập máy chủ centos:
– Scan lại disk

[root@mm-adap1 ~]# ls /sys/class/scsi_device/
0:0:0:0  3:0:0:0
[root@mm-adap1 ~]# echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/rescan 
[root@mm-adap1 ~]# echo 1 > /sys/class/scsi_device/3\:0\:0\:0/device/rescan

[root@mm-adap1 ~]# fdisk -l

Disk /dev/sda: 500 GB, 1099511627776 bytes, 2147483648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000add72

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200   629145599   313523200   8e  Linux LVM
/dev/sda3       629145600  1048575999   209715200   8e  Linux LVM

Disk /dev/mapper/centos_mm--adap1-root: 500 GB, 1098425303040 bytes, 2145361920 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Đã thấy dung lượng tăng lên, ổ đĩa tác động ở đây là /dev/sda, tiến hành fdisk

[root@mm-adap1 ~]# fdisk /dev/sda
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 4): 4
First sector (2048-20971519, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-20971519, default 20971519):
Using default value 20971519
Partition 1 of type Linux and of size 10 GiB is set

Command (m for help): t
Selected partition 4
Hex code (type L to list all codes): 8e
Changed type of partition 'Linux' to 'Linux LVM'

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Khởi tạo physical volume

partprobe
[root@mm-adap1 ~]# pvcreate /dev/sda4
  Physical volume "/dev/sda4" successfully created.

Lấy thông tin volume group hiện tại

[root@mm-adap1 ~]# vgdisplay 
  --- Volume group ---
  VG Name               centos_mm-adap1
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               498.99 GiB
  PE Size               4.00 MiB
  Total PE              127742
  Alloc PE / Size       127742 / 498.99 GiB
  Free  PE / Size       0 / 0   
  VG UUID               eyBefC-npQ3-Kl7Q-lHeC-j08s-OLCG-zl33QK

Ta được name VG Name = centos_mm-adap1, tiến hành nhét sda4 vừa tạo vào volume group

[root@mm-adap1 ~]# vgextend centos_mm-adap1 /dev/sda4
  Volume group "centos_mm-adap1" successfully extended

Làm tương tự, mở rộng tiếp logical volume

[root@mm-adap1 ~]# lvdisplay
  --- Logical volume ---
  LV Path                /dev/centos_mm-adap1/root
  LV Name                root
  VG Name                centos_mm-adap1
  LV UUID                dzkVGt-aTiT-bWyI-kYDm-cUES-7WEm-tAtFva
  LV Write Access        read/write
  LV Creation host, time mm-adap1, 2020-10-18 16:46:50 +0700
  LV Status              available
  # open                 1
  LV Size                498.99 GiB
  Current LE             127742
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:0
   
[root@mm-adap1 ~]# lvextend /dev/centos_mm-adap1/root /dev/sda4
  Size of logical volume centos_mm-adap1/root changed from 498.99 GiB (127742 extents) to <1022.99 GiB (261885 extents).
  Logical volume centos_mm-adap1/root successfully resized.

Kiểm tra mount

[root@mm-adap1 ~]# mount | grep -i mapper
/dev/mapper/centos_mm--adap1-root on / type ext4 (rw,relatime,data=ordered)

Thấy LVM /dev/mapper/centos_mm–adap1-root
tiến hành resize, ở đây phân vùng xài ext4 nên sử dụng resize2fs

[root@mm-adap1 ~]# resize2fs /dev/mapper/centos_mm--adap1-root
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/mapper/centos_mm--adap1-root is mounted on /; on-line resizing required
old_desc_blocks = 63, new_desc_blocks = 128
The filesystem on /dev/mapper/centos_mm--adap1-root is now 268170240 blocks long.

Kiểm tra lại thấy / đã có 1TB

[root@mm-adap1 ~]# df -h
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/centos_mm--adap1-root 1007G  107G  857G  12% /
devtmpfs                            16G     0   16G   0% /dev
tmpfs                               16G     0   16G   0% /dev/shm
tmpfs                               16G  169M   16G   2% /run
tmpfs                               16G     0   16G   0% /sys/fs/cgroup
/dev/sda1                          976M  100M  810M  11% /boot
tmpfs                              3.2G     0  3.2G   0% /run/user/0

Triển khai hệ thống lưu trữ đơn giản với DRDB

Mục tiêu, cần 1 server NFS để lưu trữ dữ liệu (chủ yếu là dữ liệu ảnh từ web upload lên)
Công nghệ lựa chọn là DRBD 2 node, export ra ngoài cho user bằng NFS, Failover giữa primary node và secondary node bằng keepalived

Bước 1: Cấu hình DRBD
– Cài đặt 2 máy Ubuntu 20.04:

10.144.136.41 mefin-ntl-drbd-01
10.144.136.42 mefin-ntl-drbd-02

Mỗi máy 2 ổ cứng:
- 1 ổ /dev/sda 40GB cài OS
- 1 ổ /dev/sdb 500GB để làm file server. Ổ này add vào thôi, sẽ format các thứ ở bước sau

– Thiết lập hostname cho 2 máy:

# cat /etc/hosts
127.0.0.1 localhost
10.144.136.41 mefin-ntl-drbd-01
10.144.136.42 mefin-ntl-drbd-02
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

– Format ổ /dev/sdb, làm trên cả 2 máy

root@mefin-ntl-nfs01:~# fdisk /dev/sdb
Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-1048575999, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-1048575999, default 1048575999): 

Created a new partition 1 of type 'Linux' and of size 500 GiB.

Command (m for help): t
Selected partition 1
Hex code (type L to list all codes): 8e
Changed type of partition 'Linux' to 'Linux LVM'.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.

– Tạo LVM, trên cả 2 máy

pvcreate /dev/sdb1
vgcreate storevolume /dev/sdb1
vgdisplay
lvcreate -n storevolumelogic -l 100%FREE storevolume

– Cài đặt DRBD, thao tác trên cả 2 máy

apt-get install -y drbd-utils
rm -rf /etc/drbd.d/*

– Tạo resource cho DRBD, thao tác trên cả 2 máy, tạo file cấu hình

cat >/etc/drbd.d/global_common.conf <<EOL
resource nfs-fintech {
        protocol C;
#        handlers {
#                pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
#                pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
#                local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
#                outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";      
#        }
#        startup {
#                degr-wfc-timeout 2;
#                become-primary-on mefin-ntl-nfs01;
#        }
disk {
        on-io-error             detach;
        no-disk-flushes ;
        no-disk-barrier;
        c-plan-ahead 0;
        c-fill-target 5M;
        c-min-rate 2400M;
        c-max-rate 3600M;
} 
net {
        # max-epoch-size          20000;
        max-buffers             36k;
        sndbuf-size            9072k ;
        rcvbuf-size            9072k;
}

        syncer {
                rate 4096M;
                verify-alg sha1;
                al-extents 257;
                c-fill-target 24M;
                c-min-rate 600M;
                c-max-rate 720M;
        }

        on mefin-ntl-drbd-01 {
                device  /dev/drbd0;
                disk    /dev/mapper/storevolume-storevolumelogic;
                address 10.144.136.41:7788;
                meta-disk internal;
        }

        on mefin-ntl-drbd-02 {
                device  /dev/drbd0;
                disk    /dev/mapper/storevolume-storevolumelogic;
                address 10.144.136.42:7788;
                meta-disk internal;
        }
}
EOL

– Khởi tạo metadata cho DRBD theo file cấu hình, thao tác trên cả 2 máy

#drbdadm create-md nfs-fintech
WARN:
  You are using the 'drbd-peer-outdater' as fence-peer program.
  If you use that mechanism the dopd heartbeat plugin program needs
  to be able to call drbdsetup and drbdmeta with root privileges.

  You need to fix this with these commands:
  dpkg-statoverride --add --update root haclient 4750 /lib/drbd/drbdsetup-84

initializing activity log
initializing bitmap (16000 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.

– Bật DRBD trên cả 2 máy

systemctl start drbd

– Thao tác trên node 1, tiến hành chuyển node1 làm node primary

drbdadm primary nfs-fintech --force

Sau bước này 2 máy sẽ tiến hành đồng bộ ban đầu với nhau, tiến hành kiểm tra trạng thái của cluster và quá trình đồng bộ như sau:

root@mefin-ntl-nfs01:~# cat /proc/drbd 	
version: 8.4.11 (api:1/proto:86-101)	
srcversion: FC3433D849E3B88C1E7B55C 	
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----	
    ns:92560384 nr:0 dw:0 dr:92562504 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:431707484	
	[==>.................] sync'ed: 17.7% (421588/511980)M
	finish: 2:54:07 speed: 41,316 (38,940) K/sec

Đợi cho quá trình đồng bộ hoàn tất, trạng thái cả 2 node là UpToDate/UpToDate

root@mefin-ntl-drbd-02:~# cat /proc/drbd 
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C 
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:277046300 nr:7370548 dw:284415448 dr:131389 al:72035 bm:0 lo:4 pe:4 ua:0 ap:4 ep:1 wo:d oos:0
root@mefin-ntl-drbd-02:~#

Tiến hành format ổ drbd và mount vào máy chủ, việc này thực hiện trên node primary

mkfs.ext4 /dev/drbd0
mount /dev/drbd0 /srv

Mount thành công là OK, xong phần dựng DRBD, tiền hành umount trước khi thực hiện bước tiếp theo.

Tiến hành cài đặt nfs server và keepalived cho cả 2 máy

apt install -y nfs-server keepalived
systemctl enable keepalived

Cấu hình cho nfs mount vào /srv
Tuy nhiên không để cho nfs-server khởi động cùng máy chủ. Để keepalived làm việc đó.
File export của NFS trên cả 2 máy

root@mefin-ntl-drbd-01:~# cat /etc/exports 
# /etc/exports: the access control list for filesystems which may be exported
#		to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)
/srv *(rw,sync)

Cấu hình cho keepalived check node primary và tự mount , tự bật NFS
Thực hiện trên node primary

cat >/etc/keepalived/keepalived.conf <<EOL
global_defs {
  # Keepalived process identifier
  router_id nfsserver
  enable_script_security
  script_user root
}

# Script to check whether Nginx is running or not
vrrp_script check_nfs {
  script "bash /etc/keepalived/trackdrbd.sh"
  interval 2
  weight 50
}

# Virtual interface - The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER
  interface ens160
  virtual_router_id 152
  priority 110

  virtual_ipaddress {
        10.144.136.40/26
  }
  track_script {
        check_nfs
  }
    notify_master /etc/keepalived/notify_master.sh
    notify_backup /etc/keepalived/notify_backup.sh
    notify_stop /etc/keepalived/notify_stop.sh
  authentication {
        auth_type PASS
        auth_pass secret
  }
}
EOL

File cấu hình keepalive cho Secondary

global_defs {
  # Keepalived process identifier
  router_id nfsserver
  enable_script_security
  script_user root
}

# Script to check whether Nginx is running or not
vrrp_script check_nfs {
  script "bash /etc/keepalived/trackdrbd.sh"
  interval 2
  weight 50
}

# Virtual interface - The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state BACKUP
  interface ens160
  virtual_router_id 152
  priority 100

  virtual_ipaddress {
        10.144.136.40/26
  }
  track_script {
        check_nfs
  }
    notify_master /etc/keepalived/notify_master.sh
    notify_backup /etc/keepalived/notify_backup.sh
    notify_stop /etc/keepalived/notify_stop.sh
  authentication {
        auth_type PASS
        auth_pass secret
  }
}

File script cho keepalived

root@mefin-ntl-drbd-01:~# cat /etc/keepalived/trackdrbd.sh 
#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
MOUNTPONT=/srv
VOLUMENAME=nfs-fintech
if [ -e /proc/drbd ]
then
	# check_drbd=$(drbdadm status)
	CURRENT_ROLE=$(cat /proc/drbd | grep -oE "ro:[A-Za-z]+/[A-Za-z]+")
	if [ $CURRENT_ROLE == "ro:Secondary/Secondary" ]
	then
		#Promote to Primary
		drbdadm primary $VOLUMENAME
		mount /dev/drbd0 $MOUNTPONT && systemctl start nfs-server
		# if grep -qs "$MOUNTPONT " /proc/mounts
		# then
			# echo "MOUNTPONT exist"
		# else
			# echo "MOUNTPONT not exist"
		# fi
	elif [ $CURRENT_ROLE == "ro:Secondary/Primary" ]
	then
		echo "second node, do nothing"
		exit 97
	elif [[ "$CURRENT_ROLE" == *"ro:Primary/"* ]]
	then
		if grep -qs "$MOUNTPONT " /proc/mounts
		then
			echo "MOUNTPONT exist"
			cat /srv/system/flag_nodelete
			exit 0
		else
			echo "MOUNTPONT not exist, doing now"
			mount /dev/drbd0 $MOUNTPONT && systemctl start nfs-server && echo "Mount OK" || echo "Mount Fail" && exit 97
		fi
	elif [ $CURRENT_ROLE == "ro:Secondary/Primary" ]
	then
		echo "second node, do nothing"
		exit 97
	elif [ $CURRENT_ROLE == "ro:Secondary/Unknown" ]
	then
		echo "Primary not found, promote to pri and mount system"
		drbdadm primary $VOLUMENAME
		mount /dev/drbd0 $MOUNTPONT
		systemctl start nfs-server
# xu ly cac truong hop ngoai le khac
	else
		echo $CURRENT_ROLE
		exit 98
	fi
else
	echo "DRBD not running"
	exit 99
fi

Các file notify còn lại là file rỗng, không có giá trị sử dụng. các bạn có thể comment cấu hình trong keepalived lại

Xử lý lỗi trên ubuntu dh key too small

Khi gọi http request trên ubuntu, ở đây là ubuntu 20 thì hay gặp lỗi
curl: (35) error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small
(các ngôn ngữ khác như java, python báo lỗi tương tự)
Cách fix:
Sửa file /etc/ssl/openssl.cnf
Thêm vào đầu file

openssl_conf = default_conf

Thêm vào cuối file

[default_conf]
ssl_conf = ssl_sect

[ssl_sect]
system_default = system_default_sect

[system_default_sect]
CipherString = DEFAULT:@SECLEVEL=1

Thế là xong

[Ubuntu 20.04] Xử lý /var/log/journal/ quá lớn

Một ngày đẹp trời rà soát máy chủ, hệ điều hành ubuntu 20 và thấy thư mục /var/log/journal/ to tổ chảng

root@mefin-ntl-rke-03:/var/log/journal/c27b494019ba448580a9cc1dce75e9c6# ll -h
total 4.1G
drwxr-sr-x+ 2 root systemd-journal 4.0K Apr 19 09:04  ./
drwxr-sr-x+ 3 root systemd-journal 4.0K Dec 25 03:57  ../
-rw-r-----+ 1 root systemd-journal 128M Jan 17 18:59 'system@16ca61f2092a4b948aef42e48d42a204-000000000016a25b-0005b8db6105890b.journal'
-rw-r-----+ 1 root systemd-journal 128M Jan 20 19:41 'system@16ca61f2092a4b948aef42e48d42a204-000000000019cff5-0005b91758244858.journal'
-rw-r-----+ 1 root systemd-journal  96M Jan 22 10:13 'system@16ca61f2092a4b948aef42e48d42a204-00000000001d04e2-0005b954467d0756.journal'
-rw-r-----+ 1 root systemd-journal 128M Jan 25 10:14 'system@16ca61f2092a4b948aef42e48d42a204-00000000001eb945-0005b97492f226fb.journal'
-rw-r-----+ 1 root systemd-journal 128M Jan 28 09:13 'system@16ca61f2092a4b948aef42e48d42a204-000000000021e99d-0005b9b0ef1f24bf.journal'
-rw-r-----+ 1 root systemd-journal 128M Jan 31 09:43 'system@16ca61f2092a4b948aef42e48d42a204-00000000002512a1-0005b9ec6d5a71fd.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb  3 08:59 'system@16ca61f2092a4b948aef42e48d42a204-0000000000284669-0005ba2935649832.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb  6 09:26 'system@16ca61f2092a4b948aef42e48d42a204-00000000002b714f-0005ba64eef8ba89.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb  9 10:04 'system@16ca61f2092a4b948aef42e48d42a204-00000000002ea4a7-0005baa1a94c7b25.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb 12 10:41 'system@16ca61f2092a4b948aef42e48d42a204-000000000031d940-0005bade8af14a24.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb 15 11:28 'system@16ca61f2092a4b948aef42e48d42a204-0000000000350db4-0005bb1b67cc8363.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb 18 12:12 'system@16ca61f2092a4b948aef42e48d42a204-0000000000384357-0005bb586b233c8e.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb 21 12:51 'system@16ca61f2092a4b948aef42e48d42a204-00000000003b7884-0005bb9561602aa0.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb 24 13:24 'system@16ca61f2092a4b948aef42e48d42a204-00000000003ead3f-0005bbd2472c1033.journal'
-rw-r-----+ 1 root systemd-journal 128M Feb 27 14:00 'system@16ca61f2092a4b948aef42e48d42a204-000000000041e12c-0005bc0f151bf192.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar  2 14:22 'system@16ca61f2092a4b948aef42e48d42a204-00000000004516a6-0005bc4befd2240e.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar  5 15:01 'system@16ca61f2092a4b948aef42e48d42a204-0000000000484968-0005bc8897dc0e69.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar  8 15:52 'system@16ca61f2092a4b948aef42e48d42a204-00000000004b7dfa-0005bcc57c237e46.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 11 16:31 'system@16ca61f2092a4b948aef42e48d42a204-00000000004eb422-0005bd028f33fd7d.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 14 17:15 'system@16ca61f2092a4b948aef42e48d42a204-000000000051e8c9-0005bd3f71effd5c.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 17 17:27 'system@16ca61f2092a4b948aef42e48d42a204-0000000000551e1c-0005bd7c6ab93047.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 20 12:30 'system@16ca61f2092a4b948aef42e48d42a204-0000000000584ff3-0005bdb8ef2a619b.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 23 11:36 'system@16ca61f2092a4b948aef42e48d42a204-00000000005b5d6c-0005bdf1212af3ec.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 26 11:00 'system@16ca61f2092a4b948aef42e48d42a204-00000000005e8698-0005be2cbb27a4e8.journal'
-rw-r-----+ 1 root systemd-journal 128M Mar 29 10:58 'system@16ca61f2092a4b948aef42e48d42a204-000000000061b1d7-0005be68910b83e8.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr  1 10:57 'system@16ca61f2092a4b948aef42e48d42a204-000000000064e13a-0005bea4e5b125de.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr  4 10:49 'system@16ca61f2092a4b948aef42e48d42a204-000000000068105c-0005bee13c8b5363.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr  7 10:36 'system@16ca61f2092a4b948aef42e48d42a204-00000000006b3e96-0005bf1d7697c73e.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr 10 10:26 'system@16ca61f2092a4b948aef42e48d42a204-00000000006e6c4d-0005bf59a16d76d4.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr 13 10:27 'system@16ca61f2092a4b948aef42e48d42a204-0000000000719a63-0005bf95d9ad57ee.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr 16 08:25 'system@16ca61f2092a4b948aef42e48d42a204-000000000074ca08-0005bfd237815e30.journal'
-rw-r-----+ 1 root systemd-journal 128M Apr 19 09:04 'system@16ca61f2092a4b948aef42e48d42a204-000000000077ec2e-0005c00cdc182970.journal'
-rw-r-----+ 1 root systemd-journal  32M Apr 19 20:13  system.journal
root@mefin-ntl-rke-03:/var/log/journal/c27b494019ba448580a9cc1dce75e9c6#

Những log này do systemd sinh ra, có thể được đọc bởi journalctl, Mỗi khi có sự kiện gì xảy ra, systemd sẽ ghi lại nhật ký vào đây, phòng khi thằng quản trị rảnh háng muốn tìm hiểu. Tuy nhiên, mọi tiện ích đều phải trả giá, cái giá phải trả cho việc này là tốn ổ cứng vcl. Không phải máy chủ nào cũng rảnh rang vài trăm GB hay vài TB để thoải mái lưu trữ bất cứ thứ gì mình thích, lúc này, mấy thằng SA nghèo khổ lại phải vắt óc ra mà cấu hình tối ưu lại cho tiết kiệm, tiền ít thì đừng hít cái gì thơm quá.

Cách xử lý folder này như sau:
– Với những ông không có nhu cầu đọc cái này, xử lý nhanh thì có thể xoá sạch file trong folder này đi, tuy nhiên đừng xoá thư mục.
– Với những ông thích cấu hình tối ưu, đỡ phải thi thoảng đi xử lý ba cái sự vụ này:
Cấu hình lại file /etc/systemd/journald.conf, nội dung mặc định như này:

#  This file is part of systemd.
# 
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See journald.conf(5) for details.

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitIntervalSec=30s
#RateLimitBurst=10000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=yes
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K
#ReadKMsg=yes

Một số ý nghĩa cấu hình

Storage=volatile|persistent|auto|none
#none = tắt lưu trữ journaling log
#volatile = chỉ lưu trữ trong ram, đọc bằng lệnh journalctl
#persistent = lưu trữ vào ổ cứng, thư mục /var/log/journal, nếu thư mục này k có thì systemd sẽ tạo ra để lưu, 
#nếu có rồi nhưng ko ghi đc file thì sẽ ghi vào /run/log/journal
#auto = lưu trữ vào ổ cứng, tuy nhiên nếu /var/log/journal ko tồn tại thì lưu vào /run/log/journal

SplitMode=uid|none
#điều khiển việc systemd tự tách log theo user, user nào xem log của user đó ( mặc định là uuid)

SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=, SystemMaxFiles=, RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize=, RuntimeMaxFiles=
# con số quan trọng nhất để bảo toàn số dung lượng ổ cứng ít ỏi của bạn
# SystemMaxUse và RuntimeMaxUse điều khiển số dung lượng ổ cứng ví dụ 1G, 2G tối đa dành việc ghi log này
# SystemKeepFree và RuntimeKeepFree quy định số dung lượng ổ chừa lại cho thằng khác
#SystemMaxFileSize dung lượng file lớn nhất của mỗi file log
#SystemMaxFiles số lượng file log được lưu trữ

Rename VIP hoặc đối tượng nào đó trên F5

SSH vào thiết bị F5
truy cập giao diện tmsh

tmsh

Bật chế độ cho phép move

modify /sys db mcpd.mvenabled value true

Ví dụ cần đổi lên VIP virtual_server100 thành VIP1000

mv ltm virtual virtual_server100 VIP1000

Tắt chế độ cho phép move đi

modify /sys db mcpd.mvenabled value false

Chi tiết xem tạm video:

hpssacli – HP Smart array CLI commands and download link

HP Smart array CLI commands

Sử dụng để kiểm tra raid, ổ cứng đang cắm trên raid đối với máy chủ HP

Download HP Smart array CLI

HP Smart array CLI commands (these should apply to any system with the CLI installed)

Show configuration

/opt/hp/hpssacli/bin/hpssacli ctrl all show config

Controller status

/opt/hp/hpssacli/bin/hpssacli ctrl all show status

Show detailed controller information for all controllers

/opt/hp/hpssacli/bin/hpssacli ctrl all show detail

Show detailed controller information for controller in slot 0

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail

Rescan for New Devices

/opt/hp/hpssacli/bin/hpssacli rescan

Physical disk status

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show status

Show detailed physical disk information

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show detail

Logical disk status

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld all show status

View Detailed Logical Drive Status

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 show

Create New RAID 0 Logical Drive

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:2 raid=0

Create New RAID 1 Logical Drive

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2 raid=1

Create New RAID 5 Logical Drive

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2,2I:1:6,2I:1:7,2I:1:8 raid=5

Delete Logical Drive

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 delete

Add New Physical Drive to Logical Volume

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7

Add Spare Disks

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array all add spares=2I:1:6,2I:1:7

Enable Drive Write Cache

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify dwc=enable

Disable Drive Write Cache

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify dwc=disable

Erase Physical Drive

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd 2I:1:6 modify erase

Turn on Blink Physical Disk LED

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 modify led=on

Turn off Blink Physical Disk LED

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 modify led=off

Modify smart array cache read and write ratio (cacheratio=readratio/writeratio)

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify cacheratio=100/0

Enable smart array write cache when no battery is present (No-Battery Write Cache option)

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify nbwc=enable

Disable smart array cache for certain Logical Volume

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=disable

Enable smart array cache for certain Logical Volume

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=enable

Enable SSD Smart Path

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array a modify ssdsmartpath=enable

Disable SSD Smart Path

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array a modify ssdsmartpath=disable

Triển khai Network loadbalancing trên windows server

Mô hình:

10.144.139.17 và 10.144.139.18 là 2 máy windows (Chạy active directory). Mong muốn cần tạo IP VIP cho 2 con này, IP VIP là 10.144.139.25

Thử nghiệm với máy windows 10.144.141.190 kết nối tới VIP, firewall sử dụng là Juniper SRX 4100

Tiến hành cấu hình:

Tạo loadbalancing giữa 2 máy windows, cài đặt features: Network Load Balancing trên cả 2 máy

Đứng trên máy 10.144.139.17, Sau khi cài đặt, mở màn hình Network Load Balancing, nháy phải vào chọn new Cluster

Nhập 10.144.139.17, bấm connect -> next -> Nhập VIP IP giữa 2 con: 10.144.139.25 và subnet tương ứng, kiểu như này

Internetname có thể ko cần, Cluster operation mode thì để Multicast. Nếu chọn unicast thì máy bị đứt kết nối khỏi mạng luôn. Sau đó next next tới hết

Quay lại màn hình Network Load Balancing, nháy phải vào cluster, chọn add Host, nhập thêm host 10.144.139.18 và làm tương tự.

Quay sang màn hình của Máy 10.144.139.18, Nháy phải, chọn connect to Existing

Sau đó nhập IP 10.144.139.17 là xong.

Ping thử IP 10.144.139.25 trên cả 2 máy 17, 18 và 1 máy trong cùng mạng LAN đó thấy đã OK.

Tuy nhiên từ 10.144.141.190 vẫn chưa Ping được IP ảo mới tạo do SRX không học Mac của 10.144.139.25 này. Tiến hành cấu hình trên firewall SRX:

– ĐỨng từ 10.144.141.190 ping tới 10.144.139.25 thấy chưa thông, show trên firewall:

# run monitor traffic interface reth0.1561
verbose output suppressed, use <detail> or <extensive> for full protocol decode
Address resolution is ON. Use <no-resolve> to avoid any reverse lookup delay.
Address resolution timeout is 4s.
Listening on reth0.1561, capture size 96 bytes
01:54:24.337022  In arp who-has 10.144.139.25 tell 10.144.139.1
01:54:24.337254  In arp reply 10.144.139.25 (03:bf:0a:90:8b:19) is-at 03:bf:0a:90:8b:19
01:54:24.337256  In arp reply 10.144.139.25 (03:bf:0a:90:8b:19) is-at 03:bf:0a:90:8b:19

trong đó reth0.1561 là interface của dải 10.144.139.xxx, ta thấy MAC ảo của IP 25 là 03:bf:0a:90:8b:19
Khai báo trên SRX:

# set interfaces reth0.1561 family inet address 10.144.139.1/26 arp 10.144.139.25 multicast-mac 03:bf:0a:90:8b:19
# commit

Ping thử thấy từ 190 đã thông tới 25
Done

Decode escape string

Gặp các đoạn text dạng:

\x22Signature\x22 : \x225382d1d011c45f6822442ebc1c3e499c\x22\x0A}

\x0A  \x22OrderId\x22 : \x22200914388560\x22,\x0A  \x22LocalDate\x22 : \x2220200914001319\x22,\x0A  \x22Signature\x22 : \x22\x22\x0A

Với python3 có thể dịch bằng

value.encode('utf8').decode('unicode_escape')