各位用户为了找寻关于PostgreSQL+Pgpool实现HA主备切换的操作的资料费劲了很多周折。这里教程网为您整理了关于PostgreSQL+Pgpool实现HA主备切换的操作的相关资料,仅供查阅,以下为您介绍关于PostgreSQL+Pgpool实现HA主备切换的操作的详细内容
PostgreSQL流复制实现HA主备切换
环境说明和主机规划
基础环境配置(所有主机操作)
配置HOSTS
echo -e "10.0.0.11 mastern10.0.0.12 slaven10.0.0.13 pool" >> /etc/hosts # 执行一次即可
配置统一的时间(若已配置,请忽略)
? 1 2 3yum install -y ntpdate && ntpdate ntp1.aliyun.com
echo -e
"# sync time from ntp1.aliyun.comn5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com > /dev/null 2>&1
"
>> /var/spool/cron/root # 写入定时任务,执行一次即可
创建postgres用户
useradd postgres && echo "your_password" | passwd --stdin postgres
配置免密钥登陆
? 1 2 3 4 5 6su - postgres
ssh-keygen -t rsa -f /home/postgres/.ssh/id_rsa -P
""
cd ~/.ssh/
ssh-copy-id postgres@master # 三台主机执行
scp authorized_keys postgres@slave:~/.ssh # 只在master主机执行
scp authorized_keys postgres@pool:~/.ssh # 只在master主机执行
安装Postgresql数据库(PG9.6)
yum install -y https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm
yum install -y postgresql96-server postgresql96-contrib postgresql96 postgresql96-libs
创建统一的目录结构
mkdir /data1/pg_{data,bin,logs} -p
chown -R postgres.postgres /data1/
修改系统变量
? 1 2 3 4 5 6 7vi /etc/profile #增加以下内容
export PGHOME=/usr/pgsql-9.6/
export PGDATA=/data1/pg_data
export PGPORT=54321
export PATH=$PATH:$PGHOME/bin
# 生效
source /etc/profile
PostgreSQL流复制结构(master和slave主机操作)
master主机操作
初始化系统
/usr/pgsql-9.6/bin/postgresql96-setup initdb
vi /usr/lib/systemd/system/postgresql-9.6.service
修改postgresql-9.6.service
内容如下:
? 1 2 3 4 5# Include the
default
config:
.include /usr/lib/systemd/system/postgresql-9.6.service
[Service]
Environment=PGDATA=/data1/pg_data
重启PG服务
? 1 2 3 4systemctl daemon-reload
su - postgres -c
'/usr/pgsql-9.6/bin/initdb -D /data1/pg_data'
systemctl restart postgresql-9.6
systemctl enable postgresql-9.6.service
修改系统配置(以下用postgres用户操作)
? 1 2 3 4 5 6 7 8 9 10cp /data1/pg_data/pg_hba.conf{,.bak}
cat >/data1/pg_data/pg_hba.conf<<EOF
local
all
all
trust
host
all
all
10.0.0.11/32 trust
host
all
all
10.0.0.12/32 trust
host
all
all
0.0.0.0/0 md5
host
all
all
::1/128 trust
host replication stream_replication 0.0.0.0/0 md5
EOF
#host replication stream_replication 0.0.0.0/0 md5 为流复制用户
64G
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22cp /data1/pg_data/postgresql.conf{,.bak}
cat >/data1/pg_data/postgresql.conf<<EOF
listen_addresses =
'*'
port = 54321
max_connections = 256
shared_buffers = 16GB
effective_cache_size = 48GB
work_mem = 64MB
maintenance_work_mem = 2GB
min_wal_size = 2GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
wal_level = hot_standby
wal_log_hints =
on
max_wal_senders = 1
hot_standby =
on
logging_collector =
on
log_directory =
'pg_log'
EOF
#操作完记得重启 pg_ctl restart
128G
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18listen_addresses =
'*'
port = 54321
max_connections = 256
shared_buffers = 32GB
effective_cache_size = 96GB
work_mem = 128MB
maintenance_work_mem = 2GB
min_wal_size = 2GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
wal_level = hot_standby
wal_log_hints =
on
max_wal_senders = 1
hot_standby =
on
logging_collector =
on
log_directory =
'pg_log'
在主库中创建流复制用户(stream_replication)和PGPool用户(srcheck)
CREATE USER stream_replication replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';
CREATE USER srcheck replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';
修改主库pg_hba.conf文件(已操作见cat >/data1/pg_data/pg_hba.conf<<EOF)
host replication stream_replication 0.0.0.0/0 md5
slave主机操作
初始化系统
/usr/pgsql-9.6/bin/postgresql96-setup initdb
vi /usr/lib/systemd/system/postgresql-9.6.service
修改postgresql-9.6.service
内容如下:
? 1 2 3 4 5# Include the
default
config:
.include /usr/lib/systemd/system/postgresql-9.6.service
[Service]
Environment=PGDATA=/data1/pg_data
重启PG服务
systemctl daemon-reload
基础备份复制到备库服务器
rm -rf /data1/pg_data # 如果没有重要数据可操作,主要为同步主库路径
su - postgres -c 'pg_basebackup -D $PGDATA --format=p -h master -p 54321 -U stream_replication -W'
修改备库配置信息
cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf
vi $PGDATA/recovery.conf
增加以下内容
? 1 2 3 4 5 6 7standby_mode=
'on'
primary_conninfo =
'host=master port=54321 user=stream_replication password=your_password'
restore_command =
''
recovery_target_timeline =
'latest'
# 重启PG服务
systemctl restart postgresql-9.6
systemctl enable postgresql-9.6.service
验证
主节点执行
? 1 2 3create
table
test (id int4, create_time
timestamp
(0) without
time
zone);
insert
into
test
values
(1, now());
select
*
from
test;
备节点执行
select * from test;
其他查询
进入测试数据库test,主库上执行如下命令返回f,备库上返回t。 select pg_is_in_recovery();
执行如下命令查看快照,它返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1。
select txid_current_snapshot();
执行如下命令可以查看主备同步状态。
select * from pg_stat_replication;
字段state显示的同步状态有:startup(连接中)、catchup(同步中)、streaming(同步);字段sync_state显示的模式有:async(异步)、sync(同步)、potential(虽然现在是异步模式,但是有可能升级到同步模式)。
主备切换
假设主库崩溃了,备库如何从只读状态切换为读写状态呢?只要把备库的postgresql.conf中hot_standby修改为off,并且删除recovery.conf,然后重启库就可以提供服务了。
PGPool2(pool主机操作)
安装PGPool2
? 1 2 3yum install -y http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm
yum -y install pgpool-II-pg96 pgpool-II-pg96-debuginfo pgpool-II-pg96-devel pgpool-II-pg96-extensions
systemctl enable pgpool.service #开启自动启动
添加Pgpool-II运行用户
? 1 2 3useradd postgres # 环境准备时已操作
chown -R postgres.postgres /etc/pgpool-II
chown -R postgres.postgres /var/run/pgpool/
配置pool_hba.conf
cp /etc/pgpool-II/pool_hba.conf{,.bak}
vi /etc/pgpool-II/pool_hba.conf
增加内容
host all all 0.0.0.0/0 md5
配置pcp.conf
主节点登陆后执行:
? 1 2 3 4 5 6 7 8postgres=#
select
rolname,rolpassword
from
pg_authid;
rolname | rolpassword
--------------------+-------------------------------------
pg_signal_backend |
srcheck | md5662c10f61b27a9ab38ce69157186b25f
postgres | md5d3612d57ee8d4c147cf27b11e3a0974d
stream_replication | md59279ef6b904bc483e4f85e6d44cfc0ed
(4
rows
)
vi /etc/pgpool-II/pool_passwd
增加SQL执行结果的内容,形式为$rolname:$rolpassword例如:
srcheck:md5662c10f61b27a9ab38ce69157186b25f
或者:
pg_md5 -u postgres your_password
vi /etc/pgpool-II/pcp.conf ## 加入 postgres:上一命令的输出
配置pgpool.conf
cp /etc/pgpool-II/pgpool.conf{,.bak}
vi /etc/pgpool-II/pgpool.conf
内容如下:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54# CONNECTIONS
listen_addresses =
'*'
port = 54321
socket_dir =
'/var/run/pgpool'
pcp_listen_addresses =
'*'
pcp_port = 9898
pcp_socket_dir =
'/var/run/pgpool'
# - Backend
Connection
Settings -
backend_hostname0 =
'master'
backend_port0 = 54321
backend_weight0 = 1
backend_data_directory0 =
'/data1/pg_data'
backend_flag0 =
'ALLOW_TO_FAILOVER'
backend_hostname1 =
'slave'
backend_port1 = 54321
backend_weight1 = 1
backend_data_directory1 =
'/data1/pg_data'
backend_flag1 =
'ALLOW_TO_FAILOVER'
# - Authentication -
enable_pool_hba =
on
pool_passwd =
'pool_passwd'
# FILE LOCATIONS
pid_file_name =
'/var/run/pgpool/pgpool.pid'
logdir =
'/data1/pg_logs'
replication_mode =
off
load_balance_mode =
on
master_slave_mode =
on
master_slave_sub_mode =
'stream'
sr_check_period = 5
sr_check_user =
'srcheck'
sr_check_password =
'123456'
sr_check_database =
'postgres'
# HEALTH
CHECK
健康检查
health_check_period = 10
health_check_timeout = 20
health_check_user =
'srcheck'
health_check_password =
'123456'
health_check_database =
'postgres'
# FAILOVER
AND
FAILBACK
failover_command =
'/data1/pg_bin/failover_stream.sh %H'
failover_stream.sh脚本
? 1 2 3 4 5 6vim /data1/pg_bin/failover_stream.sh
chmod 777 /data1/pg_bin/failover_stream.sh
chmod u+s /sbin/ifconfig
chmod u+s /usr/sbin
pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 & ## 启动
pgpool -m fast stop ## 关闭
failover_stream.sh内容:
? 1 2 3 4 5 6 7 8 9 10 11#! /bin/sh
# Failover command
for
streaming replication.
# Arguments: $1: new master hostname.
new_master=$1
trigger_command=
"$PGHOME/bin/pg_ctl promote -D $PGDATA"
# Prompte standby
database
.
/usr/bin/ssh -T $new_master $trigger_command
exit 0;
登陆设置
当执行pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &后可查看集群状态:
? 1 2 3 4 5 6 7 8[postgres@pool pgpool-II]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | master | 54321 | up | 0.500000 |
primary
| 0 |
false
| 0
1 | slave | 54321 | up | 0.500000 | standby | 0 |
true
| 0
(2
rows
)
如果未发现集群状态,请在master和slave主机分别执行以下操作:
? 1 2 3[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 0
[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 1
#详情查询命令pcp_attach_node
HA切换
模拟master主机宕机
Master端:
? 1 2 3[postgres@master ~]$ pg_ctl stop
waiting
for
server
to
shut down.... done
server stopped
当前集群状态
? 1 2 3 4 5 6 7 8 9 10[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
psql (9.6.1)
Type
"help"
for
help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master | 5432 | down| 0.500000 | standby | 0 |
false
| 0
1 | slave | 5432 | up | 0.500000 |
primary
| 0 |
true
| 0
(2
rows
)
发现master已经是standby了,且down机了
修改master,启动
当master主机宕机后,此时slave主机PG数据库成为主库,修改master成为slave的从库即可
? 1 2 3 4 5[postgres@master ~]$ vim recovery.conf
standby_mode=
'on'
primary_conninfo =
'host=slave port=54321 user=stream_replication password=your_password'
restore_command =
''
recovery_target_timeline =
'latest'
同步时间线
? 1 2 3 4 5 6 7#如果报时间线冲突落后,先停掉pg服务,然后执行同步时间线,否知直接看状态
[postgres@master ~]$ pg_rewind
--target-pgdata=/data1/pg_data --source-server='host=slave port=54321 user=postgres dbname=postgres'
servers diverged
at
WAL position 0/5000098
on
timeline 1
rewinding
from
last
common
checkpoint
at
0/5000028
on
timeline 1
Done!
# 重新启动数据库
[postgres@master ~]$ pg_ctl start
再次查看当前状态
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master | 5432 | down| 0.500000 | standby | 0 |
false
| 0
1 | slave | 5432 | up | 0.500000 |
primary
| 0 |
true
| 0
(2
rows
)
#注意虽然master已经启动了,但是还是down,需要手动将master节点添加进pgpool,master的node_id是0,所以-n 0
[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 54321 -n 0
#提示输入密码,输入pcp管理密码
#查看当前状态
[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master | 5432 | up | 0.500000 | standby | 0 |
false
| 0
1 | slave | 5432 | up | 0.500000 |
primary
| 0 |
true
| 0
(2
rows
)
现在两个节点都是up了。
主从两节点pgpool健康检查脚本(pgpool_check.sh)
说明:此脚本是基于PGpool只安装到master和slave两个主机上的情况下使用,在master主机有了pgpool进程后,可在slave主机执行sh pgpool_check.sh & 即可
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20#! /bin/bash
#
Check
Master host pgpool-process
while
true
do
pgcount=$(nmap 10.0.0.11|egrep
'9898|9999'
|wc -l)
if [ $pgcount -eq 2 ] ;
then
echo
'Master host pgpool is GOOD!!!'
> /dev/
null
2>&1
else
echo -e
"Master host pgpool is 33[31m BAD!!! 33[0m"
echo -e
"Master host pgpool is 33[31m BAD!!! 33[0m"
echo -e
"Master host pgpool is 33[31m BAD!!! 33[0m"
echo -e
"SYSTEM WILL DO THE SHELL : 33[34m su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &' 33[0m"
su - postgres -c
'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &'
pgport=$(netstat -lntup|egrep
'9898|9999'
|wc -l)
[ $pgport -gt 0 ] && echo -e
"Slave host pgpool is 33[32m RUNNING!!! 33[0m"
exit 0
fi
done
以上为个人经验,希望能给大家一个参考,也希望大家多多支持。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://blog.csdn.net/weixin_42509278/article/details/81484655