Cascade Replication and Delayed servers on PostgreSQL 9.2

Pythian Marketing

August 1, 2012

Postgresql 9.2 will be released with a new replication feature: Cascade Replication. It will improve streaming between wide networks, shortening distances between the servers. For example: a master server in San Francisco, a slave in Chicago and a cascade server in Barcelona.
But it has other benefit, e.g. when you need to add nodes to the cluster, you can to take backups from your slave without directly affecting the master in terms of I/O of traffic while also minimizing the connections to it.
The setup is easy and intuitive. Like from a slave from a master, but with the difference on the primary_conninfo, which will point to the slave server.
Things to consider when cascading replication:

One of the 9.0 limitations in replication -cleaned on 9.1- is the problem that when a master is vacuuming a big table that affects a query that is being executed on the slave. That feature can be enabled with the hot_standby_feedback on the slave server and will add more communication between the servers (if you have very slow connection, maybe you want to disable it)
If you have synchronous replication it will only affect 1 slave server against the master, the cascade slave won’t be affected by this setup (cascade server cannot be synchronous).
The control function pg_xlog_replay_pause (which stops applying the new incoming wal records) affects only the server in which you are executing it. If the server in which you are executing the control function has cascade servers, those servers will continue streaming and applying the records (occasionally you can have a=c but b!=a, besides replication has a->b->c chain), cause that function avoids the replay, not the streaming shipping. If you want to stop the cascading to all the hosts (not only replication, although the streaming), just stop the slave or change the primary_conninfo in the cascade servers. That is useful if you run BI queries on the cascade servers and don’t want to be affected by writes, so you can delay completely the replication until finish your reports.
If you want to use archiving, it must be set up from the master server. Archiving is necessary if you want to enlarge the availability in case of a long downtime of a slave. If you want a delayed server or if you want an incremental backup for PITR in case of disaster.

In our test we’ll have 5 servers, that I’ll label with the port number and data directory to avoid confusions. In this case you can run postgres in same port in different hosts, but just for make clear the difference I will use different ports between them:
Master (8888, data_master) Slave(8889 data_slave) Cascade Slave(8890 data_cascade_slave) delayed standby server using omnipitr-restore (8891 data_delayed) Delayed slave hot standby server using a handmade script (8892 data_streaming_delayed).
To take a backup for replication, first create a user with replication grants and add it to pg_hba.conf in the master. From psql, run:

CREATE USER repl WITH REPLICATION PASSWORD ‘password’;

pg_hba.conf:

local   replication     repl                            md5
host    replication     repl        127.0.0.1/32            md5
host    replication      repl        0.0.0.0/0               md5

*NOTE: if you want to setup several slaves in one move, make a backup locally in the master, compress it and then ship it among the slaves. Compress and backup command:

bin/pg_basebackup -Ft -p8888 -Urepl -Ddata_slave -z -Z9

Servers configuration
Master (8888) configuration:
postgresql.conf:

wal_level = hot_standby
max_wal_senders=3
wal_keep_segments = 30
archive_mode=on
archive_command = 'cp %p /opt/pg92beta/archives_master/%f'

pg_hba.conf:

local   replication     repl           md5
host    replication     repl        127.0.0.1/32             md5
host    replication      repl        0.0.0.0/0               md5

host replication repl 0.0.0.0/0 md5 max_wal_senders should be a value -at least- the number of slaves connected to this server (cascade servers doesn’t count, cause they connect to the other slaves). If you don’t know how many slaves you need, you can setup a higher value (like 10 for example). That won’t affect performance and the process of wal sender only will work if another slave is brought up.
wal_keep_segments is optional, but is a very comfortable setup when we need to have a large available segments on the master to update the slaves. If one slave gets into a not-so-long downtime it can get updated using directly the streaming replication, instead the archiving files. Is not a time value, and this is important to keep in mind. Keeping 30 segments maybe for a high load will represent just a few hours of activity and in a low activity server maybe a few days. archive_mode and archive_command are in charge of copy those segments to other destination to be used for incremental backups or update slaves that were in long period of downtime.
Slave (8889) Configuration:
postgresql.conf:

wal_level = hot_standby
max_wal_senders=3
wal_keep_segments = 30
hot_standby = on
hot_standby_feedback = on

recovery.conf:

standby_mode = on
primary_conninfo = 'user=postgres host=localhost port=8888'
trigger_file=/tmp/promote_me

This is a classic setup of a asynchronous slave server through streaming replication. No delay specified, just replicate as fast as it can. The hot_standby_feedback option, allows to communicate with the master, in case of long running queries on the slave and to avoid vacuums during its execution on the master. Cascade slave (8890) configuration:
recovery.conf:

standby_mode = on
primary_conninfo = 'user=postgres host=localhost port=8889'

Configuration of this host is like every slave, the difference is that the primary_conninfo will point to the slave server where you have taken the backup.
Delayed Slave (8891) configuration:
recovery.conf:

restore_command = '/opt/pg92beta/omniti/bin/omnipitr-restore -D /opt/pg92beta/data_delayed -l /opt/pg92beta/omniti-restore.log -s /opt/pg92beta/archives_master -w 300 %f %p'
archive_cleanup_command = '/opt/pg92beta/bin/pg_archivecleanup /opt/pg92beta/archives_master %r'
standby_mode = on

You can download omnipitr-restore at https://github.com/omniti-labs/omnipitr . Some documentation here at OmniPitr with delay.
restore_command is in charge to apply the corresponding segments and the option “-w 300” is the “wait” to apply the delay. The server will be on standby mode, which means that is not accessible to query it.

Streaming delayed hot-standby slave (8892) configuration:
Configuration is almost the same as other slaves, just keep an eye on: postgresql.conf:

#...same configuration as the other slaves...
hot_standby_feedback = off   #not necessary cause the server is delayed

Script for delayed using recovery_target_time:

!/bin/bash
HOME_PG=/opt/pg92beta
DATA_PG=$HOME_PG/data_streaming_delayed
DELAYED_DATE=$(date "+%Y-%m-%d %H:%M:%S %Z" --date="5 minutes ago")
RECOVERY_FILE_LOCATION=$DATA_PG/recovery.conf
RECOVERY_FILE_LOCATION_DONE=$DATA_PG/recovery.done
CONTROL_COMMAND=$HOME_PG/control_streaming_delayed.sh
PORT=8892
res=$($CONTROL_COMMAND stop)
if [ -f $RECOVERY_FILE_LOCATION_DONE ] ; then
 mv $RECOVERY_FILE_LOCATION_DONE $RECOVERY_FILE_LOCATION
fi
sed -i "s/.*recovery_target_time[ |\=].*/recovery_target_time =\'$DELAYED_DATE\'/g" $RECOVERY_FILE_LOCATION
#Taken from the doc:
# If this recovery target is not the desired stopping point,
# then shutdown the server, change the recovery target settings
# to a later target and restart to continue recovery.
res=$($CONTROL_COMMAND start)

control_streaming_delayed.sh is just a one line script (–mode=fast is required to stop queries when restarted):

su postgres -c "bin/pg_ctl -D data_streaming_delayed --mode=fast -l logfile_streaming_delayed $1"

This configuration has its cons:
The “delay” frame isn’t exact. Why? You need to cron the script – which has the recovery_target_time change- each x period of time. During that period of time, the delay will increase until you re-execute the script. So, if you have configured a “30 minutes ago” and your script is executed each 5 minutes, your delay will be between 30 and 35 minutes. The scripts executes a stop/start, which is necessary to load the new configuration. DO NOT RUN A RELOAD, because the slave will stop the recovery mode. Due this, you will need to have in mind if you want to make queries in that server.
But its advantages are:
You can query the server. You can just stop the refresh of the recovery_target_time for make your reports against that server. Due that the server replay is paused, you will experience more performance in your queries.

After every start, you will find something like:

LOG:  database system was shut down in recovery at 2012-07-04 11:14:39 UTC
LOG:  entering standby mode
LOG:  redo starts at 0/1E000020
LOG:  consistent recovery state reached at 0/1E010B48
LOG:  database system is ready to accept read only connections
LOG:  recovery stopping before commit of transaction 14024, time 2012-07-04 11:14:18.319328+00
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.

Do not execute pg_xlog_replay_resume() if you want to keep recovering. If you execute it, will replay the last records and start as a stand alone server, which isn’t our objective. That function will update until the last record received and will continue doing that until something stops the replay.
Let’s see an example:

root@lapp /opt/pg92beta# bin/psql -p8888 -Upostgres
psql (9.2beta2)
Type "help" for help.
postgres=# insert into check_time values (now());
INSERT 0 1
postgres=# select * from check_time ;
            i
----------------------------
2012-07-04 11:19:47.667041
2012-07-04 11:43:23.794384
(2 rows)
postgres=# \q
root@lapp /opt/pg92beta# ./change_recovery_target_time.sh
root@lapp /opt/pg92beta# tail data_streaming_delayed/pg_log/postgresql-2012-07-04_11
postgresql-2012-07-04_110653.log  postgresql-2012-07-04_110853.log  postgresql-2012-07-04_111440.log  postgresql-2012-07-04_114335.log
postgresql-2012-07-04_110808.log  postgresql-2012-07-04_111315.log  postgresql-2012-07-04_111954.log
root@lapp /opt/pg92beta# tail data_streaming_delayed/pg_log/postgresql-2012-07-04_114335.log
LOG:  database system was shut down in recovery at 2012-07-04 11:43:33 UTC
LOG:  entering standby mode
LOG:  redo starts at 0/1E000020
LOG:  consistent recovery state reached at 0/1E012734
LOG:  database system is ready to accept read only connections
LOG:  recovery stopping before commit of transaction 14027, time 2012-07-04 11:43:23.797137+00
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.
root@lapp /opt/pg92beta# bin/psql -p8892 -Upostgres
psql (9.2beta2)
Type "help" for help.
postgres=# select * from check_time ;
            i
----------------------------
2012-07-04 11:19:47.667041
(1 row)
postgres=# \q

For large amounts of data, maybe you want to have more large delays (1 day, 6 hours) due to the spended time to start an instance. Based on some experience, 1 day shuold be find to detect problems or to work with reports.
https://pgxn.org/search?q=delay&in=docs
https://pgxnclient.projects.postgresql.org/install.html
https://vibhorkumar.wordpress.com/2011/11/15/new-replication-and-recovery-features-in-postgresql-9-1/

Insight and analysis of technology and business strategy

Cascade Replication and Delayed servers on PostgreSQL 9.2

Top Categories

Tell us how we can help!