Not long ago, I set up synchronous, streaming replication in PostgreSQL 9.1 in order to build a Hot Standby system. This is the story of that journey.
This post is mostly based on the 5-minute simple replication tutorial from the PostgreSQL wiki, but with a few modifications and clarifications. There's also lots of good information in the Streaming Replication wiki from the PostgreSQL wiki.
For the remainder of this post, assume that you have a Master at 126.96.36.199 (host named
db0) and a Standby at 188.8.131.52 (host named
Configure the Master
postgresql.conf, you need to set the following:
listen_address = '*' wal_level = hot_standby max_wal_senders = 3
There is one caveat of running a hot standby system: if the standby crashes, the master will continue to wait for a response on every transaction. This is a Bad Thing. You should adequately monitor this so you know when it happens!
pg_hba.conf, you need to add this:
host replication all 184.108.40.206/32 trust
This gives your standby server permissions to connect to the master.
Configure the Standby
Now, on the standby machine, edit
postgresql.conf so that it includes:
hot_standby = on
Then, add a
recovery.conf file. This ususally goes in PostgreSQL's data directory, which defaults to
/var/lib/postgresql/9.1/main on Ubuntu. This file should contain:
standby_mode = 'on' primary_conninfo = 'host=220.127.116.11 application_name=db1'
Just to make sure the appropriate processes are running, verify that there's a
sender process that will run on the master and a
receiver process that will run on the standby. You can check for those with
ps -ef | grep sender and
ps -ef | grep receiver, respectively.
At this point, we've got to make sure the Standby contains an exact copy of the data from the Master. Shut down the PostgreSQL service on the master and standby (e.g. with a command such as
invoke-rc.d postgresql stop). Then copy data files from the master to the standby. You must exclude any config files if you've put any there (we haven't in this tutorial) as well as the
rsync -av --exclude pg_xlog /var/lib/postgresql/9.1/main 18.104.22.168:/var/lib/postgresql/9.1/main
Then, start the standby and the master (
invoke-rc.d postgresql start on both systems).
Now it's time to test this setup. Log into the master (
ssh email@example.com) and create a sample database:
$ su postgres $ createdb sampledb
Now, access the postgresql command prompt:
$ psql -d sampledb
Create a simple table and populate it with some data:
CREATE table samples ( id integer PRIMARY KEY, name varchar(25), stuff text ); INSERT into samples values (1, 'foo', 'lots of foo'); INSERT into samples values (2, 'bar', 'lots of bar'); INSERT into samples values (3, 'baz', 'lots of baz'); INSERT into samples values (4, 'bin', 'lots of bin');
Now, in another shell, log into your standby (
ssh firstname.lastname@example.org), and fire up the
$ su postgres $ psql -d sampledb
Look at a list of the tables:
sampledb=# \d List of relations Schema | Name | Type | Owner --------+---------+-------+---------- public | samples | table | postgres (1 row)
Then query the
sampledb=# select * from samples; id | name | stuff ----+------+------------- 1 | foo | lots of foo 2 | bar | lots of bar 3 | baz | lots of baz 4 | bin | lots of bin (4 rows)
Now, try to insert some more data:
sampledb=# insert into samples values (5, 'asdf', 'more asdf'); ERROR: cannot execute INSERT in a read-only transaction
Whoops! The standby server gives you Read-Only access to the data.
Checking the WAL Status
On the master, you can run the following command to see the current WAL write location:
Then on the standby, you can run:
If you get the same values, then replication is up-to-date... otherwise there is some lag. We haven't set up synchronous replication, yet, so in a production environment (with a lot of data getting written to the database) it's possible the Standby would lag behind the master. If you need both systems to always be idendical, you can flip on synchronous replication.
Enabling Synchronous Replaction just takes a couple of additional steps. Keep in mind though, that this should only be done when there's little latency between the master and the standby! That's because each query on the master will wait for a responce from the standby, which has to go over the network. I would only configure synchronous replication if both the master and standby are connected over a local (preferably private) network.
Ok, to make it happen, add the following to
postgresql.conf on the Master:
synchronous_standby_names = 'db1'
Where the value here is a comma-separated list of your standby servers. You can have more than one, but only the first one will be used. If for some reason the first one dies, the second will become the standby.
Restart postgresql, then check out the status of your data synchronization (still on the master):
$ su posgresql $ psql
SELECT usename, application_name, client_addr, client_hostname, sync_state FROM pg_stat_replication;
You should see something like this:
usename | application_name | client_addr | client_hostname | sync_state ----------+------------------+-----------------+-----------------+------------ postgres | db1 | 22.214.171.124 | | sync
Now, just for fun, go back and insert a bunch of data in your master, then query on your standby machine. You should see the same results on both machines!
Now that you've got this set up, it's probably not a bad idea to read more about what's going on. The official PostgreSQL Docs make a good reference:
- 18.5. Write Ahead Log
- 18.6. Replication
- 25. High Availability, Load Balancing, and Replication
- Basically the whole Server Administration Section.
comments powered by Disqus