Best method for PostgreSQL incremental backup
PostgresqlBackupPostgresql Problem Overview
I am currently using pg_dump
piped to gzip
piped to split
. But the problem with this is that all output files are always changed. So checksum-based backup always copies all data.
Are there any other good ways to perform an incremental backup of a PostgreSQL database, where a full database can be restored from the backup data?
For instance, if pg_dump
could make everything absolutely ordered, so all changes are applied only at the end of the dump, or similar.
Postgresql Solutions
Solution 1 - Postgresql
Update: Check out Barman for an easier way to set up WAL archiving for backup.
You can use PostgreSQL's continuous WAL archiving method. First you need to set wal_level=archive
, then do a full filesystem-level backup (between issuing pg_start_backup()
and pg_stop_backup()
commands) and then just copy over newer WAL files by configuring the archive_command
option.
Advantages:
- Incremental, the WAL archives include everything necessary to restore the current state of the database
- Almost no overhead, copying WAL files is cheap
- You can restore the database at any point in time (this feature is called PITR, or point-in-time recovery)
Disadvantages:
- More complicated to set up than pg_dump
- The full backup will be much larger than a pg_dump because all internal table structures and indexes are included
- Does not work well for write-heavy databases, since recovery will take a long time.
There are some tools such as pitrtools and omnipitr that can simplify setting up and restoring these configurations. But I haven't used them myself.
Solution 2 - Postgresql
Also check out http://www.pgbackrest.org
pgBackrest is another backup tool for PostgreSQL which you should be evaluating as it supports:
-
parallel backup (tested to scale almost linearly up to 32 cores but can probably go much farther..)
-
compressed-at-rest backups
-
incremental and differential (compressed!) backups
-
streaming compression (data is compressed only once at the source and then transferred across the network and stored)
-
parallel, delta restore (ability to update an older copy to the latest)
-
Fully supports tablespaces
-
Backup rotation and archive expiration
-
Ability to resume backups which failed for some reason
-
etc, etc..
Solution 3 - Postgresql
Another method is to backup to plain text and use rdiff to create incremental diffs.