PGSQL Phriday # 002: Postgresql Backup and Restore 

Always have a backup! You might need this if you are working on a restore as well.

This blog post is related to #PGSQLPhriday. The original post by Ads can be seen here.

Here is the question that Ads has asked in the post.

Describe how you do backups for your PostgreSQL databases.

Which tool(s) are you using, where do you store backups, how often do you do backups?

Are there any recommendations you can give the reader how to improve their backups?

Any lesser known features in your favorite backup tool?

Any new and cool features in a recently released version?

Bonus question: Is pg_dump a backup tool?

If you go back to ads’ original blog post, he asks, “Backups? Do you need that?”.  When I read this as the topic for #PGSQLPhriday, I thought to myself this guy is crazy,  he must have had too much ice cream if he believes we don’t need backups!  But then I realized he was just using that as a headline to get me reading more.  Thank you ads for helping my heart to skip a beat a little.  I owe you a good ice cream at the next event for that one.   

Let’s Talk Backups

I’m still fairly new to Postgres having only started about 5 years ago. I started with Pg9.6 and we quickly moved that environment to pg10.   When I arrived at this company,  pg_dump was the only backup we were using.  The DB at that time was still around 50GB, it was reasonable to do the backups in a timeframe that worked for us.  A dump was done every night and stored off to another server.   After some time we started to test Pg_basebackup. This allowed us to full server backup each night. It was a huge improvement as far as speed and ability to handle much larger data sizes. 

The downside to this approach is that you have to restore the full server directory to another server or over the top of your existing server.  In the world of the DBA, one of the more frequent things you do is restore just a  table that gets dropped by a DBA, Engineer, Sysadmin, or even the CTO sometimes.  When you have to simply restore one table the base backup makes that pretty hard since you HAVE to restore the entire Server/Directory to another location.  

Now don’t get me wrong it’s not easy to do that with PG_dump either, but with Dump at least you can just point that to your same server and just a new DB name.  That means it’s usually a lot faster to restore just that table. Also with the Dump, there is a good chance it will restore the data quickly. If you already have the structure of the table you can simply restore the data to the original database and not wait for things like Indexes or keys to be restored from the dump.  

Regardless of the tools or the Database (SQL Server, PG, MySQL, ES) here are some key items I always make sure of in any of my Database situations 

  1. Test all Backups!  You have NOT backed up if you don’t test the backup by restoring it.  This doesn’t have to be every file but you should have an audit restore running every X days/week that can test a backup file for you.  Just do it! 
  2. Make sure others can backup/restore and write down the Backup and Restore procedure.   Even if you have a seasoned team of DBA’s you never know when the whole group will be unavailable.  Sometimes you need the person that knows the least amount about restoring data to be able to restore the data.  We always maintain documentation on how to restore the DB and make sure several teams in the company have both the permissions and the understanding of how to do it.  
  3. Know how you are planning to restore the DB.  99.99999% of the time you are not going to restore over the top of the DB/Server you are already using.  Most of the time you are going to restore to a new server/new system/different DB.  Think about this when you are designing your backup and restore process. A lot of tools/systems will talk about how quickly they can put the whole drive back in place from just a few hours ago. But rarely does a customer want everything back to a few hours ago. Typically it’s just one aspect of the Database that’s been impacted and you don’t want to roll back all the other pieces of the system. When designing your strategy you should be thinking about what a restore looks like and what you typically will use it for. 
  4. Don’t just backup the DATA!   Many database systems have configurations and settings that are key to making the database work properly. Perhaps you have a custom configuration DB in your system that you need to make the application work.  In PG We would back up the PG conf file as well with our backup scripts.  This allowed us to restore the server if needed.  You don’t need to back it up as frequently as the data files but it should still get backed up.  
  5. Stop backing up to the server! Yes, I still see this at customer locations. It’s perfectly fine to write the backup to a local disk and THEN synchronize it to the cloud, s3, or tape drive(they still exist).  Just make sure you are doing that step and have alerts if that step fails.  If you go back to #1 in this post you should run your test restore from your cloud backups, not off the local disk.  If you only have backups on a local server and a local drive even if it’s in a data center do yourself a favor and get an s3 bucket and synchronize the backups to the bucket.  Recently I had a customer not be able to retrieve the data from the data center because the data center got hit with ransomware. They locked everything down and wouldn’t let anything out. If we had a backup in s3 or on the cloud we would have had a copy and could have kept moving forward and rebuilding. This almost put the company out of business but luckily we found a backup outside the data center that was a few months old.    

I have so many more stories I could tell about backups and restore!  I will keep this post relatively short and dedicate a future post related just to these items.  I hope this list will help you to build your backup strategy.  I would love to hear about issues you have had with backups and restores!

#PGSQLPhriday 001  Truth and Lies!  

This post will be related to #PGSQLPhriday!  Here are the rules/requirements for the post. 

“ For this first event, I want you to write a blog post that shares three of your go-to Postgres best practices, but write it in the form of the game Two truths and a lie.

For example, perhaps there are three configuration settings you always modify whenever setting up a new cluster. You could share why these settings are important to modify and how you find the correct values, but for one of them, use Ilya’s comical approach to teach us the correct way to set the value. (ie. “I never set shared_buffers to more than 128MB to conserve server resources.” )

You can read the full post from Ryan here.  https://www.softwareandbooz.com/pgsql-phriday-001-invite/

For my 3 tips I wanted to talk about logical replication and configuration.  I do a lot of work with DW/Reporting/replication.  I figured it would be good to talk about some of the key parameters you need to know when setting up logical replication.  I will keep this in the form of 2 truths and a lie, I’ll reveal at the bottom which one is which.  

wal_level = logical

Probably one of the most key things to set for logical replication is your wal_level. For logical replication it has to be set to Logical.  I typically have this set to “logical” for most of my servers.  I rarely have a server that does not participate in logical replication and it’s best to just set this up front. If you know that you won’t be doing any replication then something like minimal is a good idea but I personally don’t run into this scenario often.  You do have to restart the server for this setting. This is one of the reasons I keep it on my main checklist for a server to make sure it’s configured right when the server starts. 

max_logical_replication_workers = 1 

Keeping this at a low number means you are not utilizing as many server resources doing silly things like moving data!   You should keep this as low as possible on your subscriber,  this way you can automatically throttle your logical replication and it can always stay behind.  

max_replication_slots = 10+ 

The default for max_replication_slots is 10, I frequently find this is very low for the servers I’m trying to do logical replication for.  Since this also requires a restart I will frequently start this out at 25 so that I can avoid restarting simply to add new replications.  Along with this parameter you should always move “max_wal_senders”.  This should be the same number + any physical replication slots you have as well.  Make sure these are moved together.  

Let’s talk about Truth and Lies now!  Pretty sure most of you out there would have sensed the lie already but let me make it very clear.  

max_logical_replication_workers = 1 

Please do NOT make this a low number.   You Want this to be a higher number so that your logical replication can keep up.  If you have a LOT of publications/Subscriptions you should think about increasing this.  Be careful though this does take from “max_worker_processes”.  If that number isn’t higher as well then you will run into problems.  It’s hard to put specific good guidelines around this since it really depends on how much replication work you are doing and how many resources you have on the server. These are numbers to adjust and test and see what works for your system.  

Thank you to Ryan for starting up #PGSQLPhriday,  It was nice to get writing again and I am looking forward to more of them in the future! 

Community events  

Many years ago, I started attending conferences.  Mostly in the SQL Server space and related to the PASS community.  We built a family in that community and I’m proud to say I still know many of that family to this day. I’ve been helping to organize the PGConf NYC event for the past 2 years and I’m starting to see this community grow.   Last year was still right at the end of the Pandemic and we were still limited in some things.  This year it was nice to see Timescale do an after party and to see lots of people going off for drinks and dinner.   

When I got into my hotel room Thursday night I was super energized and focused on what I needed to do in the community to help this keep going.  I remember those feelings from the days at the summit in spending time talking to friends and mentors.  I made a quick list of things we need to do to hopefully grow this community and continue to increase the feeling of family.  

  1. Embrace the new people!  We have a lot of people that have been in this community for a long time and know all about Postgres.  But we are the fastest growing DB on the market and that means lots of new faces that need to learn about Postgres. We need more 101 sessions and more sessions that welcome them into the fold.  
  2. More community events,  We need more user groups and smaller events.  We need to start spreading the word and getting communities all over the world.  So that when they come together they already feel like family. 
  3. More sponsors focused on the community.   My hats off to Timescale for this event as they put on an after party for the community. I know that some of the other sponsors wanted to do this as well and I hope next year we can have several events to choose from.  I urge all the sponsors/vendors out there to find ways to connect with the community.  Not through the product, just connecting with people.  
  4. More Speakers. We have lots of amazing long time speakers that know PG extremely well.  We need to start getting more people in the position to speak so we can grow and get more diverse sessions.  If anyone is interested in learning to speak or interested in speaking please reach out to me. I’m happy to coach and work with you and help to get you started down that path.  It will forever change your career I promise.  

These were the key things I wrote down after the first day and things I intend to focus on in the next 6 months.  I would love to hear from others that might have suggestions for things to add to my list or for things that you would like us to do better for future events.