destroytoday.com

Upgrading the production database

A couple weeks ago, I got an email from Heroku indicating that the database version I use for Cushion (Postgres 9.5) will no longer be supported on their system, so I would need to upgrade. I love building apps, I love running them, and I love chatting with users—I do not love devops. I’m incredibly risk-averse, so the thought of copying all of Cushion’s data to a new versioned database and making sure that it still works… that makes my insides liquify. (Sorry.) I’m also overly considerate, so the thought of having any downtime for Cushion’s users makes me feel terrible—even though it’s only a couple minutes on a Saturday.

Since the most recent version supported by Heroku was Postgres 12—a significant jump from 9.5—I planned to simply upgrade to 9.6 in order to minimize surprises. In the week between planning to upgrade and actually upgrading, however, Heroku had also deprecated 9.6. I didn’t realize this until I executed the command to upgrade and received an error, so I had to temporarily abort and prepare for the bigger jump. When setting up a local Postgres 12 database, I hit a snag with one of Cushion’s database migrations. This migration was actually related to a feature I decided to abandon three years ago.

There was a time when I considered pivoting Cushion towards a more “automated” system, where it would run on bank feeds and measure a user’s “financial cushion” from their actual bank accounts. In theory, it sounded perfect, but in practice—especially when I was only months away from running out money—I realized it was exponentially more stressful to visually see that I was running out of money. To shield others from ever experiencing that same stress, I determined that I should no longer pursue that direction.

Now that this abandoned feature was preventing me from migrating the database and testing Cushion with Postgres 12, I decided that I’d rather remove the migrations than fix them. If I want Cushion to be a “lifetime project”, I need to take the KonMari method to my code—does this abandoned feature that reminds me of my panic attack bring me joy? Hell no! Then I must thank it and let it go. I removed the migrations and any lingering references to that feature until all of Cushion’s tests passed. It felt good.

Once again, I was ready to upgrade the database. I wrote a document to list each step in the process, and took notes throughout.

  1. Scale down the workers to prevent running scheduled jobs using the database

  2. Create the new database with Postgres 12

  3. Enable maintenance mode for Cushion and the invoice server

  4. Copy the data from the old database to the new one

  5. Promote the new database

  6. Disable maintenance mode

  7. Scale up the workers

I felt more confident having the steps in front of me, but still hit a couple hiccups along the way. As soon as I enabled maintenance mode for Cushion, I realized that my custom maintenance mode page no longer existed from when I migrated Cushion’s marketing site. This resulted in a 404, which I definitely didn’t want users to see when they were expecting the app. I quickly disabled maintenance mode, authored a dead-simple “offline for maintenance” page, made a note of this misstep, and continued the upgrade.

Next, I copied the data to the new database, which I thought would be the most excruciating step, but it actually finished in less than a minute. I was shocked, but relieved. After verifying the new database, I promoted it, disabled maintenance mode, and scaled up the workers. Everything seemed perfectly fine, and I felt like I could finally take a deep breath. I continued to monitor everything until I got a message from a user indicating that they couldn’t download an invoice as a PDF.

At first, I was terrified, but I quickly reminded myself that I did everything by the books and verified the new database. I don’t need to panic—I simply need to troubleshoot what might be the issue. Almost instantly, I knew what was up. Because Cushion’s invoice server is separate from the main app, it relies on a separate set of environment variables, like the one pointing to the database. Promoting Cushion’s database updates its own environment variables, but because the invoice server is separate, it still pointed to the old database. This is why when I checked several invoices on the invoice server, they appeared as expected, but when the user attempting to download a new invoice, it 404’d. After rolling my eyes at myself, I pointed the invoice server to the new database, double-checked that the user could download their invoice, and added an extra step to my upgrade sequence.

I can’t tell you how relieved that this is all behind me. I’m the kind of person who gets anxiety from knowing something is lingering. If I could, I’d much rather tackle everything immediately, so it doesn’t weigh me down. For a production database upgrade, however, I’d much rather wait for a Saturday, when folks shouldn’t be working—only me.