Will Hopkins

Links and essays


Replicating Postgres

I finally finished my albatross project: replicating all our Postgres shards to Databricks. This is great news.

In broad strokes, we export our Postgres databases daily from the backup snapshot, then ingest those Parquet snapshots with Databricks. Previously, we had done a hacked-together variant of the same process using only AWS tools, but it was a bit tougher to stitch things together. While AWS tends to take a bit of a LEGO approach (a bunch of building blocks you can recombine into bigger structures) it can be hard to make things work how you need because they're so general.

The funny part is that it's much, MUCH faster than the AWS attempt. It's also quite cheap, except for exporting the snapshots. AWS makes you pay $0.01/GB of provisioned storage, so depending on how aggressively you pre-provision your storage it can cost a lot more than strictly necessary.

At any rate, the project was a success even though we tried just about every alternative before we got to this point. I'm looking forward to our post mortem on the project so the next one will be faster.