March 27, 2021

Sanity Backup Function with GitHub Actions and Artifacts

By J茅r么me Pott

When using Sanity, our content is stored safely and in multiple copies in Google Cloud. Thanks to the document history, we can restore our documents to a previous state. However, deleted documents and datasets cannot be recovered.

Even if those scenarios are unlikely to happen, it is worth creating a simple backup routine, just in case. And the method I'm going to show you here is easy to set up, won't cost you any money, and doesn't require you to register with a 3rd party service (I assume that all my readers have a GitHub account馃檭).

TL;DR

https://github.com/sanity-io/github-action-sanity#backup-routine

Ways to backup Sanity datasets

There are three ways to backup datasets:

  1. cURL request to an export URL endpoint
  2. Using the Sanity CLI
  3. Using the @sanity/export npm package

In an another blog post, I explained how to use the @sanity/export npm package inside a serverless function to back up content to Google Drive or Dropbox.

There's however an easier way: GitHub Actions (GA). Here are their advantages:

  • Backup files are stored alongside your studio code.
  • They only require a few lines of YAML config.
  • They support CRON jobs.
  • They are cheap (execution time + storage).
  • We can make use of the GitHub ecosystem (notifications for failed workflows, access management, etc.)

Going full onboard with GitHub Actions

There is a GitHub Action that wraps the Sanity CLI. Basically, it means that we can run sanity dataset export inside our GA workflow.
Before we can export the dataset, we need to generate a read token from the Sanity project dashboard and store it as a secret in the GitHub repository.

This is how the first workflow step looks like:

- name: Export dataset
  uses: sanity-io/github-action-sanity@v0.1-alpha
  env:
    SANITY_AUTH_TOKEN: ${{ secrets.SANITY_AUTH_TOKEN }}
  with:
    args: dataset export production backups/backup.tar.gz

Then we need to upload the generated backup file so that it will be available for download as a workflow artifact. For this, we use the upload-artifact action and we specify the same path as above: backups/backup.tar.gz.

By default, this step passes even if GitHub cannot find our generated backup file. That is why I recommend setting the if-no-files-foundoption to error.

And here's the details of the step:

- name: Upload backup.tar.gz
  uses: actions/upload-artifact@v2
  with:
    name: backup-tarball
    path: backups/backup.tar.gz
    # Fails the workflow if no files are found; defaults to 'warn'
    if-no-files-found: error

In addition to running the backup routine on a schedule, you also add an option to trigger the backup process manually from the GA dashboard. This can be useful in various situations, e.g. right after content editors added a large amount of data, or right before manipulating datasets.

Here's an example of a workflow triggered manually or by a CRON job:

on:
  schedule:
    # Runs at 04:00 UTC on the 1st and 17th of every month
    - cron: '0 4 */16 * *'
  workflow_dispatch:

Conclusion

We now have set a solid backup routine in place. You can of course tune the frequency of the backups to your needs. Make sure to also read the latest information about pricing, size limits and file retention from GitHub. For example, as of writing this, backup files are automatically deleted after 90 days on public repo. I personally think that 90 days is long enough, even too long maybe. If you want to keep backups files for a shorter time, you can do so in the repository settings under Actions.

Finally, if you would like to see the workflow described in this post along with the generated artifacts, you can visit this page: https://github.com/mornir/movies-studio/actions/workflows/main.yml

Sanity.io: Get the most out of your content

Sanity.io is a platform to build websites and applications. It comes with great APIs that let you treat content like data. Give your team exactly what they need to edit and publish their content with the customizable Sanity Studio. Get real-time collaboration out of the box. Sanity.io comes with a hosted datastore for JSON documents, query languages like GROQ and GraphQL, CDNs, on-demand asset transformations, presentation agnostic rich text, plugins, and much more.

Don't compromise on developer experience. Join thousands of developers and trusted companies and power your content with Sanity.io. Free to get started, pay-as-you-go on all plans.

Other guides by author