Structured Content 2022: Join our conference to explore fresh perspectives on content and digital experiences →
January 28, 2022

Multi-environment deployments

By Daniel Favand

When a project grows beyond a proof-of-concept, your team needs to be able to work on features in parallel and validate changes before launching. With Continuous Integration and Continuous Delivery (CI/CD) pipelines and Sanity's dataset management features, you can implement an automated development and deployment pipeline for your content service.

What are you trying to achieve?

Before building out an automated system, it's important to clarify the use cases you want to solve for. Here are some common goals we'll address in this article:

  • Developing and validating changes to the editor experience, independently
  • Testing bulk migrations and building integrations
  • Validating changes in data structure against various clients (web, mobile, signage, etc)

For the remainder of the guide we’ll be using the fictional company Candicorp as our scenario. Candicorp is an established manufacturer of delightful sweets with a dedicated fan base, and are ready to take the step into a content-first strategy by leveraging modern development strategies and the power of Sanity’s Content Lake!

  • Candicorp manages content in Sanity for use on its website, iOS app, Android app, product packaging, and in-store signage
  • Candicorp authors want to be able to use editorial workflows to draft and schedule content
  • Candicorp developers want to be able to test and validate new content structures and new application code and be able to work on new features in parallel
  • Candicorp will use GitHub Actions as a CI/CD pipeline

What services are we managing?

There are three separate components or services, with their own development lifecycles and requirements:

  • Sanity Studio. The studio is a single-page application (SPA) written in React that provides the user interface for your authors and editors. Being a single-page application, the studio can be hosted anywhere and can be developed following common Git branching strategies. The studio provides a window into your Content Lake, and the schemas you define in the studio determine what content an editor can see and change in the Content Lake.
  • Sanity Content Lake. The Content Lake is where your data is stored and managed for you by Sanity. Although production content (including drafts) should be kept in a "production" dataset, your project can contain multiple datasets for development and testing. You can create, clone, and remove datasets through the Sanity APIs. Although most projects will use the studio to author content, the Content Lake can store arbitrary JSON data from any source.
  • Your client applications. These may be websites, mobile apps, digital signage, email campaigns – anything that queries content in the Content Lake.

Protip

Practically, the Studio and Content Lake environments and lifecycles are usually linked.

Our initial Candicorp environments.

In our Candicorp example, we'll start out with a single production deployment of each service and client application, as shown in the diagram above. When creating a new Sanity project, we can create a dataset named production and deploy a build of the studio on Sanity's studio hosting service with the following Sanity CLI commands.

First, create we create a project named Candicorp with a production dataset:

sanity init --create-project "Candicorp" --dataset "production"

Second, build the studio and deploy it:

sanity deploy

Following the prompts of the command line interface, we’ll pick a suitable subdomain and end up with a studio hosted at <subdomain>.sanity.studio.

This is great for authors, who should generally be using the production studio to edit, review, and publish content in the production environment. But not great for the developers, who need a way to create and validate new features.

Studio and Dataset Environments

To give developers more flexibility to create and validate changes, we can create multiple environments for both the Studio and the Content Lake.

Since the Studio is a single-page application, you can have multiple versions and deployments. Each deployment can have a different configuration. And you can host these Studios on any infrastructure you have access to. (Although hosts that support branch-based workflows, like Netlify and Vercel, make this particularly easy.) Because each deployment can have different configurations, you can set a Studio deployment to connect to any dataset. In practice, it's useful to give the studio deployment and the dataset the same name to keep things organized.

Content Lake environments are called datasets. These are managed through the Sanity project management interface at sanity.io/manage, through the CLI, or through the Sanity API directly. The number of datasets available in a project depends on your plan and quotas.

Our Studio environments are based on git branches. Our datasets are named after git branches, but are all cloned from production.

Branch-based workflows

The conventions and tooling that are used for Git-based software development can be employed to manage the development of a Sanity project. In a branch-based workflow, features can be developed and validated in isolation. CI/CD platforms like GitHub Actions, GitLab Pipelines, Jenkins, and CircleCI make it easy to implement workflows that are triggered by the creation, deletion, and merging of Git branches.

Our Studio repository branches and Content Lake datasets can follow the same lifecycle.

In our Candicorp scenario, we will have a production studio and a production dataset. We will also have a staging studio and a staging dataset. (Note that this dataset is for testing structures and integrations, not for editing draft content.) Finally, when we create a new branch to develop a new feature in the studio, we will create a studio deployment and a dataset in a CI pipeline.

The developer will use the branch studio and dataset to test and validate their changes. They can share the deployment URL with other developers and the editorial team so that everyone involved can see and test the new feature.

When the work is complete, the studio branch will be merged back into the staging branch, and the related dataset deleted. When the team is ready, they can merge the staging branch into production and the new studio will be live.

Sanity Tools for CI/CD Pipelines

Studio

While Sanity provides hosting for a single version of your studio, you can also choose to deploy studio environments to an infrastructure of your choosing.

The easiest way to do this is to connect a studio repository to a service, like Netlify or Vercel, that will automatically build and deploy your studio for every branch and merge request. You can also use Kubernetes or other platforms.

For Candicorp, we'll use Netlify to build and deploy the Studio for each branch. We create a new site in Netlify, connect it to our Studio's GitHub repository, and turn on branch builds in the settings. We'll use this build command, which will set an environment variable for the dataset name based on the branch name in Netlify's build environment:

SANITY_STUDIO_API_DATASET=${HEAD##*/} sanity build

Content Lake

You can clone and delete datasets using our APIs and CLI directly. For example, you might clone your production dataset into a new dataset using the name of the branch in your studio repository. That way, your development environment for that branch automatically has the latest production data. Then, when the branch is merged, delete the branch dataset automatically, so it doesn't count against your quota.

For clients on an enterprise plan, the fastest and easiest way to clone a dataset is with our cloud clone functionality. If you don’t have access to cloud clone features exporting and reimporting your dataset manually will also work fine, albeit not as efficient in terms of time and resources in your CI/CD environment. Datasets can also be deleted via the CLI and API.

Protip

If a dataset already exists, you will have to delete it before cloning from production again.

Gotcha

Be sure not to delete your production dataset! Add a check for the dataset name in the script itself in case you accidentally trigger a job on your production dataset.

You may also want to automatically clone your production dataset into a staging dataset. You could do this on a cron job or whenever a branch is merged in, but make sure you have a way to trigger it manually, in case you need to quickly do testing in a staging environment.

For Candicorp, we will use the Sanity API in a GitHub action to create the branch dataset.

We'll create two GitHub workflows: one to create the dataset when a developer pushes to a branch, and another to delete the dataset when the branch is merged. We'll only run these on branches that start with feature/ to ensure we don't accidentally delete our production or staging dataset. There are a couple of steps to set this up:

  1. The workflows will need the Sanity project ID and an API token in order to call the Sanity APIs. We can find the project ID in the Sanity management console at manage.sanity.io, and we can create a token under the “API” section of the management console. It should have the “manage datasets” permission.
  2. We save both the project ID and the API token in the secrets configuration for our GitHub repository, using the names SANITY_PROJECT and SANITY_TOKEN.
  3. To create the workflow that will generate our dataset when we push a new feature branch to GitHub, we create a file, .github/workflows/prepare-dataset.yml, in our repository:
name: Prepare Sanity Dataset
on:
  push:
    branches:
      # only run on feature branches, excluding production and staging just to be safe
      - 'feature/**'
      - '!production'
      - '!main'
      - '!staging'
jobs:
  prepare-dataset:
    runs-on: ubuntu-latest
    env:
      # define SANITY_TOKEN and SANITY_PROJECT in GitHub secrets config
      SANITY_TOKEN: ${{ secrets.SANITY_TOKEN }}
      SANITY_PROJECT: ${{ secrets.SANITY_PROJECT }}
    steps:
      - name: Double-check branch name
        # branch "feature/production" would trigger this action
        # so we will cancel the job if that happens
        run: |
          if [ ${GITHUB_REF##*/} = "production" ]; then exit 1; fi
          if [ ${GITHUB_REF##*/} = "main" ]; then exit 1; fi
      - name: Delete Existing Dataset
        # ${GITHUB_REF##*/} extracts everything after the last "/" for the dataset name
        run: |
          curl --request DELETE \
          --url https://api.sanity.io/v2021-06-07/projects/$SANITY_PROJECT/datasets/${GITHUB_REF##*/} \
          --header 'Authorization: Bearer '$SANITY_TOKEN
      - name: Clone dataset from production
        run: |
          curl --request PUT \
          --url https://api.sanity.io/v2021-06-07/projects/$SANITY_PROJECT/datasets/production/copy \
          --header 'Authorization: Bearer '$SANITY_TOKEN \
          --header 'Content-Type: application/json' \
          --data '{
          "targetDataset": "'${GITHUB_REF##*/}'"
          }'

4. To handle deleting a dataset when a pull request is merged, we commit a second workflow file, .github/workflows/cleanup-dataset.yml, to our repository:

name: Delete Sanity Dataset
on:
  # the pull_request_target works in the context of the 
  # pull request base branch, so we can get the branch name
  pull_request:
    types:
      # runs when the PR is closed
      - closed
jobs:
  deploy:
    runs-on: ubuntu-latest
    env:
      # define SANITY_TOKEN and SANITY_PROJECT in GitHub secrets config
      SANITY_TOKEN: ${{ secrets.SANITY_TOKEN }}
      SANITY_PROJECT: ${{ secrets.SANITY_PROJECT }}
      # define the branch name differently here
    steps:
      - name: Double-check branch name
        # branch "feature/production" would trigger this action
        # so we will cancel the job if that happens
        run: |
          if [ ${GITHUB_HEAD_REF##*/} = "production" ]; then exit 1; fi
          if [ ${GITHUB_HEAD_REF##*/} = "main" ]; then exit 1; fi
          echo GITHUB_REF $GITHUB_REF
          echo GITHUB_REF_NAME $GITHUB_REF_NAME
          echo GITHUB_HEAD_REF $GITHUB_HEAD_REF
      - name: Delete Existing Dataset
        # ${GITHUB_REF##*/} extracts everything after the last "/" for the dataset name
        run: |
          curl --request DELETE \
          --url https://api.sanity.io/v2021-06-07/projects/$SANITY_PROJECT/datasets/${GITHUB_HEAD_REF##*/} \
          --header 'Authorization: Bearer '$SANITY_TOKEN

If we want to run these commands locally, we can use this CLI command in the context of a studio project. (The example GitHub action uses curl to call the API directly, so we don't have to install any dependencies.)

sanity dataset copy source-dataset-name target-dataset-name 

To delete a dataset from our local command line, we can use this command:

sanity dataset delete dataset-name

If you don't have access to enterprise plan cloud clone features, you can also manually export a dataset and reimport it, although this will take longer and require more resources in your environment:

sanity dataset export source-dataset-name source-dataset-name.tar.gz
sanity dataset create target-dataset-name --visibility public
sanity dataset import source-dataset-name.tar.gz target-dataset-name

Integrating with websites and other clients

Generally, editorial teams should draft, review, schedule, and publish content using the production dataset. When integrated with apps (web, mobile, email, etc.), content preview becomes a function of the production app. This allows editorial processes to move independently of app code releases.

When working on new features, however, it can be helpful to be able to change the mapping:

  • Using production content with staging code to test new code against existing content
  • Using staging content against production code to validate new structures with existing code
  • Using development content with development code to test both content structures and code

You can switch which dataset is used when you configure the Sanity client.

For Candicorp, we will build the website so that when an author is logged in, they can toggle between “live” and “draft” views on the site itself. When the development team is working on a new site feature, they'll point the local development website to pull from the production dataset, unless they're also working on a new content structure. In that case, they'll configure the local development website to pull from a different dataset that contains testing content for that feature.

Putting it all together

With Sanity's dataset management features and your CI/CD pipeline, you can automate the creation of development environments and reduce the friction of developing features as a team. You don't have to worry about working on proofs-of-concept in a shared environment and getting in the way of other users. But because the dataset is hosted in the Sanity Content Lake, you can also collaborate on new features very easily.

Our fictional Candicorp developers have a simple, streamlined way to create new environments, reducing the friction of making new features and deploying them. This makes it easier for the business to iterate on new ideas and confidently test new features.