Dec 14th - Learn how Tecovas roped in success with Sanity and Shopify 🤠

Migrating Data

Occasionally, your data structure just needs to change.

Adding new fields and data structures to your project are super easy, but sometimes you get it wrong and need to make changes. If it's early days and you don't have much data it's easy, you just make the required changes and fix a few fields by hand.

For projects with considerable amounts of information, this quickly becomes impractical though. You then need to write code to reshape your data. Sanity provides two workflows for doing this: Migrating using the API or using the CLI to export and import datasets.

Migrate using the API

This is really the only way to go about this if you have a live installation and want to avoid stopping the world for your editors.

The example below shows how you might go about renaming a field called name to fullname for the author document type. It works in batches of 100 and continues patching until no more documents are returned.

The code can also safely run while people are editing. If a document has been edited right after the document was fetched, the migration will fail but you can re-run it until it passes.

Migration scripts can be run with the sanity exec command. The example below uses the getCliClient method to fetch the necessary credentials from sanity.cli.ts which is automatically generated when first setting up a studio project and should be located in the project root. Read more about sanity.cli.ts (or .js) here.

import { getCliClient } from 'sanity/cli'

const client = getCliClient()

// Go to your project folder and run this script in your terminal with:
// `sanity exec migrations/renameField.js --with-user-token`
//
// This example shows how you may write a migration script that renames a field (name => fullname)
// on a specific document type (author).
// This will migrate documents in batches of 100 and continue patching until no more documents are
// returned from the query.
//
// This script can safely be run, even if documents are being concurrently modified by others.
// If a document gets modified in the time between fetch => submit patch, this script will fail,
// but can safely be re-run multiple times until it eventually runs out of documents to migrate.

// A few things to note:
// - This script will exit if any of the mutations fail due to a revision mismatch (which means the
//   document was edited between fetch => update)
// - The query must eventually return an empty set, or else this script will continue indefinitely

// Fetching documents that matches the precondition for the migration.
// NOTE: This query should eventually return an empty set of documents to mark the migration
// as complete
const fetchDocuments = () =>
  client.fetch(`*[_type == 'author' && defined(name)][0...100] {_id, _rev, name}`)

const buildPatches = docs =>
  docs.map(doc => ({
    id: doc._id,
    patch: {
      set: {fullname: doc.name},
      unset: ['name'],
      // this will cause the transaction to fail if the documents has been
      // modified since it was fetched.
      ifRevisionID: doc._rev
    }
  }))

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})

Export, reshape, import

As noted previously, this method requires that you stop the world for editors – at least for the data types that you are going to be re-importing.

There are three steps involved:

1. Export

sanity dataset export [DATASET] [DESTINATION]

Running sanity dataset export --help will give you an overview of the commands available as well as some examples. Note that you can run this command with --types if you only want to export certain document types. This command is run in the terminal and requires you to be in your project folder.

E.g., sanity dataset export production ./production.tar.gz will export the production dataset to a tar file called production.tar.gz in the current folder.

2. Reshape

Use whatever tooling you like to reshape your data. This could be done by running a find-replace to rename a _type, using a CLI command like ndjson-cli, or writing a small script.

3. Import

sanity dataset import [FILE | FOLDER | URL] [TARGET_DATASET]

E.g., sanity dataset import ../staging.tar.gz production will import the tar file staging.tar.gz in the parent directory into the production dataset.

If you encounter errors because a document ID already exists you can use the --replace flag to replace the existing document with the document being imported. Get a full list of import options and additional examples of commands by running sanity dataset import --help. This command is run in the terminal and requires you to be in your project folder. Refer to the importing data documentation for additional details.

Gotcha

Image assets are tied to datasets. Importing your data to a completely new dataset after reshaping is a good idea as you can verify that everything worked out as planned before switching over your front-ends. Make sure to bring your assets along for the ride if this is the workflow you have chosen; references will break unless assets are included in the import or the ndjson file includes absolute filenames.

Cleaning up data

Sometimes you want to clean up test content or the like. If you need to delete multiple documents, rather than submitting an API request for each _id you can provide a GROQ query. For example, say you want to delete all feature document types that have a viewCount that is less than 5. That could be done with an authenticated POST request to the Mutations API:

{
  "mutations": [
    {
      "delete": {
        "query": "*[_type == 'feature' && viewCount < 5]"
      }
    }
  ]
}

To learn more about this, please visit the documentation on delete mutations.

Deleting unused assets

If you are looking to clean up your assets, you can use a script to delete unused assets in your dataset.

Was this article helpful?