Converting string tags to reference IDs in Sanity schema using migration script

27 replies
Last updated: Jun 2, 2022
When I first set up my schema, I added tags in the form of strings - just giving a quick string array I could reference down the line. I'm now rethinking that and want to set up a separate tags schema. Is there an easy way to quickly convert my existing strings into references, or am I looking at a manual job to do that? 😂
AI Update

Good news - you don't have to do this manually! Sanity's migration toolkit is perfect for this exact scenario. Here's how to handle converting your string arrays to references:

The Approach

You'll use defineMigration to create a script that:

  1. Creates new tag documents from your unique strings
  2. Updates your content documents to reference those tags instead

Step-by-Step Process

1. Create your new tag schema first

// schemas/tag.ts
export default {
  name: 'tag',
  type: 'document',
  fields: [
    {
      name: 'title',
      type: 'string'
    }
  ]
}

2. Generate a migration

npx sanity@latest migration create convertTagsToReferences

3. Write the migration script

import {defineMigration, at, set, unset} from 'sanity/migrate'

export default defineMigration({
  title: 'Convert string tags to references',
  documentTypes: ['yourDocumentType'], // whatever type has the tags
  
  async *migrate(documents, context) {
    // First pass: collect all unique tag strings and create tag documents
    const uniqueTags = new Set<string>()
    
    for await (const doc of documents()) {
      if (doc.tags && Array.isArray(doc.tags)) {
        doc.tags.forEach(tag => uniqueTags.add(tag))
      }
    }
    
    // Create tag documents and build a mapping
    const tagMap = new Map<string, string>()
    for (const tagString of uniqueTags) {
      const tagId = `tag-${tagString.toLowerCase().replace(/\s+/g, '-')}`
      tagMap.set(tagString, tagId)
      
      yield {
        _type: 'tag',
        _id: tagId,
        title: tagString
      }
    }
    
    // Second pass: update documents to use references
    for await (const doc of documents()) {
      if (doc.tags && Array.isArray(doc.tags)) {
        const tagRefs = doc.tags.map(tagString => ({
          _type: 'reference',
          _ref: tagMap.get(tagString),
          _key: Math.random().toString(36).substr(2, 9)
        }))
        
        yield {
          id: doc._id,
          patch: {
            set: {
              tags: tagRefs
            }
          }
        }
      }
    }
  }
})

4. Test it first (dry run)

sanity migration run convertTagsToReferences

This shows you what will happen without making changes.

5. Run it for real

sanity migration run convertTagsToReferences --dataset <your-dataset>

Update Your Schema

Don't forget to update your original document schema to use references:

{
  name: 'tags',
  type: 'array',
  of: [{type: 'reference', to: [{type: 'tag'}]}]
}

Pro Tips

  • The migration tool handles batching automatically, so you won't hit rate limits
  • Always test with a dry run first
  • Consider backing up your dataset before running migrations on production
  • The migration is idempotent - you can run it multiple times safely if you structure it right

This approach saves you from hours of manual work and ensures consistency across all your documents. The migration toolkit is specifically designed for these kinds of schema refactors!

You could write a migration script. It‘s not overly trivial, but should be doable by fiddling a little bit with it.
You might want to back up your dataset before running your migration. 🙂
I'll give that a read through, thank you! 😄
user F
Where can I find additional data on the options available in the
patch
object (following the migrating-data link)? In some cases, my tags array has a couple of strings, so I'd need to be able to map through and create/link a reference, I suspect?
Right, that’s correct.
So I would do it like this: first query tag documents so you can map a string to a reference ID. Then do the migration. For each tag as a string, look up the reference ID in the map, and create an object like this:
{ _type: 'reference', _ref: 'id-of-the-document', _key: nanoid() }
okay, would it be sensible to make each of my categories manually in advance of trying to migrate in that case? At the minute, my "categories" is just an empty, freshly made schema.
Ah yes, absolutely.
I would definitely recommend that to make the migration simpler. 🙂
so just to get a clearer idea in my head here then:• find all documents to update
• find all categories and their IDs
• loop through each document and
set
the category using the matching tag string (can I do multiple of these in one patch, or would I need a patch per change?)
Right now, your category field is an array of strings or a string?
the existing one uses the following schema:
{
  title: "Tags",
  name: "tags",
  type: "tags",
  options: {
    closeMenuOnSelect: true,
    frozen: false,
  },
},

I'd be changing it to the following:
{
  title: "Categories",
  name: "Categories",
  type: "array",
  of: [
    {
      type: "reference",
      to: [{ type: "category" }],
    },
  ],
},
Ah yes, you’re using this plugin: https://github.com/pcbowers/sanity-plugin-tags
Essentially the same thing from a schema standpoint.
Right, so you create all your categories. Then in your migration script, you start by querying your categories to create a hash mapping their string to their Sanity ID. Then during your migration, for each document you patch, you map its array of categories to transform it into an array of reference objects.
and that map can be done within a single patch instance per document?
const buildPatches = docs =>
  docs.map(doc => ({
    id: doc._id,
    patch: {
      set: {
        Categories: doc.Categories.map(category => ({
          _type: 'reference',
          _ref: CATEGORIES_MAP[category],
          _key: nanoid()
        }))
      },
      ifRevisionID: doc._rev,
    },
  }))
I’d say something like this.
okay, let me get some categories set up, and I'll give that a try. Thank you!
Really, back up your dataset before.
is that
sanity dataset export
?
I believe so.
That worked perfectly, thanks for all your help!
Yay! Amazing. 😄

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?