✨Discover storytelling in the AI age with Pixar's Matthew Luhn at Sanity Connect, May 8th—register now

Migrating text to block content in Sanity.io using a script.

9 replies
Last updated: Sep 2, 2021
Hey, I’ve got a bunch of documents (Pages) with arrays of blocks, which all have a common object (Content). The Content object has a
text
object which needs to be migrated to a
block
object, as one or more plain text paragraphs.
There aren’t that many that I couldn’t do it by hand, but I thought I should learn how. I’ve read this article:
https://www.sanity.io/docs/migrating-data , but I’m a bit stuck on how to work with blocks. It seems like it’d be really onerous. Does anyone have any pointers?
Sep 1, 2021, 8:40 PM
Hey (Removed Name)! Correct me if I'm wrong, but the text already exists inside those documents in your dataset, right? If so, you'll want to use mutations to create the block content from said text. I'll let you know here if I can find a specific example.
Sep 1, 2021, 8:51 PM
Yes, that’s exactly right. So each page is built out of a big array with items like:

{
  "_key": "d14fa57f4452",
  "_type": "contentWithList",
  "backgroundColour": {
    "title": "Dark Grey",
    "value": "#333f4c"
  },
  "content": {
    "_type": "titleTextCta",
    "content": "Do a bunch of stuff!/nAnd do a bunch more",
    "link": {
      "_type": "linkChoices",
      "link": "<mailto:client@example.com>",
      "linkStyle": "link",
      "linkTitle": "Make an enquiry"
    },
    "title": "We deliver outcomes",
    "titleType": "H2"
  },
  "listColour": "green",
  "listItems": [
    "content.",
    "some more content.",
    "More random stuff.",
  ]
}
They are of lots of different types, but (almost) all have a content object like the above item. So in this instance, I’d want to split the content on
/n
and create a new block per bit of content. But in lots of instances it’s just one block.
Sep 1, 2021, 9:06 PM
By the way, this is super unimportant, I’m sure you’ve got lots of more important things to get to. I have few enough records that I’m going to do this by hand. I was just curious.
Sep 1, 2021, 9:07 PM
Is the text you want to migrate always
content.content
?
Sep 2, 2021, 4:48 AM
Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
content
, and in that object is a field named
content
of type
text
.
1. You’ll need to install nanoid in your studio folder:
yarn add nanoid
or
npm install nanoid
, depending on your package manager of choice.2. You’ll want to change your schema type from
text
to block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click
Reset value
.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a
scripts
folder. If my assumptions were right about your naming conventions, you should only have to change the
TYPE
variable near the start, but if you want to consider all documents you can always change the filter in
fetchDocuments()
. You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. Often convention calls for a new paragraph after two new lines—if that’s the case, change
const paragraphs = doc.content.split('\n')
to
const paragraphs = doc.content.split('\n\n')
.4. Run the script with
sanity exec scripts/textToBlock.js --with-user-token

// scripts/textToBlock.js

/* eslint-disable no-console */
import { customAlphabet } from 'nanoid'
import sanityClient from 'part:@sanity/base/client'
const client = sanityClient.withConfig({ apiVersion: '2021-09-01' })

const nanoid = customAlphabet('0123456789abcdef', 12)

const TYPE = 'contentWithList' // document _type to consider

const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`)

const buildPatches = docs =>
  docs.map(doc => {
    const paragraphs = doc.content.split('\n')
    const output = paragraphs.map((paragraph) => ({
      _key: nanoid(),
      _type: 'block',
      markDefs: [],
      style: 'normal',
      children: [
        {
          _key: nanoid(),
          _type: 'span',
          marks: [],
          'text': paragraph,
        }
      ]
    }))

    return {
      id: doc._id,
      patch: {
        set: {
          content: {
            content: output,
          }
        },
        ifRevisionID: doc._rev,
      }
    }
  })

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})
Hopefully this works (on a non-production dataset
😉).
Sep 2, 2021, 5:19 AM
Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
content
, and in that object is a field named
content
of type
text
.
1. You’ll need to install nanoid in your studio folder:
yarn add nanoid
or
npm install nanoid
, depending on your package manager of choice.2. You’ll want to change your schema type from
text
to block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click
Reset value
.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a
scripts
folder. If my assumptions were right about your naming conventions, you should only have to change the
TYPE
variable near the start, but if you want to consider all documents you can always change the filter in
fetchDocuments()
. You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. If you'd rather set a new paragraph on two new lines, change
const paragraphs = doc.content.split('\n')
to
const paragraphs = doc.content.split('\n\n')
.4. Run the script with
sanity exec scripts/textToBlock.js --with-user-token

// scripts/textToBlock.js

/* eslint-disable no-console */
import { customAlphabet } from 'nanoid'
import sanityClient from 'part:@sanity/base/client'
const client = sanityClient.withConfig({ apiVersion: '2021-09-01' })

const nanoid = customAlphabet('0123456789abcdef', 12)

const TYPE = 'contentWithList' // document _type to consider

const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`)

const buildPatches = docs =>
  docs.map(doc => {
    const paragraphs = doc.content.split('\n')
    const output = paragraphs.map((paragraph) => ({
      _key: nanoid(),
      _type: 'block',
      markDefs: [],
      style: 'normal',
      children: [
        {
          _key: nanoid(),
          _type: 'span',
          marks: [],
          'text': paragraph,
        }
      ]
    }))

    return {
      id: doc._id,
      patch: {
        set: {
          "content.content": output,
        },
        ifRevisionID: doc._rev,
      }
    }
  })

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})
Hopefully this works (on a non-production dataset
😉).
Sep 2, 2021, 5:58 AM
Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
content
, and in that object is a field named
content
of type
text
.
1. You’ll need to install nanoid in your studio folder:
yarn add nanoid
or
npm install nanoid
, depending on your package manager of choice.2. You’ll want to change your schema type from
text
to block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click
Reset value
.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a
scripts
folder. If my assumptions were right about your naming conventions, you should only have to change the
TYPE
variable near the start, but if you want to consider all documents you can always change the filter in
fetchDocuments()
. You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. Often convention calls for a new paragraph after two new lines—if that’s the case, change
const paragraphs = doc.content.split('\n')
to
const paragraphs = doc.content.split('\n\n')
.4. Run the script with
sanity exec scripts/textToBlock.js --with-user-token

// scripts/textToBlock.js

/* eslint-disable no-console */
import { customAlphabet } from 'nanoid'
import sanityClient from 'part:@sanity/base/client'
const client = sanityClient.withConfig({ apiVersion: '2021-09-01' })

const nanoid = customAlphabet('0123456789abcdef', 12)

const TYPE = 'contentWithList' // document _type to consider

const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`)

const buildPatches = docs =>
  docs.map(doc => {
    const paragraphs = doc.content.split('\n')
    const output = paragraphs.map((paragraph) => ({
      _key: nanoid(),
      _type: 'block',
      markDefs: [],
      style: 'normal',
      children: [
        {
          _key: nanoid(),
          _type: 'span',
          marks: [],
          'text': paragraph,
        }
      ]
    }))

    return {
      id: doc._id,
      patch: {
        set: {
          "content.content": output,
        },
        ifRevisionID: doc._rev,
      }
    }
  })

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})
Hopefully this works (on a non-production dataset
😉).
Sep 2, 2021, 5:58 AM
Oh fab! Thank you (Removed Name)! So we’re okay to manually create the shape of the block content and just pass in our own IDs? That’s a lot simpler than I was expecting.
It’d be great to have a page in the docs talking about that, or including this kind of snippet. I’ve left a bit of feedback there along those lines and linking back to here.
Sep 2, 2021, 10:06 AM
Yes, you’ve nailed it. That’s part of the beauty of Portable Text is that it makes your content so malleable. There are a few requirements for your data to be well-formed, but nothing preventing you from building block content from a bunch of strings, as we’ve done here. I used nanoid to make the keys and followed the convention (I think) of 12 character hexadecimal, but I think you have quite a bit of freedom as long as they’re unique within the array.
I saw what I now know is your feedback.
🙂 Thank you for that. I agree that the more examples of this kind of thing, the better, and will work on a guide or update to the docs. Thanks (Removed Name)!
Sep 2, 2021, 8:39 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Was this answer helpful?