# Course: Refactoring content for migration
https://www.sanity.io/learn/course/refactoring-content

No matter where your content is, Sanity provides options to ensure its smooth transfer to the Content Lake. These are general strategies for developers migrating content.

---

## Navigation

**Track:** [Replatforming from a legacy CMS to a Content Operation System](https://www.sanity.io/learn/track/replatforming-to-sanity) · [View as markdown](https://www.sanity.io/learn/track/replatforming-to-sanity.md)

## Contents

1. [Introduction to content migrations](https://www.sanity.io/learn/course/refactoring-content/introduction-to-content-migrations) · [markdown](https://www.sanity.io/learn/course/refactoring-content/introduction-to-content-migrations.md)
2. [General migration principles](https://www.sanity.io/learn/course/refactoring-content/general-migration-principles) · [markdown](https://www.sanity.io/learn/course/refactoring-content/general-migration-principles.md)
3. [Content normalization](https://www.sanity.io/learn/course/refactoring-content/content-normalization) · [markdown](https://www.sanity.io/learn/course/refactoring-content/content-normalization.md)
4. [Deterministic and consistent IDs](https://www.sanity.io/learn/course/refactoring-content/deterministic-and-consistent-ids) · [markdown](https://www.sanity.io/learn/course/refactoring-content/deterministic-and-consistent-ids.md)
5. [Setting created and modified dates](https://www.sanity.io/learn/course/refactoring-content/setting-created-and-modified-dates) · [markdown](https://www.sanity.io/learn/course/refactoring-content/setting-created-and-modified-dates.md)
6. [Validating incoming content](https://www.sanity.io/learn/course/refactoring-content/validating-incoming-content) · [markdown](https://www.sanity.io/learn/course/refactoring-content/validating-incoming-content.md)
7. [Scripting content migrations](https://www.sanity.io/learn/course/refactoring-content/scripting-content-migrations) · [markdown](https://www.sanity.io/learn/course/refactoring-content/scripting-content-migrations.md)
8. [Uploading assets efficiently](https://www.sanity.io/learn/course/refactoring-content/uploading-assets-efficiently) · [markdown](https://www.sanity.io/learn/course/refactoring-content/uploading-assets-efficiently.md)
9. [Migrating to block content](https://www.sanity.io/learn/course/refactoring-content/migrating-to-block-content) · [markdown](https://www.sanity.io/learn/course/refactoring-content/migrating-to-block-content.md)
10. [Reducing SEO impact](https://www.sanity.io/learn/course/refactoring-content/reducing-seo-impact) · [markdown](https://www.sanity.io/learn/course/refactoring-content/reducing-seo-impact.md)
11. [Conclusion](https://www.sanity.io/learn/course/refactoring-content/conclusion) · [markdown](https://www.sanity.io/learn/course/refactoring-content/conclusion.md)

---

## Lesson 1: Introduction to content migrations
https://www.sanity.io/learn/course/refactoring-content/introduction-to-content-migrations

Gain the technical know-how to successfully migrate content to Sanity, adapting to unique project needs and confidently handling transactions and mutations.

This course is for developers migrating content from another platform into Sanity. It unpacks the different technical aspects and steps of the migration process and gives concrete examples of how to approach them.


Every re-platforming project looks different. You will need to adapt the examples to your specific use case, type of content, or any other constraints your existing stack has. But know that this course has been developed from experiences with many content migration projects, experienced firsthand and secondhand through our work onboarding customers. 


The following lessons refer to transactions and mutations: making changes to your content stored in a dataset in Content Lake with the API. The documentation for transactions and mutations is an excellent companion to this course:


> [!TIP]
> See [Transactions](https://www.sanity.io/learn/content-lake/transactions) in the documentation


> [!TIP]
> See [Patches](https://www.sanity.io/learn/content-lake/http-patches) via the HTTP API


Note that this course is mostly theoretical, and the tasks will prompt you/your team to think and plan for an upcoming migration project. It's up to you to capture these thoughts in a document and/or as issues in a project management system. This course can also be run as a workshop.


---

## Lesson 2: General migration principles
https://www.sanity.io/learn/course/refactoring-content/general-migration-principles

A developer guide to content migration covering idempotent scripts with incremental complexity and considered error handling.

Let’s start with some high-level principles for engineering successful content migrations:


- Idempotent migration scripts

- Incremental complexity

- Graceful error handling

- Create/update your schema concurrently


## Write idempotent migration scripts


When you write a program or script that can be re-run multiple times with the same result, it is called "idempotent." In our experience, content migration scripts usually have to be run multiple times throughout a re-platforming project. 


You want idempotency because you want to avoid situations in which your script may recreate new versions of the same existing records, pages, or documents with each iteration. A key element of achieving idempotency is to have deterministic, that is, stable, IDs for your content records. This makes it possible to either skip (or intentionally rewrite) documents when you run your script(s).


> [!TIP]
> Jump to [Scripting content migrations](https://www.sanity.io/learn/course/refactoring-content/scripting-content-migrations) for details on available methods to write content to Sanity at scale.


- [ ] Figure out what you can use as a source for generating stable record IDs in your legacy system


## Increase complexity incrementally


Most re-platforming projects and content migrations have a lot of "unknown unknowns." So expect a degree of trial and error and learning by doing. 


Another benefit to deterministic IDs and idempotent scripts is that you are free from the pressure of getting everything right in one script execution. Your first migration might only stage the documents' `_id` and `_type` fields. The next can add the `slug`. The next can add the `title`. And so on. 


- [ ] Figure out what content in your project would be the simplest to start with


Incrementally building out documents, coupled with real-time feedback from the Sanity Studio updating with incoming changes, makes for a satisfying—perhaps even addictive—feedback loop.


## Handle failures gracefully


Your script will fail! But that's to be expected. You probably aren't dealing with a perfect, consistent, and predictable corpus of structured content (yet 😉). So let’s prepare for things to fail from the outset and handle those errors gracefully to save you time, effort, and frustration.


Your legacy content source is likely unreliable, and content rot will set in over time. Your existing API might return records for an image’s metadata, but the image binary is missing from the filesystem. Your source likely does not have referential integrity between documents (if it has ways to express such relationships at all!), so there may be broken or missing cross-references to taxonomies or authors.


Every retrieval for a record or an asset must handle a response where the content is corrupted, misshaped, or references missing content. Your migration script must handle these cases by ignoring or recreating that content.


Consider the control flow of your migration script. Simply logging errors to the console might not help. You may wish to `throw` on errors so that scripts will not proceed with importing invalid data.


- [ ] As you know your legacy content system, what failures can you already anticipate during migration?


## Create/update your schema concurrently


You might already have completed the [Hello, Structured Content](https://www.sanity.io/learn/course/hello-structured-content) course and have an idea for the content model. Maybe you have even started to configure it. What's good to know is that the Sanity Studio schema is decoupled from Content Lake. This has the following implications for your migration projects:


- Since the Content Lake is “schemaless,” you can create new documents with any structure.

- You can configure the Sanity Studio schema after the fact to match the structure of your documents in the Content Lake.

- You *don't* get schema validation in the Content Lake, so you must solve this in the migration scripts and the CLI tooling for bulk validating documents against the Studio schema.


It’s best to enter a migration project with a reasonably planned-out content model based on the ideas of structured content but leave room for changes based on what you learn when you‘re getting hands-on with the migration process. 


- [ ] Plan and configure your foundational content model in your Sanity Studio


---

## Lesson 3: Content normalization
https://www.sanity.io/learn/course/refactoring-content/content-normalization

Migrating is an opportunity not only to move your content to Sanity, but your content strategy to structured content.

As mentioned in the [Re-platforming to Sanity](https://www.sanity.io/learn/course/re-platforming-to-sanity) course, migrating your content to Sanity is a golden opportunity to mature your content model and bring structure to increase the reusability of your legacy content. In data and database parlance, this is akin to *data normalization*. In fact, it is precisely that.


There are some typical examples where some content normalization can be rewarding:


- **Translating "page templates" into a structured content model**, sometimes splitting content for a template into separate dedicated types.

- **Stripping out HTML of string values** or converting it to Sanity’s presentation agnostic Portable Text format.

- **Keeping only the best resolution of duplicated images** (because some CMSes require you to upload different resolutions of the same image)

- [ ] Identify opportunities for content normalization in the content you are to migrate.


## Example: Page templates into content types


Your existing website-centric CMS likely has stored what would be considered structured content into web pages. Resulting in content that looks like this:


```json
{
  "type": "page",
  "id": 4014,
  "template": "staff-profile-page",
  "title": "Emkay Petersen"
}
```

This content is not a web `page`. It's a `person`! Storing this as structured content doesn't prevent it from being queried into a web page. This same document would be better remodeled into a Sanity document like this:


```json
{
  "_type": "person",
  "_id": "person-4014",
  "name": "Emkay Petersen"
}
```

Note in this example that the change from `title` to `name` is subtle but meaningful!


You can learn more about structured content modeling in the [Hello, Structured Content](https://www.sanity.io/learn/course/hello-structured-content) course.


## Example: Getting rid of HTML


Don‘t get us wrong: We love HTML! But it works best as a rendering language in a browser and not as the storage format for your content. The same goes for HTML-like formats like Markdown, MDX, etc.


Depending on the content, you might want to get rid of HTML altogether; typically, your old web-centric CMS has allowed for rich text editing in fields that you might want to keep to plain text so that you can have control over the rendering wherever you need to display this content:


```json
{
  "type": "post",
  "id": 4014,
  "title": "<em>Disarm <span style=\"font-color: red\">you</span> with a <strong>smile</strong>.</em>"
}
```

By including an HTML stripping out step in your migration script, you can get this clean content:


```json
{
  "_type": "post",
  "_id": "post-4014",
  "title": "Disarm you with a smile."
}
```

For cases where you want to keep the information about the semantic rich-text formatting, embeds, and such, go to the [Migrating to block content](https://www.sanity.io/learn/course/refactoring-content/migrating-to-block-content) course.


- [ ] Think about how you want to deal with HTML content in your migration script(s)


---

## Lesson 4: Deterministic and consistent IDs
https://www.sanity.io/learn/course/refactoring-content/deterministic-and-consistent-ids

Reusing existing values from your content source helps prevent duplicate data and optimistically set strong references.

In the Content Lake, document IDs (stored as the attribute `_id`) can be any unique string value to the dataset. Good to know is that Sanity Studio will automatically generate one using the [Universally Unique ID (UUID) specification](https://en.wikipedia.org/wiki/Universally_unique_identifier), while Content Lake uses [the NATS Unique ID](https://github.com/nats-io/nuid) algorithm to automatically create document IDs.


However, when *migrating* content into the Content Lake, it is preferable to reuse a unique value from the source content. This helps with your script’s idempotency and allows you to construct references during migration without the need to query the dataset in advance.


Imagine your existing data source has documents like this:


```json
[ 
  {
    "type": "post",
    "id": 234,
    "authors": [123]
  },
  {
    "type": "user",
    "id": 123,
  },  
]
```

You could convert this into Sanity documents with deterministic IDs and optimistic references like this:


```json
[ 
  {
    "_type": "post",
    "_id": "post-234",
    "authors": [{"_ref": "author-123", "_type": "reference"}] 
  },
  {
    "type": "author",
    "_id": "author-123",
  },  
]
```

To successfully write a new document that contains a strong reference, that referenced document must exist in the dataset or within the same mutation.


> [!TIP]
> [Learn more about how references work in the Content Lake](https://www.sanity.io/learn/docs/studio/connected-content)


An added benefit to predetermining these IDs is that if the incoming data were separated—users and posts—you could create all of the “author” documents in one pass. Then, all “posts” in the next, and the references should be written successfully.


As mentioned, these IDs can be any string value. Still, a pattern we have often seen, which is easy to reason about, is to indicate the document content type and then use whatever unique identifier you can get or generate from the document. Here are some patterns and examples that you can use as inspiration:


- `contentType-recordID` 👉 `post-234`

- `uniqueSlug` 👉 `hello-world`

- `contentType-slug-publishDate` 👉 `post-hello-world-2010-10-11`

> [!TIP]
> [Learn more about how IDs in Content Lake work ](https://www.sanity.io/learn/docs/content-lake/ids)


You might want your document IDs to follow the UUID pattern but be deterministically generated from a string. Different packages on npm can do this for you:


```typescript
import getUuid  from 'uuid-by-string';

const uuidId = getUuid(record.uniqueSlug);
// d3486ae9-136e-5856-bc42-212385ea7970
```

The minor drawback with this approach is that you must pass the values through this function when creating references in other documents.


- [ ] Review the different content types in your legacy CMS and think about their ID scheme. Are there unique values you can use?


---

## Lesson 5: Setting created and modified dates
https://www.sanity.io/learn/course/refactoring-content/setting-created-and-modified-dates

While the Content Lake stores date time values for document operations, it may be better to write your own for editorial purposes.

Your existing content source likely contains dates and times for when documents were created and last modified. They may look something like this in a record:


```json
{
  "type": "post",
  "id": 234,
  "created": "2013-11-05T13:25:02Z",
  "modified": "2015-11-05T04:12:23Z"
}
```

The Content Lake also stores information about when documents were created and modified. 


- You can set the `_createdAt` attribute the first time you create a document, but it cannot be modified in subsequent mutations (without deleting and recreating the document). 

- Whenever a document changes, the Content Lake writes to the `_updatedAt` attribute, which reflects when the *mutation* was completed.


Both _`createdAt` and `_updatedAt` are considered “automatic” values and not "editorial" values; that is, these attributes can‘t be overridden by a content creator using Sanity Studio.


For example, you may want to use a “last modified” date to display when a document was last significantly updated, but the `_updatedAt` field will automatically update every time a change is made.


So, if a “last modified” date is essential for editorial reasons, it is best to create a [`datetime`](https://www.sanity.io/docs/datetime-type) field in the document schema definition and populate it during migration.


```json
{
  "_type": "post",
  "_id": "post-234",
  "title": "Today is the greatest",
  "created": "2013-11-05T13:25:02Z",
  "modified": "2015-11-05T04:12:23Z"
}
```

With the above data sent in a `create` or `createOrReplace` mutation, the `_createdAt` and `_updatedAt` fields will be populated in the response. 


You can import and create documents with any data independently of your Sanity Studio schema, but we include the corresponding schema definition for educational purposes. Note that the  `_createdAt` and `_updatedAt` attributes are **not** part of the schema definition:


```typescript:postType.ts
import { defineType, defineField } from 'sanity'

export const post = defineType({
  name: 'post',
  type: 'document',
  title: 'Post',
  fields: [
    defineField({
      name: 'title',
      type: 'string',
      title: 'Post title',
    }),
    definedField({
      name: 'created',
      type: 'datetime',
      title: 'Post creation date',
      description: 'The editorial creation date for this post',
    }),
    definedField({
      name: 'modified',
      type: 'datetime',
      title: 'Manual modified date',
      description: 'The editorial modified date for this post',
    }),
  ]
})
```

- [ ] Identify if you have editorial date/datetime fields that need to be editable by content teams in the studio.


---

## Lesson 6: Validating incoming content
https://www.sanity.io/learn/course/refactoring-content/validating-incoming-content

Never trust your existing content source. Validate all data during a migration to avoid future headaches.

The formatting of your existing content may be problematic or insufficient. Examples include:


- You may need to escape HTML entities or process Markdown formatting.

- You may want to trim strings to remove whitespace.

- Integers may need to be converted to strings – or vice-versa.


## Use TypeScript


TypeScript lets you add data type definitions to JavaScript. It's beneficial for migration projects because you probably want to go from messy and idiosyncratic content (we assume!) to tidy and structured content.


### Sanity TypeGen


If you have started configuring your content model for Sanity Studio, you can use Sanity TypeGen to generate types for it and use those types for the output in your migration scripts.


> [!TIP]
> Check out the [Generating types](https://www.sanity.io/learn/course/day-one-with-sanity-studio/generating-types) lesson.


Example of using generated types in a migration script:


```typescript
import { Post } from './sanity.types' // From Sanity TypeGen

export default defineMigration({
  title: 'Import WP JSON data',

  async *migrate(documents) {
    const wpType = "posts";
    let page = 1;
    let hasMore = true;

    while (hasMore) {
      try {
        const wpData = await wpDataTypeFetch(wpType, page);

        if (Array.isArray(wpData) && wpData.length) {
          for (const wpDoc of wpData) {
            const doc: Post = {
              _id: `post-${wpDoc.id}`,
              _type: "post",
              // Add other required fields here based on wpDoc structure
            };

            yield createOrReplace(doc);
          }
          page++;
        } else {
          hasMore = false;
        }
      } catch (error) {
        console.error(`Error fetching data for page ${page}:`, error);
        hasMore = false; // Stop the loop in case of an error
      }
    }
  },
});
```

### Runtime validation


You can also consider a runtime validation library such as Zod to validate, catch, transform or throw errors for any unexpected problems with incoming data.


> [!TIP]
> [Zod is a popular library](https://zod.dev/) for validating content at run time.


## Validate post-migration data


Your Sanity Studio schema types should also include validation rules on all content. This way, as your content is migrated into the Content Lake, you can validate all new documents against these rules from the terminal by running:


```sh
npx sanity@latest documents validate
```

> [!TIP]
> See the documentation about [Validation](https://www.sanity.io/learn/studio/validation) on schema types and [validating all documents with the CLI](https://www.sanity.io/docs/documents#0ee72ece4609).


---

## Lesson 7: Scripting content migrations
https://www.sanity.io/learn/course/refactoring-content/scripting-content-migrations

Sanity's API-first design allows you to write content – even in huge volumes – however you prefer. The CLI Migration tooling offers several conveniences that make it a great fit.

There are different ways to run content migrations scripts with the Sanity CLI:


- Recommended: Create and run migration script with `sanity migration`

- Executing custom scripts with `sanity exec --with-user-token`

- Generating an NDJSON file and importing it with `sanity dataset import`


## Using the migration tooling


The Sanity CLI contains helpful migration tooling. The primary use case is for schema and content migrations of documents in a dataset, like changing a field name or turning a string into an array of strings. However, they can also retrieve content from an external data source and write new documents.


> [!TIP]
> Take the [Handling schema changes confidently](https://www.sanity.io/learn/course/handling-schema-changes-confidently) course for a more thorough introduction.


> [!TIP]
> See the documentation about [Migrations CLI command reference](https://www.sanity.io/learn/cli-reference/cli-migrations) to learn what it can do.


The benefits of using the migration tooling include:


- Less scaffolding and great abstractions for creating and changing documents in the Content Lake

- Automatically batching mutations into transactions to avoid hitting rate limits.

- Dry-run by default with visual feedback

- Validate your migrated documents against a Sanity Studio schema


You can **create** a new migration script by running the following from the command line:


```sh
npx sanity@latest migration create
```

The following is a highly simplified example of a migration script that retrieves content from an API and continues to paginate until no more results are returned.


> [!TIP]
> This example comes from the [Migrating content from WordPress to Sanity](https://www.sanity.io/learn/course/migrating-content-from-wordpress-to-sanity) course.


```typescript:/migrations/moving-from-wp/posts.ts
import { SanityDocumentLike} from 'sanity'
import { createOrReplace } from 'sanity/migration'
import { wpDataTypeFetch} from '../migrationUtils'

export default defineMigration({
  title: 'Import WP JSON data',
  
  async *migrate(documents) {
    const wpType = "posts";
    let page = 1;
    let hasMore = true;

    while (hasMore) {
      try {
        const wpData = await wpDataTypeFetch(wpType, page);

        if (Array.isArray(wpData) && wpData.length) {
          for (const wpDoc of wpData) {
            const doc: SanityDocumentLike = {
              _id: `post-${wpDoc.id}`,
              _type: "post",
              // Add other required fields here based on wpDoc structure
            };

            yield createOrReplace(doc);
          }
          page++;
        } else {
          hasMore = false;
        }
      } catch (error) {
        console.error(`Error fetching data for page ${page}:`, error);
        hasMore = false; // Stop the loop in case of an error
      }
    }
  },
});
```

Running this script with the Sanity CLI migration tooling will build a series of mutations based on content returned from `wpDataTypeFetch` and, when necessary, automatically batch them into transactions.


This script is the basic building block of using migration scripts to write new documents from an external source – it’s up to you to query your data source, add validation, extra attributes, error handling, and more.


- [ ] Adapt this script to make a simple migration from your CMS 


### Adding asynchronous actions


By default, each individual document mutation is "staged" into the transaction sequentially—one at a time. This is due to the [generator/iterator](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*) pattern the migration tooling uses. So, if you add an asynchronous function call – such as a fetch for an image upload or external document – it can slow down the migration. 


This can be overcome by yielding an array of mutations instead of a single mutation. This is preferable when including image uploads in your migration script. Keep in mind that uploading an image creates [an asset metadata document](https://www.sanity.io/learn/docs/apis-and-sdks/image-metadata) in the Content Lake, so be careful to avoid rate limits by throttling the number of concurrent operations while creating your array of mutations.


An example of this is shown in the [Migrating content from WordPress to Sanity](https://www.sanity.io/learn/course/migrating-content-from-wordpress-to-sanity) course.


## Alternatively, DIY a CLI script


Custom scripts that use your Sanity Studio’s CLI configuration – `sanity.cli.ts` – and your terminal’s authenticated session (`npx sanity@latest login`) can be run with the following:


```sh
npx sanity@latest exec ./path-to-your-script --with-user-token
```

This may be beneficial for more complex data structures or use cases where you want more low-level control. In that instance, you would also need to be careful to avoid rate limits by throttling the number of mutations concurrently sent to the Content Lake.


> [!TIP]
> [Look into technical limits to learn more](https://www.sanity.io/learn/docs/content-lake/technical-limits).


The script below is the same basic example as above but uses the CLI client to create a single transaction. 


Note that this single transaction could become too large and, when committed, be rejected by the Content Lake. 


Also, running it from the command line will *immediately* write this content to the dataset, with the only visual feedback being the included console logs.


```typescript
import {getCliClient} from 'sanity/cli'

const client = getCliClient()

async function importData() {
  const transaction = client.transaction()
  const wpType = 'posts'
  let page = 1
  let hasMore = true

  while (hasMore) {
    try {
      const wpData = await wpDataTypeFetch(wpType, page)

      if (Array.isArray(wpData) && wpData.length) {
        for (const wpDoc of wpData) {
          const doc: SanityDocumentLike = {
            _id: `post-${wpDoc.id}`,
            _type: 'post',
            // Add other required fields here based on wpDoc structure
          }

          transaction.createOrReplace(doc)
        }
        page++
      } else {
        hasMore = false
      }
    } catch (error) {
      console.error(`Error fetching data for page ${page}:`, error)
      hasMore = false // Stop the loop in case of an error
    }
  }

  try {
    await transaction.commit()
    console.log('Data imported successfully')
  } catch (error) {
    console.error('Error committing transaction:', error)
  }
}

importData()
```

### Hosting a migration script on a server


The script above might also be modified to use a configured Sanity Client with a token if you plan to have a cloud-hosted migration script that external tools can access. 


For continuous migrations or regular content imports from an external source—like a podcast episode feed, product stock levels, or property availability—this may be preferable to performing migrations locally.


## Or, write an NDJSON import file


Sanity CLI includes tooling to export an entire dataset—content and images—to a single file. The text content is stored in NDJSON format, that is, newline delineated JSON:


```json:production.ndjson
{ "_type": "post", "_id": "post-435", "title": "A Model for Reality" }
{ "_type": "post", "_id": "post-436", "title": "Halo [Breathe]" }
{ "_type": "post", "_id": "post-437", "title": "Smiling Through The Pain" }
```

In some instances, you may prefer to write a content migration script which creates a file in this format, and then use the CLI to import it all in one go.


This might be necessary if your legacy CMS doesn't have (good) export APIs, or if you need to lift content out of a database directly. In some cases, if you have a lot of content, it might also be more efficient by letting you use programing languages that are faster.


This requires more manual steps, and does not account for a "dry run," but has the benefit of being able to upload images as part of the import process. 


The source of images can be marked as from a URL or file path, which will be fetched and uploaded as part of the import process.


> [!TIP]
> See `sanity dataset import` [in the documentation](https://www.sanity.io/docs/dataset#9c9aab5198aa) for more details about importing text and images from NDJSON.


---

## Lesson 8: Uploading assets efficiently
https://www.sanity.io/learn/course/refactoring-content/uploading-assets-efficiently

Effortlessly manage and transform high-resolution images with Sanity's asset pipeline, avoid unnecessary uploads, and optimize content migration with metadata and in-memory cache. 

Sanity comes with a capable asset pipeline that allows the content team to upload one high-resolution image and developers to transform it on-demand to whatever size and format they need. Gone are the days when content teams had to upload or manage different-sized duplicates of the same image!


Sanity will also extract metadata from an image, which can be used to tailor the presentation or query for image assets using GROQ. So, moving your images into Sanity has many upsides.


## Upload images with Sanity Client


Migrating an asset into Sanity is made convenient with the `client.assets.upload()` method in the JavaScript client. If all you have is a URL to the image, this is the minimum amount of code required to upload it in a Node script:


```typescript
import {Readable} from 'node:stream'
import type {UploadClientConfig} from '@sanity/client'

async function uploadImage(url: string, metadata: UploadClientConfig) {
  const {body} = await fetch(url)

  return client.assets.upload('image', Readable.fromWeb(body), metadata)
}
```

Once again, remember that the image might not exist, and the URL could be broken. Improve the code above by assuming nothing about the response to your fetch!


Helpfully, images uploaded to the Content Lake are given deterministic IDs based on the image itself. Uploading the same image binary multiple times will always result in the same ID and will not create duplicate documents.


### Avoiding unnecessary uploads


However, uploading the same image *every time* you run your migration script is not ideal. It’s slow and unnecessary. This can be countered by taking metadata from your data source and saving it on the asset documents created in your dataset for every uploaded file. There’s a dedicated `source` key on asset documents that we can use.


For example, your existing image may have a record like this:


```json
{
  "type": "image",
  "id": 647,
  "url": "http://www.example.com/image.jpg",
}
```

You could now call the function above using this metadata to write the “source” of the image when uploading.


```typescript
uploadImage(
  doc.url, 
  {
    source: {
      name: "Legacy CMS",
	  id: doc.id,
	  url: doc.url
    }
  }
)
```

Now, every image that is uploaded contains queryable metadata with values that match your existing data source.


So, instead of constantly uploading every image, you could query the dataset at the beginning of your migration script to create an “in-memory cache” of all existing images.


```typescript
type ExistingImage = {_id: string; sourceId: number}

const query = '*[
  _type == "sanity.imageAsset" 
  && defined(source.id)
]{
  _id, 
  "sourceId": source.id
}'

const existingImages = await client.fetch<ExistingImage[]>(query)
```

Now, during your migration script, it’s easier to check if the image already exists in the dataset by looking for its source. If found, reference its `_id` in the dataset. If not, upload it!


### Reference images in documents


Once an image is uploaded, you only need to have its `_id` field to set the reference:


```json
// 1. Query your existing data source:
{
  "type": "post",
  "id": 4986,
  "featuredMedia": 104,
}

// 2. Upload image and get its Sanity-generated _id from the response

// 3. createOrReplace your new Sanity document:
{
  "_type": "post"
  "_id": "post-4986",
  "featuredMedia": {
    "_type": "image",
    "asset": {
      "_ref": "image-b7e1c5136d3b935ebed18298bead5fa1cda2785e-946x473-jpg",
      "_type": "reference"
    }
  },
}
```

### Avoiding rate limits


When uploading many assets, it is important to limit the number of concurrent uploads to prevent hitting the rate limits. A popular method to mitigate this is the use of a library like [p-limit](https://www.npmjs.com/package/p-limit).


It will allow you to prepare any number of asset uploads in advance, but then control how many are performed concurrently.


> [!TIP]
> See [Technical limits](https://www.sanity.io/learn/content-lake/technical-limits) for more details about rate limits during mutations


> [!TIP]
> The lesson [Uploading assets performantly](https://www.sanity.io/learn/course/migrating-content-from-wordpress-to-sanity/uploading-assets-performantly) demonstrates how to do everything in this lesson – including throttled concurrent uploads – within the context of a WordPress migration script.


### Consider if you need to upload everything


There are benefits to hosting images within the Content Lake, but there may be instances where you have huge volumes of images with a high turnover already stored on a third-party CDN with stable URLs. Examples include real estate websites where listing data and images typically come from external tools.


In this instance, it may be best to write the URL to an image as a string on a document and still serve the image from its original CDN without actually uploading the images to the Content Lake.


---

## Lesson 9: Migrating to block content
https://www.sanity.io/learn/course/refactoring-content/migrating-to-block-content

Convert HTML to presentation-agnostic Portable Text, even handling complex block content from WordPress' Gutenberg editor.

Portable Text is a presentation-agnostic open-source format for block content, that is, when you have a mix of rich-text paragraphs and specialized content blocks, like images, videos, call-to-action objects, etc. Portable Text also lets you define editable object data inline or as text annotations. Portable Text can be rendered in different ways, and for most front end frameworks, it's a matter of natively mapping its data to components' props (instead of awkwardly injecting HTML).


The Portable Text Editor makes editing block content in Sanity Studio simple and relieves content teams of learning specialized syntax, custom tags, or dealing with HTML embed code.


That said, scripting HTML content into block content requires a little more finesse, but it will be worth it!


Migrating from HTML-formated rich text and block content will be more difficult the more presentation-focused your source content is. For example, if you’re migrating from WordPress and have content stored in the Classic Editor, converting basic HTML into rich text and block content should be reasonably straightforward.


However, if you’re using WordPress’s block editor (aka Gutenberg), your documents likely have complex HTML structures, which will take more effort to recreate and refactor into Portable Text. 


### Migrating to Portable Text with Block Tools


Fortunately, you can use [@portabletext/block-tools](https://github.com/portabletext/editor/tree/main/packages/block-tools) to simplify the deserialization of an HTML string to Portable Text and can leverage schema types from your Sanity Studio. The [readme](https://github.com/portabletext/to-html/) provides simplified examples of migrating an HTML string to block content.


This tool will handle the basics of rich text formatting, such as headings, paragraphs, and lists, without configuration. However, more complex objects, like images, must be parsed from the HTML and turned into block content. Block tools exposes the incoming HTML through the [HTML Node API](https://developer.mozilla.org/en-US/docs/Web/API/Node) (not to be confused with Node.js), which lets you access elements as JavaScript objects. 


Below is a simplified example of intercepting a `<figure>` element in the HTML, retrieving the URL and alt text from an `<img>` tag inside, and create a new block with its URL and alt text.


- [ ] Install JSDOM: `npm install -D jsdom`

- [ ] Install Block Tools: `npm install -D @portabletext/block-tools`


The deserialize function is synchronous, so you must post-process these blocks to upload any images found in the content. 


Again, optimize your upload script by leveraging an in-memory cache to avoid re-uploading the same image every time the migration script is run. Also, rate limits can be avoided by throttling the number of concurrent uploads in a parallelized operation.


```typescript
import {JSDOM} from 'jsdom'
import {htmlToBlocks} from '@portabletext/block-tools'

export async function htmlToBlockContent(html: string) {
  let blocks = htmlToBlocks(html, blockContentSchema, {
    parseHtml: (html) => new JSDOM(html).window.document,
    rules: [
      {
        deserialize(node, next, block) {
          const el = node as HTMLElement

          if (node.nodeName.toLowerCase() === 'figure') {
            const img = el.querySelector('img')
            const imgSrc = img?.getAttribute('src')

            if (!img || !imgSrc) {
              return undefined
            }

            const altText = img.getAttribute('alt')

            return block({
              _type: 'image',
              url: imgSrc,
              altText,
            })
          }

          return undefined
        },
      },
    ],
  })

  // Insert your own logic to upload any blocks
  // where block._type == "image" and change
  // them to an asset reference!

  return blocks
}
```


---

## Lesson 10: Reducing SEO impact
https://www.sanity.io/learn/course/refactoring-content/reducing-seo-impact

Confidently migrate content into Sanity, maintain SEO standing, manage redirects, and prevent broken links. Ensure visual consistency with automated testing.

Migrating content into Sanity is just one half of the equation. If your front end is a website with an established link structure and good SEO standing, you must have automated, external link checking and downtime monitoring.


### Redirects


If your URL structure changes as part of the re-platforming, ensure you have redirects from old pages to new ones. Use a tool like [Screaming Frog](https://www.screamingfrog.co.uk/seo-spider/) to crawl your existing website to find every route that needs recreating.


Redirects can be modeled as Sanity Studio schema types and implemented into any framework that supports redirects.


> [!TIP]
> See the guide [Managing redirects with Sanity](https://www.sanity.io/guides/managing-redirects-with-sanity) on Sanity Exchange


It may be part of your re-platforming strategy to reduce the amount of content you have online, but you still should not be generating a large volume of unhandled, broken links.


Set up a broken link-checking service to catch any invalid internal links. If you do not already have a link checking service, the sooner you can set this up, the better.


- [ ] Do an audit of the existing URL structure and which URLs should be handled with redirects.


### Visual regression testing


Another option, if your existing front end is going to be re-used with Sanity as the new content source, is to perform automated visual regression testing. 


Misshaped or invalid data may break page layouts – and your application may be far too large to manually check every page individually. An automated tool may catch these broken layouts and alert you to deeply nested visual problems.


---

## Lesson 11: Conclusion
https://www.sanity.io/learn/course/refactoring-content/conclusion

You're now equipped to navigate content migration with a unique, well-documented strategy, adopting new technologies smoothly and predictably.

After working through these lessons and completing the tasks, you should be well-prepared to take on the content migration. 


Ultimately, your strategy for re-platforming your content is as unique as the content and your organization. Involving the right people at all levels, documenting the journey, and incrementally adopting these new technologies should result in a predictable and smoother process.


---

## Related Resources

- [Track overview](https://www.sanity.io/learn/track/replatforming-to-sanity.md)
- [All courses and lessons](https://www.sanity.io/learn/sitemap.md)
- [Complete content for LLMs](https://www.sanity.io/learn/llms-full.txt)