Best way to convert HTML to a PortableText object?

5 replies
Last updated: Dec 3, 2020
Hi folks! Does anyone know a simple way of converting HTML to a PortableText object? A library of some sorts? Im trying to import some data from an old site into Sanity, and smdm of the content has been built with a wysiwyg editor saving it as html...
AI Update

Perfect! There's a dedicated package for exactly what you need. The @sanity/block-tools package (also available as @portabletext/block-tools on npm) provides the htmlToBlocks function that converts HTML to Portable Text.

Here's how to get started:

Installation:

npm install -D @portabletext/block-tools jsdom

You'll need JSDOM (or another DOM parser) since the conversion needs to parse the HTML structure.

Basic Usage:

The htmlToBlocks function handles common HTML elements like headings, paragraphs, bold, italic, and lists automatically. For a simple migration, you can use it with your Portable Text schema configuration.

Custom Rules:

If your WYSIWYG editor created more complex HTML (like custom classes, <figure> tags, or embedded images), you can define custom deserialization rules to handle specific elements. This lets you:

  • Extract data from HTML attributes
  • Transform elements into custom Portable Text block types
  • Handle inline styles and convert them to decorators

Important Note:

The htmlToBlocks function is synchronous, so if your HTML contains images or other assets, you'll need to handle uploading those to Sanity separately and then reference them in your Portable Text blocks.

Sanity has a great course on migrating WordPress content that walks through the entire process, which should be helpful even if you're not coming from WordPress specifically.

sanity have their own lib for it πŸ™‚ https://www.npmjs.com/package/@sanity/block-tools
Amazing! I have no idea how i managed to miss that - only found libs for going the other way around πŸ˜„ Thanks a lot
user V
!
no problem! it also support custom rules if you want to convert some tags to a custom block type. quite handy πŸ‘Œ
Might be useful to read through a higher-level guide like https://www.sanity.io/guides/how-to-migrate-your-html-blog-content-from-ghost β€” even if it’s not the same source as for your migration it probably covers relevant issues πŸ™‚
It looks great - both the library and the post - i was going to run the import through the api, but converting all the data to ndjson and using the cli seems like a better idea! Thank you both! ❀️

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?