How to handle <img /> tags with htmlToBlocks from @sanity/block-tools

5 replies
Last updated: Nov 3, 2023
Hello all, I'm using
htmlToBlocks
from
@sanity/block-tools
and can't figure out how to handle
<img />
tags that are in the HTML.It's being used for a content migration tool and I have everything in place to download assets from the previous site, upload them into Sanity and refer to it from documents. But when I convert HTML to Portable Text using
htmlToBlock
it ignores
<img />
tags.Can I somehow add support for it or is it an option I have to enable somewhere?
Do I need to try something funky like replace
<img />
in the HTML with
<a href />
with a particular className and then fix it in the resulting JSON so it turns into an image?
Appreciate any input on this!
AI Update

Based on the Sanity documentation, htmlToBlocks from @sanity/block-tools (now @portabletext/block-tools) doesn't automatically handle <img /> tags by default. You need to add custom deserialization rules to handle them—no funky workarounds needed!

Adding Support for <img> Tags

You can pass custom rules to htmlToBlocks to intercept specific HTML elements. Here's how to handle images:

For Images Inside <figure> Tags

import {htmlToBlocks} from '@portabletext/block-tools'
import {JSDOM} from 'jsdom'

const blocks = htmlToBlocks(html, blockContentSchema, {
  parseHtml: (html) => new JSDOM(html).window.document,
  rules: [
    {
      deserialize(node, next, block) {
        const el = node as HTMLElement
        
        if (node.nodeName.toLowerCase() === 'figure') {
          const url = el.querySelector('img')?.getAttribute('src')
          
          if (!url) {
            return undefined
          }
          
          return block({
            _type: 'externalImage', // Temporary type
            url,
          })
        }
        
        return undefined
      },
    },
  ],
})

For Standalone <img> Tags

If your images aren't wrapped in <figure> tags, check for img directly:

rules: [
  {
    deserialize(node, next, block) {
      if (node.nodeName.toLowerCase() === 'img') {
        const el = node as HTMLElement
        const url = el.getAttribute('src')
        
        if (!url) return undefined
        
        return block({
          _type: 'externalImage',
          url,
          alt: el.getAttribute('alt') || '',
        })
      }
      
      return undefined
    },
  },
]

Post-Processing: Uploading Images

Since the deserialize method is synchronous, you need to post-process blocks to upload images and create proper Sanity asset references:

// Step 1: Extract URLs with htmlToBlocks
let blocks = htmlToBlocks(html, blockContentSchema, { 
  parseHtml: (html) => new JSDOM(html).window.document,
  rules: [/* your rules */]
})

// Step 2: Upload images and create references
const blocksWithUploads = blocks.map((block) =>
  async () => {
    if (block._type !== 'externalImage' || !('url' in block)) {
      return block
    }
    
    // Upload the image to Sanity
    const imageAsset = await client.assets.upload('image', fetch(block.url))
    
    // Return proper image block with reference
    return {
      _key: block._key,
      _type: 'image',
      asset: {
        _ref: imageAsset._id,
        _type: 'reference'
      }
    }
  }
)

blocks = await Promise.all(blocksWithUploads.map(fn => fn()))

Complete Example from the Migration Guide

The WordPress to Sanity migration course shows a full implementation with rate limiting and caching:

export async function htmlToBlockContent(
  html: string,
  client: SanityClient,
  imageCache: Record<number, string>,
): Promise<Post['content']> {
  // Convert HTML to Portable Text
  let blocks = htmlToBlocks(html, blockContentSchema, {
    parseHtml: (html) => new JSDOM(html).window.document,
    rules: [
      {
        deserialize(node, next, block) {
          const el = node as HTMLElement
          if (node.nodeName.toLowerCase() === 'figure') {
            const url = el.querySelector('img')?.getAttribute('src')
            if (!url) return undefined
            
            return block({
              _type: 'externalImage',
              url,
            })
          }
          return undefined
        },
      },
    ],
  })

  // Upload images with rate limiting
  const limit = pLimit(2)
  const blocksWithUploads = blocks.map((block) =>
    limit(async () => {
      if (block._type !== 'externalImage' || !('url' in block)) {
        return block
      }

      // Check cache first
      if (imageCache[block.url]) {
        return {
          _key: block._key,
          _type: 'image',
          asset: { _ref: imageCache[block.url], _type: 'reference' }
        }
      }

      // Upload and cache
      const imageDocument = await sanityUploadFromUrl(block.url, client)
      if (imageDocument) {
        imageCache[block.url] = imageDocument._id
        return {
          _key: block._key,
          _type: 'image',
          asset: { _ref: imageDocument._id, _type: 'reference' }
        }
      }
      
      return block
    }),
  )

  return await Promise.all(blocksWithUploads)
}

Key Takeaways

  • Custom rules let you deserialize <img> tags into temporary block types
  • Post-processing is required to upload images (the deserialize method is synchronous)
  • Store URLs temporarily, then map over blocks to upload and create _sanityAsset references
  • Consider using rate limiting (like p-limit) and caching for performance

Check out the full Converting HTML to Portable Text lesson and the @sanity/block-tools documentation for more details!

Show original thread
5 replies
Have you created a
rule
for handling them? What does your code look like so far?
Now that you point it out, I see the documentation on
rule
!I'm digging into that. Thanks!
That was the missing link. Got it fixed now.Thank you so much for your reply!!
🙏
Excellent!
Just leaving this link here because I didn't know what was being talked about and it's interesting! https://www.npmjs.com/package/@sanity/block-tools#rules

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?