Preserving HTML ID attributes in block content using the block tools library

5 replies
Last updated: Oct 26, 2022
https://www.npmjs.com/package/@sanity/block-tools I’m using the block tools
htmlToBlocks
library to translate existing html to block content/portable text. I have html elements with important ID attributes that are being stripped out during the transformation process. Is there a way to preserve html id attributes in block content?
For example, this HTML

<p id="important">Important paragraph</p>
Results in this block content object (there is no id=“important” artifact anywhere here)

[
  {
    "_type": "block",
    "markDefs": [],
    "style": "normal",
    "children": [
      {
        "_type": "span",
        "marks": [],
        "text": "Important paragraph",
        "_key": "55a9bea915f10"
      }
    ],
    "_key": "55a9bea915f1"
  }
]
AI Update

Yes, you can preserve HTML id attributes in block content, but you'll need to store them as custom data on the block object itself. By default, htmlToBlocks doesn't preserve arbitrary HTML attributes like id because standard Portable Text blocks don't have a built-in place for them.

Here's how to solve this:

Solution: Store IDs as Custom Block Data

You can extend your htmlToBlocks rules to extract the id attribute and store it as a custom property on the block object:

import {htmlToBlocks} from '@sanity/block-tools'
import {JSDOM} from 'jsdom'
import {Schema} from '@sanity/schema'

// Define your schema with the custom id field
const defaultSchema = Schema.compile({
  name: 'myBlog',
  types: [
    {
      type: 'object',
      name: 'blogPost',
      fields: [
        {
          title: 'Body',
          name: 'body',
          type: 'array',
          of: [
            {
              type: 'block',
              // Add custom fields to your block type
              fields: [
                {
                  name: 'htmlId',
                  type: 'string',
                  title: 'HTML ID'
                }
              ]
            }
          ]
        }
      ]
    }
  ]
})

const blockContentType = defaultSchema
  .get('blogPost')
  .fields.find((field) => field.name === 'body').type

// Custom rules to preserve id attributes
const customRules = [
  {
    deserialize(el, next, block) {
      // Handle any element with an id attribute
      if (el.getAttribute && el.getAttribute('id')) {
        const id = el.getAttribute('id')
        
        // For block-level elements (p, h1-h6, etc.)
        if (['P', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'DIV'].includes(el.tagName)) {
          const children = next(el.childNodes)
          
          return {
            ...block(children),
            htmlId: id  // Store the id as custom data
          }
        }
      }
      
      return undefined
    }
  }
]

// Use it
const html = '<p id="important">Important paragraph</p>'
const blocks = htmlToBlocks(html, blockContentType, {
  rules: customRules,
  parseHtml: (html) => new JSDOM(html).window.document
})

This will produce:

[
  {
    "_type": "block",
    "markDefs": [],
    "style": "normal",
    "htmlId": "important",
    "children": [
      {
        "_type": "span",
        "marks": [],
        "text": "Important paragraph",
        "_key": "55a9bea915f10"
      }
    ],
    "_key": "55a9bea915f1"
  }
]

Alternative: Use Annotations for Inline Elements

If you need to preserve IDs on inline elements (like <span id="xyz">), you'd want to use annotations instead, which allow you to attach structured data to text spans through the markDefs array.

Schema Configuration

Make sure your Portable Text schema includes the custom field:

{
  name: 'body',
  type: 'array',
  of: [
    {
      type: 'block',
      fields: [
        {
          name: 'htmlId',
          type: 'string',
          title: 'HTML ID',
          description: 'Preserved from original HTML'
        }
      ]
    }
  ]
}

Rendering the IDs

When rendering your Portable Text on the frontend, you can access the custom htmlId property in your serializers to output the ID back to HTML. For example, with @portabletext/react:

<PortableText
  value={content}
  components={{
    block: {
      normal: ({value, children}) => (
        <p id={value.htmlId}>{children}</p>
      )
    }
  }}
/>

This approach gives you full control over preserving and rendering HTML IDs while keeping your content structured and reusable across different platforms.

Hey
user Q
! Have you specified any rules to deal with these attributes?
Yes, my setup looks very similar to Mark’s code sandbox above.
I have rules like this…

  ...
  {
    deserialize(el, _next, _block) {
      if (el.tagName.toLowerCase() !== "sup") {
        return undefined;
      }
      return {
        _type: "span",
        marks: ["superscript"],
        text: el.textContent,
      };
    },
  },
  ...
Do you have an example of what a rule to preserve ID attributes would look like? I’m surprised this is a “special HTML case” haha.
Got it! They won't be picked up by default since attributes on elements are usually visual concerns. You need a few things. First, you need a custom decorator set up in order for the PT editor in the Studio to display/edit the attribute. Second, you'd need to add logic to check for the attribute (which is likely
el.attributes
or something similar), then return an object like this:
{
  _type: 'span',
  marks: ['<custom-decorator-name>'],
  text: el.textContent
}
Great! Thanks for the links and example!
You're welcome!

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?