How to save id-attributes in blocks when importing HTML with block-tools

19 replies
Last updated: Feb 14, 2022
Hello. I'm importing old html in to block-text with block-tools. I have to save id-attributes (due of html's inner links) in Sanity, but I don't get how I can do it properly. I understand I have to create mark and markdef somehow. I have read this example https://github.com/sanity-io/sanity/tree/next/packages/@sanity/block-tools#rules .
Feb 9, 2022, 10:24 AM
Here is my code

const result = blockTools.htmlToBlocks(content, schema, {

parseHtml: html => new JSDOM(html).window.document,

rules: [

{

deserialize(el, next, block) {

let idVal= '';


if (el.tagName.toLowerCase() === 'a') {

return;

}


_forEach(el.attributes, (attr) => {

if (attr.name.toLowerCase() === 'id') {

idVal = attr.value;

}

});


// ??? What I should do now?


}

}

]

});
Feb 9, 2022, 10:28 AM
Can somebody help me! This is major blocker for me.
Feb 9, 2022, 2:16 PM
Hey Kai πŸ‘‹. I'm taking a look at this. Would you be able to provide an example of the HTML you want to import?
Feb 9, 2022, 2:51 PM
I have an example of creating markdefs with blocktools here: https://gist.github.com/d4rekanguok/8a6c698d16ef6666196ae028c04066bc
The trick seems to be returning a node of type
__annotation
Feb 9, 2022, 4:54 PM
user E
It is just very basic html, with nothing really special, but it contains standard inner links.Example:
...<a href="#toc2">How we handle your personal information</a>...
...<h2 id="toc2">How we handle your personal information</h2>...

Block tools parser saves href="#toc2" but id="toc2" will not be saved, so I cannot recreate same page again.
Feb 10, 2022, 7:35 AM
user G
Thank you for your answer πŸ™‚ , but Block tools saves href just fine as default, but loses id-attributes, so inner page links does not work.
Feb 10, 2022, 7:37 AM
I've just given it a quick shot & it looks like
el.id
would correctly log the id attribute, if that doesn't work, could you share the relevant excerpt of your html so we can give it a try?
Feb 10, 2022, 8:02 AM
const htmlToBlocks = (content) => {
const result = blockTools.htmlToBlocks(content, schema, {
parseHtml: html => new JSDOM(html).window.document,
rules: [
{
deserialize(el, next, block) {
let idVal= '';

if (el.tagName.toLowerCase() !== 'h2') {
return;
}

_forEach(el.attributes, (attr) => {
if (attr.name.toLowerCase() === 'id') {
idVal = attr.value;
}
});

if (!idVal) {
return;
}

const result = {
_type: 'block',
children: [
{
_type: 'span',
text: el.textContent,
htmlId: idVal,
}
],
style: 'h2'
};

return block(result);

}
}
]
});

return result;

}
Feb 11, 2022, 6:49 AM
const htmlToBlocks = (content) => {
const result = blockTools.htmlToBlocks(content, schema, {
parseHtml: html => new JSDOM(html).window.document,
rules: [
{
deserialize(el, next, block) {
let idVal= '';

if (el.tagName.toLowerCase() !== 'h2') {
return;
}

_forEach(el.attributes, (attr) => {
if (attr.name.toLowerCase() === 'id') {
idVal = attr.value;
}
});

if (!idVal) {
return;
}

const result = {
_type: 'block',
children: [
{
_type: 'span',
text: el.textContent,
htmlId: idVal,
}
],
style: 'h2'
};

return block(result);

}
}
]
});

return result;

}
Feb 11, 2022, 6:49 AM
Hi again. I found that I can just put values in json. But because it does not follow any schema I think it is not legit way.
Is there legit way to save ids in blocks?


const htmlToBlocks = (content) => {

return blockTools.htmlToBlocks(content, schema, {

parseHtml: html => new JSDOM(html).window.document,

rules: [

{

deserialize(el, next, block) {

let idVal= '';


if (el.tagName.toLowerCase() !== 'h2') {

return;

}


_forEach(el.attributes, (attr) => {

if (attr.name.toLowerCase() === 'id') {

idVal = attr.value;

}

});


if (!idVal) {

return;

}


const result = {

_type: 'block',

children: [

{

_type: 'span',

text: el.textContent,

htmlId: idVal,

}

],

style: 'h2'

};


return block(result);


}

}

]

});


}
Feb 11, 2022, 7:16 AM
Previous code outputs following:

{
      "_key": "30f3d9e74d36",
      "_type": "block",
      "children": [
        {
          "_key": "30f3d9e74d360",
          "_type": "span",
          "htmlId": "toc1",
          "marks": [],
          "text": "Using Polar services in short"
        }
      ],
      "markDefs": [],
      "style": "h2"
},
Feb 11, 2022, 7:17 AM
Of course I have to find way to parse that kind of json somehow. πŸ™‚
Feb 11, 2022, 7:19 AM
if you want to keep it as a regular block, you'd better off creating an annotation. Instead of returning block you can return an object of type '__annotation' with your custom markdef:

          return {
            _type: '__annotation',
            markDef: {
              _type: 'htmlId',
              _key: randomKey(12),
              htmlId,
            },
            children: next(el.childNodes)
          }
You'd have to define this annotation in your block schema:


marks: {
  annotations: [
    {
      name: 'htmlId',
      type: 'object',
      fields: [ { name: htmlId, type: 'string' } ]
    } 
  ]
}
Feb 11, 2022, 7:25 AM
alternatively you can create a custom heading block with that property defined

return block({
  _type: 'customHeading',
  htmlId,
  /* etc */
})
and define it in your schema


{
  type: 'array',
  of: [
    { type: 'block' },
    { type: 'customHeading', /* your custom props */ }
  ]
}
Feb 11, 2022, 7:25 AM
Finally I found working solution:
const htmlToBlocks = (content) => {

return blockTools.htmlToBlocks(content, schema, {

parseHtml: html => new JSDOM(html).window.document,

rules: [

{

deserialize(el, next, block) {

let idVal= '';


if (el.tagName.toLowerCase() !== 'h2') {

return;

}


_forEach(el.attributes, (attr) => {

if (attr.name.toLowerCase() === 'id') {

idVal = attr.value;

}

});


if (!idVal) {

return;

}


const markKey = blockTools.randomKey(12);


const result = {

_type: 'block',

style: 'h2',

children: [

{

_type: 'span',

text: el.textContent,

marks: [

markKey

]

}

],

markDefs: [

{

_key: markKey,

_type: "html_id",

html_id: idVal

}

]

};


return block(result);


}

}

]

});


}
Feb 14, 2022, 9:54 AM
Schema:
title: "Legal texts",

name: "legal_text",

type: "document",

icon: GiScales,

i18n: true,

fields: [

{

title: "Title",

name: "title",

type: "string",

},

{

title: "Content",

name: "content",

type: "array",

of: [

{

type: "block",

marks: {

annotations: [

{

name: "link",

type: "object",

title: "Link",

fields: [

{

title: "href",

name: "href",

type: "string"

}

]

},

{

name: "html_id",

type: "object",

title: "html Id",

icon: BsHash,

fields: [

{

title: "#",

name: "html_id",

type: "string"

}

]

}

]

}

}

]

},

{

title: "Key",

name: "key",

type: "string"

}


]
Feb 14, 2022, 9:55 AM
Thank you for your help πŸ™‚
Feb 14, 2022, 9:56 AM
That's great! I'm glad you got this working.
user G
thank you for helping! πŸ™
Feb 14, 2022, 2:52 PM
user U
figured it out! nice work, I didn't think of just returning the whole block with markDef defined πŸ˜…
Feb 14, 2022, 2:59 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Was this answer helpful?