htmlToBlocks - A Custom Implementation in C#

39 replies
Last updated: Apr 16, 2020
Hi guys! At our company, we are trying to adopt the Portable Text format, turning away from plain HTML. Our backend is in C#, so unfortunatelly we cannot (wish not) to use this https://www.npmjs.com/package/@sanity/block-tools?activeTab=readme#htmltoblockshtml-blockcontenttype-options-html-deserializer package, but we use it as a source for a custom implementation in C#. I am operating as an intermediate here and may have questions in the future related to this. Our backend developer has a question about the reasoning behind:

{
    "_type": "block",
    "markDefs": [],
    "style": "normal",
    "children": []
},
These empty blocks. What are the requirements for these to be inserted when parsing html to Portable Text?

PS: Anyone here who would be interested in porting (or help porting) block-tools's
htmlToBlocks
to C#? ❤️
Apr 16, 2020, 11:31 AM
Hi, those blocks aren't valid. You probably need to normalize the end result. Either make them include an empty span, or just remove them.
Apr 16, 2020, 11:34 AM
Did you get those from C# or from the JS function?
Apr 16, 2020, 11:35 AM
Used the linked JS block-tools package, without custom rules.
Apr 16, 2020, 11:36 AM
How did you call it?
Apr 16, 2020, 11:36 AM
function convertHTMLtoPortableText (HTMLDoc) {
  return blockTools.htmlToBlocks(HTMLDoc, blockContentType, {
    // rules,
    parseHtml: html => new JSDOM(html).window.document
  })
}
Apr 16, 2020, 11:37 AM
Hmm..that's strange. Because that function should normalize it already.
Apr 16, 2020, 11:38 AM
HTML:
<div>
  <h3>Some text</h3>
  <div>
    <div>
      <h4>Some other text</h4>
    </div>
  </div>
</div>
output:

[
  {
    "_type": "block",
    "markDefs": [],
    "style": "h3",
    "children": [
      {
        "_type": "span",
        "marks": [],
        "text": "Some text"
      }
    ]
  },
  {
    "_type": "block",
    "markDefs": [],
    "style": "normal",
    "children": []
  },
  {
    "_type": "block",
    "markDefs": [],
    "style": "h4",
    "children": [
      {
        "_type": "span",
        "marks": [],
        "text": "Some other text"
      }
    ]
  }
]
Apr 16, 2020, 11:40 AM
Yes, this is the expected output except that middle block should have been normalized with an empty span as children.
Apr 16, 2020, 11:41 AM
Also the same for
<div>
  <h3>Some text</h3>
  <h4>Some other text</h4>
</div>
Apr 16, 2020, 11:41 AM
What is the result of that?
Apr 16, 2020, 11:41 AM
the same as above
Apr 16, 2020, 11:42 AM
Hmm..I'm not getting those results...that's weird.
Apr 16, 2020, 11:46 AM
[
  {
    "_key": "randomKey0",
    "_type": "block",
    "children": [{"_key": "randomKey00", "_type": "span", "marks": [], "text": "Some text"}],
    "markDefs": [],
    "style": "h3"
  },
  {
    "_key": "randomKey1",
    "_type": "block",
    "children": [{"_key": "randomKey10", "_type": "span", "marks": [], "text": ""}],
    "markDefs": [],
    "style": "normal"
  },
  {
    "_key": "randomKey2",
    "_type": "block",
    "children": [{"_key": "randomKey20", "_type": "span", "marks": [], "text": "Some other text"}],
    "markDefs": [],
    "style": "h4"
  }
]

Apr 16, 2020, 11:46 AM
Maybe something with JSDOM?
Apr 16, 2020, 11:47 AM
I'm using "jsdom": "^12.0.0",
Apr 16, 2020, 11:47 AM
Or no...that's so weird. It should just normalize it anyway.
Apr 16, 2020, 11:50 AM
using 15.2.1 here.
Apr 16, 2020, 11:50 AM
Which version of block-tools btw?
Apr 16, 2020, 11:52 AM
It's really strange that it doesn't normalize, because the exported function should do that.
Apr 16, 2020, 11:55 AM
I think I found it out!


const data = fs.readFileSync(path.join(__dirname,"/data/test.html"), {encoding: "utf-8"})
blockTools.htmlToBlocks(data, blockContentType, {
  parseHtml: html => new JSDOM(html).window.document
})

Apr 16, 2020, 11:55 AM
If there are line breaks in the input file, empty blocks will be added.
Apr 16, 2020, 11:56 AM
<div><h3>Some text</h3><h4>Some other text</h4></div>
meaning this does not instert empty blocks
Apr 16, 2020, 11:57 AM
Right, but it should still be normalized, so that's weird.
Apr 16, 2020, 11:58 AM
Do you by the way have any suggestion for helping someone port this to C#? I showed them the https://github.com/portabletext/portabletext specs, but it may be not enough for implementing it from scratch. Do you know people with C# experience (maybe in sanity) who would be willing to help out?
Apr 16, 2020, 12:08 PM
Sorry, I don't know.
Apr 16, 2020, 12:09 PM
Maybe there’s something here that can help? https://github.com/oslofjord/sanity-linq
Apr 16, 2020, 12:11 PM
Yeah, I linked that too. As I understand, you can use it to reverse engineer things, but not straight forward. Would be helpful, if not specifically for C# but to provide a bit better starting point for those would like to implement a converter in other languages? 🙂 I think it could broaden the amount of companies considering sanity to migrate to.
Apr 16, 2020, 12:13 PM
I find this as the biggest pain point in the whole process. It is a breeze to write schemas and generate block content if you already are in Sanity, but getting there might be hard, especially if your current CMS is only delivering HTML. 😕
Apr 16, 2020, 12:15 PM
Absolutely! Better tooling and docs around portable text is on the list.
Apr 16, 2020, 12:15 PM
I really like Sanity, and lobby for it at my company, but this is a turning point for us.
Right now we cannot let our old CMS go yet, so the current workflow is to listen to changes, convert the HTML to a more "sane" structure, save it in another database, and use a GraphQL endpoint to fetch that data.
Apr 16, 2020, 12:17 PM
The "converter" is written in C#, and we think it is cumbersome to use JS in addition in the backend, as the backend developers prefer a single language codebase (understandibly)
Apr 16, 2020, 12:19 PM
I could do it with the given npm package in an additional step, but that would complicate the publishing pipeline, and I would have the sole responsibility for the correct data transformation even though I am supposed (not strictly though) to only work with the frontend
Apr 16, 2020, 12:20 PM
I can see that – I guess we're a bit biased towards JS since much of our stuff is written in it. Then again, the “logics” behind serialization and deserialization of Portable Text should be pretty similar in any language.
Apr 16, 2020, 12:25 PM
So it's something we could take a closer look at.
Apr 16, 2020, 12:25 PM
If not else making it easier for the community to contribute with tooling in their favorite languages
Apr 16, 2020, 12:25 PM
(:javascript: 🤘)
Apr 16, 2020, 12:26 PM
I guess if we can help somehow, I can ask my boss and the back-end developer if we could contribute. (Vi er fra Trondheim, forresten.)
Apr 16, 2020, 12:28 PM
That would be awesome! (Jeg er faktisk født der, sjø)
Apr 16, 2020, 12:29 PM
will ask around then, and get back to you soon
Apr 16, 2020, 12:34 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Was this answer helpful?