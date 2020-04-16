htmlToBlocks - A Custom Implementation in C#
Hi guys! At our company, we are trying to adopt the Portable Text format, turning away from plain HTML. Our backend is in C#, so unfortunatelly we cannot (wish not) to use this https://www.npmjs.com/package/@sanity/block-tools?activeTab=readme#htmltoblockshtml-blockcontenttype-options-html-deserializer package, but we use it as a source for a custom implementation in C#. I am operating as an intermediate here and may have questions in the future related to this. Our backend developer has a question about the reasoning behind:
{ "_type": "block", "markDefs": [], "style": "normal", "children": [] },
PS: Anyone here who would be interested in porting (or help porting) block-tools's
htmlToBlocksto C#? ❤️
Hi, those blocks aren't valid. You probably need to normalize the end result. Either make them include an empty span, or just remove them.
Did you get those from C# or from the JS function?
Used the linked JS block-tools package, without custom rules.
How did you call it?
function convertHTMLtoPortableText (HTMLDoc) { return blockTools.htmlToBlocks(HTMLDoc, blockContentType, { // rules, parseHtml: html => new JSDOM(html).window.document }) }
Hmm..that's strange. Because that function should normalize it already.
HTML:
<div> <h3>Some text</h3> <div> <div> <h4>Some other text</h4> </div> </div> </div>
[ { "_type": "block", "markDefs": [], "style": "h3", "children": [ { "_type": "span", "marks": [], "text": "Some text" } ] }, { "_type": "block", "markDefs": [], "style": "normal", "children": [] }, { "_type": "block", "markDefs": [], "style": "h4", "children": [ { "_type": "span", "marks": [], "text": "Some other text" } ] } ]
Yes, this is the expected output except that middle block should have been normalized with an empty span as children.
Also the same for
<div> <h3>Some text</h3> <h4>Some other text</h4> </div>
What is the result of that?
the same as above
Hmm..I'm not getting those results...that's weird.
[ { "_key": "randomKey0", "_type": "block", "children": [{"_key": "randomKey00", "_type": "span", "marks": [], "text": "Some text"}], "markDefs": [], "style": "h3" }, { "_key": "randomKey1", "_type": "block", "children": [{"_key": "randomKey10", "_type": "span", "marks": [], "text": ""}], "markDefs": [], "style": "normal" }, { "_key": "randomKey2", "_type": "block", "children": [{"_key": "randomKey20", "_type": "span", "marks": [], "text": "Some other text"}], "markDefs": [], "style": "h4" } ]
Maybe something with JSDOM?
I'm using "jsdom": "^12.0.0",
Or no...that's so weird. It should just normalize it anyway.
using 15.2.1 here.
Which version of block-tools btw?
It's really strange that it doesn't normalize, because the exported function should do that.
I think I found it out!
const data = fs.readFileSync(path.join(__dirname,"/data/test.html"), {encoding: "utf-8"}) blockTools.htmlToBlocks(data, blockContentType, { parseHtml: html => new JSDOM(html).window.document })
If there are line breaks in the input file, empty blocks will be added.
<div><h3>Some text</h3><h4>Some other text</h4></div>meaning this does not instert empty blocks
Right, but it should still be normalized, so that's weird.
Do you by the way have any suggestion for helping someone port this to C#? I showed them the https://github.com/portabletext/portabletext specs, but it may be not enough for implementing it from scratch. Do you know people with C# experience (maybe in sanity) who would be willing to help out?
Sorry, I don't know.
Maybe there’s something here that can help? https://github.com/oslofjord/sanity-linq
Yeah, I linked that too. As I understand, you can use it to reverse engineer things, but not straight forward. Would be helpful, if not specifically for C# but to provide a bit better starting point for those would like to implement a converter in other languages? 🙂 I think it could broaden the amount of companies considering sanity to migrate to.
I find this as the biggest pain point in the whole process. It is a breeze to write schemas and generate block content if you already are in Sanity, but getting there might be hard, especially if your current CMS is only delivering HTML. 😕
Absolutely! Better tooling and docs around portable text is on the list.
I really like Sanity, and lobby for it at my company, but this is a turning point for us.
Right now we cannot let our old CMS go yet, so the current workflow is to listen to changes, convert the HTML to a more "sane" structure, save it in another database, and use a GraphQL endpoint to fetch that data.
The "converter" is written in C#, and we think it is cumbersome to use JS in addition in the backend, as the backend developers prefer a single language codebase (understandibly)
I could do it with the given npm package in an additional step, but that would complicate the publishing pipeline, and I would have the sole responsibility for the correct data transformation even though I am supposed (not strictly though) to only work with the frontend
I can see that – I guess we're a bit biased towards JS since much of our stuff is written in it. Then again, the “logics” behind serialization and deserialization of Portable Text should be pretty similar in any language.
So it's something we could take a closer look at.
If not else making it easier for the community to contribute with tooling in their favorite languages
(:javascript: 🤘)
I guess if we can help somehow, I can ask my boss and the back-end developer if we could contribute. (Vi er fra Trondheim, forresten.)
That would be awesome! (Jeg er faktisk født der, sjø)
will ask around then, and get back to you soon
