How AI is powering better personalization in e-commerce [with Vercel]. Sign up now

Understanding Empty Blocks in HTML to Portable Text Conversion

39 repliesLast updated: Dec 1, 2025

Hi guys! At our company, we are trying to adopt the Portable Text format, turning away from plain HTML. Our backend is in C#, so unfortunatelly we cannot (wish not) to use this https://www.npmjs.com/package/@sanity/block-tools?activeTab=readme#htmltoblockshtml-blockcontenttype-options-html-deserializer package, but we use it as a source for a custom implementation in C#. I am operating as an intermediate here and may have questions in the future related to this. Our backend developer has a question about the reasoning behind:

{
    "_type": "block",
    "markDefs": [],
    "style": "normal",
    "children": []
},
These empty blocks. What are the requirements for these to be inserted when parsing html to Portable Text?

PS: Anyone here who would be interested in porting (or help porting) block-tools's
htmlToBlocks
to C#? ❤️
AI Update

Great question about empty blocks in Portable Text! Understanding when and why these get created during HTML parsing is important for implementing this correctly in C#.

When Empty Blocks Are Created

During HTML to Portable Text conversion, empty blocks are generated based on specific parsing rules. The htmlToBlocks function (note: @sanity/block-tools has been deprecated in favor of @portabletext/block-tools, but the core logic is similar) creates empty blocks in these scenarios:

1. Block-level HTML elements with no text content:

2. Self-closing or void elements that represent block boundaries:

3. Elements with only formatting/annotation children but no actual text:

4. Structural preservation during initial parsing: The parser creates a block object for every block-level HTML element it encounters, regardless of content, to maintain a 1:1 structural mapping during the initial parse phase. This ensures the HTML's document structure is faithfully represented before any cleanup.

The Parsing Algorithm's Behavior

The htmlToBlocks deserializer follows this general pattern:

Requirements for Your C# Implementation

When implementing this in C#, you should:

During Parsing (create empty blocks):

// Pseudo-code for C# implementation
foreach (var element in blockLevelElements) 
{
    var block = new Block 
    {
        _type = "block",
        _key = GenerateKey(),
        style = MapStyleFromTag(element.TagName), // h1 -> "h1", p -> "normal"
        markDefs = new List<MarkDef>(),
        children = new List<Span>()
    };
    
    // Process children - this might result in empty children array
    foreach (var child in element.ChildNodes) 
    {
        if (child.NodeType == NodeType.Text && !string.IsNullOrWhiteSpace(child.TextContent))
        {
            block.children.Add(CreateSpan(child));
        }
    }
    
    blocks.Add(block); // Add even if children is empty
}

After Parsing (filter empty blocks):

// Post-processing cleanup
blocks = blocks.Where(block => 
{
    if (block.children == null || block.children.Count == 0)
        return false;
    
    // Check if all children have only whitespace text
    var allText = string.Join("", block.children.Select(c => c.text ?? ""));
    return !string.IsNullOrWhiteSpace(allText);
}).ToList();

Important: Keep blocks that don't have a children property at all (like image blocks or other custom block types) - only filter blocks where children exists but is empty or contains only whitespace.

Why This Two-Phase Approach?

The empty blocks exist as an intermediate artifact because:

The blocks with "children": [], "markDefs": [], and "style": "normal" are valid Portable Text structures - they represent empty paragraphs in the source HTML. However, they're typically undesirable in the final output since they'd render as empty paragraphs.

Regarding Your C# Port

For your C# implementation, I'd recommend:

The key insight is that empty block creation is a byproduct of the structural parsing algorithm, not a bug - but they should generally be filtered out before returning the final Portable Text array.

Show original thread
39 replies

Was this answer helpful?

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Related contributions