Converting Markdown to Sanity: HTML intermediary vs direct conversion?

32 replies
Last updated: May 18, 2020
yeah the markdown process is quite daunting. Working on that now myself. I’m considering generating html from the markdown, then to Sanity but not quite sure if that’s better or not than trying to convert markdown directly. This is getting me much of the way: https://www.sanity.io/guides/how-to-migrate-your-html-blog-content-from-ghost
AI Update

I hear you—the markdown migration process can definitely be tricky! You're actually on a good path, and there are multiple approaches depending on your content structure.

Markdown → HTML → Portable Text vs. Direct Markdown → Portable Text

Both approaches work, but here's what to consider:

Going the HTML route (what you're exploring):

  • Use @sanity/block-tools with the htmlToBlocks function
  • This is actually a solid approach because HTML is more structured than markdown
  • The Ghost HTML migration guide you linked uses this pattern and it's battle-tested
  • The package handles basic rich text formatting (headings, paragraphs, lists) automatically

Direct markdown route:

  • You'd need to parse markdown to MDAST (Markdown Abstract Syntax Tree) using something like remark
  • Then write custom transformation logic to convert MDAST to Portable Text blocks
  • There isn't a well-maintained "markdown-to-portable-text" package in the ecosystem
  • The @portabletext packages mostly work in reverse (Portable Text → other formats)

My recommendation: Stick with the HTML approach you're exploring. Here's why:

  1. Better tooling support: @sanity/block-tools has proven utilities for HTML → Portable Text conversion
  2. Easier to handle complex content: Markdown parsers vary in how they handle edge cases, but HTML is more standardized
  3. Custom deserialization rules: You can define custom rules to handle specific HTML elements like <figure> or <img> tags and transform them into custom block types
  4. The Ghost guide pattern works: Convert markdown → HTML (using marked, remark-html, or similar), then use htmlToBlocks

Quick implementation pattern:

import {htmlToBlocks} from '@sanity/block-tools'
import {JSDOM} from 'jsdom'
import {marked} from 'marked' // or your preferred markdown parser

// Convert markdown to HTML first
const html = marked(yourMarkdownContent)

// Then to Portable Text
const blocks = htmlToBlocks(html, blockContentType, {
  parseHtml: html => new JSDOM(html).window.document
})

Pro tips:

  • Test with your most complex markdown first (nested lists, code blocks, images, inline styles)
  • You may need custom deserialization rules for specific markdown features
  • Keep your original markdown files as backup during migration
  • Remember that htmlToBlocks is synchronous, so you'll need to handle image uploads and asset references in a separate post-processing step

The two-step process (markdown → HTML → Portable Text) gives you more control and better debugging when things don't convert perfectly. Good luck with your migration!

Show original thread
32 replies
Great blog post. I was looking at making a lot of very similar decisions, but they took it much further. I wouldn’t mind helping out if you are making your converter open source
I’d be happt to share once I get it working. Still trying to figure out how to run a node script with modules correctly 🤦
SyntaxError: Cannot use import statement outside a module
this sorta thing
ahh just use
const myLib = require('my-package')
when I do that it eventually fails on importing things inside sanity libraries, etc
You could also trying running your script with
sanity exec path/to/your-script.js
oh, interesting, I’ll give that a shot!
That could fix import errors since it lets you use the
import lib from "package"
syntax
you’re a lifesaver, that did it
I mean I now have bugs to fix but it’s working
Haha glad I could help
In that guide, can you figure out what to do with
ndjson
once you’ve populated
ndSerialize
? I’m not understanding how I get that to a file from here
I just used
sanity import myfile.ndjson my-dataset-name
I can’t figure out how to actually write out the ndjson from the processing script though, I’m sure something trivial is escaping me
I’m not sure where you’re getting
ndSerialize
from. What I did was read the
md
file with node, extract out the content, and write to a
.ndjson
file. From there I used
sanity import
to upload the
.ndjson
file
oh sorry, in this guide they generate the ndjson file. I have a similar case where I have a handful of preprocessing I want to do so I have a nice clean ndjson at the end. That post leaves out what to do at the very end to get the file. I can ask in the group here if you aren’t sure, though
Oh from the article! Let me look again and see
What do you mean by “get the file”? they also used
sanity dataset import
after generating the ndjson file
I was expecting the script to at the very end write to disk, somehow
it’s not clear it actually does that
I see. They don’t actually show any function calls, they just make the functions
const ndSerialize = ndjson.serialize();
let ndJsonLines = '';
ndSerialize.on('data', (line) => (ndJsonLines += `${line}`));
const writeJson = (obj) => ndSerialize.write(obj);
and you call
writeJson
, I get that part, it seems like it’s loading
ndSerialize
with each line, but it never seems like that’s closed out and written to disk
I see, you’re on the right track but your logic is off. They are using the
processPost
function for every object in their ``mux-blog.ghost.json`` file
yep, I’m working on a json export I got from gatsby’s graphql so I don’t have to think too hard about the logic of finding all my posts and associated files
sure, but where does the ndjson end up? I basically am doing this script, but like is there a file that should be output or?
You have to decide the path and create the stream to write to a file
ahhhhhh ok. They write to
ndJsonLines
🤦
Exactly
that’s all the final puzzle pieces, thank you, now to figuring out how to handle the markdown body cleanly
Awesome! Let me know when you work through it!
Hey
user M
any progress?
yeah made good progress on the import script last week, about the only think left I have is the markdown/html/body so I think I’ll be tackling that this week

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?