Migrating from Ghost: Problem Solving

32 replies
Last updated: May 18, 2020
yeah the markdown process is quite daunting. Working on that now myself. I’m considering generating html from the markdown, then to Sanity but not quite sure if that’s better or not than trying to convert markdown directly. This is getting me much of the way: https://www.sanity.io/guides/how-to-migrate-your-html-blog-content-from-ghost
AI Update

Great question! Converting markdown to Portable Text directly is generally the better path rather than going markdown → HTML → Portable Text. Here's why and how to approach it:

Going directly from markdown to Portable Text preserves the semantic structure better and avoids introducing HTML-specific quirks. The markdown → HTML → Portable Text route adds an unnecessary transformation step that can introduce edge cases.

The Best Tools for the Job

@sanity/block-tools (now also published as @portabletext/block-tools) is your primary library here. It includes htmlToBlocks which works with the Ghost guide you linked, but more importantly for your case, you can use it with remark/rehype to process markdown directly.

The typical workflow looks like:

  1. Markdown → MDAST (using remark-parse)
  2. MDAST → HAST (using remark-rehype)
  3. HAST → HTML (using rehype-stringify)
  4. HTML → Portable Text (using htmlToBlocks from @sanity/block-tools)

Here's a basic implementation:

import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import rehypeStringify from 'rehype-stringify'
import {htmlToBlocks} from '@sanity/block-tools'
import {Schema} from '@sanity/schema'
import {JSDOM} from 'jsdom'

// Load your Sanity schema
const defaultSchema = Schema.compile({
  name: 'default',
  types: [/* your schema types */]
})

const blockContentType = defaultSchema
  .get('blockContent') // or whatever your PT field is called
  .jsonType

async function markdownToPortableText(markdown) {
  // Convert markdown to HTML via remark
  const file = await unified()
    .use(remarkParse)
    .use(remarkRehype)
    .use(rehypeStringify)
    .process(markdown)
  
  const html = String(file)
  
  // Convert HTML to Portable Text blocks
  const blocks = htmlToBlocks(html, blockContentType, {
    parseHtml: (htmlString) => new JSDOM(htmlString).window.document
  })
  
  return blocks
}

Why Direct Conversion is Better

As mentioned in the content migration principles, migrations should be incremental and idempotent. Converting directly from markdown:

  • Preserves semantic meaning better (headings, lists, emphasis)
  • Avoids HTML-specific quirks like inline styles or presentational markup
  • Gives you cleaner Portable Text that's easier to query with GROQ
  • Reduces transformation complexity and potential bugs

The @sanity/block-tools package handles basic rich text formatting like headings, paragraphs, and lists automatically. You can also add custom deserialization rules to handle specific markdown patterns that need special treatment (like code blocks, images, or custom syntax).

Alternative: Keep Markdown Native

If you want to keep markdown as-is in Sanity and render it later, consider sanity-plugin-markdown. This stores markdown natively and provides a markdown editor in the Studio. However, you lose the powerful querying benefits of Portable Text.

The guide you linked is solid for HTML sources (like Ghost exports), but since you're starting with markdown, the remark/rehype ecosystem gives you a mature, well-maintained foundation for your migration scripts. The intermediate HTML step is quick and leverages the robust htmlToBlocks function that already handles edge cases well.

Great blog post. I was looking at making a lot of very similar decisions, but they took it much further. I wouldn’t mind helping out if you are making your converter open source
I’d be happt to share once I get it working. Still trying to figure out how to run a node script with modules correctly 🤦
SyntaxError: Cannot use import statement outside a module
this sorta thing
ahh just use
const myLib = require('my-package')
when I do that it eventually fails on importing things inside sanity libraries, etc
You could also trying running your script with
sanity exec path/to/your-script.js
oh, interesting, I’ll give that a shot!
That could fix import errors since it lets you use the
import lib from "package"
syntax
you’re a lifesaver, that did it
I mean I now have bugs to fix but it’s working
Haha glad I could help
In that guide, can you figure out what to do with
ndjson
once you’ve populated
ndSerialize
? I’m not understanding how I get that to a file from here
I just used
sanity import myfile.ndjson my-dataset-name
I can’t figure out how to actually write out the ndjson from the processing script though, I’m sure something trivial is escaping me
I’m not sure where you’re getting
ndSerialize
from. What I did was read the
md
file with node, extract out the content, and write to a
.ndjson
file. From there I used
sanity import
to upload the
.ndjson
file
oh sorry, in this guide they generate the ndjson file. I have a similar case where I have a handful of preprocessing I want to do so I have a nice clean ndjson at the end. That post leaves out what to do at the very end to get the file. I can ask in the group here if you aren’t sure, though
Oh from the article! Let me look again and see
What do you mean by “get the file”? they also used
sanity dataset import
after generating the ndjson file
I was expecting the script to at the very end write to disk, somehow
it’s not clear it actually does that
I see. They don’t actually show any function calls, they just make the functions
const ndSerialize = ndjson.serialize();
let ndJsonLines = '';
ndSerialize.on('data', (line) => (ndJsonLines += `${line}`));
const writeJson = (obj) => ndSerialize.write(obj);
and you call
writeJson
, I get that part, it seems like it’s loading
ndSerialize
with each line, but it never seems like that’s closed out and written to disk
I see, you’re on the right track but your logic is off. They are using the
processPost
function for every object in their ``mux-blog.ghost.json`` file
yep, I’m working on a json export I got from gatsby’s graphql so I don’t have to think too hard about the logic of finding all my posts and associated files
sure, but where does the ndjson end up? I basically am doing this script, but like is there a file that should be output or?
You have to decide the path and create the stream to write to a file
ahhhhhh ok. They write to
ndJsonLines
🤦
Exactly
that’s all the final puzzle pieces, thank you, now to figuring out how to handle the markdown body cleanly
Awesome! Let me know when you work through it!
Hey
user M
any progress?
yeah made good progress on the import script last week, about the only think left I have is the markdown/html/body so I think I’ll be tackling that this week

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?