How to import a Jekyll blog to Sanity?

32 replies
Last updated: Aug 18, 2022
To
user Y
or anyone who knows:Is there a recommended way to import a Jekyll blog to sanity? I just tried the “markdown-to-sanity” plugin, but it gave me an error.
AI Update

I found some helpful information about importing Jekyll blogs to Sanity! While there isn't a specific official "markdown-to-sanity" plugin in Sanity's documentation, there are several proven approaches you can use to migrate your Jekyll content.

The best solution is to use @portabletext/block-tools, which is designed specifically for converting HTML to Sanity's Portable Text format. Since Jekyll posts are markdown with YAML front matter, here's the workflow:

  1. Parse the markdown files - Use gray-matter to extract front matter (title, date, etc.) and the markdown body
  2. Convert markdown to HTML - Use a library like marked
  3. Convert HTML to Portable Text - Use @portabletext/block-tools

Here's a basic migration script structure:

import {JSDOM} from 'jsdom'
import {htmlToBlocks} from '@portabletext/block-tools'
import matter from 'gray-matter'
import {marked} from 'marked'
import fs from 'fs'

// Parse your Jekyll post
const fileContent = fs.readFileSync('your-post.md', 'utf-8')
const {data: frontMatter, content: markdown} = matter(fileContent)

// Convert markdown to HTML
const html = marked(markdown)

// Convert HTML to Portable Text blocks
const blocks = htmlToBlocks(html, blockContentSchema, {
  parseHtml: (html) => new JSDOM(html).window.document,
  rules: [
    // Add custom rules for images, code blocks, etc.
    {
      deserialize(node, next, block) {
        if (node.nodeName.toLowerCase() === 'img') {
          return block({
            _type: 'image',
            url: node.getAttribute('src'),
            alt: node.getAttribute('alt'),
          })
        }
        return undefined
      }
    }
  ]
})

Avoiding Import Errors

A common issue when running migration scripts is module import errors. The solution is to run your script using Sanity's CLI:

sanity exec path/to/your-migration-script.js --with-user-token

This lets you use ES6 import syntax and access Sanity's libraries directly without configuration headaches.

Handling Images

For images in your posts, you'll need to:

  1. Upload them to Sanity's asset pipeline using the client
  2. Replace URLs with Sanity asset references
  3. Use an in-memory cache to avoid re-uploading duplicates

The uploading assets efficiently guide covers this in detail.

Helpful Resources

About That Plugin Error

If you can share the specific error message you're getting with the "markdown-to-sanity" plugin, I can help troubleshoot it! It might be a versioning issue, configuration problem, or the plugin might be outdated. The @portabletext/block-tools approach above is the current recommended method and is actively maintained by the Sanity team.

Show original thread
32 replies
What error?
That seems like a good old bug
It seems to come from the fact you have <figure> elements without a <figcaption>, which is why it fails. Arguably, the library should be less strict on this.
Someone should tell its creator 🧐
😅
Do you want me to PR it?
for sure – it would be appreciated!
I’ll try to get onto npm soonish
Sweet, thanks y’all!
Just published
0.1.1
with Kitty’s fix. Try it out and see if it works better!
Yes, it produced the file!In formatting it for importing, my images are within text blocks like:

{"_type":"block","markDefs":[],"style":"normal","children":[{"_type":"image","_sanityAsset":{"src":"url","alt":"Book Gift Packaging from Amazon"}}]}
How should we handle the alt text attribute?
There’s a lot of regex work to do here.
I have a separate schema type called “graphicImage” and I was thinking maybe it’d work like this?:

{"_type":"block","markDefs":[],"style":"normal","children":[{"_type":"graphicImage","_sanityAsset":"image@url","alt":"Book Gift Packaging from Amazon"}]}
Why regex, out of curiosity?
☝️ I also wonder about this. Since you have JSON you can for example use GROQ (your milage might vary).
This won’t be the exact query, since we only have the block object, but hopefully you can see how it works.
https://groq.dev/oU3Sy68yC8xdpcZOaOeEKa
I thought I needed to format the ndjson file for easier importing, via guidance at:
https://www.sanity.io/docs/importing-data
Yup! With groq-cli you can input and output ndjson too
For example
cat production.ndjson|groq '*[]{...,children[]{...,_type == "image" => {...,"_type": "graphicImage"}}}' -n > production_edited.ndjson

So I have approximately 650 posts like this in a file:
https://gist.github.com/mvellandi/0228c9f53e0da2f5d25ddecc60b8a1c4 And you’re recommending I use the groq-cli to format it like your example
Yes, pretty much. Just worried that trying to do this with Regex will be a bit painful. Also, it’s already structured, so you shouldn’t need to. I got this to work at least:
cat production.ndjson|groq '*[_type == "post"]{...,body{...,children[]{..., _type == "img" => {...,"_type": "graphicImage"}}}}' -n > production_edited.ndjson

Okay, so do I need to then still rename the “asset” key to “_sanityAsset”? I’m supposing I should be trying to build “graphicImage” types and its structure, keys/values.
and convert urls to image@weburl or image@filepath
The
_sanityAsset
thing is a way for our import tool to automatically upload those photos for you, it will automatically fix the object to be
asset._ref
when it imports. If that makes sense.
ah, yes. But that can also be done in GROQ. to sec
Hmm. I’m realising that your images are inline in text blocks. It will work, but you probably would want them in their own blocks in between paragraps?
I wish we had a simpler abstraction for HTML->Portable Text. But Markdown/HTML can be so messy/unstructured.
Yes, my use case is articles with inline body images
aight
I think maybe this should do the trick?
Note: I hoisted the
alt
up to the image object (since
asset
will be overwritten in the import)

cat production.ndjson|groq '*[_type == "post"]{...,body{...,children[]{..., _type == "img" => {...,"_type": "graphicImage", "_sanityAsset": "image@" + asset.src,"alt": asset.alt, "asset": null}}}}' -n > production_edited.ndjson
The GROQ query:

*[_type == "post"]{
  ...,
  body{
    ...,
    children[]{
      ..., 
      _type == "img" => {
        ...,
        "_type": "graphicImage", 
        "_sanityAsset": "image@" + asset.src,
        "alt": asset.alt,
        "asset": null
      }
    }
  }
}

Well, this approach is certainly nicer than regex.So this query of yours should conform with this:

export default {
  name: "graphicImage",
  type: "object",
  title: "image",
  fields: [
    {
      name: "image",
      type: "image",
      title: "Image",
      options: {
        hotspot: true,
      },
    },
    {
      name: "alt",
      type: "string",
      title: "Alt Text",
    },
  ],
};
Takk skal du ha (Thanks). I’ll advise when everything’s working 🙂
1. With the markdown-to-sanity tool, is there a possibility of converting additional frontmatter fields like categories and tags?2. How should one consider HTML href links from posts to other posts? I’m assuming these would turn to references…somehow. If it’s just for a website, I’m okay if I hardcode the URL path. Just curious.
3. Are there any plausible reasons why data imports would timeout other than size? After trying to import ~630 records with 8mb of assets and getting timeouts after 11min, I split up the ndjson file into 5 files. I’m still getting the following error with just one file of 120 records:
Error: connect ETIMEDOUT 52.46.128.194:80
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16)

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?