Importing a Jekyll blog to Sanity using GROQ queries and graphicImage schema type

32 replies

Last updated: Aug 18, 2022

user Y

or anyone who knows:Is there a recommended way to import a Jekyll blog to sanity? I just tried the “markdown-to-sanity” plugin, but it gave me an error.

Aug 17, 2022, 2:08 PM

What error?

Aug 17, 2022, 2:08 PM

That seems like a good old bug

Aug 17, 2022, 2:13 PM

It seems to come from the fact you have <figure> elements without a <figcaption>, which is why it fails. Arguably, the library should be less strict on this.

Aug 17, 2022, 2:14 PM

Someone should tell its creator 🧐

Aug 17, 2022, 2:15 PM

😅

Aug 17, 2022, 2:15 PM

Do you want me to PR it?

Aug 17, 2022, 2:15 PM

for sure – it would be appreciated!

Aug 17, 2022, 2:17 PM

Done: https://github.com/kmelve/markdown-to-sanity/pull/19

Aug 17, 2022, 2:18 PM

I’ll try to get onto npm soonish

Aug 17, 2022, 2:27 PM

Sweet, thanks y’all!

Aug 17, 2022, 2:31 PM

Just published

0.1.1

with Kitty’s fix. Try it out and see if it works better!

Aug 17, 2022, 3:21 PM

Yes, it produced the file!In formatting it for importing, my images are within text blocks like:

{"_type":"block","markDefs":[],"style":"normal","children":[{"_type":"image","_sanityAsset":{"src":"url","alt":"Book Gift Packaging from Amazon"}}]}

How should we handle the alt text attribute?
There’s a lot of regex work to do here.
I have a separate schema type called “graphicImage” and I was thinking maybe it’d work like this?:

{"_type":"block","markDefs":[],"style":"normal","children":[{"_type":"graphicImage","_sanityAsset":"image@url","alt":"Book Gift Packaging from Amazon"}]}

Aug 17, 2022, 9:16 PM

Why regex, out of curiosity?

Aug 18, 2022, 7:16 AM

☝️ I also wonder about this. Since you have JSON you can for example use GROQ (your milage might vary).
This won’t be the exact query, since we only have the block object, but hopefully you can see how it works.
https://groq.dev/oU3Sy68yC8xdpcZOaOeEKa

Aug 18, 2022, 7:22 AM

I thought I needed to format the ndjson file for easier importing, via guidance at:
https://www.sanity.io/docs/importing-data

Aug 18, 2022, 7:53 AM

Yup! With groq-cli you can input and output ndjson too

Aug 18, 2022, 7:56 AM

For example

cat production.ndjson|groq '*[]{...,children[]{...,_type == "image" => {...,"_type": "graphicImage"}}}' -n > production_edited.ndjson

Aug 18, 2022, 7:59 AM

So I have approximately 650 posts like this in a file:
https://gist.github.com/mvellandi/0228c9f53e0da2f5d25ddecc60b8a1c4 And you’re recommending I use the groq-cli to format it like your example

Aug 18, 2022, 8:02 AM

Yes, pretty much. Just worried that trying to do this with Regex will be a bit painful. Also, it’s already structured, so you shouldn’t need to. I got this to work at least:

cat production.ndjson|groq '*[_type == "post"]{...,body{...,children[]{..., _type == "img" => {...,"_type": "graphicImage"}}}}' -n > production_edited.ndjson

Aug 18, 2022, 8:13 AM

Okay, so do I need to then still rename the “asset” key to “_sanityAsset”? I’m supposing I should be trying to build “graphicImage” types and its structure, keys/values.

Aug 18, 2022, 8:22 AM

and convert urls to image@weburl or image@filepath

Aug 18, 2022, 8:24 AM

The

_sanityAsset

thing is a way for our import tool to automatically upload those photos for you, it will automatically fix the object to be

asset._ref

when it imports. If that makes sense.

Aug 18, 2022, 8:24 AM

ah, yes. But that can also be done in GROQ. to sec

Aug 18, 2022, 8:25 AM

Hmm. I’m realising that your images are inline in text blocks. It will work, but you probably would want them in their own blocks in between paragraps?

Aug 18, 2022, 8:28 AM

I wish we had a simpler abstraction for HTML->Portable Text. But Markdown/HTML can be so messy/unstructured.

Aug 18, 2022, 8:29 AM

Yes, my use case is articles with inline body images

Aug 18, 2022, 8:29 AM

aight

Aug 18, 2022, 8:29 AM

I think maybe this should do the trick?
Note: I hoisted the

alt

up to the image object (since

asset

will be overwritten in the import)

cat production.ndjson|groq '*[_type == "post"]{...,body{...,children[]{..., _type == "img" => {...,"_type": "graphicImage", "_sanityAsset": "image@" + asset.src,"alt": asset.alt, "asset": null}}}}' -n > production_edited.ndjson

The GROQ query:

*[_type == "post"]{
  ...,
  body{
    ...,
    children[]{
      ..., 
      _type == "img" => {
        ...,
        "_type": "graphicImage", 
        "_sanityAsset": "image@" + asset.src,
        "alt": asset.alt,
        "asset": null
      }
    }
  }
}

Aug 18, 2022, 8:32 AM

Well, this approach is certainly nicer than regex.So this query of yours should conform with this:

export default {
  name: "graphicImage",
  type: "object",
  title: "image",
  fields: [
    {
      name: "image",
      type: "image",
      title: "Image",
      options: {
        hotspot: true,
      },
    },
    {
      name: "alt",
      type: "string",
      title: "Alt Text",
    },
  ],
};

Aug 18, 2022, 8:48 AM

Takk skal du ha (Thanks). I’ll advise when everything’s working 🙂

Aug 18, 2022, 9:00 AM

1. With the markdown-to-sanity tool, is there a possibility of converting additional frontmatter fields like categories and tags?2. How should one consider HTML href links from posts to other posts? I’m assuming these would turn to references…somehow. If it’s just for a website, I’m okay if I hardcode the URL path. Just curious.

Aug 18, 2022, 1:17 PM

3. Are there any plausible reasons why data imports would timeout other than size? After trying to import ~630 records with 8mb of assets and getting timeouts after 11min, I split up the ndjson file into 5 files. I’m still getting the following error with just one file of 120 records:

Error: connect ETIMEDOUT 52.46.128.194:80
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16)

Aug 18, 2022, 1:29 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Get started for free Explore the demo

Platform

Features

Use cases

Integrations

Learn

Build and share

Frameworks

Discover

Case studies

Popular guides

Importing a Jekyll blog to Sanity using GROQ queries and graphicImage schema type

Sanity– build remarkable experiences at scale

Was this answer helpful?