Importing a Jekyll blog to Sanity using GROQ queries and graphicImage schema type

32 replies
Last updated: Aug 18, 2022
To
user Y
or anyone who knows:Is there a recommended way to import a Jekyll blog to sanity? I just tried the “markdown-to-sanity” plugin, but it gave me an error.
Aug 17, 2022, 2:08 PM
What error?
Aug 17, 2022, 2:08 PM
That seems like a good old bug
Aug 17, 2022, 2:13 PM
It seems to come from the fact you have <figure> elements without a <figcaption>, which is why it fails. Arguably, the library should be less strict on this.
Aug 17, 2022, 2:14 PM
Someone should tell its creator 🧐
Aug 17, 2022, 2:15 PM
😅
Aug 17, 2022, 2:15 PM
Do you want me to PR it?
Aug 17, 2022, 2:15 PM
for sure – it would be appreciated!
Aug 17, 2022, 2:17 PM
I’ll try to get onto npm soonish
Aug 17, 2022, 2:27 PM
Sweet, thanks y’all!
Aug 17, 2022, 2:31 PM
Just published
0.1.1
with Kitty’s fix. Try it out and see if it works better!
Aug 17, 2022, 3:21 PM
Yes, it produced the file!In formatting it for importing, my images are within text blocks like:

{"_type":"block","markDefs":[],"style":"normal","children":[{"_type":"image","_sanityAsset":{"src":"url","alt":"Book Gift Packaging from Amazon"}}]}
How should we handle the alt text attribute?
There’s a lot of regex work to do here.
I have a separate schema type called “graphicImage” and I was thinking maybe it’d work like this?:

{"_type":"block","markDefs":[],"style":"normal","children":[{"_type":"graphicImage","_sanityAsset":"image@url","alt":"Book Gift Packaging from Amazon"}]}
Aug 17, 2022, 9:16 PM
Why regex, out of curiosity?
Aug 18, 2022, 7:16 AM
☝️ I also wonder about this. Since you have JSON you can for example use GROQ (your milage might vary).
This won’t be the exact query, since we only have the block object, but hopefully you can see how it works.
https://groq.dev/oU3Sy68yC8xdpcZOaOeEKa
Aug 18, 2022, 7:22 AM
I thought I needed to format the ndjson file for easier importing, via guidance at:
https://www.sanity.io/docs/importing-data
Aug 18, 2022, 7:53 AM
Yup! With groq-cli you can input and output ndjson too
Aug 18, 2022, 7:56 AM
For example
cat production.ndjson|groq '*[]{...,children[]{...,_type == "image" => {...,"_type": "graphicImage"}}}' -n > production_edited.ndjson

Aug 18, 2022, 7:59 AM
So I have approximately 650 posts like this in a file:
https://gist.github.com/mvellandi/0228c9f53e0da2f5d25ddecc60b8a1c4 And you’re recommending I use the groq-cli to format it like your example
Aug 18, 2022, 8:02 AM
Yes, pretty much. Just worried that trying to do this with Regex will be a bit painful. Also, it’s already structured, so you shouldn’t need to. I got this to work at least:
cat production.ndjson|groq '*[_type == "post"]{...,body{...,children[]{..., _type == "img" => {...,"_type": "graphicImage"}}}}' -n > production_edited.ndjson

Aug 18, 2022, 8:13 AM
Okay, so do I need to then still rename the “asset” key to “_sanityAsset”? I’m supposing I should be trying to build “graphicImage” types and its structure, keys/values.
Aug 18, 2022, 8:22 AM
and convert urls to image@weburl or image@filepath
Aug 18, 2022, 8:24 AM
The
_sanityAsset
thing is a way for our import tool to automatically upload those photos for you, it will automatically fix the object to be
asset._ref
when it imports. If that makes sense.
Aug 18, 2022, 8:24 AM
ah, yes. But that can also be done in GROQ. to sec
Aug 18, 2022, 8:25 AM
Hmm. I’m realising that your images are inline in text blocks. It will work, but you probably would want them in their own blocks in between paragraps?
Aug 18, 2022, 8:28 AM
I wish we had a simpler abstraction for HTML->Portable Text. But Markdown/HTML can be so messy/unstructured.
Aug 18, 2022, 8:29 AM
Yes, my use case is articles with inline body images
Aug 18, 2022, 8:29 AM
aight
Aug 18, 2022, 8:29 AM
I think maybe this should do the trick?
Note: I hoisted the
alt
up to the image object (since
asset
will be overwritten in the import)

cat production.ndjson|groq '*[_type == "post"]{...,body{...,children[]{..., _type == "img" => {...,"_type": "graphicImage", "_sanityAsset": "image@" + asset.src,"alt": asset.alt, "asset": null}}}}' -n > production_edited.ndjson
The GROQ query:

*[_type == "post"]{
  ...,
  body{
    ...,
    children[]{
      ..., 
      _type == "img" => {
        ...,
        "_type": "graphicImage", 
        "_sanityAsset": "image@" + asset.src,
        "alt": asset.alt,
        "asset": null
      }
    }
  }
}

Aug 18, 2022, 8:32 AM
Well, this approach is certainly nicer than regex.So this query of yours should conform with this:

export default {
  name: "graphicImage",
  type: "object",
  title: "image",
  fields: [
    {
      name: "image",
      type: "image",
      title: "Image",
      options: {
        hotspot: true,
      },
    },
    {
      name: "alt",
      type: "string",
      title: "Alt Text",
    },
  ],
};
Aug 18, 2022, 8:48 AM
Takk skal du ha (Thanks). I’ll advise when everything’s working 🙂
Aug 18, 2022, 9:00 AM
1. With the markdown-to-sanity tool, is there a possibility of converting additional frontmatter fields like categories and tags?2. How should one consider HTML href links from posts to other posts? I’m assuming these would turn to references…somehow. If it’s just for a website, I’m okay if I hardcode the URL path. Just curious.
Aug 18, 2022, 1:17 PM
3. Are there any plausible reasons why data imports would timeout other than size? After trying to import ~630 records with 8mb of assets and getting timeouts after 11min, I split up the ndjson file into 5 files. I’m still getting the following error with just one file of 120 records:
Error: connect ETIMEDOUT 52.46.128.194:80
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16)
Aug 18, 2022, 1:29 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Was this answer helpful?