Content Lake (Datastore)

Importing Data

How to go about importing data in bulk, including file and image assets.

Media Library available

There are two ways to import data into your Sanity project.

The recommended way of importing data is to use the Command Line Interface. You can run sanity dataset import --help for a quick summary of syntax and options. Your other option is to use one of our client libraries and handle it yourself.

Gotcha

Import using the CLI

The Sanity import tool operates on newline-delimited JSON (NDJSON) files. Basically, each line in a file is a valid JSON-object containing a document you want to import.

Documents should follow the structure of your data model – most importantly, the requirement of a _type attribute. The _id field is optional – but helpful – in case you want to make references or be able to re-import your data replacing data from an old import. _ids in Sanity are usually a GUID, but any string containing only letters, numbers, hyphens, and underscores are valid.

During import, all references are automatically set to weak, then flipped to strong after all documents are in place. This ensures that you can import documents that reference other documents in any order you like.

Assets (images and files) are stored using references in Sanity. To make it easy to import these and refer to them within your documents, you can use a special _sanityAsset property where you would normally put a _ref. For instance, let's say you want your document to end up like this:

{
  "_id": "movie_123",
  "_type": "movie",
  "title": "Rogue One",
  "poster": {
    "_type": "image",
    "asset": {
      "_ref": "image_234",
      "_type": "reference"
    }
  }
}

This is what your ready-to-import document should look like:

{
  "_id": "movie_123",
  "_type": "movie",
  "title": "Rogue One",
  "poster": {
    "_type": "image",
    "_sanityAsset": "image@file:///local/path/to/rogue-one-poster.jpg",
  }
}

However, ndjson uses the newline character as delimiter (NDJSON == Newline Delimited JSON), therefore your ndjson file must be structured with one document on each line, like this:

{"_id": "movie_123", "_type": "movie", "title": "Rogue One", "poster": {"_type": "image", "_sanityAsset": "image@file:///local/path/to/rogue-one-poster.jpg"}}
{"_id": "another_movie", "_type": "movie"}
{"_id": "yet_another_movie", "_type": "movie"}

Note that you need to prefix the asset URL with a type declaration – either image@ or file@.

If your asset is on the Internet use image@https://example.com/path/to/rogue-one-poster.jpg instead of image@file:///local/path/to/rogue-one-poster.jpg.

Gotcha

Once you have prepared your ndjson file, you can run the import using the Sanity CLI.

What should I import?

sanity dataset import <file> <targetDataset>

E.g.:

sanity dataset import my-data-dump.ndjson production

// or

sanity dataset import staging.tar.gz production

Protip

Protip

Import using a client library

If you prefer not to use our CLI import tool, you may of course do the import yourself with help from one of our client libraries.

There are some common pitfalls to keep in mind:

  • Concurrency. While you may have thousands of documents to import, you shouldn't trigger thousands of requests in parallel. This is going to exceed API rate limits and might fail. We advise you to use a queue with a reasonably low concurrency.
    Use a library to keep your import below our API rate limit:
const {default: PQueue} = require('p-queue')
const queue = new PQueue({
  concurrency: 1,
  interval: 1000 / 25
})

queue.add(() => client.create(...))
queue.add(() => client.patch('id').inc('visits').commit())
  • API usage limits. Importing large data sets can quickly cause a lot of requests, especially if you import a single document per request. It is usually a good idea to send multiple mutations within a single transaction.
  • Mutation size limits. While it's a good idea to do multiple mutations per transaction, you need to make sure that the size of the request is within our limits, in terms of byte size.
  • Mutation visibility. A Sanity client will use the visibility mode of sync by default, which means that it will wait for the documents to be searchable before returning. This should not be necessary when importing large datasets, so we recommend you use deferred. If you have a lot of documents, it can take a little while for them to be searchable, but the import job will move along much faster.
  • References. If you are referring to one document from another, they either need to be imported in the right order, or the reference needs to be flagged as weak by setting the _weak property to true. After importing, you probably want to remove the weak property in order to prevent referenced documents from being deleted.

Gotcha

  • Assets. Since assets (e.g., files and images) in Sanity are stored using references, you'll need to upload the assets first and put the returned document ID in your reference.

With this in mind, do check out our client libraries documentation to see how to perform mutations.

Was this page helpful?