Discussion about importing data and handling duplicate IDs in Sanity dataset

51 replies
Last updated: Mar 19, 2022
Hey dear Sanity people!

sanity dataset import <file> --replace
doesn't seem to work properly? I still get the warning about duplicate IDs.
I have currently deleted all items of the type I am trying to import here. So I assume, that IDs can only be used once even if deleted?


sanity help dataset import
confirms that my syntax is correct. Any input?
Mar 19, 2022, 7:07 PM
Might it be duplicate ids in the dataset you’re importing?
Mar 19, 2022, 7:09 PM
The duplicate ids are on purpose to use
--replace
to overwrite entries.
Currently working on an import script and planned to use this for testing
đź‘Ť
Have I misunderstood the concept?

Thank you for your fast answer,
user Y
!
Mar 19, 2022, 7:12 PM
You get an error when the CLI introspects your
test-out.ndjson
file. The duplicate IDs is within that file, so you have to clean out those duplicates before you can import it to the dataset. If that makes sense
Mar 19, 2022, 7:14 PM
You get an error when the CLI introspects your
test-out.ndjson
file. The duplicate IDs is within that file, so you have to clean out those duplicates before you can import it to the dataset. If that makes sense
Mar 19, 2022, 7:14 PM
(The
--replace
is when you have matching IDs between your source file and the remote dataset)
Mar 19, 2022, 7:14 PM
But the CLI error could have been clearer about this
Mar 19, 2022, 7:15 PM
That is clear, but I thought that the
--replace
flag was actually intended to be used to replace entries with duplicate ids?
Mar 19, 2022, 7:15 PM
I have tried to follow this tip:
Mar 19, 2022, 7:15 PM
Ahhh..
Mar 19, 2022, 7:16 PM
--replace
= if you have a local document with an
_id
that matches a document in your remote dataset, it will be overwritten with the local document
Your error is because within your
ndjson
file, there are documents with duplicate IDs.
Mar 19, 2022, 7:16 PM
Okay.
Mar 19, 2022, 7:16 PM
You mean duplicated ids within the source file.
Mar 19, 2022, 7:16 PM
That makes a lot of sense.
Mar 19, 2022, 7:16 PM
Yes! So the error should probably say:
Error: Found 4 duplicate IDs within your source file:

Mar 19, 2022, 7:17 PM
Definitely 🙂
Mar 19, 2022, 7:17 PM
Another interesting point:
I struggled a bit to confirm that I was using the right syntax before that, where specifically the flag should be used.
Mar 19, 2022, 7:18 PM
E.g.

sanity dataset import <file> dataset production --replace
sanity dataset import --replace <file> dataset production
sanity dataset --replace import <file> dataset production

Mar 19, 2022, 7:19 PM
I feel that the example hidden at
sanity help dataset import
could be added to the tip directly or the error message could be adapted as the flag simply gets ignored.
Mar 19, 2022, 7:20 PM
Or actually leads to some interesting output:
Mar 19, 2022, 7:20 PM
The import was successful now :party_parrot:
Mar 19, 2022, 7:22 PM
One additional question, if I might:
Mar 19, 2022, 7:23 PM
I have created a post-category type.
If those imported documents should have fields that refer to categories each, how would I go about that?

Look at the source of the categories, create an object including all of them (including refs, ids, ...) and selectively add them to the documents with a switch?
Mar 19, 2022, 7:24 PM
I have created a post-category type.
If those imported documents should have fields that refer to categories each, how would I go about that?

Look at the source of the categories, create an object including all of them (including refs, ids, ...) and selectively add them to the documents with a switch?
Mar 19, 2022, 7:24 PM
Or is there a "more established" way of doing that?
Mar 19, 2022, 7:25 PM
(I feel such cases could also be added to the docs)
Mar 19, 2022, 7:25 PM
Ah, and one personal note for you:

Thank you for your delete snippet, really helpful!

Since api versioning has been introduced it works no longer due to version needing to be specified for this operation. You could update it like so, if you find the time:


import sanityClient from 'part:@sanity/base/client'

const client = sanityClient.withConfig({ apiVersion: '2021-06-07' })

https://www.sanity.io/schemas/delete-documents-by-filter-698e1f26
Mar 19, 2022, 7:29 PM
Awesome, thank you!
Mar 19, 2022, 7:34 PM
I have created a post-category type.
If those imported documents should have fields that refer to categories each, how would I go about that?
Look at the source of the categories, create an object including all of them (including refs, ids, ...) and selectively add them to the documents with a switch?
Hm. It’s a bit hard to say without more context. So, how to prepare a source file with references between documents?
Mar 19, 2022, 7:34 PM
More like:
Type blogPost has a reference to postCategory.

During programmatic import, how could I "add" postCategories to the imported posts. They are references after all? So the solution would be to get the specific required values for those refs and add them based on the string?

E.g.:
Mar 19, 2022, 7:37 PM
More like:
Type blogPost has a reference to postCategory.

During programmatic import, how could I "add" postCategories to the imported posts. They are references after all? So the solution would be to get the specific required values for those refs and add them based on the string?

E.g.:
Mar 19, 2022, 7:37 PM
if (bp.category === 'tutorial' {
  document.categories.append({ _ref: "someId"})
}

Mar 19, 2022, 7:38 PM
Does that make sense?
Mar 19, 2022, 7:38 PM
Programmatically "filling in" the correct references to the array.
Mar 19, 2022, 7:38 PM
Yes, it does makes sense!
Just remember to add the type key too!

if (bp.category === 'tutorial' {
  document.categories.append({ _type: "reference", _ref: "someId"})
}
Mar 19, 2022, 7:39 PM
Ah, yes.
Mar 19, 2022, 7:40 PM
Awesome, so quite intuitive.
Mar 19, 2022, 7:40 PM
Thank you for your help, Knut!
Mar 19, 2022, 7:40 PM
For sure!
Mar 19, 2022, 7:40 PM
Interestingly they turn up as "empty references".
Mar 19, 2022, 8:09 PM
What a manually added reference in the schema looks like:
Mar 19, 2022, 8:09 PM
What the empty references look like:
Mar 19, 2022, 8:10 PM
I tried to force it with the
_id
so those can likely be ignored.
Mar 19, 2022, 8:12 PM
Directly from the `ndjson`:

"post_categories":[{"_rev":"ym1xCkF4NoCiAWtcIehMpe","_type":"reference"},{"_rev":"ym1xCkF4NoCiAWtcIehNCO","_type":"reference"}]
Mar 19, 2022, 8:20 PM
Directly from the `ndjson`:

"post_categories":[{"_rev":"ym1xCkF4NoCiAWtcIehMpe","_type":"reference"},{"_rev":"ym1xCkF4NoCiAWtcIehNCO","_type":"reference"}]
Mar 19, 2022, 8:20 PM
For some reason the
_id
seems to be the
_rev
from the screenshot above?
Mar 19, 2022, 8:23 PM
And what is up with the difference between
_rev
and
_ref
if acquired through
vision
?
Mar 19, 2022, 8:25 PM
And what is up with the difference between
_rev
and
_ref
if acquired through
vision
?
Mar 19, 2022, 8:25 PM
It was actually not possible to add the references afterwards with
--replace
if the documents (blogPosts) already existed.
Mar 19, 2022, 8:41 PM
I had to delete them all again and on a fresh import the categories were successfully created.
Mar 19, 2022, 8:41 PM

Sanity– build remarkable experiences at scale

The Sanity Composable Content Cloud is the headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Was this answer helpful?