Modifying a query to exclude drafts when finding duplicate articles in a dataset.

8 replies
Last updated: Apr 12, 2021
Hi everybody! I am trying to find and delete duplicate articles in my dataset and I’m using this query: https://www.sanity.io/schemas/groq-query-to-find-duplicate-data-Q1kbbo7fUQAjtPldAqGhY The question is, how to modify this query to not return drafts? (Are drafts considered as duplicates?) I’ve tried adding
&& !(_id in path('drafts.**'))
but operation times out even for finding one duplicate.
Apr 12, 2021, 12:43 PM
Good catch, User! I had thought
_id != ^._id
would exclude drafts but clearly it doesn’t. It should work if you append the code you wrote—`&& !(_id in path('drafts.**'))`—to the filter:

*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[defined(duplicates)]
Apr 12, 2021, 2:10 PM
user A
thanks for the reply. When I’m trying to get one duplicate using this query, operation times out.. Our dataset is quite big..
*[_type == $type] {

"duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}

}[defined(duplicates)][0]
Apr 12, 2021, 2:15 PM
This is good feedback. Thank you. I can see now that a more robust equijoin would be using
[count(duplicates) > 0]
as the second filter (
[defined(duplicates)]
) only works on v1 of the API. I’ll update the snippet but am getting some errors in the community studio at the moment.
For a large dataset I wonder if you might need to add a slice. Something like:


*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0][0..50][0]
Apr 12, 2021, 2:29 PM
I’ve just tried
*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0][0..50][0]
and

*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0][0]
on my dataset but they both return time out..
Apr 12, 2021, 2:37 PM
Shoot. I’ll keep working at it, User. If I could ask, which version of the API were you using?
Apr 12, 2021, 2:41 PM
I am using v1 API, I’ll try to run query on latest version..
Apr 12, 2021, 2:43 PM
Just tried running these queries with the v2021-03-25 and they work🎉. Thanks User!
Apr 12, 2021, 3:24 PM
Phew! Thanks for sticking with it. Hopefully I can amend it to be more robust. Your use-case feedback is a huge help. 😄
Apr 12, 2021, 3:26 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Was this answer helpful?