Modifying a query to exclude drafts when finding duplicate articles in a dataset.

8 replies

Last updated: Apr 12, 2021

Hi everybody! I am trying to find and delete duplicate articles in my dataset and I’m using this query: https://www.sanity.io/schemas/groq-query-to-find-duplicate-data-Q1kbbo7fUQAjtPldAqGhY The question is, how to modify this query to not return drafts? (Are drafts considered as duplicates?) I’ve tried adding

&& !(_id in path('drafts.**'))

but operation times out even for finding one duplicate.

Apr 12, 2021, 12:43 PM

Good catch, User! I had thought

_id != ^._id

would exclude drafts but clearly it doesn’t. It should work if you append the code you wrote—`&& !(_id in path('drafts.**'))`—to the filter:

*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[defined(duplicates)]

Apr 12, 2021, 2:10 PM

user A

thanks for the reply. When I’m trying to get one duplicate using this query, operation times out.. Our dataset is quite big..

*[_type == $type] {

"duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}

}[defined(duplicates)][0]

Apr 12, 2021, 2:15 PM

This is good feedback. Thank you. I can see now that a more robust equijoin would be using

[count(duplicates) > 0]

as the second filter (

[defined(duplicates)]

) only works on v1 of the API. I’ll update the snippet but am getting some errors in the community studio at the moment.
For a large dataset I wonder if you might need to add a slice. Something like:

*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0][0..50][0]

Apr 12, 2021, 2:29 PM

I’ve just tried

*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0][0..50][0]

and

*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0][0]

on my dataset but they both return time out..

Apr 12, 2021, 2:37 PM

Shoot. I’ll keep working at it, User. If I could ask, which version of the API were you using?

Apr 12, 2021, 2:41 PM

I am using v1 API, I’ll try to run query on latest version..

Apr 12, 2021, 2:43 PM

Just tried running these queries with the v2021-03-25 and they work🎉. Thanks User!

Apr 12, 2021, 3:24 PM

Phew! Thanks for sticking with it. Hopefully I can amend it to be more robust. Your use-case feedback is a huge help. 😄

Apr 12, 2021, 3:26 PM

Sanity– build remarkable experiences at scale

Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.

Get started for free Explore the demo

Platform

Features

Use cases

Integrations

Learn

Build and share

Frameworks

Discover

Case studies

Popular guides

Modifying a query to exclude drafts when finding duplicate articles in a dataset.

Sanity– build remarkable experiences at scale

Was this answer helpful?