GROQ query to find duplicate data

By Geoff Ball

GROQ query that finds duplicate data in a given field.

GROQ query in Vision or Sanity CLI

// GROQ query
*[_type == $type] {
  "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]}
}[count(duplicates) > 0]

// Params
{
  "type": "post",
  "findDups": "title"
}

This GROQ query takes advantage of Equijoins to search all documents of a particular type and return the _id and duplicated field so that action can be taken. One might use this query in Vision to see duplicates at a glance (or make modifications in the Studio), or from the Sanity CLI (or a program like Insomnia or Postman) where mutations could be made directly.

It is recommended to use a newer version of the GROQ API (e.g., v2021-03-25) due to speed enhancements made since v1. In the event that the query times out when used with a large dataset, you may want to add a slice to the end of the query (e.g., [0..50]).

This query depends on two params:


"type": "post"
: The document type to search. Change post to your desired document type.
"findDups": "title"
: The field to search for duplicates. Change title to the name (in the schema) of the field in which to search for duplicates.

The user can quickly change the document type (type) and field to scan for duplicates (findDups).

The query comes in three parts:

  1. *[_type == $type]: Filter for all documents that have a _type of $type (pulled in from params).
  2. { "duplicates": *[_type == $type && @[$findDups] == ^[$findDups] && _id != ^._id && !(_id in path('drafts.**'))] {_id, $findDups: @[$findDups]} }: Run another query that returns documents of that same _type that have duplicated data in the field named in findDups but whose _ids don't match. From that, return only _id and the field named in params as findDups (for brevity) and assign it all to the key duplicates.
  3. [count(duplicates) > 0]: From all the data assigned to duplicates, return only those that actually have duplicates. There's no need to see the rest.

Contributor

Other schemas by author