Filtering out duplicate slugs after a large data import
Great question! Finding duplicate slugs after a large import is definitely something that comes up. Here's how you can tackle this:
Finding Duplicates with GROQ
The most straightforward approach is to query for all documents with slugs and then identify duplicates programmatically. Unfortunately, GROQ doesn't have built-in aggregation functions like SQL's GROUP BY and HAVING, so you'll need to process the results:
*[defined(slug.current)] {
_id,
_type,
"slug": slug.current
}Then in your code (JavaScript/Node.js), you can find duplicates:
const results = await client.fetch('*[defined(slug.current)] { _id, _type, "slug": slug.current }')
// Find duplicates
const slugMap = {}
results.forEach(doc => {
if (!slugMap[doc.slug]) {
slugMap[doc.slug] = []
}
slugMap[doc.slug].push(doc)
})
// Filter to only duplicates
const duplicates = Object.entries(slugMap)
.filter(([slug, docs]) => docs.length > 1)
.map(([slug, docs]) => ({ slug, docs }))
console.log('Duplicates found:', duplicates)Alternative: Check by Document Type
If you want to narrow it down by document type:
*[_type == "yourDocumentType" && defined(slug.current)] {
_id,
"slug": slug.current
}Prevention: Slug Uniqueness Validation
For the future, you can add a custom validation function to your schema to prevent duplicate slugs from being created. The slug field type in Sanity has built-in support for uniqueness checking in the Studio UI, but it doesn't enforce it at the API level during imports.
Hope this helps you clean up those duplicates! Let us know if you need help with the cleanup script once you've identified them.
Sanity β Build the way you think, not the way your CMS thinks
Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.