How to delete unused assets from large dataset without timeout errors?
I understand your pain! That query is timing out because checking count(*[references(^._id)]) for every asset means you're running a reference check across your entire dataset for each individual asset. With a large dataset, this becomes exponentially slow.
Here are a few approaches to solve this:
1. Use the Sanity CLI with Batching (Recommended)
Instead of doing this in a single GROQ query, use a script that processes assets in smaller batches and uses the client's API more efficiently. Here's a pattern:
import {createClient} from '@sanity/client'
const client = createClient({
projectId: 'your-project-id',
dataset: 'your-dataset',
token: 'your-token',
apiVersion: '2024-01-01',
useCdn: false
})
async function deleteUnusedAssets() {
// Get all assets
const assets = await client.fetch(
`*[_type in ["sanity.imageAsset", "sanity.fileAsset"]] {_id, _type}`
)
console.log(`Found ${assets.length} total assets`)
// Check each asset in batches
const batchSize = 10
let deleted = 0
for (let i = 0; i < assets.length; i += batchSize) {
const batch = assets.slice(i, i + batchSize)
await Promise.all(batch.map(async (asset) => {
// Check if this specific asset is referenced
const references = await client.fetch(
`count(*[references($id)])`,
{ id: asset._id }
)
if (references === 0) {
await client.delete(asset._id)
deleted++
console.log(`Deleted ${asset._id}`)
}
}))
console.log(`Processed ${Math.min(i + batchSize, assets.length)}/${assets.length}`)
}
console.log(`Deleted ${deleted} unused assets`)
}
deleteUnusedAssets()2. Build an Index First
Create a temporary document that tracks which assets are used, then delete based on that:
// First, get all referenced asset IDs
const referencedAssets = await client.fetch(
`array::unique(*[][_type match "sanity.*Asset"]._ref)`
)
// Then get all assets
const allAssets = await client.fetch(
`*[_type in ["sanity.imageAsset", "sanity.fileAsset"]]._id`
)
// Find the difference
const unusedAssets = allAssets.filter(id => !referencedAssets.includes(id))
// Delete in batches
for (const assetId of unusedAssets) {
await client.delete(assetId)
}3. Use Sanity Functions for Background Processing
For really large datasets, consider using Sanity Functions to run this as a background job with the 900-second timeout and 10GB memory limits, which is much more generous than API query limits. This is particularly useful if you're dealing with thousands of assets.
Why Your Query Times Out
The issue with *[_type in ["sanity.imageAsset", "sanity.fileAsset"] && count(*[references(^._id)]) == 0] is that for every asset, it scans your entire dataset looking for references. With thousands of assets and documents, this becomes a massive number of operations.
The batching approach works better because:
- You're making targeted queries with parameters (
references($id)) - The query planner can optimize better
- You can add delays/throttling if needed
- You get progress feedback
Additional Tips
- Run this during off-peak hours if you have active users
- Consider doing a dry run first (log what would be deleted without actually deleting)
- Keep the batch size small (5-10) to avoid rate limits
- Add error handling and logging to track progress
Hope this helps! Let me know how it goes.
Show original thread1 reply
Sanity – Build the way you think, not the way your CMS thinks
Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.