How to delete unused assets from large dataset without timeout errors?

1 replies
Last updated: Mar 23, 2021
Hey everybody, can somebody advise me on how to deal with deleting unused assets from a large dataset, my query gives me a timeout error. It works ok for smaller dataset. I would appreciate any help!
*[ _type in ["sanity.imageAsset", "sanity.fileAsset"] && count(*[ references(^._id) ]) == 0 ] [0..99]
AI Update

I understand your pain! That query is timing out because checking count(*[references(^._id)]) for every asset means you're running a reference check across your entire dataset for each individual asset. With a large dataset, this becomes exponentially slow.

Here are a few approaches to solve this:

Instead of doing this in a single GROQ query, use a script that processes assets in smaller batches and uses the client's API more efficiently. Here's a pattern:

import {createClient} from '@sanity/client'

const client = createClient({
  projectId: 'your-project-id',
  dataset: 'your-dataset',
  token: 'your-token',
  apiVersion: '2024-01-01',
  useCdn: false
})

async function deleteUnusedAssets() {
  // Get all assets
  const assets = await client.fetch(
    `*[_type in ["sanity.imageAsset", "sanity.fileAsset"]] {_id, _type}`
  )
  
  console.log(`Found ${assets.length} total assets`)
  
  // Check each asset in batches
  const batchSize = 10
  let deleted = 0
  
  for (let i = 0; i < assets.length; i += batchSize) {
    const batch = assets.slice(i, i + batchSize)
    
    await Promise.all(batch.map(async (asset) => {
      // Check if this specific asset is referenced
      const references = await client.fetch(
        `count(*[references($id)])`,
        { id: asset._id }
      )
      
      if (references === 0) {
        await client.delete(asset._id)
        deleted++
        console.log(`Deleted ${asset._id}`)
      }
    }))
    
    console.log(`Processed ${Math.min(i + batchSize, assets.length)}/${assets.length}`)
  }
  
  console.log(`Deleted ${deleted} unused assets`)
}

deleteUnusedAssets()

2. Build an Index First

Create a temporary document that tracks which assets are used, then delete based on that:

// First, get all referenced asset IDs
const referencedAssets = await client.fetch(
  `array::unique(*[][_type match "sanity.*Asset"]._ref)`
)

// Then get all assets
const allAssets = await client.fetch(
  `*[_type in ["sanity.imageAsset", "sanity.fileAsset"]]._id`
)

// Find the difference
const unusedAssets = allAssets.filter(id => !referencedAssets.includes(id))

// Delete in batches
for (const assetId of unusedAssets) {
  await client.delete(assetId)
}

3. Use Sanity Functions for Background Processing

For really large datasets, consider using Sanity Functions to run this as a background job with the 900-second timeout and 10GB memory limits, which is much more generous than API query limits. This is particularly useful if you're dealing with thousands of assets.

Why Your Query Times Out

The issue with *[_type in ["sanity.imageAsset", "sanity.fileAsset"] && count(*[references(^._id)]) == 0] is that for every asset, it scans your entire dataset looking for references. With thousands of assets and documents, this becomes a massive number of operations.

The batching approach works better because:

  • You're making targeted queries with parameters (references($id))
  • The query planner can optimize better
  • You can add delays/throttling if needed
  • You get progress feedback

Additional Tips

  • Run this during off-peak hours if you have active users
  • Consider doing a dry run first (log what would be deleted without actually deleting)
  • Keep the batch size small (5-10) to avoid rate limits
  • Add error handling and logging to track progress

Hope this helps! Let me know how it goes.

Show original thread
1 reply
Found a way around, just needed to reduce size of batch.. From [0..99] to [0..1]

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?