Bulk optimized images not showing in Studio after re-importing .tar.gz dataset

2 replies
Last updated: Nov 26, 2021
Hi, I am try to bulk download .tar.gz of dataset in sanity and optimized the images manually (the images folder) without renaming any files. Then I repacked the folder as .tar.gz (same folder structure with asset.json and data.jsonld and images folder). I have been trying for a while now, but the issue is that when I import to new dataset the image assets aren't showing up in sanity studio cms or the front end. But i do see the images in the media gallery plugin inside sanity, as well as via graphql on the main dataset
AI Update

I understand the issue you're experiencing. When you optimize images manually and repack them, the SHA1 hash in the asset IDs no longer matches the modified image files, which is why the images aren't showing up in your documents even though they appear in the media gallery.

The Problem: Asset ID Hash Mismatch

Sanity image asset IDs follow a specific format: image-<sha1-hash>-<dimensions>-<format> (e.g., image-abc123def456-1920x1080-jpg). This SHA1 hash is a cryptographic fingerprint of the original image file contents. When you optimize the images:

  1. The file content changes (even if dimensions/format stay the same)
  2. The SHA1 hash of the optimized file is now different
  3. But the assets.json and data.ndjson (or data.jsonld) files still reference the old hash
  4. When Sanity tries to match documents to assets during import, it looks for files matching those old hashes
  5. The mismatch means your document references point to assets that (from Sanity's perspective) don't exist

The images appear in the media gallery plugin because Sanity successfully imported the image files from the images folder and created new asset documents for them with new IDs based on the optimized files' SHA1 hashes. However, your content documents (in data.jsonld) still reference the old asset IDs from before optimization, so those references are broken.

The Solution: Update Asset References

You have a few options:

Instead of manually optimizing, let Sanity's Image Pipeline handle optimization on-demand. Sanity automatically serves optimized images via URL parameters without changing the source files:

// Images are automatically optimized at request time
imageUrlBuilder(client)
  .image(asset)
  .width(800)
  .format('webp')
  .quality(80)
  .url()

This way you keep original assets unchanged and get optimized delivery automatically through Sanity's CDN.

Option 2: Map Old to New Asset IDs After Import

If you must optimize manually:

  1. Before optimizing, note the mapping of old filename → old asset ID from your original export
  2. Optimize images and import the tar.gz to your new dataset
  3. After import, query the new dataset to get the new asset IDs for each imported image
  4. Run a content migration script to update all image references in your documents from old asset IDs to new ones

This would involve creating a migration that patches all documents with image fields:

// Pseudocode for migration
const assetMapping = {
  'image-oldHash123-1920x1080-jpg': 'image-newHash456-1920x1080-jpg',
  // ... map all your assets
}

// Update all documents with image references
documents.forEach(doc => {
  if (doc.imageField?.asset?._ref) {
    doc.imageField.asset._ref = assetMapping[doc.imageField.asset._ref]
  }
})

Option 3: Recalculate Hashes Before Repacking

This is the most complex but "correct" approach if you want to maintain the export/import workflow:

  1. After optimizing each image, calculate its new SHA1 hash
  2. Rename the optimized image file to match the new hash format: <sha1>.<extension>
  3. Update the corresponding entry in assets.json with:
    • New _id (with new hash)
    • New sha1hash value
    • New size (file size in bytes)
    • Updated path to match new filename
  4. Update all references to that asset ID throughout data.jsonld to use the new hash-based ID
  5. Repack as tar.gz and import

This requires careful scripting to ensure consistency across all three files (images, assets.json, and data.jsonld).

Why This Design Exists

The SHA1 hash system ensures content integrity and enables Sanity's deduplication—if the same image is uploaded twice, Sanity recognizes it by hash and stores only one copy. Changing file contents breaks this chain of trust.

For most use cases, Sanity's built-in image optimization via the CDN is the better approach—it's automatic, cached globally, supports WebP/AVIF formats, and doesn't require manual intervention or breaking asset references. The original high-quality assets remain unchanged in the Content Lake while optimized versions are served on-demand based on your URL parameters.

Show original thread
2 replies
Hi, could someone help please. I am trying to launch today
const fs = require('fs')

const util = require('util')

const ndjson = require('ndjson')



function rebuildFileAssetUrls(val, key) {

if(typeof val === 'string') {
`const newVal = val.replace('file://./',
file://${__dirname}/
)`

return newVal

}

else if(Array.isArray(val)) {

return val.map((item)=> rebuildFileAssetUrls(item))


}

else if(typeof val === 'object' && val != null) {

const entries = Object.entries(val)

let newItem = {}

entries.forEach(([key, val])=> newItem[key] = rebuildFileAssetUrls(val, key))


return newItem

}

else if(typeof val === 'undefined' || val == null) {

return val

}

else if(typeof val === 'boolean' || typeof val === 'number') {

return val

}

else {

return val

}

}


(async ()=> {

let data = await new Promise((resolve)=> {

let temp = []

fs.createReadStream('./data.ndjson')

.pipe(ndjson.parse())

.on('data', function(obj) {

temp.push(obj)

})

.on('end', function() {

resolve(temp)

})

})


let writeStream = fs.createWriteStream('./output_0.ndjson')

const serialize = ndjson.stringify()

serialize.on('data', function(line) {

// line is line of stringified JSON with newline delimiter the end

writeStream.write(line, 'utf-8')


})


writeStream.on('finish', () => {

console.log('wrote all data to file')

})


data = data.forEach((item)=> {

let newItem = {

...item

}



newItem = rebuildFileAssetUrls(newItem)


serialize.write(newItem)


})




serialize.end()



})()

leaving this script here for anyone who have this problem again.

for this "amazing" community of 13 K members ! thank you very much!!!!

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?