Bulk optimized images not showing in Studio after re-importing .tar.gz dataset
I understand the issue you're experiencing. When you optimize images manually and repack them, the SHA1 hash in the asset IDs no longer matches the modified image files, which is why the images aren't showing up in your documents even though they appear in the media gallery.
The Problem: Asset ID Hash Mismatch
Sanity image asset IDs follow a specific format: image-<sha1-hash>-<dimensions>-<format> (e.g., image-abc123def456-1920x1080-jpg). This SHA1 hash is a cryptographic fingerprint of the original image file contents. When you optimize the images:
- The file content changes (even if dimensions/format stay the same)
- The SHA1 hash of the optimized file is now different
- But the
assets.jsonanddata.ndjson(ordata.jsonld) files still reference the old hash - When Sanity tries to match documents to assets during import, it looks for files matching those old hashes
- The mismatch means your document references point to assets that (from Sanity's perspective) don't exist
Why They Show in Media Gallery
The images appear in the media gallery plugin because Sanity successfully imported the image files from the images folder and created new asset documents for them with new IDs based on the optimized files' SHA1 hashes. However, your content documents (in data.jsonld) still reference the old asset IDs from before optimization, so those references are broken.
The Solution: Update Asset References
You have a few options:
Option 1: Optimize Images Within Sanity (Recommended)
Instead of manually optimizing, let Sanity's Image Pipeline handle optimization on-demand. Sanity automatically serves optimized images via URL parameters without changing the source files:
// Images are automatically optimized at request time
imageUrlBuilder(client)
.image(asset)
.width(800)
.format('webp')
.quality(80)
.url()This way you keep original assets unchanged and get optimized delivery automatically through Sanity's CDN.
Option 2: Map Old to New Asset IDs After Import
If you must optimize manually:
- Before optimizing, note the mapping of old filename → old asset ID from your original export
- Optimize images and import the tar.gz to your new dataset
- After import, query the new dataset to get the new asset IDs for each imported image
- Run a content migration script to update all image references in your documents from old asset IDs to new ones
This would involve creating a migration that patches all documents with image fields:
// Pseudocode for migration
const assetMapping = {
'image-oldHash123-1920x1080-jpg': 'image-newHash456-1920x1080-jpg',
// ... map all your assets
}
// Update all documents with image references
documents.forEach(doc => {
if (doc.imageField?.asset?._ref) {
doc.imageField.asset._ref = assetMapping[doc.imageField.asset._ref]
}
})Option 3: Recalculate Hashes Before Repacking
This is the most complex but "correct" approach if you want to maintain the export/import workflow:
- After optimizing each image, calculate its new SHA1 hash
- Rename the optimized image file to match the new hash format:
<sha1>.<extension> - Update the corresponding entry in
assets.jsonwith:- New
_id(with new hash) - New
sha1hashvalue - New
size(file size in bytes) - Updated
pathto match new filename
- New
- Update all references to that asset ID throughout
data.jsonldto use the new hash-based ID - Repack as tar.gz and import
This requires careful scripting to ensure consistency across all three files (images, assets.json, and data.jsonld).
Why This Design Exists
The SHA1 hash system ensures content integrity and enables Sanity's deduplication—if the same image is uploaded twice, Sanity recognizes it by hash and stores only one copy. Changing file contents breaks this chain of trust.
For most use cases, Sanity's built-in image optimization via the CDN is the better approach—it's automatic, cached globally, supports WebP/AVIF formats, and doesn't require manual intervention or breaking asset references. The original high-quality assets remain unchanged in the Content Lake while optimized versions are served on-demand based on your URL parameters.
Show original thread2 replies
Sanity – Build the way you think, not the way your CMS thinks
Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.