Skip to content
Sanity
Get started
  • Sanity Studio - Flexible editing environment
  • Content Lake - Real-time database
  • Developer experience - Tooling you love
  • Structured content - The philosophy behind Sanity
  • Review changes - View edits & rollback instantly
  • Image pipeline - On-demand transformations
  • E-commerce - Better shopping experiences
  • Marketing sites - Control your story
  • Products & services - Innovate and automate
  • Mobile apps - Content backend for every OS
  • Aether - Unique digital shopping experience
  • Morning Brew - Omnichannel media distribution
  • InVision - Delivering exceptional customer experiences
  • DataStax - Personalization for global audience
  • React
  • Gatsby
  • Next
  • Nuxt
  • Eleventy
  • Netlify
  • Vercel
  • Algolia
  • Documentation
  • Reference
  • Guides
  • Resource library
  • Headless CMS
  • Tools & plugins
  • Project showcase
  • Schemas & snippets
  • Agency partners
  • Technology partners
  • Get support
  • Share your work
  • 5 Disadvantages Of Wordpress That Are Holding You Back
EnterprisePricing
Contact salesLog inGet started
Published October 27th 2020

Mendoza - The totally non-human readable diff format for structured JSON documents

Mendoza is a new, super efficient format for expressing differences between JSON documents.

Magnus Holm

Magnus Holm is a principal engineer with special focus on the document store.

When we started work on the recently released feature Review Changes, we needed a way to keep a significant part of the edit history of a document in the browser memory to be able to respond quickly to different user interface states. As the user picked various document versions to compare we wanted to be able to quickly reconstruct a specific section of the history of a document.

Review Changes for Sanity Studio. Powered by Mendoza.

For text diffs, we use the diff-match-patch format, and we just assumed someone would have implemented a similarly efficient and compact diff format for JSON documents, but no such luck. If we wanted a general JSON diff format that was super compact and fast to apply, we would have to invent it ourselves. And thus, Mendoza, the totally non-human readable diff format for structured JSON documents, was born.

Mendoza is:

  • Lightweight JSON format
  • A flexible format that can accommodate more advanced encodings in the future
  • As a Go library for encoding and decoding
  • A JavaScript library for decoding
  • Efficient handling of the renaming of fields
  • Efficient handling of reordering of arrays
  • Not designed to be human-readable

Mendoza differs (hah!) from normal diffs as they are:

  • Made for humans to read and understand and based on simple operations (like keep, insert, and delete text)
  • Possible to apply even if the source has changed a bit by including some of the contexts around every part that has changed
  • Designed for text, and not structured documents

Now, this is great when you are collaborating with humans on code development and use something like git to track your changes. What it isn’t great for, however, is expressing differences between structured documents (such as JSON) in a compact manner that can be efficiently transferred over the network and parsed in JavaScript inside of browsers.

Most diffs aren't meant for machines

Most diff formats are made to be human-readable. Take these two documents, where a key and the array have some changes between them:

{
  "name": "Bob",
  "EXPERIMENTAL_foobar": {"key": "123"},
  "values": [
    {"_key": "abc"},
    {"_key": "def"},
    {"_key": "ghi"}
  ]
}
{
  "name": "Bob",
  "foobar": {"key": "123"},
  "values": [
    {"_key": "ghi"},
    {"_key": "abc"},
    {"_key": "def"}
  ]
}

If these where two commits, the Git diff between them would be expressed like this:

--- left.json	2020-10-06 21:49:06.000000000 +0200
+++ right.json	2020-10-06 21:49:35.000000000 +0200
@@ -1,9 +1,9 @@
 {
   "name": "Bob",
-  "EXPERIMENTAL_foobar": {"key": "123"},
+  "foobar": {"key": "123"},
   "values": [
+    {"_key": "ghi"},
     {"_key": "abc"},
-    {"_key": "def"},
-    {"_key": "ghi"}
+    {"_key": "def"}
   ]
-}
+}

This makes it somewhat practical for humans to understand what is going on when the latter change is applied. But as you can see, in terms of pure data, there is a lot of repetition going on. and expressing all changes with only plusses and minuses isn't very efficient.

The same diff with Mendoza is expressed like this:

[19,0,10,0,14,"foobar",11,2,21,2,3,21,0,2,15]

Mendoza constructs a minimal recipe for transforming a document into another. All it really does is to compare two JSON documents and figure out the most minimal way to express their difference as strings and integers in an array. You can use this difference to reconstruct the first document to the other.

How to read a Mendoza patch (even though you shouldn't)

A Mendoza patch consists of an array of integers and strings. The integers are opcodes (short for “operation codes”), 8-bit numbers that correspond to an operation. Opcodes take parameters as strings, positive numbers, or JSON values (that is: the actual data that is changing). The list of available opcodes is as follows, notice that 10-18 are composites of the preceding ones:

  • 0 Value​
  • 1 Copy​
  • 2 Blank​
  • 3 ReturnIntoArray​
  • 4 ReturnIntoObject​
  • 5 ReturnIntoObjectSameKey​
  • 6 PushField​
  • 7 PushElement​
  • 8 PushParent​
  • 9 Pop​
  • 10 PushFieldCopy
  • 11 PushFieldBlank
  • 12 PushElementCopy
  • 13 PushFieldBlank
  • 14 ReturnIntoObjectPop
  • 15 ReturnIntoObjectSameKeyPop
  • 16 ReturnIntoArrayPop
  • 17 ObjectSetFieldValue
  • 18 ObjectCopyField
  • 19 ObjectDeleteField​
  • 20 ArrayAppendValue​
  • 21 ArrayAppendSlice​
  • 22 StringAppendString​

Mendoza reads these opcodes from the patch and produces the resulting document from them. Depending on the patch, Mendoza might choose not to strictly follow the opcodes but take a simpler path. If every field and value has changed, for example, it’s more efficient just to replace the whole document with the new data without going through all the operations. Or if you have a list of objects where one has moved to another position and changed a key-value, Mendoza will manage to go back to the original and represent the change in a cheap way.

Run your own Mendoza

Mendoza is implemented in Go and can be found in this GitHub repository. We have also made a parser for Mendoza patches in JavaScript, that you can use in your own application.

Of course, you can dive into the source code for the Sanity Studio and explore how Mendoza is used there. If you want a slightly simpler use-case, you can also check out how we’re using Mendoza to simulate a part of Sanity’s real-time datastore in the browser to power the real-time preview for Next.js.

Why “Mendoza”?

Naming a project is always difficult. Since this project is focused on representing changes between JSON documents I naturally started thinking about names like "JSON differ, JSON changes, JSON patch, …". However, most of these names have already been used by existing projects. While I was muttering "JSON, JSON, JSON" it suddenly turned into "JSON, JSON, Jason, Jason Mendoza".

Jason Mendoza is a character from the show The Good Place, and while this project has little in common with the stupidest DJ from Florida, at least it's short and catchy.

Some caveats

Since a Mendoza patch is just describing the effect of a change it is also limited to work for the documents it was based on. It doesn’t come with guarantees for consistency if the document you apply it on has changed from the original in meanwhile. This is one of the tradeoffs that we needed to do to make it really compressed.

Page content

  • Most diffs aren't meant for machines
  • How to read a Mendoza patch (even though you shouldn't)
  • Run your own Mendoza
  • Why “Mendoza”?
  • Some caveats

Platform

Structured ContentDeveloper experienceContent LakeSanity StudioSecurity & Compliance
  • Sanity vs Contentful
  • Sanity vs Strapi
  • Sanity vs Wordpress
  • Sanity vs Adobe Experience Manager
  • Sanity vs Hygraph
  • Sanity vs Sitecore
  • Sanity vs Storyblok
  • Sanity vs Contentstack
  • Sanity vs Prismic
  • Sanity vs Drupal
  • Sanity vs ButterCMS

Resources

Documentation
  • React Blog
  • Gatsby Blog
  • Next.js Landing Pages
  • Progressive Web Application
  • Single Page Application
  • Svelte & Typescript App
  • Vue & Tailwind Blog
  • Developer Portfolio Templates
  • Form validation with Yup
  • Live Preview with Next.js and Sanity.io
Resource library
  • Agency partners
  • Technology partners
  • Blog Template
  • Personal Website Template
  • Developer Portfolio Templates
  • All Templates
Case Studies
  • Headless CMS
  • What is an API CMS
  • Static Sites 101
  • Headless SEO
  • Localization
  • GraphQL vs REST
  • What is a DXP?
  • Typescript 101
  • Content as a Service
  • Ecommerce SEO
  • React CMS
  • Next.JS CMS
  • CMS for Shopify
  • Content platform
  • Multilingual CMS
  • Static Site CMS
  • Gatsby CMS
  • Node CMS
  • E-commerce CMS
  • Vue CMS
  • Angular CMS
  • GraphQL CMS
  • Newspaper CMS
  • Magazine CMS
  • CMS for apps
  • Remix CMS

Company

Contact SalesEnterpriseCareersTerms of ServiceAccessibility Statement

Stay connected

  • GitHub
  • Slack
  • Twitter
  • YouTube
  • Stack Overflow
  • Blog RSS
  • Newsletter
©Sanity 2023