We're open sourcing GROQ: A query language for JSON documents

We're open sourcing GROQ: A query language for JSON documents

Today we're open sourcing the spec for Sanity's query language GROQ. We're also developing tooling so you can use its powerful features for filtering, projection and sorting anywhere you might be stuck with JSON document collections.

So what can you use GROQ for? Say you have a perplexing urge to filter and transform those 20.000 posts sitting in a Sanity dataset. First of all you really only want the ones with “JavaScript” in the title. And oh, you only need those with more than four code examples in the body text. And you don't need all the fields – just titles, and the name of the referenced authors. And yeah, order the results by publish date while you're at it, HAL.

The following is how you would do this today with Sanity’s query language GROQ:

// indented for readability
*[
  title match "Javascript" 
  && count(body[_type == "code"]) > 4
]{
  title, 
  authors[]->{name}
}|order(publishedAt asc)

GROQ, is short for for Graph-Relational Object Queries. GROQ was born out of the development of our real-time document store that Simen Svale Skogsrud and Alexander Staubo started developing in 2015. Erik Grinaker has since then contributed with follow-up work on consistency. Magnus Holm has recently started work on building a parser toolkit that lets us easily build GROQ parsers in different languages.

As of today, millions of GROQ queries serve content from Sanity every day to websites, apps, and other systems. As GROQ is a wholesome way of quering and reshaping collections of JSON documents it's generally useful and doesn't need to be tied to our systems.

Now that GROQ has been in production for a while, and we are pinning down how we want it to work, it’s therefore also time to open source our work with it.

Check out our introduction to GROQ, and the specification, and and the CLI tool.

The specification

To kick off the open sourcing of GROQ we are publishing the working draft of the GROQ specification. While we have thorough reference documentation for how GROQ works with our hosted content store, there are still some corners we want to sort out before bumping the spec to 1.0. We obviously didn’t get everything right on the first try. The specification allows for implementations to add additional features and functions. For example, with Sanity we have added the identity() function to be able to query with user permissions.

Of course, the specification is also written and stored in our Sanity project. We took inspiration from GraphQL’s specification and use GraphQL’s creator Lee Byron’s excellent spec-md to generate the user-friendly website for this. It's done with a small script that fetches the documents from Sanity, and converts them into the Markdown-files that checked into the GitHub repository.

You are welcome to post questions in the repository, and we will answer them on a best effort basis.

The Parser and the CLI

Our new backend engineer, Magnus Holm, has done some pretty interesting work recently with parsing technologies and is working on Glush, a “parser-parser”, that we can use to generate a GROQ parser (or parser for other languages). So we generated one for JavaScript that you can run in Node or the browser. This opens for all sorts of interesting use cases that we can't wait to explore.

We have wrapped the parser into a CLI tool that lets you query (ND)JSON from a file, a URL, or from standard input (that is, when you pipe data from another command). Let's say you want to quickly output only the title of the todos that hasn't been completed from an API, you can do something like this:

> curl -s https://jsonplaceholder.typicode.com/todos | groq "*[completed == false]{title}" --pretty
[
  {
    "title": "delectus aut autem"
  },
  {
    "title": "quis ut nam facilis et officia qui"
  },
  {
    "title": "fugiat veniam minus"
  },
  {
    "title": "laboriosam mollitia et enim quasi adipisci quia provident illum"
  },
  {
    "title": "qui ullam ratione quibusdam voluptatem quia omnis"
  },
  {
    "title": "illo expedita consequatur quia in"
  },
  {
    "title": "molestiae perspiciatis ipsa"
  },
  {
    "title": "et doloremque nulla"
  },
  // ...
]

This is only a simple example. There's way more stuff you can do with GROQ. Let's say you quickly wanted to count the grass and posion Pokemón from the Pokedex:

$ groq '{ "grass": count(*["Grass" in type]), "posion": count(*["Posion" in type])}' \
  --url https://raw.githubusercontent.com/Biuni/PokemonGO-Pokedex/master/pokedex.json \
  --pretty --primary pokemon

> {
    "grass": 14,
    "posion": 33
  }

Or, let's query the Nobel Prize API for female laureates for physics:

$ groq '*[gender == "female" && "physics" in prizes[].category]{firstname, surname}' \
  --url http://api.nobelprize.org/v1/laureate.json \
  --pretty --primary laureates
  
> [
    {
      "firstname": "Marie",
      "surname": "Curie, née Sklodowska"
    },
    {
      "firstname": "Maria",
      "surname": "Goeppert Mayer"
    },
    {
      "firstname": "Donna",
      "surname": "Strickland"
    }
  ]

We can also pipe a GROQ query to a new query, let's say we wanted to count how many laureates in physics that have been men:

$ groq '*[gender == "male" && "physics" in prizes[].category]{firstname, surname}' \
  --url http://api.nobelprize.org/v1/laureate.json --primary laureates |groq 'count(*)'
  
> 206

What's next?

As much as we try to avoid clichés, we have to point out that this is just the beginning for GROQ. We will continue working on the specification, and the tooling for it, making it even better and more capable. We are very excited about the prospect of running GROQ in the browser and what it can let us do for Sanity Studio and other applications. We hope it can be useful for you as well, and can't wait to see what you'll do with it.

Install Sanity:
npm install -g @sanity/cli && sanity init|