Guide

Introduction to content modeling

This guide will introduce you to how you can approach content modeling for your digital projects. How you organize and structure your content is arguably as important, if not more, as the technology choices you make in any given project.

Knut Melvær

Knut runs developer relations at Sanity.io.

Because your texts, images, and other media are the true interfaces between whatever you do, and whomever you're doing it for. That's where meaning is communicated. And that's making sure it's kept access

We dare say that with Sanity.io, you have one of the most flexible and versatile tools for structuring your content. You can do it the way you need it for your code, while generating a user-friendly editor for editors. Before we dive into the in and outs of content modeling, allow us paint a picture of what this flexibility with Sanity entails.

The Sanity Studio supports nested content structures out of the box. You define your content model in plain JavaScript objects, which in turn generates the editorial interface. You can define custom content models, which can be reused across document types. You can add references from any level to one or multiple document types. They are indexed and queryable and will keep you from accidentally deleting documents that has references on them.

Rich text as structured content

Another important idea behind Sanity is that rich text is treated as a first class citizen of structured content. Behind the rich text editor you will find deeply typed Portable Text, which in essence is an array of objects with the necessary properties to describe the semantics of the rich text. It's not bound to only the typographic semantics we are used to from word processors, such as emphasis or bold. It may as well be the semantics of voice assistants, or some other interface beyond the strictly typographic realm. You can at any time add additional inline data-objects and block types.

With Portable Text and Sanity you can decide how tight you want the connection between content and pre entation. The point is to always keep the content structured so it's easy to adopt when the presentation changes. There are helper libraries that makes rendering Portable Text in popular frontend frameworks easier.

Sanity’s schema types come in three flavors:

You can find all of Sanity’s types in the reference documentation. These are the building blocks on which you define your content model. You can also publish your own custom schema types on npm and let other people download them to their studios. For example, you can get document types for podcasts, episodes, and sponsors by running sanity install podcast in your project.

Querying your content model

With GROQ you can filter your documents by any field and join documents in different ways. You can project only the fields you need, rename them in the query, and also do sub-queries. It frees you to build a content model that makes sense from an editorial perspective. GROQ gives you the best from noSQL datastores and relational databases. We use it to power the GraphQL API that you also can use with Sanity if you wish to.

You can also override how documents are listed out with Structure Builder, which also lends you flexibility in terms of workflows.

We made all of this to make it easy to get started with structured content, without painting yourself into a corner. That being said, it is worth the time thinking about your content model and doing so with those you will be working with. There's always a cost that comes with changing your content model, since you have to update your code and run migrations. That's why it makes sense to us to also do the content modelling in code. You can put it into version control and more easily bootstrap it. You can also generate content models programatically, and it makes it easier to add customizations, like previews and custom inputs.

General approaches to content modelling

Ultimately, content modelling is about capturing all your texts, images, videos, and files into buckets that makes sense to those who will work with it. From designing and developing products and services, to managing and sustaining this content for a longer time. Traditionally, content management systems has been used to power a website with a certain design – and we still see a lot of CMSs still focused around the idea of a website, even those who claim to be “headless”. In most cases this forces you to think about your content in terms of pages and page layout, instead of structuring your content in a way that makes it reusable across many contexts, and hence, worth more.

Breaking out of the presentational cage

You will soon discover that some assumptions that you carry over from these CMSs will be challenged. This is a good thing. That doesn't mean you have to strive for a completely agnostic content model and don't think about presentation at all. The point is that you should be circumspect about where in your content model you put these assumptions.

Begin with asking these questions:

  • Are there any content that is reused across pages, products, or services? (e.g. people, products, services, events, sponsors, contacts, vendors, and so on)
  • Are there any relationships where references can be used? (e.g. an article can reference a person as the author)
  • Do you need to structure content in taxonomies or hiearchies? (e.g. a document type for tableOfContents with an array of references to the document type for chapters)
  • What kind of media and content types do you need in your rich text fields? (e.g. references to products with some extra fields that can be embeded in marketing material)
  • How much, and what kind of editorial control do you really need for how the content is presented. Can this be solved programatically where the content is rendered? (e.g. a list of articles should be generated based on the visitors browsing history on your site)

Chances are that with structured content you gain possibilities you haven't considered before, because it was too painful to parse rich text from HTML in a meaningful way, or to costly to query complex content models with RESTful APIs for just a small experiment.

Content-first, less surprises

By adopting a structured content approach, you will also be able to start the content work much earlier and way faster. Not only is this something that's recommended by content strategists, it also makes it easier for your developers and designers to anticipate what they need to do in order to build whatever you want to build. It will be easier to facilitate a multidisciplinary design process, where you can formulate rules and guidelines for your content, that in turn can be embedded in the Sanity Studio. For example, "give the editor a warning if their title is getting too long".

Having your content structured in a sensible way, not only keeps you from doing double bookeeping (for example where you have to update a product name across all marketing copy, instead of using a reference embedded in rich text and just change it one place), it also set you up for the time where you need to redesign a website, or add a new service.

Use the rights tools for the right things

Sanity has a lot of features, but they all revolve around making it as easy to work with structured content as possible, without making too many assumptions about how you want to do things. In a way, Sanity is a "CMS construction kit" where the hard parts are solved, and the parts you would want control over are available for customization.

There are many things it doesn't do. You can't directly render a website with Sanity, but you can use its APIs to get whatever data you need in one of the many excellent web-frameworks, be it Gatsby, Nuxt.js, create-react-app, or Laravel. You shouldn't deprecate a service that works for you, just because you decided on a new CMS. Do you have a Product Information Management (PIM) system? Cool, sync the relevant product information in real-time and let your editors make references to those document, and get a unified API to query products and marketing at the same time.

You don't have to reinvent the whole wheel either. Content modelling is own discipline, and there's loads of smart people that has written and given it a lot of thought. Search for "content strategy", and check out books like Designing Connected Content (2018) to learn more about structured content.

Example: Making a content model for a single track one-day conference

You probably don't need a content backend for a conference just now, but bear with us, it is a good case for showing the different features and ways to approach content modelling with Sanity. For a conference we need content on speakers, sessions (i.e. talks, panels and so on), code of conduct, sponsors, organizers, the program, and some general info about the time and place, as well as where to find tickets.

Event information

We begin with the event information. The conference should have a name, a description, an image, a schedule, a venue, some keywords, and organizers. We begin with adding a file called eventInformation.js to the schemas-folder and put in the following:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event information',
  fields: []
}

Here we have defined a new document type without any fields. Before we add those, let's import this file to schema.js, so it's loaded in the Studio, and we can follow the changes as we go along.

import createSchema from 'part:@sanity/base/schema-creator'
import schemaTypes from 'all:part:@sanity/base/schema-type'

import eventInformation from './eventInformation'

export default createSchema({
    name: 'events',
    types: schemaTypes.concat([
      eventInformation
    ])
})

If you start the Studio now it will complain about eventInformation not having any fields, so let's add those. We begin with the event name field, which is of the type string:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
{
name: 'name',
type: 'string',
title: 'Event name'
}
] }

Notice that we don't name it eventName, because you will most probably get it in code as eventInformation.name. With this configuration your studio will look something like this:

The Studio with an event name field

We can go on and add the description field. This content will probably be used for SEO-fields and be shown on Google and similar. It therefore makes sense that it's plain text, but we should allow for a bit more space and new lines, hence we use the text type:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
    {
      name: 'name',
      type: 'string',
      title: 'Event name'
    },
    {
      name: 'description',
      type: 'text',
      title: 'Description',
      description: 'Describe your event for search engines and social media.'
    }
  ]
}

Here you notice that we added a key for description. Here we can include small hints and tips for the editors — a great way to embed your style guide into the editorial interface.

The Studio with a description field

Now we want to add an image to promote the event. You could just add an image type like with the other fields, but it makes sense to make a field that we can reuse across different document types. We will therefore make a new file, mainImage.js, which we'll import into schema.js, the same way as with eventInformation.js.

// mainImage.js
export default {
  name: 'mainImage',
  title: 'Image',
  type: 'image',
  options: {
    hotspot: true
  },
  fields: [
    {
      name: 'caption',
      type: 'string',
      title: 'Caption',
      options: {
        isHighlighted: true
      }
    },
    {
      name: 'alt',
      type: 'string',
      title: 'Alternative text',
      description: 'Important for SEO and accessiblity.',
      options: {
        isHighlighted: true
      }
    }
  ],
  preview: {
    select: {
      imageUrl: 'asset.url',
      title: 'caption'
    }
  }
}

There are some different things going on here. First you'll notice the options: { hotspot: true } configuration. This adds an inteface for setting a crop and hotspot on the image. The crop/hotspot-selection will be saved to the document with a reference to the image asset. This lets you reuse the same image with different crops and hotspots across your dataset. With the asset pipeline you can get any dimension and crop of your image on demand, so you'll never need to upload more than one version.

The hotspot and crop interface

Additionally there is some fields for caption and alternative text. The isHighlighted: true option will make these fields show under the image field, instead of being hidden behind an edit-button.

The image input field with additional fields

Lastly, there is a configuration for preview. Here we tell the Studio which fields it should select for the title and imageUrl where those can be previewed. More on this later. When this file is imported i schema.js, you can use it in eventInformation.js:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
    {
      name: 'name',
      type: 'string',
      title: 'Event name'
    },
        {
      name: 'description',
      type: 'text',
      title: 'Description',
      description: 'Describe your event for search engines and social media.'
    },
{
name: 'image',
type: 'mainImage',
title: 'Event image',
description: 'The highest resolution'
},
] }

Notice how we use the name: 'mainImage' from mainImage.js as type: 'mainImage' in eventInformation.js.

We have found it to be a good practice to keep custom object types in their own files. So to add a schedule field with from and to times, we'll first create a new file called schedule.js, and place some datetime fields within:

// schedule.js
import { isSameDay } from 'date-fns'

export default {
  name: 'schedule',
  type: 'object',
  title: 'Schedule',
  validation: Rule => Rule.custom(schedule => {
    return isSameDay(
      schedule.from,
      schedule.to
    ) || 'Only one-day events are supported'
  }),
  fields: [
    {
      name: 'from',
      type: 'datetime',
      title: 'From'
    },
    {
      name: 'to',
      type: 'datetime',
      title: 'To'
    }
  ]
}

Here we have yet another new thing: validation. Remember that we are making a one-day conference? Here we have installed date-fns via npm, and use its isSameDate-function to check if the from and to dates are… on the same day. If they are not, the validation function will return false and fall back on the string that explains you what you need to fix to publish the changes (hence making the validation return true). Remember that validations should be helpful and can be demotivating if you are overly zealous.

The Studio with custom validation returning an error

A conference event tends to happen at a place, and people tend to want to know where that place is, so let's add a venue.js with some spatial information:

// venue.js
export default {
  name: 'venue',
  type: 'object',
  title: 'Venue',
  fields: [
    {
      name: 'name',
      type: 'string',
      title: 'Name of venue'
    },
    {
      name: 'city',
      type: 'string',
      title: 'City'
    },
    {
      name: 'postCode',
      type: 'string',
      title: 'Post code'
    },
    {
      name: 'country',
      type: 'string',
      title: 'Country'
    }
  ]
}

This object is fairly straight forward. You can get more fancy and add the geopoint to get coordinates, and install the google-maps input plugin for the studio. We'll keep this simple for now, and include it in our eventInformation.js:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
    {
      name: 'name',
      type: 'string',
      title: 'Event name'
    },
    {
      name: 'description',
      type: 'text',
      title: 'Description',
      description: 'Describe your event for search engines and social media.'
    },
    {
      name: 'image',
      type: 'mainImage',
      title: 'Event image',
      description: 'The highest resolution'
    },
{
name: 'venue',
type: 'venue',
title: 'Venue',
description: 'Where will the event take place?'
}
] }

We could go all out and make a new document type for tickets, but we'll keep that simple for now, and just add a simple URL field to the schema where we can place whatever third-party ticketing service we prefer:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
    {
      name: 'name',
      type: 'string',
      title: 'Event name'
    },
    {
      name: 'description',
      type: 'text',
      title: 'Description',
      description: 'Describe your event for search engines and social media.'
    },
    {
      name: 'image',
      type: 'mainImage',
      title: 'Event image',
      description: 'The highest resolution'
    },
    {
      name: 'venue',
      type: 'venue',
      title: 'Venue',
      description: 'Where will the event take place?'
    },
{
name: 'ticket',
type: 'url',
title: 'Ticket link'
}
] }

You can't promote a conference without keywords?! We could just add a string field and ask you to put in some comma-separated keywords, but let's make this interesting. In stead, we will add an array of strings:

// eventInformation.js
export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
    {
      name: 'name',
      type: 'string',
      title: 'Event name'
    },
    {
      name: 'description',
      type: 'text',
      title: 'Description',
      description: 'Describe your event for search engines and social media.'
    },
    {
      name: 'image',
      type: 'mainImage',
      title: 'Event image',
      description: 'The highest resolution'
    },
    {
      name: 'venue',
      type: 'venue',
      title: 'Venue',
      description: 'Where will the event take place?'
    },
    {
      name: 'ticket',
      type: 'url',
      title: 'Ticket link'
    },
{
name: 'keywords',
type: 'array',
title: 'Keywords',
description: 'Add keywords that describes your event.',
of: [{ type: 'string' }],
options: {
layout: 'tags'
}
}
] }

We have named all the other fields inn singular, but for arrays we have found it a good convention to name the field in plural, hence keywords, and not keyword. Additionally we have added the options: { layout: 'tags' } to get a better interface for this particular case:

The Studio with an array of string fields with a tags layout

This will produce a proper array of strings:

{
  "keywords": [
    "conference",
    "vue",
    "javascript"
  ]
}

This is handy for when you want to query it. For example with the GROQ query *["vue" in keywords] you will find this document, and you don't have to split a string in the frontend and risk all sorts of unforseen problems.

Lastly we want to add some information about who organizes this conference, and this is where things get interesting. Remember the point about reusing content by adding references? This is the time for that. Because an organizer is usually a person, in many cases that person may also be a speaker, or something else. And chances are, that you want kinda the same information about all persons connected to this event. So before we add a field for the event organizers, we have to make a new document type for person.

Person

We'll add a new file called person.js and kick off with this simple schema:

// person.js
export default {
  type: 'document',
  name: 'person',
  title: 'Person',
  fields: [
    {
      name: 'name',
      title: 'Name',
      type: 'string'
    }
  ]
}

It's kinda nice to have images of persons as well, so let's add that. Now we can reuse the mainImage type we have already made:

// person.js
export default {
  type: 'document',
  name: 'person',
  title: 'Person',
  fields: [
    {
      name: 'name',
      title: 'Name',
      type: 'string'
    },
{
name: 'image',
title: 'Image',
type: 'mainImage'
}
] }

There is a good chance that you will make some sort of dedicated page for a person. If you want to refer to a document in some sort of URL-scheme you can use the slug type:

// person.js
export default {
  type: 'document',
  name: 'person',
  title: 'Person',
  fields: [
    {
      name: 'name',
      title: 'Name',
      type: 'string'
    },
    {
      name: 'image',
      title: 'Image',
      type: 'mainImage'
    },
    {
      name: 'slug',
      title: 'Slug',
      type: 'slug',
      description: 'Some frontends will require a slug to be set to be able to show the person',
      options: {
        source: 'name',
        maxLength: 96
      }
    },
  ]
}

For the slug we can set some options that let you generate a slug from another field in the document, and define a maxLength. You can also add your own slug-generator if you wish (check the reference docs).

The slug field with a generate button

Note that this slug field produces a data structure that looks like this:

{
  "slug": {
    "_type": "slug",
    "current": "the-main-organizer"
  }
}

To query for this document, you can run this GROQ query: *[slug.current == "the-main-organizer"] or *[slug.current match "the-main*"].

The last field we want to give the person is for a biography. And this we want to be rich text. As we mentioned earlier, Sanity use Portable Text to store rich text. To activate the rich text editor, you'll need to make an array of the type block. We will keeping things tidy and create a file called bioPortableText.js:

// bioPortableText.js

export default {
  name: 'bioPortableText',
  type: 'array',
  title: 'Biography',
  of: [
    {
      title: 'Block',
      type: 'block',
      styles: [],
      lists: [],
      marks: {
        decorators: [
          {title: 'Strong', value: 'strong'},
          {title: 'Emphasis', value: 'em'},
          {title: 'Code', value: 'code'}
        ]
      }
    }
  ]
}

There's a lot going on here. What we have done is to strip back most of the default styles, list types, and basic formatting in the rich text field, keeping strong, emphasis and code. You can learn more about how to configure and customize the rich text field in the documentation. We will return to some other aspects for it later in this guide as well.

The Studio with the rich text field

Now that we have the document type for person in place, we can go ahead and add the organizers field in eventInformation.js:

export default {
  name: 'eventInformation',
  type: 'document',
  title: 'Event Information',
  fields: [
    /* 
     * the other fields 
     * ...
    */
    {
      name: 'organizers',
      type: 'array',
description: 'Publish one or more persons and set a reference to them here.',
title: 'Organizers',
validation: Rule => Rule.unique().error('You can only have one of a person'),
of: [
{
type: 'reference',
to: [{ type: 'person' }]
}
]
}
]
}

Here we have an array of the type reference to the type person. We also have a validation that keeps us from adding the same reference to a person twice.

The array of references to persons field

If you know inspect this document, you will find a data structure like this:

{
  "organizers": [
    {
      "_key": "438d7bb74d4e",
      "_ref": "6f4149cb-dcc3-4d00-8b2b-b091f57a9ff4",
      "_type": "reference"
    },
    {
      "_type": "reference",
      "_key": "514eb90585cb",
      "_ref": "8f97c677-3713-4204-99f8-00140b27c95a"
    }
  ]
}

The `_ref` here points to the document-id of these persons. It's easily queryable as well. To get the event information document with the list of the organizers name and image url you could run this GROQ query:

*[_type == "eventInformation"]{
  name,
  organizers[]->{
    name,
    "imageUrl": mainImage->url
  }
}

The arrow (->) tells the API that you want to follow the reference and return the fields you specify after it (if you leave it open, you'll get the whole document). In GraphQL it looks like this:

{
  allEventInformation {
    name
    organizers {
      name
      mainImage {
        url
      }
    }
  }
}

Now that we have event information, as well as persons in place, let's go on and add the content model for the programming.

The program

If you have ever organized an event, you know that the program will change both in terms of its contents and order. It's also possible that you want to present the talks and sessions independently of when they're scheduled to appear in the program. And you perhaps want to offer different ways of browsing presenters in relation to their appearances and so on. And what you do want to avoid is to juggle a lot of timestamps manually when you move things around. Fortunatly, with structured content, this isn't hard to achieve.

Document type for sessions

First we need a document type for what the program should be filled with, namely the talks, panels, and the breaks. There are different ways you can solve this, but for now we will just create a type that we call session. A session will have a title, an image, an array of references to a person, it will have a sessionType, a short summary, and a long description. These are largely the same fields have been through with the event information:

// session.js
export default {
  name: 'session',
  type: 'document',
  title: 'Session',
  fields: [
    {
      name: 'title',
      type: 'string',
      title: 'Title'
    },
    {
      name: 'image',
      type: 'mainImage',
      title: 'Image'
    },
    {
      name: 'persons',
      type: 'array',
      title: 'Persons',
      description: 'Who is responsible for this session?',
      of: [
        {
          name: 'person',
          type: 'personReference',
          title: 'Person'
        },
      ]
    },
    {
      name: 'sessionType',
      type: 'string',
      title: 'Session type',
      options: {
        list: [
          { value: 'keynote', title: 'Keynote' },
          { value: 'talk', title: 'Talk' },
          { value: 'break', title: 'Break' },
          { value: 'firesideChat', title: 'Fireside Chat' },
          { value: 'panel', title: 'Panel' }
        ]
      }
    },
    {
      name: 'summary',
      type: 'text',
      title: 'Short summary',
      description: 'For previews, social media etc.'
    },
    {
      name: 'description',
      type: 'bodyPortableText',
      title: 'Description'
    },
  ],
  preview: {
    select: {
      title: 'title',
      sessionType: 'sessionType',
      persons: 'persons',
      firstPerson: 'person.0.name',
      media: 'mainImage'
    },
    prepare ({title, media, sessionType, persons, firstPerson}) {
      const renderPersons = (persons = [], firstPerson) => {
        if (!persons || persons.length === 0) return ''
        if (persons.length === 1) return firstPerson
        return `${firstPerson} and ${persons.length - 1} more`
      }
      return {
        title,
        media,
        subtitle: `${sessionType} ${renderPersons(persons, firstPerson)}`
      }
    }
  }
}

There are two things to note in this schema. The first thing is the definition for sessionType, this is a string type, but with an options.list that lets us predefine a dropdown with values that the editor can select from. We could also add sessionType as it's own document type that we made references too, allowing editors to add their own types. In this case though, we don't want the additional complexity. And you can still query documents by the value for sessionType (*[sessionType in ["talk", "keynote"]).

The other thing we want you to take a closer look at is the preview. This hasn't so much to with the content model per se, but it is a consideration that you need to make for those who will use the Studio. In the select object we choose which of the fields we want to use in the preview, and ascribe them to a key. We destructure these keys in the prepare function, and return an object. For the subtitle, we have made a small function that lets us check if there is more than one person, and render a preview based on that.

If you think about it, to make a program for a one-day single-track conference it is really just a schedule of titles in a certain order, with a set duration.