Next.js Conf 2024: Your app should be Live by Default – Watch Keynote
Published August 30, 2023

Search Taxonomy Setup & Use

By Andy Fitzgerald

A search taxonomy (also called a "thesaurus") allows you to manage and accommodate the language variations your online visitors enter into your site search tool. Maybe potential customers for your artisanal dungarees search for "coverall" or "romper" or "onesie." Perhaps you host information on how to make the most of space in one's garage ... but you have a contingent of visitors who consistently search for "car hole." Search taxonomies help you anticipate these variants, and, where appropriate, guide users to the terms and language you recommend.

In this guide you'll learn what makes a thesaurus different from other kinds of taxonomy structures, like type, topic, or faceted taxonomies, and you'll learn how to set up a thesaurus in the Sanity Taxonomy Manager plugin. You'll also find tips on how to integrate your custom thesaurus into managed or self-hosted search tools in order to deliver a more effective site search experience for your users.

The search tool on the Nielsen Norman Group website understands that I'm probably not there for "Dairy Study" information and returns content on "Diary Studies" instead

Site search tools include SaaS products like Algolia and Elasticsearch and front end solutions like Pagefind and Flexsearch. They all work by indexing site content, querying that index in response visitor search terms, and then returning relevant results. A well-tuned search experience balances the precision of the results returned (do all these results match what the user has in mind?) and the recall of the result set (did the search return everything your site has on the topic?).

A search thesaurus helps your search tool increase the precision and recall of its results by allowing you to specify alternative labels and hidden labels that match the terms for which your users are searching—even if (especially if) those terms are "wrong."

Alternative Labels

Alternative labels include synonyms, near-synonyms, abbreviations, and acronyms of a concept. Alternative labels in your thesaurus help your search tool bridge the gaps between what users enter (term variants and misspellings) and what the search index picks up from crawling your content and extracting keywords and concepts.

Many modern search tools have built-in functionality for dealing with common synonyms and typos. When your content is in a specialized domain, however, these tools may need additional guidance. On the Mayo Clinic website, for example, a site search for "Lou Gehrig's" returns both results for amyotrophic lateral sclerosis (ALS) which include the eponym "Lou Gehrig's" in result text, and those with no mention of "Lou Gehrig" at all.

Mayo Clinic web search returns results about ALS for the search term "Lou Gehrig," even if the name isn't in the result, and handles the misspelling "Lou Gherig" by suggesting the correct spelling and linking to those results

Hidden Labels

Hidden labels include misspelled variants and other terms you don't want to be otherwise visible. For example, if a site visitor misspells "Lou Gehrig's" as "Lou Gherig's," the Mayo Clinic search tool suggests the correct spelling and offers a link to the properly spelled term. This serves a dual function of getting visitors the content they want, and educating them about the correct spelling.

Alternatively, a search for "handicap parking pass" on the Washington State Department of Licensing website returns results for "disabled parking permits," but otherwise hides references to a search query which is now considered outdated and unacceptable for referring to individuals or accessible environments.

A search for "handicap parking pass" on the Washington State Department of Licensing website returns results for "disabled parking permits," but does list or repeat the no longer acceptable term "handicap"

Integrating with a Search Tool

Due to the wide and ever evolving range of search tools currently on the market, there isn't a single standard way to integrate a thesaurus managed in Sanity with site search tools. Many SaaS tools have APIs that can ingest thesaurus data as JSON or CSV, and have varying levels of support for alternative and hidden labels. Some tools also allow you to treat narrower terms as synonyms for their broader parent categories.

Front end search tools targeted toward static site generators such as Pagefind or FlexSearch allow you to customize what gets entered into the search index with data tags in your page templates or via custom additions to the search index. Terms can be weighted and shown (or not) on your search result page based on how you configure the results page on your site.

Some potential benefits of managing thesaurus terms in Sanity with the Taxonomy Manager plugin include:

  • A standards compliant data structure (SKOS), which offers interoperability between standards compliant information systems
  • Term management in Sanity Studio, which allows producers and vocabulary managers to manage and use thesaurus terms in the same environment in which they produce content
  • The flexibility of the GROQ query language, which offers robust tooling for transforming thesaurus terms into the structures required by different search tools

You may not want to manage thesaurus terms in Sanity if:

  • Your search tools offers a dashboard for managing synonyms that meets your workflow management and data interoperability requirements
  • You already use a separate standalone tool as a single source of truth for your digital ecosystem's controlled vocabularies

Setting Up a Search Taxonomy

1. Install Sanity Taxonomy Manager

The Taxonomy Manager plugin allows you to create standards compliant relationships that help keep your taxonomy interoperable and reusable.

2. Create a new concept scheme

Concept Schemes are used to create multiple taxonomies in a single project, and, where needed, use the same concepts across them. This gives you a single source of truth for each concept you define, and allows you to establish semantic relationships between individual taxonomies.

Add a new Concept Scheme with either the global "new document" button, or the "new document" button in the Concept Schemes list view

Protip

Sometimes the concept scheme you need is the one you already have: if you need to create synonyms for a set of concepts in an existing taxonomy, create them there. Every concept created with Sanity Taxonomy Manager has Alternative Label and Hidden Label fields, so there's no reason your Type or Topic taxonomy can't also be your thesaurus.

3. Name and describe your thesaurus

Add a clear name and describe the purpose and goals of your thesaurus to users. Tagging content with managed terms may be new to your content creators: good descriptions can help users understand why the tagging step is important.

4. Add Concepts and Preferred Labels

"Concepts" are the central modeling metaphor in simple knowledge organization system (SKOS) taxonomies. A concept's "Preferred Label" corresponds to the ISO 2788/5964 standard's idea of "term." For each concept for which you need synonyms, click "Add Concept" and provide a Preferred Label.

5. Add Alternative and Hidden Labels

Once you have a concept and its Preferred Label defined, add Alternative Labels and Hidden Labels as necessary.

Gotcha

Preferred, alternative, and hidden label sets must not overlap. Taxonomy Manager will show a validation error if you accidentally duplicate a label across label sets.

6. Add the thesaurus to a Sanity document scheme

You're now ready to publish your thesaurus, integrate it into your content scheme, and start tagging content. Taxonomy Manager includes two helper functions for ensuring that only the appropriate concepts are available for a given field

Adding Synonyms to Your Search Index

As noted above, adding synonyms to your search index is entirely dependent on the affordances provided by your search tool. The flexibility of the GROQ query language, however, means that getting your synonym data into a format that can be used by your search tool need not be a daunting task.

Here, for example, is a query for formatting synonyms for Algolia's synonym API:

*[_type == "skosConceptScheme" && schemeId == "69d9c8" ].concepts[]->
  {
    "objectId": conceptId,
    "type": "oneWaySynonym",
    "input": prefLabel,
    "synonymns": coalesce(
      altLabel[] + hiddenLabel[],
      altLabel[],
      hiddenLabel[]
    )
  }

This data matches the shape Algolia specifies for creating or updating a one-way synonym with their API:

"data": [
    {
      "objectId": "f3ac93",
      "type": "oneWaySynonym",
      "input": "Bilirubinometer",
      "synonymns": [
        "Jaundice Meter",
        "Bilimeter"
      ]
    },
    {
      "objectId": "e84bd2",
      "type": "oneWaySynonym",
      "input": "Sphygmomanometer",
      "synonymns": [
        "Blood Pressure Cuff",
        "BP Cuff"
      ]
    },
    {
      "objectId": "caefe4",
      "type": "oneWaySynonym",
      "input": "Tongue Depressor",
      "synonymns": [
        "Spatula",
        "Popsicle Stick"
      ]
    }
  ]

Learn More

Find more examples, applications, and tips in the Sanity Taxonomy Manager Docs >>

Sanity – build remarkable experiences at scale

Sanity Composable Content Cloud is the headless CMS that gives you (and your team) a content backend to drive websites and applications with modern tooling. It offers a real-time editing environment for content creators that’s easy to configure but designed to be customized with JavaScript and React when needed. With the hosted document store, you query content freely and easily integrate with any framework or data source to distribute and enrich content.

Sanity scales from weekend projects to enterprise needs and is used by companies like Puma, AT&T, Burger King, Tata, and Figma.

Other guides by author

Faceted Taxonomy Setup & Use

Faceted taxonomies are a great option for organizing resources that don't fit neatly into hierarchical structures.

Andy Fitzgerald
Go to Faceted Taxonomy Setup & Use

Topic Taxonomy Setup & Use

Add topic-based relationships to your content to make it more discoverable and reusable across contexts.

Andy Fitzgerald
Go to Topic Taxonomy Setup & Use

Type Taxonomy Setup & Use

Add semantic relationships to your content to make it more discoverable and reusable across contexts.

Andy Fitzgerald
Go to Type Taxonomy Setup & Use

Self-Hosting Sanity Studio with GitHub Actions

A comprehensive guide to self-hosting Sanity Studio on DreamHost with GitHub Actions for continuous integration.

This is an external link at:www.andyfitzgeraldconsulting.com
Andy Fitzgerald
Go to Self-Hosting Sanity Studio with GitHub Actions