How to search nested string fields in complex Portable Text with custom blocks?

8 replies
Last updated: Dec 10, 2021
Hello, I'm building a search engine with complex texts structure (lot of blocks inside text). Is there an easy way to walk through all descending string fields and make a
match
or do I have to write every possible path?An example of my output block looks like that:

"body": [
  // CLASSICAL BLOCK TYPE
  // `pt::text(body)` makes it easy to check matches
  {
    "_type": "block",
    "children": [
      {
        "_type": "span",
        "marks": [],
        "text": "nisi aliquam sequi voluptas quia ut rem esse quae qui voluptatem officia consectetur incididunt Neque et iure voluptatem ipsum ab amet ipsa occaecat ullam cupidatat ut velit cupidatat sequi nisi nostrum irure consequatur Quis aliquip commodi suscipit iste consectetur sequi velit ipsa enim dolor Neque voluptatem sit ipsam enim eiusmod ipsam doloremque aperiam"
      }
    ],
    "markDefs": [],
    "style": "pd-s pm-s"
  },

  // DIPTYCH TYPE
  {
    "_type": "diptych",
    "left": [
      {
        "_type": "asideText",
        "excerpt": "Quis minim nisi ad quia irure voluptatem veniam id nulla magna fugiat quasi ut voluptatem est laboris sequi nulla ea numquam commodi magnam qui ut dolore dicta est magna adipisci numquam velit ut labore qui perspiciatis velit minim et aute quia nulla incidunt Neque Sed molestiae explicabo voluptatem",
        "surtitle": "sit  qui",
        "title": "laborum unde Ut mollit et"
      }
    ],
    "right": [
      {
        "_type": "imagesCompo",
        "mainImage": {
          "_type": "image",
          "asset": {
            "_ref": "image-b2b2275f06bd2728f18eed194b5f734d244e593a-240x314-jpg",
            "_type": "reference"
          }
        }
      }
    ],
  },

  // PROCESS TYPE
  {
    "_type": "process",
    "description": "corporis adipisci molestiae totam est ab sit error vel vel Sed odit ut mollit reprehenderit eiusmod eu dolorem voluptatem dicta explicabo exercitation nostrud cupidatat ut porro minim iste pariatur anim commodi architecto irure porro ad fugit incididunt ad",
    "surtitle": "ad culpa architecto",
    "title": "eum beatae Ut elit fugiat Nemo"
  },

  // INTRO TYPE
  {
    "_type": "intro",
    "chapters": [
      "qui non incididunt eiusmod cupidatat",
      "doloremque corporis quia",
      "quasi aute",
      "voluptatem fugiat dolor adipisci"
    ],
    "description": "vel aliquid nostrud labore ex eiusmod numquam molestiae mollit enim autem vel dolore voluptas velit quaerat pariatur ut adipisci nulla non sit doloremque totam in Ut ad numquam consequatur cillum Duis quae Lorem sed consequat consequatur commodi eius enim veniam ad unde incididunt exercitationem ad inventore velit nostrum fugit",
    "title": "aliqua"
  },
]
I removed some fields that are not useful but I have string fields used for display/config in my structure.

For example for the
diptych
type, I need to walk through all the text children but they can be on
left
or
right
field.So the generated GROQ request would looks something like

score(
  left[].title match $value,
  left[].surtitle match $value,
  left[].excerpt match $value,
  right[].title match $value,
  right[].surtitle match $value,
  right[].excerpt match $value,
)
It starts to grow really fast and I have a lot of block types with a lot of different fields.
I'm also worried about the performance of running a score functions with a lot of match.

Two possible solutions that I'm thinking about.

1. A function that is walking through a path, something like
walk(['left', 'right'], ['title', 'surtitle', 'excerpt'])
.According to the doc, this doesn't exist right now and maybe it's not relevant to have something like that in GROQ (let me know if I'm going too far).

2. A way to write custom GROQ functions under a custom namespace. That would allow me to implement a custom
ept::text(body)
for
extended-portable-text
and handle my fields more precisely.Again, I'm maybe going too far with GROQ possibilities, let me know if this is relevant.

Maybe I should open a Github discussion because those could be improvements for the language, I don't know what do you think.
AI Update

I feel your pain! Searching through complex nested structures with lots of block types is definitely challenging. Unfortunately, GROQ doesn't currently have a built-in "walk" function that can automatically traverse arbitrary paths, but there are several strategies to make this more manageable.

Current Approaches:

  1. Use pt::text() for standard Portable Text blocks: As you've discovered, pt::text() is great for extracting plain text from standard Portable Text structures. This handles the traversal automatically for those classic block types:
*[_type == "post"] | score(pt::text(body) match $value)
  1. Flatten your searchable text at write time (Recommended): Instead of trying to walk through all possible paths at query time, consider creating a computed field that aggregates all searchable text when documents are saved. You could use a Sanity Function with a document mutation listener to automatically update a searchableText field whenever content changes:
// In your sanity.blueprint.ts
export default {
  name: 'search-indexer',
  functions: [
    {
      name: 'index-searchable-text',
      trigger: 'document.publish',
      handler: async (event, context) => {
        const doc = event.document;
        const searchableText = extractAllText(doc.body); // Your custom extraction logic
        
        await context.client.patch(doc._id)
          .set({ searchableText })
          .commit();
      }
    }
  ]
}

Then your GROQ query becomes simply:

*[_type == "yourType"] | score(searchableText match $value) | order(_score desc)

This approach with Sanity Functions is the modern, recommended way to handle this kind of automation - it runs on Sanity's infrastructure, scales automatically, and doesn't require external hosting like webhooks would.

  1. Array flattening with [] operator: For your specific structure, you can use GROQ's array traversal. The match operator works with tokenized text:
*[_type == "yourType"] | score(
  body[_type == "diptych"].left[].title match $value,
  body[_type == "diptych"].left[].surtitle match $value,
  body[_type == "diptych"].left[].excerpt match $value,
  body[_type == "diptych"].right[].title match $value,
  body[_type == "process"].description match $value,
  body[_type == "intro"].description match $value
) | order(_score desc)

Regarding Your Proposed Solutions:

  1. Custom walk function: This doesn't exist in GROQ currently, and you're right that it would be a language-level enhancement. This would be worth proposing in the Sanity GROQ GitHub discussions.

  2. Custom GROQ functions under namespaces: Unfortunately, GROQ doesn't support user-defined functions or custom namespaces. The namespace functions like pt::text() are built into the GROQ engine itself.

Performance Considerations:

Multiple match operations in a score() function shouldn't cause major performance issues for moderately-sized datasets, but if you're worried about scaling, the "flatten at write time" approach with Sanity Functions is definitely more performant since you're doing the heavy lifting once when content changes rather than on every search query.

My Recommendation:

Use Sanity Functions to create a preprocessing step that flattens all searchable text into a dedicated field when documents are created or updated. This gives you:

  • Best query performance
  • Simplified GROQ queries
  • Full control over what gets indexed
  • Ability to add field weighting by concatenating important fields multiple times
  • No external infrastructure to maintain

You should definitely open a GitHub discussion about the walk function idea though - it's a legitimate use case that others would benefit from!

Show original thread
8 replies
Hi! This is exactly how
score()
is intended. I don’t know exactly how well it scales with the number of fields, but 10-20 fields probably should not be a problem. Have you tried it out yet?
In theory you could rewrite:


score(
  [...left[].title, ...left[].surtitle, ...left[].excerpt] match $value,
  [...right[].title, ...right[].surtitle, ...right[].excerpt] match $value
)
but we are a bit restrictive about what expressions we support inside
score()
, and this is rejected right now. But it could be something we could support. Of course, it would perform identically, since it’s just another way to write the original query.
I totally see that it becomes verbose to write out the full GROQ. We have been talking about possible ways to support reusable fragments or functions. But this stuff is still on the drawing board. You should definitely feel free to open a GitHub issue about this.
Thanks for the answer.I wasn't thinking about the
[] match $value
syntax, this would already help.
There is just something that I dont get in your snippet, what the purpose of the
...
syntax on only
title
and
excerpt
fields?
[...left[].title, left[].surtitle, ...left[].excerpt] match $value
That’s a typo, supposed to be
there
(Fixed it)
And about performances, what's the faster between1. making a big mapping with wrapped
select()
and then use
score()
to match on only one field
*[_type in ['page', 'article']]{
  _id,
  title,
  'body': select(
    _type == 'page' => content.body,
    _type == 'article' => select(
      content.articleContent[]._type == 'diptych' => content.articleContent[].left[].title
      ...
    ),
  )
}
|score(body match $value)
|order(_score desc)
[_score > 0]
{_id, title}
2. make multiples
match

 *[_type in ['page', 'article']]
|score(content.body match $value, content.articleContent[].left[].title match $value)
|order(_score desc)
[_score > 0]
{_id, title}
You can’t
score()
on a projection, actually. The
{ ... }
part can only follow the
| score()
call.
Oh yeah I forgot that, that's solving the question so 😅
Thanks for the answers 🙂

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?