Excluding noindex pages from next-sitemap in a custom implementation

1 replies
Last updated: Jun 22, 2023
Hi all đź‘‹ so we are using the next-sitemap, but I'd like to exclude all pages that are set to noindex. I have been told that "As discussed, this looks relatively tricky to achieve with the current implementation (next-sitemap) due to the fact that it executes after the main site build, and outside the scope where access to Sanity is available." but I'd like to verify this. It seems like a pretty obvious thing most people would want so I'm surprised this is considered a tricky thing to achieve. Any tips?
AI Update

You're right to question this - it's definitely a common need! The challenge with next-sitemap is real, but there are good solutions available.

The issue is that next-sitemap typically runs as a post-build script, outside of your Next.js runtime environment where you'd normally have access to your Sanity client. However, there are a few approaches to solve this:

If you're using Next.js 13+ with the App Router, the best approach is to ditch next-sitemap entirely and use Next.js's built-in sitemap.ts file. This runs server-side where you have full access to Sanity:

// app/sitemap.ts
import { client } from '@/sanity/lib/client'

export default async function sitemap() {
  const pages = await client.fetch(`
    *[_type == "page" && !(_id in path("drafts.**")) && seo.noIndex != true] {
      "slug": slug.current,
      _updatedAt
    }
  `)

  return pages.map((page) => ({
    url: `https://yourdomain.com/${page.slug}`,
    lastModified: page._updatedAt,
  }))
}

This approach gives you complete control and direct Sanity access, making it trivial to exclude noindex pages.

Option 2: Fetch Data in next-sitemap Config

You can make next-sitemap work with Sanity by fetching data directly in the config file:

// next-sitemap.config.js
const { createClient } = require('@sanity/client')

const client = createClient({
  projectId: 'your-project-id',
  dataset: 'production',
  useCdn: false,
  apiVersion: '2024-01-01'
})

module.exports = async () => {
  const noIndexSlugs = await client.fetch(`
    *[_type == "page" && seo.noIndex == true].slug.current
  `)

  return {
    siteUrl: 'https://yourdomain.com',
    generateRobotsTxt: true,
    transform: async (config, path) => {
      // Exclude paths that are marked noindex
      if (noIndexSlugs.some(slug => path.includes(slug))) {
        return null // returning null excludes the page
      }
      
      return {
        loc: path,
        changefreq: config.changefreq,
        priority: config.priority,
        lastmod: config.lastmod,
      }
    },
  }
}

The transform function returning null is the key - this tells next-sitemap to exclude that URL from the sitemap.

Option 3: Use additionalPaths with Sanity Data

Another next-sitemap approach is to explicitly define which paths to include:

module.exports = async () => {
  const indexablePages = await client.fetch(`
    *[_type == "page" && seo.noIndex != true] {
      "slug": slug.current,
      _updatedAt
    }
  `)

  return {
    siteUrl: 'https://yourdomain.com',
    additionalPaths: async (config) => {
      return indexablePages.map(page => ({
        loc: `/${page.slug}`,
        lastmod: page._updatedAt,
      }))
    },
  }
}

Bottom line: The native Next.js sitemap approach (Option 1) is cleaner and more maintainable if you're on App Router. If you're stuck with next-sitemap, Options 2 and 3 both work - the config file is just a Node.js module where you can absolutely make Sanity queries before exporting the configuration. The person who told you it's tricky might not have realized you can use async functions and import the Sanity client directly in the config file!

We made our own that runs in preBuildUtilities.js 🙂It’s a very simple loop through all document types we want included, and we exclude the hidden, can rank them according to their _search_priority_ attribute etc.
In the end it’s just a
fs.writeFileSync('./public/sitemap.xml', sitemap)
to write it to the filesystem.

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Was this answer helpful?