Technical SEO For E-commerce
Before diving deeper into technical SEO topics, it’s important to remember that high-level decisions like the tech stack you choose matter a lot for technical SEO.
Sanity provides a combination of a headless CMS and static site generator, which loads your site in an SEO-friendly way without too many dynamic elements that are hard for Google to understand. Choosing to go with a headless CMS and static site generator minimizes problems like search engine crawling and rendering.
Using Sanity with Shopify, for example, enables you to completely customize Liquid-based templates or create your own with front-end frameworks like Hydrogen or Next.js. Sanity Connect provides users a turnkey, bi-directional sync of Shopify data that enables you to create richer storefronts using a fully customizable content workspace, Sanity Studio.
Technical SEO involves optimizing factors that are not visible to users but make a big difference for search engines.
We divide technical SEO into four sub-areas that reflect the search engine ranking process:
Before Google can add any new pages or files to its database (called “the index”), it needs to find the content. When we talk about “crawling,” we mean the process of sending a program to a website to find, download, and store information like text, images, videos, metadata, etc.
There are two types of sitemaps: XML and HTML. The former is a feed for Google, while the latter is a page with links to all pages of a domain that users can use to navigate.
Google crawls the web through hyperlinks, references that point from site to site and page to page. When you create a new site that doesn’t have any references yet, Google will find your site by creating an account with Google Search Console and uploading an XML sitemap.
XML sitemaps are feeds or URLs specifically for Google. You can create XML sitemaps for pages, images, videos, or news articles. To avoid frustrating your audience, it’s critical to only include URLs in XML sitemaps that return a 200 status code, meaning no redirects, pages with errors, or parameter URLs. Overstepping the limit of 50MB Per XML sitemap typically isn’t a problem for sites with fewer than 50,000 pages. Regular e-commerce sites shouldn’t have to worry about XML sitemap size.
With Sanity, you can create an automated XML sitemap by:
- Using GraphQL or GROQ to pull a list of URL slugs
- Merging domain and slugs
- Deploying your sitemap through your front-end
While crawling your e-commerce site, Google uses internal links to find all pages on your domain. Pages without internal links are called “orphaned pages” and should be avoided, except for edge cases like campaign pages that provide no value for SEO. Other than XML sitemaps, Google also uses the anchor text and the relationship between the pointer and receiver of internal links as a ranking factor.
We distinguish between editorial links that are placed directly in the content, and programmatic link modules like “other customers also bought.” Both should help users find relevant information and navigate through the site, but adding link modules is much more scalable than editorial links since they’re replicated on all pages of the same type.
There is no optimal number of editorial links and link modules, but having none is not great for SEO. A good rule of thumb is to place at least 3 internal links in blog articles and link to related categories and products.
In the Sanity Studio, internal linking is standard and offers some great added capabilities. By using strong, bi-directional references to create internal links, Sanity can automatically update your internal links when you change a URL.
The “status code” of a page is the response of the server that hosts it. A status code 200 says, “everything is ok, go ahead and load this page.”
When crawlers follow the network of internal links on a website, they encounter pages with different responses. Ideally, all pages have a status code of 200, but that’s rarely the case.
3xx status codes are reserved for redirects, meaning the page now lives under a different address. The most popular ones are 302 (temporary, old address will return shortly) and 301 (permanent, the old address will never return). Except for short-term campaigns, always aim to use 301 redirects for SEO.
4xx status codes tell users and search engines that content is either temporarily unavailable (404) or permanently gone (410). A 410 status code is the best choice when a store phases a product out. But if a product is only unavailable for a short time and will return to stock, choose a 404.
Lastly, 5xx status codes tell a browser that the server is not available. Of all status codes (except for 200s), 5xx errors are the most problematic because the server isn’t reachable, likely due to an outage. If your site accumulates too many 5xx errors, Google might choose to remove affected pages from the index altogether.
Good SEO hygiene means avoiding 5xx status codes as much as possible while keeping 3xx and 4xx status codes to a minimum. Sanity users can produce and save redirect documents in the Sanity Studio, then use those documents to implement the front-end framework.
The robots.txt file — the most powerful text file on the web — is a guide for what pages, directories, and subdomains bots are allowed to crawl. You can use the robots.txt to exclude parts of your site you don’t want Google to find. Though not all bots obey it, all major search engines follow the robots.txt instructions.
Robots.txt influences crawling but not indexing. A search engine can find a page via means other than links on the site, such as links from other sites, and choose to index the page even if it hasn’t crawled it. However, in the majority of cases, pages blocked in the robots.txt don’t rank well.
The syntax has three basic commands: user-agent (the bot), allow, and disallow. For example, you can exclude Googlebot from your site complete with two simple lines of code (though you never, ever want to see this):
Using “/” after the disallow or allow command either excludes or includes the whole site for a bot. You don’t need to explicitly allow the site for bots since it’s regarded as the default.
Most of the time, robots.txt is used to exclude certain directories or subdomains from being crawled. For example, if you don’t want any bot to crawl an API directory, use:
This command excludes any page within the /api/ directory. Using /api would match any page in the /api directory but also any directory starting with “/api” like “/apical”.
Robots.txt files should also reference XML sitemaps. A simple command is enough:
After crawling a site, Google chooses which pages to index, or add to its database of pages that might be relevant for a search keyword. By default, Google will index every page it encounters that doesn’t have a meta=noindex tag or canonical tag pointing at another page. But there might be instances where Google chooses not to index a page, for example, when it has no content.
Two tags tell search engines to exclude a page from their index: the meta robots tag and the x-robots-tag. The meta robots tag is part of the HTML head section and can be found in the source code. Bots will still crawl the page but not index it.
It looks like this:
<meta name="robots" content="noindex">
The x-robots-tag is part of the server’s response to a browser and is not visible in the source code.
It looks like this:
Unless you can’t add it to the source code of a page, using a meta robots tag is the better solution. You can use it for pages that don’t add SEO value, like paid campaigns or short-term marketing campaign pages.
However, noindexing your “about us” or “terms of service” page is not a good idea since it will prevent searchers from finding critical information that might foster or destroy their trust in your site.
Canonical tags indicate that the current URL is a duplicate (known as duplicate content) and that another URL should be used for ranking. Canonical tags should be used for product facets like blue shirts or URLs containing tracking parameters like domain.com/url?tracking=1 (recognizable by the “?” in the URL).
In the source code, it looks like this:
<link rel="canonical" href="https://www.domain.com/shirts" />
In some cases, it makes sense to intentionally let Google index facets, but this advanced SEO tactic is outside the scope of this article.
Sanity provides granular index management options. When using Next.js, for example, you can simply define an indexing or SEO object to manage noindex or canonical tags and assign them to any page type you’d like.
One important part of indexing is Google’s understanding of language and markets. When people search for something, it’s imperative that Google shows results in the local language This is more complex than most people would assume and one of the most common SEO problems.
For sites that operate in more than one language, using hreflang tags increases the likelihood that Google will rank a site in the right country. For that to work, every page needs to have hreflang tags that point at its equivalents in other languages.
For example, if page A is in English but also available in French and Spanish, page A needs to have 3 hreflang tags: one pointing at itself in English (this is what we call a self-referencing hreflang tag), one pointing at the version in French, and one pointing at the version in Spanish. The French and the Spanish versions need to follow the same principle.
Every page needs to have a self-referencing hreflang tag, meaning the hreflang tag needs to point at the URL it’s on and indicate its language.
There are two types of rendering: server-side (SSR) and client-side (CSR). Server-side rendering is a pre-rendering method that delivers a fully-formed page, ready to be displayed, to the user’s browser. However, SSR can also be resource-intensive and less well-suited to heavily dynamic pages.
Most shop systems use SSR, meaning SSR vs. CSR is not a problem most of the time. If you’re unsure whether Google can render all elements on your page, select “test live URL” in Search Console’s inspection tool and look for elements that don’t appear in the screenshot view.
Since Sanity allows you to use frameworks like Gatsby or Next.js, Server-Side Rendering is a basic functionality that can even be adjusted for different routes. You can even render pages on the CDN level with most providers. In plain terms, Sanity offers a range of simple ways to enable SSR.
Faster rendering times make the experience more user-friendly, allowing Google to crawl more pages over time.
The final step of the search engine ranking process is adding user experience signals to the technical understanding of a site. In recent years, Google has evolved by quantifying UX and calling it page experience.
Most page experience signals are straightforward and can be solved by out-of-the-box shopping systems, like having a secure connection (default HTTPS) and avoiding intrusive interstitials and popups. But two signals need closer examination.
Core Web Vitals (CWV) is a set of three metrics Google uses to understand the quality of user experience on a web page. Before CWV, Google measured page speed. But since speed is a subjective and contextual metric, Google now measures Core Web Vitals.
Google uses field instead of laboratory data to measure CWV. That means it doesn’t matter how well your page performs theoretically but how well it actually performs when users engage with it. Google gets field data from Chrome (called “CrUX” for Chrome User Experience) and provides that data for free in Looker Data Studio or Pagespeed insights.
Keep in mind that a site needs enough traffic for Google to have a statistically relevant sample for lab data.
The three Core Web Vitals are:
- Largest Contentful Paint, which measures how fast the most important element of a page loads (often the hero image in e-commerce)
- First Input Delay, which measures the time between interaction and response on a page
- Cumulative Layout Shift, which measures the shift of elements after a page loads
Instead of relative numbers, Google sets absolute thresholds for Core Web Vitals. As soon as your page hits them, there is no more optimization potential (though you could argue that a better experience is always better for conversion rate optimization), meaning improving core web vitals scores beyond their optimal scores doesn’t yield in better performance
Make sure to measure CWV for both mobile and desktop. Google reports performance for Core Web Vitals and Page Experience metrics over time in Search Console under Page Experience.
The maximum thresholds for Core Web Vitals are:
- Largest Contentful Paint: 2.5 seconds
- First Input Delay: 100 milliseconds
- Cumulative Layout Shift: 0.1 seconds
Using Sanity as a headless CMS, you have full control over how your store is displayed and which elements are loaded. Custom object and rendering rules make optimising Core Web Vitals easy. Since all page elements have predefined positions and sizes, you don’t need to worry about layout shifts. Since time-to-first-byte is short with server-side rendering, you don’t have to worry about large paints or input delays.
The second Page Experience metric worth zooming in on is Mobile Friendliness. Many users browse the web on smartphones, but the experience is so different than on desktops. To account for this difference, Google pushes sites to meet mobile-friendly criteria. As for other Page Experience metrics, Google provides a report for mobile-friendliness in Search Console.
First, pages should be responsive and easy to use. Elements that are too small, especially buttons or fonts, are frustrating to users. Completing everyday tasks on mobile should not be difficult or cumbersome. The graphical size of a site should not overstep the size of your smartphone’s screen.
Second, avoid a mobile subdomain. Even though Google supports mobile subdomains and can understand them, the setup is much more prone to bugs and requires considerable overhead.
You can also use Google’s mobile friendliness tool outside of Search Console.
Schema and structured data can improve Google’s understanding of your site and bring in more traffic by enriching the way your site looks on search. Think about it like a dictionary for your code.
The implementation of schema is straightforward but technical. There are several types of schema, but these ones are the most important for e-commerce:
- Pros and Cons
Be mindful of where you add Schema and related content. Category, product and blog articles might all benefit from FAQ schema, for example. But HowTo schema might not be a good choice for category pages. Product-related Schema should only be used on product pages, not on categories. Ideally, test different types of Schema against each other.
Ready to grow your e-commerce business?
Create seamless and connected experiences for your e-commerce that rank on Google users with Sanity and Shopify. Sign up today to unlock the full potential of your online store!