What Is an XML Sitemap?
An XML sitemap is a file that lists the URLs on your website that you want search engines to know about. Think of it as a table of contents for your site, written in a format that search engine crawlers can easily read and process.
While Google can discover pages by following links, a sitemap provides a direct, comprehensive list of your content. This is especially valuable for:
- New websites with few external backlinks
- Large sites with thousands of pages where some content might be buried deep in the navigation
- Sites with dynamic content that changes frequently
- Pages with few internal links that Googlebot might otherwise miss
Having a sitemap doesn't guarantee that every URL will be indexed, but it ensures that Google at least knows these pages exist and should be considered for crawling.
XML Sitemap Format
The basic structure of an XML sitemap follows the Sitemaps protocol (sitemaps.org). Here's what a simple sitemap looks like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-one</loc>
<lastmod>2025-01-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/page-two</loc>
<lastmod>2025-02-01</lastmod>
</url>
</urlset>
Key Elements
<loc>(required) — The full URL of the page. Must include the protocol (https://) and be properly encoded.<lastmod>(recommended) — The date the page was last meaningfully modified. Use ISO 8601 format (YYYY-MM-DD). This is the most useful optional tag — Google uses it to prioritize which pages to re-crawl.<changefreq>(optional) — How frequently the page is likely to change (always, hourly, daily, weekly, monthly, yearly, never). Google has stated it largely ignores this tag, so it's not critical to include.<priority>(optional) — A value between 0.0 and 1.0 indicating the page's relative importance within your site. Likechangefreq, Google mostly ignores this. Don't spend time fine-tuning it.
In practice, <loc> and <lastmod> are the only tags that really matter. Focus on getting those right.
How to Generate Sitemaps
Content Management Systems
Most popular CMS platforms have built-in or plugin-based sitemap generation:
- WordPress — Plugins like Yoast SEO or Rank Math generate sitemaps automatically. WordPress 5.5+ also includes a basic built-in sitemap at
/wp-sitemap.xml. - Shopify — Generates a sitemap automatically at
/sitemap.xmlfor all products, collections, pages, and blog posts. - Squarespace, Wix, and similar platforms — Handle sitemap generation automatically with no configuration needed.
Static Site Generators and Frameworks
- Next.js — Use the
next-sitemappackage or the built-in sitemap support in the App Router. - Jekyll — The
jekyll-sitemapplugin generates a sitemap at build time. - Hugo — Has built-in sitemap generation with configurable templates.
- Rails — Gems like
sitemap_generatorcreate dynamic sitemaps that can handle large sites with millions of URLs.
Custom or Manual Generation
For custom-built sites, you can generate sitemaps programmatically by querying your database for all public URLs and outputting them in the XML format described above. Many web frameworks have libraries or gems to streamline this.
Submitting Your Sitemap to Google
Once your sitemap is generated, you need to tell Google where to find it.
Method 1: Google Search Console
- Log into Google Search Console.
- Navigate to "Sitemaps" in the left sidebar.
- Enter your sitemap URL (e.g.,
https://example.com/sitemap.xml) and click "Submit."
GSC will show you the submission status, the number of URLs discovered, and any errors found in the sitemap.
Method 2: robots.txt
Add a Sitemap directive to your robots.txt file:
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
This is a passive method — search engines that read your robots.txt will discover the sitemap URL. It's good practice to include this even if you've already submitted through GSC.
Method 3: Ping
Send a GET request to notify Google of your sitemap:
https://www.google.com/ping?sitemap=https://example.com/sitemap.xml
This is useful for automated workflows where you want to notify Google immediately after your sitemap updates.
Sitemap Index Files for Large Sites
A single sitemap file has two limits:
- Maximum 50,000 URLs per file
- Maximum 50 MB (uncompressed) per file
For sites that exceed these limits, use a sitemap index file — a sitemap that points to other sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2025-02-20</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2025-02-18</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2025-02-25</lastmod>
</sitemap>
</sitemapindex>
Even if your site is below the 50,000 URL limit, splitting sitemaps by content type (pages, products, blog posts) is a good organizational practice. It makes it easier to monitor indexing by section in Google Search Console and keeps individual files manageable.
Common Sitemap Mistakes
Including Non-Indexable URLs
Your sitemap should only contain URLs that you actually want indexed. Don't include:
- Pages with
noindexdirectives - URLs blocked by
robots.txt - Redirect URLs (include the final destination instead)
- Pages behind authentication
- Parameter-based duplicate URLs
Including non-indexable URLs wastes crawl budget and sends mixed signals to Google. If you're telling Google about a URL in your sitemap but also telling it not to index that URL, it creates confusion.
Inaccurate <lastmod> Dates
A common mistake is setting <lastmod> to the current date/time every time the sitemap is regenerated, even if the page content hasn't changed. Google uses lastmod to decide which pages to re-crawl. If every page always shows "just modified," Google will eventually learn to distrust your lastmod values and ignore them entirely.
Only update <lastmod> when the page content has actually changed in a meaningful way.
Stale URLs
Over time, sitemaps can accumulate URLs for pages that no longer exist (returning 404 errors) or that have been redirected. Regularly audit your sitemap to remove:
- Dead URLs (404s)
- Redirected URLs (replace with the final destination)
- Pages you've intentionally removed or archived
Exceeding Size Limits
If your sitemap exceeds 50,000 URLs or 50 MB, Google will reject it. Use a sitemap index file to split your URLs across multiple sitemap files. For very large sites, consider gzip-compressing your sitemap files (Google supports .xml.gz format) to stay under the size limit.
Not Updating After Site Changes
If you redesign your site, migrate to new URLs, or change your URL structure, your sitemap must be updated to reflect the new URLs. An outdated sitemap pointing to old URLs leads to wasted crawls and delayed indexing of your new pages.
Sitemap Best Practices
To get the most indexing value from your sitemaps, follow these guidelines:
- Generate sitemaps dynamically. Don't maintain them by hand. Use your CMS, framework, or a build step to generate sitemaps automatically from your actual content.
- Include only canonical, indexable URLs. Every URL in your sitemap should return a 200 status code and be the canonical version of that page.
- Use accurate
<lastmod>dates. Only update when content genuinely changes. - Submit through Google Search Console. Don't just add a
robots.txtreference — actively submit and monitor your sitemap in GSC. - Monitor for errors. GSC will flag issues with your sitemap (invalid URLs, unreachable pages, format errors). Check periodically and fix problems promptly.
- Keep sitemaps in sync with your site. When pages are added, removed, or redirected, the sitemap should reflect those changes within hours, not days or weeks.
- Use gzip compression for large sitemaps. This reduces file size and transfer time for both Google and your server.
Sitemaps and Index Monitoring
A sitemap tells Google what you want indexed — but it doesn't guarantee everything will be. That's why pairing your sitemap with index monitoring is valuable. By comparing the URLs in your sitemap against what's actually in Google's index, you can quickly identify gaps. Tools like Indexed let you track exactly which of your sitemap URLs are indexed and alert you when pages drop out, giving you a clear picture of how effectively Google is processing your content.