Why Your Pages Might Not Be Indexed
Getting your pages into Google's index should be straightforward, but a surprising number of things can go wrong. Some issues are obvious configuration mistakes; others are subtle problems that silently prevent indexing for weeks or months.
This guide covers the most common indexing problems, how to diagnose each one, and what to do about it.
Noindex Meta Tags
What It Is
A noindex directive tells Google not to include a page in its index. It can appear as a meta tag in the HTML <head>:
<meta name="robots" content="noindex">
Or as an HTTP response header:
X-Robots-Tag: noindex
This is the most direct way to exclude a page from search results — and one of the most common accidental causes of indexing problems.
How to Diagnose
- Check the page source for
noindexin any meta robots tag. - Use Google Search Console's URL Inspection tool — it will explicitly tell you if a noindex directive was detected.
- Check your HTTP response headers using browser developer tools or a tool like
curl -I.
How to Fix
Remove the noindex directive from the page. If you're using a CMS like WordPress, check your page settings — many SEO plugins have a per-page noindex toggle that may have been enabled accidentally. After removing the directive, request indexing through Google Search Console.
Robots.txt Blocking
What It Is
The robots.txt file tells search engine crawlers which paths on your site they're allowed to access. A Disallow rule prevents Googlebot from even crawling the page, which means it can never be indexed.
User-agent: *
Disallow: /private/
How to Diagnose
- Check your
robots.txtfile (usually atyoursite.com/robots.txt). - Use Google Search Console's robots.txt Tester to see if specific URLs are blocked.
- In the URL Inspection tool, look for "Blocked by robots.txt" as the crawl status.
How to Fix
Update your robots.txt to allow crawling of the pages you want indexed. Be careful with broad rules like Disallow: / which blocks everything. After making changes, use the robots.txt Tester in GSC to verify your rules work as intended.
Important note: If a page is disallowed in robots.txt but has external links pointing to it, Google may still index the URL (showing just the title and a snippet saying "No information is available for this page") without ever crawling the content. This is rarely what you want.
Canonical Issues
What It Is
The canonical tag (<link rel="canonical" href="...">) tells Google which version of a page is the "primary" one. If the canonical points to a different URL, Google will typically index the canonical version and skip the current one.
How to Diagnose
- Inspect the page source for the
<link rel="canonical">tag. Verify it points to the current page's URL, not a different one. - Check Google Search Console's URL Inspection tool — it shows the "Google-selected canonical" which may differ from what you declared.
- Watch for trailing slash mismatches (
/pagevs/page/), protocol differences (httpvshttps), andwwwvs non-wwwvariations.
How to Fix
Ensure every page's canonical tag points to its own URL (the version you want indexed). If you have duplicate pages, choose one canonical version and point all duplicates to it. Make sure your canonical URLs are consistent with your preferred URL format (HTTPS, with or without www, with or without trailing slashes).
Thin or Duplicate Content
What It Is
Google may choose not to index pages it considers thin (too little useful content) or duplicate (too similar to other pages already in the index). This frequently affects:
- Category or tag pages with only a title and no unique content
- Auto-generated pages with boilerplate text and minimal variation
- Paginated listing pages
- Pages with content scraped or syndicated from other sources
How to Diagnose
- Check Google Search Console's "Pages" report for pages listed as "Duplicate without user-selected canonical" or "Crawled - currently not indexed."
- Compare similar pages on your site — if they share 80%+ of their content, Google may treat them as duplicates.
How to Fix
- Add unique, valuable content to thin pages or consolidate them.
- Use canonical tags to point duplicate pages to the preferred version.
- Consider using noindex on low-value pages (like certain tag or filter pages) to focus Google's attention on your important content.
Server Errors (5xx)
What It Is
When Googlebot encounters a 5xx server error (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable), it can't access the page content. If these errors persist, Google will eventually drop the page from its index.
How to Diagnose
- Check Google Search Console's "Pages" report for "Server error (5xx)" entries.
- Review your server logs for 5xx errors, particularly during the times Googlebot is crawling.
- Use a monitoring service to track uptime and catch intermittent server issues.
How to Fix
Fix the underlying server issue — this could be anything from resource exhaustion to application bugs to hosting problems. Once resolved, Googlebot will re-crawl and can re-index the pages. If the errors were temporary, recovery is usually automatic within a few crawl cycles.
Crawl Budget Exhaustion
What It Is
For large sites, Google allocates a limited crawl budget — the number of pages it will crawl within a given period. If your crawl budget is consumed by low-priority pages, important pages may not get crawled frequently enough to stay indexed.
How to Diagnose
- In Google Search Console, check the "Crawl Stats" report under Settings to see crawl activity trends.
- If you notice important pages aren't being crawled while unimportant pages (faceted navigation, expired content, parameter variations) are consuming crawl activity, you likely have a crawl budget problem.
How to Fix
- Noindex or remove low-value pages that waste crawl budget.
- Use
robots.txtto block crawling of non-essential URL patterns. - Improve site speed — faster response times allow Googlebot to crawl more pages in the same timeframe.
- Reduce URL bloat from parameter variations, session IDs, or infinite calendar/filter combinations.
Manual Actions
What It Is
A manual action is a penalty applied by a human reviewer at Google when your site is found to violate Google's spam policies. Manual actions can result in specific pages or your entire site being demoted or removed from search results.
How to Diagnose
- Check Google Search Console under "Security & Manual Actions" > "Manual actions." If there's a penalty, it will be listed here with details.
How to Fix
Address the specific issue described in the manual action notice (thin content, unnatural links, cloaking, etc.), then submit a reconsideration request through GSC. Recovery can take weeks to months.
Soft 404s
What It Is
A soft 404 occurs when a page returns a 200 OK status code but the content looks like an error page to Google — for example, a "Product not found" message on an otherwise working URL. Google treats these as errors and won't index them.
How to Diagnose
- Check Google Search Console's "Pages" report for "Soft 404" entries.
- Visit the flagged URLs yourself to see if the content is actually empty or error-like.
How to Fix
- If the page truly doesn't exist, return a proper 404 or 410 status code instead of a 200.
- If the page does have valid content, make sure it's substantial enough that Google doesn't mistake it for an error page. Very short pages with generic headings can trigger soft 404 classification.
Redirect Chains
What It Is
A redirect chain occurs when URL A redirects to URL B, which redirects to URL C, and so on. Googlebot will follow up to about 10 redirects, but long chains waste crawl budget and can cause Googlebot to give up before reaching the final destination.
How to Diagnose
- Use a redirect checker tool or browser developer tools (Network tab) to trace the redirect path from the original URL.
- Check Google Search Console for "Redirect error" entries in the Pages report.
- Tools like Indexed can help you monitor whether pages behind redirects maintain their index status over time.
How to Fix
- Update redirect chains so each URL points directly to the final destination (A -> C instead of A -> B -> C).
- When migrating URLs, update internal links to point to the final URL rather than relying on redirects.
- Audit your redirects periodically to catch chains that accumulate over time.
A Systematic Approach to Diagnosing Indexing Issues
When you discover a page isn't indexed, work through this checklist:
- Check robots.txt — is the URL blocked from crawling?
- Check for noindex — is there a noindex tag or header?
- Check the canonical — does it point to a different URL?
- Check server response — is the page returning a 200 status code?
- Check for redirects — does the URL redirect, and if so, does the chain resolve cleanly?
- Check content quality — is the page thin, duplicate, or auto-generated?
- Check GSC manual actions — is there a penalty on the site?
- Check GSC URL Inspection — what does Google specifically report for this URL?
Addressing indexing problems promptly is important because the longer a page stays deindexed, the more organic traffic you lose. Setting up automated monitoring — whether through Google Search Console alerts or a dedicated tool — ensures you catch these issues before they significantly impact your site's visibility.