Crawling & Indexing
Robots.txt, sitemaps, crawl budget, and indexing issues.
-
GSC reports resource failures despite 200 OK in server logs
A WooCommerce site logs clean 200 responses to Googlebot, but GSC flags resource failures caused by CDN interception or timing gaps between crawl and render.
-
Noindex vs. robots.txt disallow for millions of stub pages
Noindex and robots.txt disallow have different effects on crawling and indexing. Verify you have a crawl budget problem before blocking stub pages at scale.
-
OpenAI crawl activity tripled after GPT-5, led by search bot
OAI-SearchBot now generates more log events than GPTBot after a 3.5x post-GPT-5 surge, and each bot has its own robots.txt directive you need to manage.
-
ChatGPT uses SerpAPI to pull Google results, not its own crawler
ChatGPT pulls results from SerpAPI, not its own index, so your Google rankings directly determine whether AI platforms surface your content.
-
AI bot traffic starves Googlebot of crawl budget on large sites
AI crawler traffic is consuming server bandwidth and crawl budget on large sites, potentially throttling Googlebot discovery and indexing of important pages.
-
Mueller doubts freshness-based sitemap splits speed crawling
Mueller doubts freshness-based sitemap splits influence crawl frequency, questioning a widely used enterprise SEO tactic with no confirmed crawl benefit.
-
Blocking CSS and JS in robots.txt breaks indexing, not saves
Blocking CSS and JS in robots.txt breaks Googlebot's page rendering and indexing, not crawl budget. Improve cache headers on static assets instead.
-
Wildcard DNS lets Googlebot index phantom subdomains as real pages
Wildcard DNS can cause Googlebot to index phantom subdomains as real pages, wasting crawl budget and creating duplicate content signals that may hurt rankings.
-
Indexing API bypasses 'Discovered - currently not indexed' queue
Indexing API achieved 94% indexation in 48 hours versus 8.4% via sitemap, but bypassing documented restrictions for JobPosting-only pages risks future enforcement.
-
Mueller lists nine reasons Google overrides your rel=canonical
John Mueller listed nine scenarios where Google picks a different canonical than your tag, from JS rendering failures to URL parameter pattern inference.