Crawling & Indexing
Robots.txt, sitemaps, crawl budget, and indexing issues.
- News
Silent soft 404s caused 90% traffic loss after site migration
Pages returned HTTP 200 but Google's quality classifiers flagged them as soft 404s, deindexing thousands of URLs before the traffic drop was visible.
- News
1,800 pages deindexed overnight despite clean GSC signals
A WordPress site lost all indexed pages while GSC showed no errors. The blank canonicals looked technical, but the community and Google's own docs point to quality.
- News
Empty category pages trigger soft 404s with no clean fix
Google flags out-of-stock category pages as soft 404s, but removing them risks slow re-indexing when products return. Each option has real tradeoffs.
- News
GSC reports resource failures despite 200 OK in server logs
A WooCommerce site logs clean 200 responses to Googlebot, but GSC flags resource failures caused by CDN interception or timing gaps between crawl and render.
- News
Noindex vs. robots.txt disallow for millions of stub pages
Noindex and robots.txt disallow have different effects on crawling and indexing. Verify you have a crawl budget problem before blocking stub pages at scale.
- News
OpenAI crawl activity tripled after GPT-5, led by search bot
OAI-SearchBot now generates more log events than GPTBot after a 3.5x post-GPT-5 surge, and each bot has its own robots.txt directive you need to manage.
- News
ChatGPT uses SerpAPI to pull Google results, not its own crawler
ChatGPT pulls results from SerpAPI, not its own index, so your Google rankings directly determine whether AI platforms surface your content.
- News
AI bot traffic starves Googlebot of crawl budget on large sites
AI crawler traffic is consuming server bandwidth and crawl budget on large sites, potentially throttling Googlebot discovery and indexing of important pages.
- News
Mueller doubts freshness-based sitemap splits speed crawling
Mueller doubts freshness-based sitemap splits influence crawl frequency, questioning a widely used enterprise SEO tactic with no confirmed crawl benefit.
- News
Blocking CSS and JS in robots.txt breaks indexing, not saves
Blocking CSS and JS in robots.txt breaks Googlebot's page rendering and indexing, not crawl budget. Improve cache headers on static assets instead.
- News
Wildcard DNS lets Googlebot index phantom subdomains as real pages
Wildcard DNS can cause Googlebot to index phantom subdomains as real pages, wasting crawl budget and creating duplicate content signals that may hurt rankings.
- News
Indexing API bypasses 'Discovered - currently not indexed' queue
Indexing API achieved 94% indexation in 48 hours versus 8.4% via sitemap, but bypassing documented restrictions for JobPosting-only pages risks future enforcement.
- News
Mueller lists nine reasons Google overrides your rel=canonical
John Mueller listed nine scenarios where Google picks a different canonical than your tag, from JS rendering failures to URL parameter pattern inference.