Battling Next.js SEO Issues on a Government Jobs Aggregator
Next.js is the default choice for React-based web applications, and Vercel makes deployment effortless. But “effortless” hides a minefield of SEO pitfalls, ones that only surface at scale, across different page types, and under the unforgiving lens of Googlebot’s rendering pipeline. (If you are weighing framework options, see our Astro vs Next.js comparison for SEOs for a side-by-side breakdown.)
This case study follows a government jobs aggregator we will call GovJobsHub, a Next.js App Router site on Vercel with roughly 20,000 pages. The site aggregates federal, state, and local government job listings into programmatically generated pages organized by location, category, and agency. It is the kind of site where technical SEO determines whether tens of thousands of pages get indexed or disappear into a crawl budget black hole.
We will walk through every pitfall we encountered, explain why each one happens in the Next.js + Vercel stack, and show how to configure the stack properly from the start.
Understanding the Page Types
Before diagnosing problems, you need to understand that different page types on a Next.js site can have completely different rendering behaviors, crawl characteristics, and SEO requirements. This was our most important lesson: never assume one working page type means they all work.
GovJobsHub has six distinct page types:
Job Detail Pages (/jobs/[id])
Individual job listings. Each has a title, description, salary range, location, agency, and application deadline. These are the most valuable pages for Google Jobs rich results. Count: ~15,000 pages, constantly churning as listings expire and new ones appear.
SEO requirements: JobPosting structured data, proper 410 status on expiry, fresh content signals, unique meta descriptions.
Location Pages (/jobs/[state], /jobs/[state]/[city])
Aggregation pages listing jobs by geography. State pages show all jobs in that state. City pages narrow further. Count: ~2,500 pages (50 states + major cities).
SEO requirements: Unique content beyond the job list itself, BreadcrumbList schema, proper pagination, no thin-content signals.
Category Pages (/jobs/category/[slug])
Jobs grouped by field: IT, healthcare, law enforcement, administration. Count: ~200 pages.
SEO requirements: Similar to location pages. Category descriptions must not be boilerplate.
Agency Pages (/jobs/federal/[agency])
Federal jobs grouped by agency: VA, DOD, USPS, etc. Count: ~150 pages.
SEO requirements: Agency-specific context, not just a filtered job list.
Hub Pages (/jobs, /jobs/federal, /jobs/remote)
Top-level entry points that aggregate across all jobs or major segments. Count: ~10 pages.
SEO requirements: Strong internal linking, SSR mandatory, canonical management for pagination.
Static Pages (homepage, /about, /faq, /checker)
Marketing and utility pages. Count: ~10 pages.
SEO requirements: Standard on-page SEO, FAQPage schema where applicable.
Each of these page types had different problems. That is the nature of Next.js at scale, the framework’s flexibility means each route can end up with a different rendering strategy, and silent inconsistencies compound.
Pitfall 1: The Rendering Gap Across Page Types
This was the most damaging issue and the hardest to detect.
GovJobsHub’s job detail pages were fully server-rendered. You could curl the URL and see complete job descriptions, salary data, and structured markup in the raw HTML. These pages looked great to Googlebot.
But the main /jobs hub page, the highest-traffic listing page and the root of the site’s internal link graph, told a different story. The raw HTML contained React Server Component flight data: serialized self.__next_f.push() arrays instead of actual job cards. The content only appeared after JavaScript execution.
The location pages were a mix. State-level pages (/jobs/california) rendered server-side. But city-level pages (/jobs/california/los-angeles) used a hybrid approach where the job list was delivered as serialized JSON in RSC payloads rather than rendered HTML.
Why This Happens
In the Next.js App Router, any component marked with 'use client' renders on the client. If your job listing grid is a Client Component, maybe because it has sorting, filtering, or pagination interactions, the actual job data is not in the initial HTML. The server sends a placeholder and the RSC payload, and the client hydrates it.
The insidious part: this works perfectly in the browser. You never notice unless you view source or disable JavaScript. As Sam Torres noted in her JavaScript SEO AMA, rendering queues, not crawl budget, are the real bottleneck for JavaScript-heavy sites.
How to Detect It
# Fetch raw HTML and check for actual content vs RSC payloads
curl -s https://yoursite.com/jobs | grep -c "self.__next_f.push"
curl -s https://yoursite.com/jobs | grep -c "<article"
# If you see many __next_f.push calls and zero <article> tags,
# the page depends on client-side renderingRun this check against every page type. Do not assume, verify. For a more structured approach, Sitebulb’s rendering comparison (response vs. render) covers this analysis in depth.
How to Fix It
Move SEO-critical content out of Client Components. In the App Router, the default is Server Components, content renders on the server unless you explicitly opt out with 'use client'. The fix is architectural:
// BAD: Job list in a Client Component
'use client'
export function JobList({ jobs }) {
// This content is NOT in initial HTML
return jobs.map(job => <JobCard key={job.id} job={job} />)
}
// GOOD: Server Component with client interactivity separated
// JobList is a Server Component (default)
export function JobList({ jobs }) {
// This content IS in initial HTML
return (
<div>
{jobs.map(job => <JobCard key={job.id} job={job} />)}
{/* Only the interactive filter is a Client Component */}
<JobFilter />
</div>
)
}The principle: render the content on the server, hydrate only the interactivity on the client. Every page type that you want indexed should have its primary content in Server Components.
Pitfall 2: _rsc Parameter Pollution
When a user navigates between pages on a Next.js App Router site, the framework fetches an optimized RSC payload by appending ?_rsc=XXXXX to the URL. This is an internal mechanism, it is not meant to be seen by search engines.
But Googlebot sees everything. It discovers these _rsc URLs during JavaScript rendering, follows them, and attempts to index them. The result: thousands of “Duplicate, Google chose different canonical” entries in Search Console.
GovJobsHub had over 1,300 of these entries within three months of launch.
Why It Is Hard to Fix
This is a framework-level issue with no clean solution:
- robots.txt
Disallow: /*?_rsc=: Google still discovers and reports the URLs. They show as “Indexed, though blocked by robots.txt” instead of disappearing. - Middleware redirect: Next.js strips
_rscfrom theNextRequestobject before middleware processes it. You literally cannot see the parameter in middleware code. next.config.jsredirects: Usinghasconditions to redirect_rscURLs reduced errors from 1,300 to about 400, but did not eliminate the problem.- Disabling prefetch: Setting
prefetch={false}on all<Link>components prevents_rscrequests entirely but sacrifices the performance benefits of prefetching.
The Pragmatic Approach
There is no silver bullet. The combination that worked best for GovJobsHub:
- robots.txt Disallow: blocks most crawling of these URLs
- Canonical tags on every page: pointing to the clean URL without parameters
- Selective prefetch disabling: turn off prefetch on pages with dozens of internal links (like listing pages) where the
_rscgeneration is heaviest - Accept the noise: some
_rscentries in Search Console are cosmetic. Focus on whether your clean URLs are indexed correctly, not on eliminating every duplicate report
# robots.txt
User-agent: *
Disallow: /*?_rsc=
Disallow: /*&_rsc=The Bigger Question
This issue is tracked in multiple GitHub discussions with hundreds of participants and no official resolution from the Next.js team. Making matters worse, Google recently removed its JavaScript SEO guidance, leaving practitioners without an official render validation framework. If you are building a site where clean index coverage matters, and at 20,000 pages, it absolutely does, you need to account for this as a known, ongoing maintenance burden.
Setting It Up Right: Rendering Strategy Per Page Type
The core mistake is treating all pages the same. Each page type on a programmatic site needs its own rendering strategy based on content volume, update frequency, and SEO value.
Here is what worked for GovJobsHub after the fixes:
| Page Type | Rendering | Revalidation | Rationale |
|---|---|---|---|
Job detail (/jobs/[id]) | ISR | 24 hours | High volume, moderate churn. Cannot SSG 15K pages at build time. |
State pages (/jobs/[state]) | SSG | Build time | 50 pages, stable URLs, high SEO value. Pre-build all of them. |
City pages (/jobs/[state]/[city]) | ISR | 48 hours | ~2,500 pages, moderate churn. Too many for full SSG. |
Category pages (/jobs/category/[slug]) | SSG | Build time | ~200 pages, stable. Pre-build all. |
Agency pages (/jobs/federal/[agency]) | SSG | Build time | ~150 pages, stable. Pre-build all. |
Hub pages (/jobs, /jobs/federal) | ISR | 1 hour | High traffic, content changes with each new listing. |
| Static pages | SSG | Build time | Rarely changes. |
The Decision Framework
Use SSG (generateStaticParams) when:
- Page count is under 500
- URLs are stable and predictable
- Content changes infrequently (weekly or less)
- Pages are high SEO value (location hubs, category landing pages)
Use ISR when:
- Page count is in the thousands
- Content updates daily but not in real-time
- You need fresh content without full rebuilds
- Set
revalidateshorter than your content’s lifespan
Never use client-side rendering for:
- Any page you want indexed
- Any page with structured data
- Any page that is a target for internal linking
// Example: generateStaticParams for state pages
// This pre-builds all 50 state pages at build time
export async function generateStaticParams() {
return US_STATES.map(state => ({
state: state.slug,
}))
}
// Example: ISR for job detail pages
// Revalidates every 24 hours
export const revalidate = 86400 Pitfall 3: Robots.txt and Meta Robots Contradictions
GovJobsHub had a resume checker tool at /checker. The robots.txt blocked it with Disallow: /checker/. But the page’s HTML included <meta name="robots" content="index, follow">. These directives conflict, robots.txt prevents crawling, but the meta tag (which Googlebot never sees because it cannot crawl the page) says to index it.
This is not just a GovJobsHub problem. It is a pattern on Next.js sites where robots.txt is managed in one file and meta robots are set in page-level metadata, two different systems with no built-in consistency check.
Other robots.txt Mistakes
Blocking static assets: Several Next.js sites block /_next/static/ in robots.txt, thinking they are hiding implementation details. This prevents Googlebot from loading CSS and JavaScript needed to render pages. Only block /_next/data/ if you want to prevent JSON endpoint crawling.
Missing _rsc blocking: As covered above, _rsc parameters should be disallowed.
Overly broad API blocking: Disallow: /api/ blocks all API routes, but some sites serve structured data or public content through API routes that should be crawlable.
Proper robots.txt for Next.js on Vercel
User-agent: *
Allow: /_next/static/
Allow: /_next/image/
Disallow: /_next/data/
Disallow: /api/
Disallow: /*?_rsc=
Disallow: /*&_rsc=
Sitemap: https://www.yoursite.com/sitemap.xmlPair this with consistent meta robots in your layout:
// app/layout.tsx — default for all pages
export const metadata = {
robots: {
index: true,
follow: true,
},
}
// app/admin/layout.tsx — override for non-public sections
export const metadata = {
robots: {
index: false,
follow: false,
},
} Pitfall 4: Soft 404s and Wrong Status Codes
The SALT.agency study of 50 Next.js sites found that 41 out of 50 failed to return proper 404 status codes for non-existent URLs. GovJobsHub was among them initially.
The problem manifests differently per page type:
Job Detail Pages
When a job listing expires, what should happen? The page should return 410 Gone, telling Google the content existed but has been permanently removed. Instead, GovJobsHub was returning 200 with a “This job is no longer available” message. Google kept these pages in the index with stale JobPosting structured data, wasting crawl budget and showing expired listings in search results. This matters even more than you might expect: Google may skip JavaScript rendering entirely for non-200 pages, so getting the status code right determines whether your error handling is even rendered.
Dynamic Route Catchalls
Requesting /jobs/not-a-real-state returned a 200 status code with a generic “No jobs found” page instead of a 404. At scale, this means any URL under /jobs/ appears valid to crawlers, encouraging them to waste budget on non-existent paths.
The Fix
// app/jobs/[id]/page.tsx
import { notFound } from 'next/navigation'
export default async function JobPage({ params }) {
const job = await getJob(params.id)
if (!job) {
notFound() // Returns 404 status code
}
if (job.expired) {
// For expired content, return 410 Gone
return new Response(null, { status: 410 })
}
return <JobDetail job={job} />
}Verify per page type. Curl non-existent URLs under each route pattern and check the status code:
curl -o /dev/null -s -w "%{http_code}" https://yoursite.com/jobs/fake-id-12345
curl -o /dev/null -s -w "%{http_code}" https://yoursite.com/jobs/not-a-state
curl -o /dev/null -s -w "%{http_code}" https://yoursite.com/jobs/category/fakeEvery one of those should return 404, not 200.
Pitfall 5: Missing and Broken Structured Data
GovJobsHub’s structured data situation was a mixed bag. Job detail pages had solid JobPosting schema. Everything else was bare.
What Was Missing
| Page Type | Had | Needed |
|---|---|---|
| Job detail | JobPosting | Already good |
| Location pages | Organization only | JobPosting aggregate, BreadcrumbList |
| Category pages | Organization only | BreadcrumbList |
| Hub pages | Organization, WebSite | BreadcrumbList |
| FAQ page | Nothing | FAQPage |
| All pages | Nothing | BreadcrumbList |
Next.js-Specific JSON-LD Gotcha
In Next.js, you cannot put JSON-LD in the <head> the way you might in a traditional HTML site. The JSON-LD <script> tag must be rendered within a Server Component in the page body:
// app/jobs/[id]/page.tsx
export default async function JobPage({ params }) {
const job = await getJob(params.id)
const jsonLd = {
'@context': 'https://schema.org',
'@type': 'JobPosting',
title: job.title,
description: job.description,
datePosted: job.postedDate,
validThrough: job.expiryDate,
hiringOrganization: {
'@type': 'Organization',
name: job.agency,
},
jobLocation: {
'@type': 'Place',
address: {
'@type': 'PostalAddress',
addressLocality: job.city,
addressRegion: job.state,
addressCountry: 'US',
},
},
}
return (
<>
<script
type="application/ld+json"
dangerouslySetInnerHTML={{
// Sanitize to prevent XSS — replace < with unicode escape
__html: JSON.stringify(jsonLd).replace(/</g, '\\u003c'),
}}
/>
<JobDetail job={job} />
</>
)
}Critical: The JSON.stringify XSS sanitization (replacing < with \u003c) is not optional. Without it, malicious job descriptions could inject scripts via structured data.
BreadcrumbList for Hierarchical Pages
Every page with a position in the site hierarchy should have BreadcrumbList schema. For a site with /jobs/california/los-angeles, that means:
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{ "@type": "ListItem", "position": 1, "name": "Jobs", "item": "https://www.govJobshub.com/jobs" },
{ "@type": "ListItem", "position": 2, "name": "California", "item": "https://www.govJobshub.com/jobs/california" },
{ "@type": "ListItem", "position": 3, "name": "Los Angeles" }
]
}Implement this as a reusable Server Component that takes the breadcrumb trail as a prop. Use it on every page type except the homepage.
Pitfall 6: Boilerplate Content at Scale
Every state page on GovJobsHub had an “About Government Jobs in [State]” section that read almost identically:
“Government jobs in [State] offer competitive salaries, excellent benefits, and job security. Browse our latest listings from federal, state, and local agencies.”
That sentence appeared, with minor variations, on 50 state pages, hundreds of city pages, and dozens of category pages. At scale, this is a thin content signal. Google’s quality systems look for pages that add substantive, unique value. When the only difference between /jobs/california and /jobs/texas is the state name in a template sentence, both pages risk being classified as low-quality.
The Fix: Data-Driven Unique Content
Replace boilerplate with programmatically generated content that is genuinely unique per page:
// Generate unique location context
function getLocationContent(state: string, stats: StateStats) {
return {
intro: `${state} has ${stats.activeListings.toLocaleString()} open government positions across ${stats.agencyCount} agencies. The average salary is $${stats.avgSalary.toLocaleString()}.`,
topAgencies: `The largest employers are ${stats.topAgencies.slice(0, 3).join(', ')}.`,
trends: stats.monthOverMonth > 0
? `Listings are up ${stats.monthOverMonth}% compared to last month.`
: `Listings are down ${Math.abs(stats.monthOverMonth)}% compared to last month.`,
}
}Even two or three sentences of unique, data-driven content per page significantly differentiates them. The key is pulling from real data, job counts, salary ranges, top employers, trending categories, not just swapping a place name into a template.
Category and Agency Pages
The same principle applies. A category page for “IT & Technology” should reference the specific agencies hiring for tech roles, the salary range for that category, and any notable trends. An agency page for the VA should mention its hiring volume, locations, and most common position types.
This content does not need to be hand-written. It needs to be data-driven and genuinely different per page.
Pitfall 7: Page Churn and the Expiring Content Problem
A job board is not a blog. Content does not accumulate, it churns. GovJobsHub has roughly 15,000 job detail pages at any given time, but individual listings have a lifespan of 30 to 90 days. That means 2,000 to 5,000 pages expire every month and roughly the same number of new pages appear.
This creates a cascade of SEO problems that static content sites never face.
The Index Bloat Cycle
Here is what happens without intervention:
- A job listing is posted. ISR generates the page. Googlebot crawls it. It enters the index with JobPosting rich results.
- 60 days later, the listing expires. The source data is removed.
- But the ISR cache still serves the old page. Googlebot crawls the cached version and sees active content.
- Eventually ISR revalidates and the page updates, but to what? If the code renders a “This job is no longer available” message with a 200 status code, Google keeps the URL in the index as a soft 404.
- Meanwhile, the expired listing still has JobPosting structured data in Google’s cache, showing in search results with stale salary, location, and apply links.
At scale, this means hundreds of expired listings sitting in Google’s index at any given time, damaging user trust and wasting crawl budget.
The Fix: A Page Lifecycle Strategy
Every page type with expiring content needs a defined lifecycle:
Active listing (200 OK):
- Full content, JobPosting schema, in sitemap
- ISR revalidation every 24 hours
Expired listing (410 Gone):
- Return 410 status immediately, not a soft 404, not a redirect
- Strip JobPosting schema
- Remove from sitemap on next generation
- Trigger on-demand ISR revalidation so the 410 is served immediately, not after the next cache interval
// app/jobs/[id]/page.tsx
export default async function JobPage({ params }) {
const job = await getJob(params.id)
if (!job) {
notFound() // 404 for never-existed
}
if (job.status === 'expired') {
// Return 410 Gone — this listing existed but is permanently removed
return new Response('This job listing has been removed.', {
status: 410,
headers: { 'Content-Type': 'text/html' },
})
}
return <JobDetail job={job} />
}The Google Indexing API
For job boards specifically, Google offers the Indexing API which supports URL_DELETED notifications. This is dramatically faster than waiting for Googlebot to recrawl, deletions are processed within minutes, not days.
// Notify Google when a listing expires
async function notifyGoogleOfRemoval(url: string) {
const auth = new google.auth.GoogleAuth({
scopes: ['https://www.googleapis.com/auth/indexing'],
})
const client = await auth.getClient()
await client.request({
url: 'https://indexing.googleapis.com/v3/urlNotifications:publish',
method: 'POST',
data: {
url,
type: 'URL_DELETED',
},
})
}GovJobsHub was not using this API at all initially. After implementing it, stale listings were deindexed within hours instead of lingering for weeks.
Sitemap Freshness
Your sitemap must reflect page removals quickly. If you generate sitemaps at build time, expired listings stay in the sitemap until the next deploy. For a job board, sitemaps should be generated dynamically or regenerated on a schedule shorter than your content’s average lifespan.
At minimum, run sitemap regeneration daily. Include only active listings. Set lastmod to the listing’s actual post date, not the sitemap generation time.
Pitfall 8: Filter Pills, the Hidden Client Rendering Trap
GovJobsHub has filter pills on every listing page. Users click pills to filter by job category (IT, Healthcare, Law Enforcement), location type (Remote, On-site, Hybrid), salary range, and agency. These pills are a standard UI pattern, small, rounded chips that toggle on and off.
They are also an SEO disaster in a typical Next.js implementation.
The Rendering Problem
Filter pills are interactive. Users click them. They toggle state. They update the job list below. In Next.js, this means they are almost always implemented as Client Components:
// Typical implementation — entirely client-rendered
'use client'
export function FilterPills({ categories, activeFilters, onToggle }) {
return (
<div className="flex gap-2 flex-wrap">
{categories.map(cat => (
<button
key={cat.slug}
onClick={() => onToggle(cat.slug)}
className={activeFilters.includes(cat.slug) ? 'active' : ''}
>
{cat.name} ({cat.count})
</button>
))}
</div>
)
}The problem: none of this renders in the initial HTML. Googlebot sees an empty <div> where the pills should be. The category names, the job counts, the entire navigational structure of the filter UI, all invisible to crawlers.
This matters because those pill labels are often keyword-rich terms that help search engines understand the page’s topic. “IT & Technology (342 jobs)” is a strong relevance signal for a listing page. When it is client-rendered, that signal disappears.
The URL Parameter Problem
Clicking pills typically updates the URL: /jobs?category=IT&location=remote. Each combination is a unique URL that Googlebot can discover and attempt to index. With 20 categories, 3 location types, and 5 salary ranges, that is potentially hundreds of filtered URL variations per listing page, most of which contain duplicate or near-duplicate content.
GovJobsHub had over 800 filtered URL variations discovered in Search Console, each generating a “Duplicate, Google chose different canonical” warning.
The Fix: Server-Rendered Pills with Client Interactivity
Separate the rendering from the interaction:
// Server Component — renders pill labels and counts in initial HTML
export function FilterPills({ categories, activeFilters }) {
return (
<div className="flex gap-2 flex-wrap">
{categories.map(cat => (
<PillButton
key={cat.slug}
slug={cat.slug}
label={`${cat.name} (${cat.count})`}
isActive={activeFilters.includes(cat.slug)}
/>
))}
</div>
)
}
// Client Component — only handles the click interaction
'use client'
export function PillButton({ slug, label, isActive }) {
const router = useRouter()
return (
<button
onClick={() => {
// Update URL params and re-fetch
const params = new URLSearchParams(window.location.search)
if (isActive) {
params.delete('category', slug)
} else {
params.append('category', slug)
}
router.push(`?${params.toString()}`, { scroll: false })
}}
className={isActive ? 'active' : ''}
>
{label}
</button>
)
}Now the pill labels and counts render in the initial HTML (visible to Googlebot), while the click behavior hydrates on the client.
Managing Filtered URLs
Even with server-rendered pills, the URL parameter problem remains. The fix is canonical management:
// All filtered views canonical to the unfiltered URL
export async function generateMetadata({ searchParams }) {
const hasFilters = Object.keys(searchParams).some(
key => ['category', 'location', 'salary'].includes(key)
)
return {
alternates: {
canonical: 'https://www.yoursite.com/jobs', // Always clean URL
},
// Noindex filtered views to prevent duplicate content
...(hasFilters && {
robots: { index: false, follow: true },
}),
}
}The follow: true is important, even though the filtered page is noindexed, you want Googlebot to follow the links on it to discover individual job detail pages.
Pitfall 9: Core Web Vitals Across Page Types
The SALT.agency study of 50 Next.js sites found that only 3 out of 50 passed LCP and only 1 out of 50 passed all three Core Web Vitals thresholds. GovJobsHub was not an outlier, it failed LCP on listing pages and had INP issues on interactive pages.
LCP: The Image Problem
Listing pages have hero images and dozens of job cards with agency logos. The default behavior of next/image is to lazy-load everything. But the hero image is above the fold, it should not be lazy-loaded.
// BAD: Hero image lazy-loads by default
<Image src={heroImage} alt="..." width={1200} height={600} />
// GOOD: Hero image preloaded with priority
<Image src={heroImage} alt="..." width={1200} height={600} priority />This single prop (priority) was the difference between a 3.8s and a 2.1s LCP on GovJobsHub’s listing pages.
INP: The Hydration Problem
Interactive pages, those with search filters, sorting, and pagination, had poor Interaction to Next Paint (INP) scores. The cause: heavy hydration. When the client-side JavaScript boots up and hydrates Server Component output, the main thread is blocked. Any user interaction during hydration (clicking a filter, typing in search) queues behind the hydration work.
Mitigations:
- Reduce Client Component scope: hydrate only the interactive parts, not the entire page
- Use
React.lazyand dynamic imports: defer hydration of below-the-fold interactive components - Avoid CSS-in-JS: libraries like Styled Components inject styles at runtime, causing layout recalculations that block the main thread
CLS: The Dynamic Content Problem
Location pages that load job counts and statistics asynchronously caused Cumulative Layout Shift. The page renders, then numbers pop in and push content down.
Fix: Reserve space for dynamic content using CSS min-height or skeleton placeholders that match the final content dimensions. Better yet, fetch the data on the server so it is in the initial render.
Test Per Page Type
CWV scores vary dramatically across page types. The homepage might score 95 on Lighthouse while listing pages score 45. Test every template independently:
# Test each page type with Lighthouse CLI
lighthouse https://yoursite.com/ --output=json
lighthouse https://yoursite.com/jobs --output=json
lighthouse https://yoursite.com/jobs/california --output=json
lighthouse https://yoursite.com/jobs/12345 --output=json Pitfall 10: Vercel-Specific Issues
Vercel makes Next.js deployment simple, but its platform constraints create SEO-specific challenges that are not obvious until you hit them.
ISR Cache Staleness
Vercel’s ISR implementation caches pages on its Edge Network. When revalidate is set to 86400 (24 hours), the page can serve stale content for up to 24 hours after the source data changes. For a job board, this means:
- Expired job listings still appear in search results with active JobPosting schema
- Google crawls the cached page and sees content that no longer exists
- When the cache finally revalidates, the page updates, but Google may not re-crawl for days
Fix: Use on-demand revalidation. When a job listing is removed from the database, call Vercel’s revalidation API:
// API route: /api/revalidate
export async function POST(request: Request) {
const { path, secret } = await request.json()
if (secret !== process.env.REVALIDATION_SECRET) {
return new Response('Unauthorized', { status: 401 })
}
await revalidatePath(path)
return Response.json({ revalidated: true })
}Serverless Function Timeouts
Vercel’s default function timeout is 10 seconds on the Hobby plan, 60 seconds on Pro. Pages that query large datasets, like a hub page aggregating 20,000 job listings for sorting and filtering, can timeout during SSR.
Fix: Pre-compute aggregations. Do not query the full dataset on every request. Build summary data at deploy time or via a scheduled job, and have the SSR page read from the pre-computed summary.
Middleware Limitations for SEO
Vercel middleware runs at the Edge, which means:
- No access to Node.js APIs (no
fs, no database drivers) _rscparameters are stripped from the request before middleware sees them- Response body cannot be modified (only headers and redirects)
If you need server-side SEO logic, like conditionally setting X-Robots-Tag headers based on content state, you need to do it in the route handler or page component, not middleware.
www vs non-www
Vercel does not automatically redirect between www and non-www. Both versions serve content, creating duplicate pages. Configure this in vercel.json or via middleware:
// middleware.ts — redirect non-www to www
import { NextResponse } from 'next/server'
import type { NextRequest } from 'next/server'
export function middleware(request: NextRequest) {
const hostname = request.headers.get('host') || ''
if (hostname === 'govJobshub.com') {
return NextResponse.redirect(
new URL(request.url.replace('govJobshub.com', 'www.govJobshub.com')),
301
)
}
} Pitfall 11: Internal Linking at Scale
With 20,000 pages, internal linking is not something you do manually. It is an architectural decision that determines which pages get crawled, how link equity flows, and which pages rank.
The Hub-and-Spoke Problem
GovJobsHub’s initial linking structure was flat: the main /jobs page linked to paginated results, and each job card linked to a detail page. Location pages and category pages existed but were poorly connected to the job detail pages and to each other.
This created a shallow hub-and-spoke pattern where:
- Job detail pages were 3+ clicks from the homepage
- Location pages did not link to related category pages
- Category pages did not link to related location pages
- No cross-linking between related geographic areas
The Fix: Programmatic Cross-Linking
Build internal links into your templates:
// On a state page (/jobs/california), link to:
// 1. Child city pages
// 2. Related category pages for that state
// 3. Neighboring state pages
// 4. Parent hub page
function StatePageLinks({ state, topCities, topCategories }) {
return (
<>
<nav aria-label="Cities in this state">
<h2>Top Cities in {state.name}</h2>
<ul>
{topCities.map(city => (
<li key={city.slug}>
<Link href={`/jobs/${state.slug}/${city.slug}`}>
{city.name} ({city.jobCount} jobs)
</Link>
</li>
))}
</ul>
</nav>
<nav aria-label="Job categories in this state">
<h2>Popular Categories in {state.name}</h2>
<ul>
{topCategories.map(cat => (
<li key={cat.slug}>
<Link href={`/jobs/category/${cat.slug}`}>
{cat.name}
</Link>
</li>
))}
</ul>
</nav>
</>
)
}Crawl Depth Matters
The goal: every page should be reachable within 3 clicks from the homepage. For a 20,000-page site, that requires deliberate hub-and-spoke architecture with cross-links between spokes.
Homepage links to hub pages (jobs, federal, remote) and featured locations/categories. Hub pages link to location and category index pages. Location and category pages link to individual job listings and cross-link to each other. Job detail pages link back to their location and category parents.
Setting It Up Right: Sitemaps for 20K Pages
Next.js’s built-in sitemap.ts works for small sites. At 20,000 pages, you need a sitemap index that splits pages by type.
The Problem
A single sitemap file with 20,000 URLs is technically valid (the limit is 50,000), but it is harder to debug and monitor. When Google reports indexing issues, a monolithic sitemap gives you no granularity about which page types are affected.
The Fix: Sitemap Index by Page Type
// app/sitemap.ts — generates sitemap index
import { MetadataRoute } from 'next'
export default function sitemap(): MetadataRoute.Sitemap {
return [
// Return sitemap index entries
// Each points to a type-specific sitemap
]
}
// app/sitemaps/jobs/sitemap.ts
// app/sitemaps/locations/sitemap.ts
// app/sitemaps/categories/sitemap.tsOr use the next-sitemap package, which handles sitemap index generation, splitting, and per-route configuration automatically.
Priority and Changefreq per Page Type
| Page Type | Priority | Changefreq |
|---|---|---|
| Homepage | 1.0 | daily |
| Hub pages | 0.9 | daily |
| State pages | 0.8 | daily |
| Category pages | 0.8 | weekly |
| City pages | 0.7 | daily |
| Agency pages | 0.7 | weekly |
| Job detail pages | 0.6 | weekly |
| Static pages | 0.5 | monthly |
Note: Google has stated it largely ignores priority and changefreq, but other search engines (Bing, Yandex) still use them, and they help with debugging.
lastmod Must Be Accurate
Do not set lastmod to the current build time for every page. Use the actual content modification date. For ISR pages, this means tracking when the underlying data last changed, not when the cache was last generated.
Setting It Up Right: Canonical URLs
Canonical mismanagement is the silent killer of large Next.js sites. GovJobsHub had three distinct canonical problems.
Problem 1: Pagination Canonicals
Paginated pages (/jobs?page=2, /jobs?page=3) should each have a self-referencing canonical. The second page of results is not a duplicate of the first, it is a distinct page with different content. But some Next.js SEO guides incorrectly suggest pointing all paginated pages to page 1.
// Correct: self-referencing canonical on paginated pages
export async function generateMetadata({ searchParams }) {
const page = searchParams.page || '1'
const canonicalUrl = page === '1'
? 'https://www.yoursite.com/jobs'
: `https://www.yoursite.com/jobs?page=${page}`
return {
alternates: {
canonical: canonicalUrl,
},
}
}Problem 2: Parameter Pollution
Beyond _rsc, other query parameters can create duplicates: ?sort=salary, ?filter=remote, ?q=engineer. Each parameter combination is a unique URL to Google.
Rule: Pages with sort/filter parameters should canonical to the unfiltered version. Search result pages should be noindexed.
// Canonical always points to clean URL
export async function generateMetadata({ searchParams }) {
return {
alternates: {
canonical: 'https://www.yoursite.com/jobs',
},
// Noindex search results
...(searchParams.q && {
robots: { index: false, follow: true },
}),
}
}Problem 3: Trailing Slashes
Next.js uses 308 redirects (not 301) for trailing slash normalization. Pick one format and enforce it in next.config.js:
// next.config.js
module.exports = {
trailingSlash: false, // /jobs, not /jobs/
} The Setup Checklist
If you are starting a Next.js + Vercel project today and SEO matters, configure these before writing a single page component.
1. Rendering Defaults
- All page content in Server Components by default
'use client'only for interactive UI elements (filters, modals, forms)- Audit with
curlorview-source:before launch, if content is not in raw HTML, it is not server-rendered
2. Metadata Configuration
- Use the Metadata API (
metadataobject orgenerateMetadatafunction), notnext/head - Set defaults in
app/layout.tsx, override per route - Every page gets: title, description, canonical, robots, Open Graph
3. robots.txt
- Allow
/_next/static/and/_next/image/ - Disallow
/_next/data/,/api/,/*?_rsc= - Include sitemap reference
4. Sitemap
- Split by page type for sites over 1,000 pages
- Accurate
lastmodfrom content timestamps, not build time - Submit to Search Console immediately
5. Structured Data
- JSON-LD in Server Components, not Client Components
- Sanitize with
.replace(/</g, '\\u003c') - Match schema type to page type (JobPosting, BreadcrumbList, FAQPage, etc.)
6. Status Codes
- Use
notFound()for missing content - Return 410 for expired content
- Verify every dynamic route pattern returns 404 for invalid params
7. Vercel Configuration
- www/non-www redirect in middleware or
vercel.json - On-demand ISR revalidation for content removals
- Function timeout appropriate for your data volume
8. Internal Linking
- Programmatic cross-links between page types
- Maximum 3 clicks from homepage to any page
- Use
<Link>, neverrouter.push()for navigation between indexable pages
9. Core Web Vitals
priorityon above-the-foldnext/imagecomponents- Use
next/fontfor web fonts - Avoid CSS-in-JS libraries
- Test every page template, not just the homepage
10. Monitoring
- Google Search Console coverage report, check weekly for the first 3 months
- Log file analysis if possible, see which pages Googlebot actually crawls
- Automated crawl audits with tools like Screaming Frog (or its MCP server for AI-assisted audits)
- CrUX data for real-user CWV per page type
Results and Takeaways
After implementing the fixes described above, GovJobsHub saw measurable improvements over 8 weeks:
- Indexed pages increased from ~4,000 to ~14,000 (of 20,000 total)
- _rsc duplicate entries in Search Console dropped from 1,300 to under 200
- Soft 404 errors eliminated entirely
- Average LCP on listing pages improved from 3.8s to 2.1s
- JobPosting rich results started appearing for individual job listings within 3 weeks of schema implementation
What We Learned
1. Audit every page type independently. The rendering strategy, status codes, structured data, and CWV scores can be completely different across page types on the same Next.js site. A passing grade on the homepage means nothing for your listing pages.
2. Next.js defaults are not SEO defaults. The framework does not block you from doing SEO well, but it does not do it for you. Every SEO requirement, rendering strategy, canonical management, structured data, status codes, needs explicit configuration.
3. _rsc is a fact of life. There is no clean fix. Budget time for managing the noise in Search Console and implement the mitigation stack (robots.txt + canonicals + selective prefetch disabling).
4. Vercel adds a caching layer you must account for. ISR staleness, function timeouts, and middleware limitations are platform constraints that affect SEO. On-demand revalidation is not optional for sites with expiring content.
5. Programmatic content needs programmatic quality. Templated pages at scale require data-driven unique content, not just name-swapped boilerplate. Every page type needs enough unique content to justify its existence in the index.
6. Internal linking is architecture, not afterthought. At 20,000 pages, you cannot manually link. Build cross-linking into your templates and ensure crawl depth stays under 3 clicks for every page.
The Next.js + Vercel stack is powerful, and it can absolutely support large-scale SEO. But it requires deliberate configuration at every layer, rendering, caching, metadata, structured data, and crawl management. The pitfalls are real, well-documented, and largely avoidable if you know where to look.