Optimizing Your Next.js Sitemap with next-sitemap: A Complete Guide
Table of Contents
- Introduction
- Why Sitemaps Matter for SEO
- The Problem with Default Configurations
- My Optimized Configuration
- Key Configuration Decisions
- 1. Comprehensive Exclusions
- 2. Data-Driven Priority System with Critical Bug Fix
- 3. Robust Dynamic Content Handling with Error Recovery
- 4. Enhanced robots.txt
- Results and Benefits
- SEO Benefits
- Performance Benefits
- Common Pitfalls to Avoid
- 1. Including Build Artifacts
- 2. Ignoring Dynamic Content
- 3. Wrong Priorities
- 4. Missing Error Handling
- 5. CRITICAL: Root Path Matching Bug
- 6. Regex Issues with Apostrophes
- 7. Inconsistent Slug Generation
- Advanced Tips
- 1. Conditional Content Inclusion
- 2. Custom Change Frequencies
- Monitoring and Maintenance
- 1. Google Search Console
- 2. Regular Audits
- 3. Automated Testing
- Debugging and Testing Your Sitemap
- 1. Test Priority Assignments
- 2. Check for Duplicate Entries
- 3. Validate Content Sources
- 4. Manual Sitemap Inspection
- 5. Unit Testing Your Configuration
- Conclusion
- Resources
Introduction
A well-configured sitemap is crucial for SEO success, helping search engines discover and index your content efficiently. However, many Next.js websites have poorly optimized sitemaps that include build artifacts, static assets, and other files that shouldn't be indexed.
In this post, I'll walk you through optimizing your Next.js sitemap using next-sitemap, sharing the exact configuration I use for this website and the reasoning behind each decision.
Why Sitemaps Matter for SEO
Before diving into the technical details, let's understand why sitemaps are essential:
- Content Discovery: Help search engines find all your pages, especially dynamic content
- Crawl Efficiency: Guide crawlers to prioritize important content
- Metadata Communication: Provide information about page importance, update frequency, and modification dates
- Performance: Reduce server load by preventing crawlers from accessing unnecessary files
The Problem with Default Configurations
Most Next.js websites using next-sitemap start with a basic configuration like this:
/** @type {import('next-sitemap').IConfig} */
module.exports = {
siteUrl: 'https://example.com',
generateRobotsTxt: true,
};While this works, it often results in sitemaps that include:
- Next.js build artifacts (
/_next/static/chunks/...) - Image files and static assets
- API routes that shouldn't be indexed
- Component files and internal routes
My Optimized Configuration
Here's the complete next-sitemap.config.js configuration I use for this website:
/** @type {import('next-sitemap').IConfig} */
module.exports = {
siteUrl: 'https://www.yiminyang.dev',
generateIndexSitemap: false,
generateRobotsTxt: true,
exclude: [
'/blocked',
'/blocked/*',
'/api/*',
'/_next/*', // Next.js build artifacts
'/static/*', // Static assets
'*.js', // JavaScript files
'*.css', // CSS files
'*.map', // Source maps
'*.json', // JSON files (manifests, etc.)
'*.ico', // Favicon files
'*.png', // Image files
'*.jpg', // Image files
'*.jpeg', // Image files
'*.gif', // Image files
'*.svg', // SVG files (unless they're pages)
'*.webp', // Image files
'/playground/games/memory-card-game/memorycardgame', // Component file, not page
'/playground/games/whack-a-mole/whack-a-mole', // Component file, not page
'/playground/text-transformations/*/[A-Z]*', // Component files (capitalized)
'/playground/tools/qr-code-generator/[content]/*', // Dynamic route internals
],
robotsTxtOptions: {
policies: [
{
userAgent: '*',
disallow: ['/blocked', '/api', '/_next', '/static'],
allow: ['/playground', '/blog', '/talks'],
},
],
additionalSitemaps: ['https://www.yiminyang.dev/sitemap.xml'],
},
changefreq: 'weekly',
priority: 0.7,
sitemapSize: 5000,
// Custom transformation for specific pages
transform: async (config, path) => {
const pathPriorities = {
main: { paths: ['/', '/about', '/blog', '/talks'], priority: 1.0, changefreq: 'weekly' },
content: { paths: ['/blog/', '/talks/'], priority: 0.8, changefreq: 'monthly' },
playground: { paths: ['/playground/'], priority: 0.6, changefreq: 'monthly' },
};
for (const [, { paths, priority, changefreq }] of Object.entries(pathPriorities)) {
if (
paths.some((p) => {
if (p === '/') {
// Special case: root path only matches exactly
return path === '/';
}
return path === p || (p.endsWith('/') && path.startsWith(p));
})
) {
return {
loc: path,
changefreq,
priority,
lastmod: new Date().toISOString(),
};
}
}
return {
loc: path,
changefreq: config.changefreq,
priority: config.priority,
lastmod: new Date().toISOString(),
};
},
additionalPaths: async () => {
const fs = require('fs');
const path = require('path');
const result = [];
// Helper function to safely process files
const safeProcess = async (description, processor) => {
try {
await processor();
} catch (error) {
console.warn(`Could not load ${description} for sitemap:`, error.message);
}
};
// Add blog posts
await safeProcess('blog posts', () => {
const matter = require('gray-matter');
const postsDir = path.join(process.cwd(), 'public', 'content', 'blog', 'posts');
if (!fs.existsSync(postsDir)) return;
fs.readdirSync(postsDir)
.filter((file) => file.endsWith('.mdx'))
.forEach((file) => {
try {
const filePath = path.join(postsDir, file);
const { data } = matter(fs.readFileSync(filePath, 'utf8'));
if (data.publish !== false) {
const slug = file.replace(/\.mdx$/, '').toLowerCase();
result.push({
loc: `/blog/${slug}`,
changefreq: 'monthly',
priority: 0.8,
lastmod: data.modifiedDate || data.date || new Date().toISOString(),
});
}
} catch (error) {
console.warn(`Could not process blog post ${file}:`, error.message);
}
});
});
// Add talks with improved regex handling
await safeProcess('talks', () => {
const talksPath = path.join(process.cwd(), 'lib', 'data', 'talks.ts');
if (!fs.existsSync(talksPath)) return;
const content = fs.readFileSync(talksPath, 'utf8');
const talkMatches = content.match(/{\s*id:\s*\d+,[\s\S]*?}/g) || [];
talkMatches.forEach((match) => {
try {
const extractField = (field) => {
// Improved regex to handle apostrophes within quoted strings
const doubleQuoteMatch = match.match(new RegExp(`${field}:\\s*"((?:[^"\\\\]|\\\\.)*)"`));
const singleQuoteMatch = match.match(new RegExp(`${field}:\\s*'((?:[^'\\\\]|\\\\.)*)'`));
const backtickMatch = match.match(new RegExp(`${field}:\\s*\`((?:[^\`\\\\]|\\\\.)*)\``));
return doubleQuoteMatch?.[1] || singleQuoteMatch?.[1] || backtickMatch?.[1];
};
const title = extractField('title');
if (!title) return;
const date = extractField('date') || new Date().toISOString();
const slug =
extractField('slug') ||
title
.toLowerCase()
.replace(/[^\w\s-]/g, '') // Remove special characters except spaces and hyphens
.replace(/\s+/g, '-') // Replace spaces with hyphens
.replace(/-+/g, '-') // Replace multiple hyphens with single hyphen
.trim()
.replace(/^-+|-+$/g, ''); // Remove leading/trailing hyphens
result.push({
loc: `/talks/${slug}`,
changefreq: 'monthly',
priority: 0.8,
lastmod: new Date(date).toISOString(),
});
} catch (error) {
console.warn('Could not process talk:', error.message);
}
});
});
return result;
},
};Key Configuration Decisions
1. Comprehensive Exclusions
The exclude array is crucial for keeping your sitemap clean:
exclude: [
'/_next/*', // Next.js build artifacts
'*.js', // JavaScript files
'*.css', // CSS files
'*.png', // Image files
// ... other static assets
];Why this matters: Without these exclusions, your sitemap might include URLs like:
/_next/static/chunks/240a8089e20a3158.js/favicon.ico/apple-icon.png
These files shouldn't be indexed by search engines as they're not content pages.
2. Data-Driven Priority System with Critical Bug Fix
I use a data-driven approach in the transform function that includes a critical fix for root path matching:
transform: async (config, path) => {
const pathPriorities = {
main: { paths: ['/', '/about', '/blog', '/talks'], priority: 1.0, changefreq: 'weekly' },
content: { paths: ['/blog/', '/talks/'], priority: 0.8, changefreq: 'monthly' },
playground: { paths: ['/playground/'], priority: 0.6, changefreq: 'monthly' },
};
for (const [, { paths, priority, changefreq }] of Object.entries(pathPriorities)) {
if (
paths.some((p) => {
if (p === '/') {
// Special case: root path only matches exactly
return path === '/';
}
return path === p || (p.endsWith('/') && path.startsWith(p));
})
) {
return {
loc: path,
changefreq,
priority,
lastmod: new Date().toISOString(),
};
}
}
return {
loc: path,
changefreq: config.changefreq,
priority: config.priority,
lastmod: new Date().toISOString(),
};
};Critical Bug Fix: The special handling for the root path (/) is essential. Without it, the root path would match ALL paths (since every path starts with /), causing incorrect priority assignments. This bug can significantly impact your SEO by giving wrong priorities to your pages.
This data-driven approach reduces code duplication and makes it easy to adjust priorities and change frequencies for different content types.
3. Robust Dynamic Content Handling with Error Recovery
The additionalPaths function uses a sophisticated approach with centralized error handling:
additionalPaths: async () => {
const fs = require('fs');
const path = require('path');
const result = [];
// Helper function to safely process files
const safeProcess = async (description, processor) => {
try {
await processor();
} catch (error) {
console.warn(`Could not load ${description} for sitemap:`, error.message);
}
};
// Add blog posts
await safeProcess('blog posts', () => {
const matter = require('gray-matter');
const postsDir = path.join(process.cwd(), 'public', 'content', 'blog', 'posts');
if (!fs.existsSync(postsDir)) return;
fs.readdirSync(postsDir)
.filter((file) => file.endsWith('.mdx'))
.forEach((file) => {
try {
const filePath = path.join(postsDir, file);
const { data } = matter(fs.readFileSync(filePath, 'utf8'));
if (data.publish !== false) {
const slug = file.replace(/\.mdx$/, '').toLowerCase();
result.push({
loc: `/blog/${slug}`,
changefreq: 'monthly',
priority: 0.8,
lastmod: data.modifiedDate || data.date || new Date().toISOString(),
});
}
} catch (error) {
console.warn(`Could not process blog post ${file}:`, error.message);
}
});
});
// Add talks with improved regex handling
await safeProcess('talks', () => {
const talksPath = path.join(process.cwd(), 'lib', 'data', 'talks.ts');
if (!fs.existsSync(talksPath)) return;
const content = fs.readFileSync(talksPath, 'utf8');
const talkMatches = content.match(/{\s*id:\s*\d+,[\s\S]*?}/g) || [];
talkMatches.forEach((match) => {
try {
const extractField = (field) => {
// Improved regex to handle apostrophes within quoted strings
const doubleQuoteMatch = match.match(new RegExp(`${field}:\\s*"((?:[^"\\\\]|\\\\.)*)"`));
const singleQuoteMatch = match.match(new RegExp(`${field}:\\s*'((?:[^'\\\\]|\\\\.)*)'`));
const backtickMatch = match.match(new RegExp(`${field}:\\s*\`((?:[^\`\\\\]|\\\\.)*)\``));
return doubleQuoteMatch?.[1] || singleQuoteMatch?.[1] || backtickMatch?.[1];
};
const title = extractField('title');
if (!title) return;
const date = extractField('date') || new Date().toISOString();
const slug =
extractField('slug') ||
title
.toLowerCase()
.replace(/[^\w\s-]/g, '') // Remove special characters except spaces and hyphens
.replace(/\s+/g, '-') // Replace spaces with hyphens
.replace(/-+/g, '-') // Replace multiple hyphens with single hyphen
.trim()
.replace(/^-+|-+$/g, ''); // Remove leading/trailing hyphens
result.push({
loc: `/talks/${slug}`,
changefreq: 'monthly',
priority: 0.8,
lastmod: new Date(date).toISOString(),
});
} catch (error) {
console.warn('Could not process talk:', error.message);
}
});
});
return result;
};Key improvements:
safeProcesshelper: Centralizes error handling to prevent build failures- Improved regex: Handles apostrophes within quoted strings correctly
- Consistent slug generation: Uses the same logic as the Next.js app to prevent duplicate entries
- Multiple content sources: Handles both blog posts and talks with appropriate error recovery
4. Enhanced robots.txt
The robotsTxtOptions configuration creates a comprehensive robots.txt:
User-agent: *
Allow: /playground
Allow: /blog
Allow: /talks
Disallow: /blocked
Disallow: /api
Disallow: /_next
Disallow: /static
Sitemap: https://www.yiminyang.dev/sitemap.xmlThis explicitly tells crawlers what to index and what to avoid.
Results and Benefits
After implementing this configuration and fixing critical bugs, my website's sitemap went from including unwanted build artifacts to a clean, focused list of 59 relevant URLs:
- 4 Main pages (priority 1.0)
- 8 Blog posts (priority 0.8)
- 29 Talk pages (priority 0.8)
- 13 Playground tools (priority 0.6)
- 5 Other pages (legal, contact, etc.)
Critical Bug Fixes Applied:
- Fixed root path matching that was causing incorrect priority assignments
- Resolved duplicate entries caused by inconsistent slug generation
- Improved regex handling for apostrophes in titles
SEO Benefits
- Cleaner crawling: Search engines focus on actual content, not build artifacts
- Better prioritization: Important pages get higher priority scores (fixed the root path bug)
- Accurate metadata: Real publication dates instead of build timestamps
- No duplicate entries: Consistent slug generation prevents confusion
- Reduced server load: Fewer unnecessary requests from crawlers
Performance Benefits
- Smaller sitemap files: Only relevant URLs are included
- Faster generation: Efficient exclusion patterns and error handling
- Better caching: Static sitemap generation during build
- Robust error recovery: Build doesn't fail if content sources are unavailable
Common Pitfalls to Avoid
1. Including Build Artifacts
Always exclude /_next/* and static assets. These files change with every build and shouldn't be indexed.
2. Ignoring Dynamic Content
Don't forget to handle dynamic routes like [slug] pages. Use additionalPaths to include them.
3. Wrong Priorities
Avoid giving all pages the same priority. Use a tiered system that reflects your content hierarchy.
4. Missing Error Handling
Always wrap dynamic content discovery in try-catch blocks to prevent build failures.
5. CRITICAL: Root Path Matching Bug
One of the most dangerous bugs in sitemap configurations is improper root path handling. Here's the complete context:
The Problem: Without special handling, the root path / will match ALL paths because every path starts with /. Here's what happens:
// This logic has a fatal flaw
const pathPriorities = {
main: { paths: ['/', '/about', '/blog'], priority: 1.0 },
content: { paths: ['/blog/', '/talks/'], priority: 0.8 },
};
for (const [, { paths, priority }] of Object.entries(pathPriorities)) {
if (paths.some((p) => path.startsWith(p))) {
// BUG IS HERE!
return { priority };
}
}
// What happens:
// path = '/blog/my-post'
// '/blog/my-post'.startsWith('/') → true (WRONG!)
// Root path '/' matches everything, so ALL pages get priority 1.0The Fix: Handle the root path as a special case that only matches exactly:
// Special case for root path
for (const [, { paths, priority, changefreq }] of Object.entries(pathPriorities)) {
if (
paths.some((p) => {
if (p === '/') {
// Special case: root path only matches exactly
return path === '/';
}
// For other paths, use normal prefix matching
return path === p || (p.endsWith('/') && path.startsWith(p));
})
) {
return {
loc: path,
changefreq,
priority,
lastmod: new Date().toISOString(),
};
}
}
// Now it works correctly:
// path = '/' → matches '/' exactly → priority 1.0 ✓
// path = '/blog/my-post' → doesn't match '/' → continues to check '/blog/' → priority 0.8 ✓Why this matters: Without this fix, ALL your pages would get the wrong priority (usually the first one in your list), which can severely impact SEO rankings.
6. Regex Issues with Apostrophes
When parsing dynamic content, simple regex patterns can break on apostrophes:
// Breaks on "Here's How"
const match = content.match(/title:\s*['"`](.*?)['"`]/);The fix: Use separate patterns for each quote type:
const doubleQuoteMatch = match.match(/title:\s*"((?:[^"\\]|\\.)*)"/);
const singleQuoteMatch = match.match(/title:\s*'((?:[^'\\]|\\.)*)'/);
const backtickMatch = match.match(/title:\s*`((?:[^`\\]|\\.)*)`/);7. Inconsistent Slug Generation
If your sitemap generates slugs differently than your Next.js app, you'll get duplicate entries:
// Simple replacement
const slug = title.toLowerCase().replace(/[^a-z0-9]+/g, '-');
// Match your app's logic
const slug = title
.toLowerCase()
.replace(/[^\w\s-]/g, '') // Remove special characters except spaces and hyphens
.replace(/\s+/g, '-') // Replace spaces with hyphens
.replace(/-+/g, '-') // Replace multiple hyphens with single hyphen
.trim()
.replace(/^-+|-+$/g, ''); // Remove leading/trailing hyphensAdvanced Tips
1. Conditional Content Inclusion
if (data.publish !== false && !data.draft) {
result.push({
loc: `/blog/${slug}`,
// ... rest of config
});
}2. Custom Change Frequencies
// More frequent updates for time-sensitive content
if (path.includes('/news/')) {
return {
changefreq: 'daily',
priority: 0.9,
};
}Monitoring and Maintenance
1. Google Search Console
Submit your sitemap to Google Search Console and monitor:
- Index coverage
- Crawl errors
- Sitemap processing status
2. Regular Audits
Periodically check your sitemap for:
- Unwanted URLs
- Missing important pages
- Incorrect priorities or dates
3. Automated Testing
Consider adding tests to verify your sitemap configuration:
// Example test
test('sitemap excludes build artifacts', () => {
const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
expect(sitemap).not.toContain('/_next/');
expect(sitemap).not.toContain('.js');
});Debugging and Testing Your Sitemap
Based on the critical bugs I discovered in my own configuration, here's how to properly test your sitemap:
1. Test Priority Assignments
Create a simple test to verify your transform function works correctly:
// Test your transform function
const testPaths = ['/', '/blog', '/blog/test-post', '/playground/tool'];
testPaths.forEach(async (path) => {
const result = await transform(config, path);
console.log(`${path}: priority ${result.priority}, changefreq ${result.changefreq}`);
});Expected output:
/: priority 1.0, changefreq weekly/blog: priority 1.0, changefreq weekly/blog/test-post: priority 0.8, changefreq monthly/playground/tool: priority 0.6, changefreq monthly
2. Check for Duplicate Entries
// Check for duplicates in your sitemap
const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
const urls = sitemap.match(/<loc>(.*?)<\/loc>/g) || [];
const uniqueUrls = new Set(urls);
if (urls.length !== uniqueUrls.size) {
console.error('Duplicate URLs found in sitemap!');
// Find duplicates
const duplicates = urls.filter((url, index) => urls.indexOf(url) !== index);
console.log('Duplicates:', duplicates);
}3. Validate Content Sources
Test that your dynamic content discovery works:
// Test your additionalPaths function
const paths = await additionalPaths();
console.log(`Found ${paths.length} dynamic paths`);
paths.forEach((path) => {
console.log(`${path.loc}: priority ${path.priority}`);
});4. Manual Sitemap Inspection
Always manually review your generated public/sitemap.xml:
- Check URL count: Does it match your expectations?
- Verify priorities: Are main pages getting priority 1.0?
- Look for unwanted URLs: Any build artifacts or component files?
- Check for duplicates: Same content with different URLs?
5. Unit Testing Your Configuration
Here's a comprehensive test suite for your sitemap config:
describe('Sitemap Configuration', () => {
test('excludes build artifacts', () => {
const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
expect(sitemap).not.toContain('/_next/');
expect(sitemap).not.toContain('.js');
expect(sitemap).not.toContain('.css');
});
test('includes main pages with correct priority', () => {
const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
expect(sitemap).toContain('<priority>1.0</priority>');
});
test('no duplicate URLs', () => {
const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
const urls = sitemap.match(/<loc>(.*?)<\/loc>/g) || [];
const uniqueUrls = new Set(urls);
expect(urls.length).toBe(uniqueUrls.size);
});
test('root path gets correct priority', async () => {
const result = await transform(config, '/');
expect(result.priority).toBe(1.0);
expect(result.changefreq).toBe('weekly');
});
test('blog posts get correct priority', async () => {
const result = await transform(config, '/blog/test-post');
expect(result.priority).toBe(0.8);
expect(result.changefreq).toBe('monthly');
});
});Conclusion
A well-configured sitemap is a powerful SEO tool that helps search engines understand and index your content effectively. By excluding unwanted files, setting appropriate priorities, and handling dynamic content properly, you can significantly improve your website's search engine visibility.
The configuration I've shared has helped this website maintain clean, focused sitemaps that guide search engines to the most important content while avoiding unnecessary crawling of build artifacts and static assets.
Remember to regularly audit your sitemap and adjust the configuration as your website evolves. What works for one site might need tweaking for another, but the principles remain the same: keep it clean, prioritize correctly, and focus on content that matters to your users.