Optimizing Your Next.js Sitemap with next-sitemap: A Complete Guide

Introduction

A well-configured sitemap is crucial for SEO success, helping search engines discover and index your content efficiently. However, many Next.js websites have poorly optimized sitemaps that include build artifacts, static assets, and other files that shouldn't be indexed.

In this post, I'll walk you through optimizing your Next.js sitemap using next-sitemap, sharing the exact configuration I use for this website and the reasoning behind each decision.

Why Sitemaps Matter for SEO

Before diving into the technical details, let's understand why sitemaps are essential:

Content Discovery: Help search engines find all your pages, especially dynamic content
Crawl Efficiency: Guide crawlers to prioritize important content
Metadata Communication: Provide information about page importance, update frequency, and modification dates
Performance: Reduce server load by preventing crawlers from accessing unnecessary files

The Problem with Default Configurations

Most Next.js websites using next-sitemap start with a basic configuration like this:

javascript

/** @type {import('next-sitemap').IConfig} */
module.exports = {
    siteUrl: 'https://example.com',
    generateRobotsTxt: true,
};

While this works, it often results in sitemaps that include:

Next.js build artifacts (/_next/static/chunks/...)
Image files and static assets
API routes that shouldn't be indexed
Component files and internal routes

My Optimized Configuration

Here's the complete next-sitemap.config.js configuration I use for this website:

javascript

/** @type {import('next-sitemap').IConfig} */
module.exports = {
    siteUrl: 'https://www.yiminyang.dev',
    generateIndexSitemap: false,
    generateRobotsTxt: true,
    exclude: [
        '/blocked',
        '/blocked/*',
        '/api/*',
        '/_next/*', // Next.js build artifacts
        '/static/*', // Static assets
        '*.js', // JavaScript files
        '*.css', // CSS files
        '*.map', // Source maps
        '*.json', // JSON files (manifests, etc.)
        '*.ico', // Favicon files
        '*.png', // Image files
        '*.jpg', // Image files
        '*.jpeg', // Image files
        '*.gif', // Image files
        '*.svg', // SVG files (unless they're pages)
        '*.webp', // Image files
        '/playground/games/memory-card-game/memorycardgame', // Component file, not page
        '/playground/games/whack-a-mole/whack-a-mole', // Component file, not page
        '/playground/text-transformations/*/[A-Z]*', // Component files (capitalized)
        '/playground/tools/qr-code-generator/[content]/*', // Dynamic route internals
    ],
    robotsTxtOptions: {
        policies: [
            {
                userAgent: '*',
                disallow: ['/blocked', '/api', '/_next', '/static'],
                allow: ['/playground', '/blog', '/talks'],
            },
        ],
        additionalSitemaps: ['https://www.yiminyang.dev/sitemap.xml'],
    },
    changefreq: 'weekly',
    priority: 0.7,
    sitemapSize: 5000,
    // Custom transformation for specific pages
    transform: async (config, path) => {
        const pathPriorities = {
            main: { paths: ['/', '/about', '/blog', '/talks'], priority: 1.0, changefreq: 'weekly' },
            content: { paths: ['/blog/', '/talks/'], priority: 0.8, changefreq: 'monthly' },
            playground: { paths: ['/playground/'], priority: 0.6, changefreq: 'monthly' },
        };

        for (const [, { paths, priority, changefreq }] of Object.entries(pathPriorities)) {
            if (
                paths.some((p) => {
                    if (p === '/') {
                        // Special case: root path only matches exactly
                        return path === '/';
                    }
                    return path === p || (p.endsWith('/') && path.startsWith(p));
                })
            ) {
                return {
                    loc: path,
                    changefreq,
                    priority,
                    lastmod: new Date().toISOString(),
                };
            }
        }

        return {
            loc: path,
            changefreq: config.changefreq,
            priority: config.priority,
            lastmod: new Date().toISOString(),
        };
    },
    additionalPaths: async () => {
        const fs = require('fs');
        const path = require('path');
        const result = [];

        // Helper function to safely process files
        const safeProcess = async (description, processor) => {
            try {
                await processor();
            } catch (error) {
                console.warn(`Could not load ${description} for sitemap:`, error.message);
            }
        };

        // Add blog posts
        await safeProcess('blog posts', () => {
            const matter = require('gray-matter');
            const postsDir = path.join(process.cwd(), 'public', 'content', 'blog', 'posts');

            if (!fs.existsSync(postsDir)) return;

            fs.readdirSync(postsDir)
                .filter((file) => file.endsWith('.mdx'))
                .forEach((file) => {
                    try {
                        const filePath = path.join(postsDir, file);
                        const { data } = matter(fs.readFileSync(filePath, 'utf8'));

                        if (data.publish !== false) {
                            const slug = file.replace(/\.mdx$/, '').toLowerCase();
                            result.push({
                                loc: `/blog/${slug}`,
                                changefreq: 'monthly',
                                priority: 0.8,
                                lastmod: data.modifiedDate || data.date || new Date().toISOString(),
                            });
                        }
                    } catch (error) {
                        console.warn(`Could not process blog post ${file}:`, error.message);
                    }
                });
        });

        // Add talks with improved regex handling
        await safeProcess('talks', () => {
            const talksPath = path.join(process.cwd(), 'lib', 'data', 'talks.ts');

            if (!fs.existsSync(talksPath)) return;

            const content = fs.readFileSync(talksPath, 'utf8');
            const talkMatches = content.match(/{\s*id:\s*\d+,[\s\S]*?}/g) || [];

            talkMatches.forEach((match) => {
                try {
                    const extractField = (field) => {
                        // Improved regex to handle apostrophes within quoted strings
                        const doubleQuoteMatch = match.match(new RegExp(`${field}:\\s*"((?:[^"\\\\]|\\\\.)*)"`));
                        const singleQuoteMatch = match.match(new RegExp(`${field}:\\s*'((?:[^'\\\\]|\\\\.)*)'`));
                        const backtickMatch = match.match(new RegExp(`${field}:\\s*\`((?:[^\`\\\\]|\\\\.)*)\``));

                        return doubleQuoteMatch?.[1] || singleQuoteMatch?.[1] || backtickMatch?.[1];
                    };

                    const title = extractField('title');
                    if (!title) return;

                    const date = extractField('date') || new Date().toISOString();
                    const slug =
                        extractField('slug') ||
                        title
                            .toLowerCase()
                            .replace(/[^\w\s-]/g, '') // Remove special characters except spaces and hyphens
                            .replace(/\s+/g, '-') // Replace spaces with hyphens
                            .replace(/-+/g, '-') // Replace multiple hyphens with single hyphen
                            .trim()
                            .replace(/^-+|-+$/g, ''); // Remove leading/trailing hyphens

                    result.push({
                        loc: `/talks/${slug}`,
                        changefreq: 'monthly',
                        priority: 0.8,
                        lastmod: new Date(date).toISOString(),
                    });
                } catch (error) {
                    console.warn('Could not process talk:', error.message);
                }
            });
        });

        return result;
    },
};

Key Configuration Decisions

1. Comprehensive Exclusions

The exclude array is crucial for keeping your sitemap clean:

javascript

exclude: [
    '/_next/*', // Next.js build artifacts
    '*.js', // JavaScript files
    '*.css', // CSS files
    '*.png', // Image files
    // ... other static assets
];

Why this matters: Without these exclusions, your sitemap might include URLs like:

/_next/static/chunks/240a8089e20a3158.js
/favicon.ico
/apple-icon.png

These files shouldn't be indexed by search engines as they're not content pages.

2. Data-Driven Priority System with Critical Bug Fix

I use a data-driven approach in the transform function that includes a critical fix for root path matching:

javascript

transform: async (config, path) => {
    const pathPriorities = {
        main: { paths: ['/', '/about', '/blog', '/talks'], priority: 1.0, changefreq: 'weekly' },
        content: { paths: ['/blog/', '/talks/'], priority: 0.8, changefreq: 'monthly' },
        playground: { paths: ['/playground/'], priority: 0.6, changefreq: 'monthly' },
    };

    for (const [, { paths, priority, changefreq }] of Object.entries(pathPriorities)) {
        if (
            paths.some((p) => {
                if (p === '/') {
                    // Special case: root path only matches exactly
                    return path === '/';
                }
                return path === p || (p.endsWith('/') && path.startsWith(p));
            })
        ) {
            return {
                loc: path,
                changefreq,
                priority,
                lastmod: new Date().toISOString(),
            };
        }
    }

    return {
        loc: path,
        changefreq: config.changefreq,
        priority: config.priority,
        lastmod: new Date().toISOString(),
    };
};

Critical Bug Fix: The special handling for the root path (/) is essential. Without it, the root path would match ALL paths (since every path starts with /), causing incorrect priority assignments. This bug can significantly impact your SEO by giving wrong priorities to your pages.

This data-driven approach reduces code duplication and makes it easy to adjust priorities and change frequencies for different content types.

3. Robust Dynamic Content Handling with Error Recovery

The additionalPaths function uses a sophisticated approach with centralized error handling:

javascript

additionalPaths: async () => {
    const fs = require('fs');
    const path = require('path');
    const result = [];

    // Helper function to safely process files
    const safeProcess = async (description, processor) => {
        try {
            await processor();
        } catch (error) {
            console.warn(`Could not load ${description} for sitemap:`, error.message);
        }
    };

    // Add blog posts
    await safeProcess('blog posts', () => {
        const matter = require('gray-matter');
        const postsDir = path.join(process.cwd(), 'public', 'content', 'blog', 'posts');

        if (!fs.existsSync(postsDir)) return;

        fs.readdirSync(postsDir)
            .filter((file) => file.endsWith('.mdx'))
            .forEach((file) => {
                try {
                    const filePath = path.join(postsDir, file);
                    const { data } = matter(fs.readFileSync(filePath, 'utf8'));

                    if (data.publish !== false) {
                        const slug = file.replace(/\.mdx$/, '').toLowerCase();
                        result.push({
                            loc: `/blog/${slug}`,
                            changefreq: 'monthly',
                            priority: 0.8,
                            lastmod: data.modifiedDate || data.date || new Date().toISOString(),
                        });
                    }
                } catch (error) {
                    console.warn(`Could not process blog post ${file}:`, error.message);
                }
            });
    });

    // Add talks with improved regex handling
    await safeProcess('talks', () => {
        const talksPath = path.join(process.cwd(), 'lib', 'data', 'talks.ts');

        if (!fs.existsSync(talksPath)) return;

        const content = fs.readFileSync(talksPath, 'utf8');
        const talkMatches = content.match(/{\s*id:\s*\d+,[\s\S]*?}/g) || [];

        talkMatches.forEach((match) => {
            try {
                const extractField = (field) => {
                    // Improved regex to handle apostrophes within quoted strings
                    const doubleQuoteMatch = match.match(new RegExp(`${field}:\\s*"((?:[^"\\\\]|\\\\.)*)"`));
                    const singleQuoteMatch = match.match(new RegExp(`${field}:\\s*'((?:[^'\\\\]|\\\\.)*)'`));
                    const backtickMatch = match.match(new RegExp(`${field}:\\s*\`((?:[^\`\\\\]|\\\\.)*)\``));

                    return doubleQuoteMatch?.[1] || singleQuoteMatch?.[1] || backtickMatch?.[1];
                };

                const title = extractField('title');
                if (!title) return;

                const date = extractField('date') || new Date().toISOString();
                const slug =
                    extractField('slug') ||
                    title
                        .toLowerCase()
                        .replace(/[^\w\s-]/g, '') // Remove special characters except spaces and hyphens
                        .replace(/\s+/g, '-') // Replace spaces with hyphens
                        .replace(/-+/g, '-') // Replace multiple hyphens with single hyphen
                        .trim()
                        .replace(/^-+|-+$/g, ''); // Remove leading/trailing hyphens

                result.push({
                    loc: `/talks/${slug}`,
                    changefreq: 'monthly',
                    priority: 0.8,
                    lastmod: new Date(date).toISOString(),
                });
            } catch (error) {
                console.warn('Could not process talk:', error.message);
            }
        });
    });

    return result;
};

Key improvements:

safeProcess helper: Centralizes error handling to prevent build failures
Improved regex: Handles apostrophes within quoted strings correctly
Consistent slug generation: Uses the same logic as the Next.js app to prevent duplicate entries
Multiple content sources: Handles both blog posts and talks with appropriate error recovery

4. Enhanced robots.txt

The robotsTxtOptions configuration creates a comprehensive robots.txt:

txt

User-agent: *
Allow: /playground
Allow: /blog
Allow: /talks
Disallow: /blocked
Disallow: /api
Disallow: /_next
Disallow: /static

Sitemap: https://www.yiminyang.dev/sitemap.xml

This explicitly tells crawlers what to index and what to avoid.

Results and Benefits

After implementing this configuration and fixing critical bugs, my website's sitemap went from including unwanted build artifacts to a clean, focused list of 59 relevant URLs:

4 Main pages (priority 1.0)
8 Blog posts (priority 0.8)
29 Talk pages (priority 0.8)
13 Playground tools (priority 0.6)
5 Other pages (legal, contact, etc.)

Critical Bug Fixes Applied:

Fixed root path matching that was causing incorrect priority assignments
Resolved duplicate entries caused by inconsistent slug generation
Improved regex handling for apostrophes in titles

SEO Benefits

Cleaner crawling: Search engines focus on actual content, not build artifacts
Better prioritization: Important pages get higher priority scores (fixed the root path bug)
Accurate metadata: Real publication dates instead of build timestamps
No duplicate entries: Consistent slug generation prevents confusion
Reduced server load: Fewer unnecessary requests from crawlers

Performance Benefits

Smaller sitemap files: Only relevant URLs are included
Faster generation: Efficient exclusion patterns and error handling
Better caching: Static sitemap generation during build
Robust error recovery: Build doesn't fail if content sources are unavailable

Common Pitfalls to Avoid

1. Including Build Artifacts

Always exclude /_next/* and static assets. These files change with every build and shouldn't be indexed.

2. Ignoring Dynamic Content

Don't forget to handle dynamic routes like [slug] pages. Use additionalPaths to include them.

3. Wrong Priorities

Avoid giving all pages the same priority. Use a tiered system that reflects your content hierarchy.

4. Missing Error Handling

Always wrap dynamic content discovery in try-catch blocks to prevent build failures.

5. CRITICAL: Root Path Matching Bug

One of the most dangerous bugs in sitemap configurations is improper root path handling. Here's the complete context:

The Problem: Without special handling, the root path / will match ALL paths because every path starts with /. Here's what happens:

javascript

// This logic has a fatal flaw
const pathPriorities = {
    main: { paths: ['/', '/about', '/blog'], priority: 1.0 },
    content: { paths: ['/blog/', '/talks/'], priority: 0.8 },
};

for (const [, { paths, priority }] of Object.entries(pathPriorities)) {
    if (paths.some((p) => path.startsWith(p))) {
        // BUG IS HERE!
        return { priority };
    }
}

// What happens:
// path = '/blog/my-post'
// '/blog/my-post'.startsWith('/') → true (WRONG!)
// Root path '/' matches everything, so ALL pages get priority 1.0

The Fix: Handle the root path as a special case that only matches exactly:

javascript

// Special case for root path
for (const [, { paths, priority, changefreq }] of Object.entries(pathPriorities)) {
    if (
        paths.some((p) => {
            if (p === '/') {
                // Special case: root path only matches exactly
                return path === '/';
            }
            // For other paths, use normal prefix matching
            return path === p || (p.endsWith('/') && path.startsWith(p));
        })
    ) {
        return {
            loc: path,
            changefreq,
            priority,
            lastmod: new Date().toISOString(),
        };
    }
}

// Now it works correctly:
// path = '/' → matches '/' exactly → priority 1.0 ✓
// path = '/blog/my-post' → doesn't match '/' → continues to check '/blog/' → priority 0.8 ✓

Why this matters: Without this fix, ALL your pages would get the wrong priority (usually the first one in your list), which can severely impact SEO rankings.

6. Regex Issues with Apostrophes

When parsing dynamic content, simple regex patterns can break on apostrophes:

javascript

// Breaks on "Here's How"
const match = content.match(/title:\s*['"`](.*?)['"`]/);

The fix: Use separate patterns for each quote type:

javascript

const doubleQuoteMatch = match.match(/title:\s*"((?:[^"\\]|\\.)*)"/);
const singleQuoteMatch = match.match(/title:\s*'((?:[^'\\]|\\.)*)'/);
const backtickMatch = match.match(/title:\s*`((?:[^`\\]|\\.)*)`/);

7. Inconsistent Slug Generation

If your sitemap generates slugs differently than your Next.js app, you'll get duplicate entries:

javascript

// Simple replacement
const slug = title.toLowerCase().replace(/[^a-z0-9]+/g, '-');

// Match your app's logic
const slug = title
    .toLowerCase()
    .replace(/[^\w\s-]/g, '') // Remove special characters except spaces and hyphens
    .replace(/\s+/g, '-') // Replace spaces with hyphens
    .replace(/-+/g, '-') // Replace multiple hyphens with single hyphen
    .trim()
    .replace(/^-+|-+$/g, ''); // Remove leading/trailing hyphens

Advanced Tips

1. Conditional Content Inclusion

javascript

if (data.publish !== false && !data.draft) {
    result.push({
        loc: `/blog/${slug}`,
        // ... rest of config
    });
}

2. Custom Change Frequencies

javascript

// More frequent updates for time-sensitive content
if (path.includes('/news/')) {
    return {
        changefreq: 'daily',
        priority: 0.9,
    };
}

Monitoring and Maintenance

1. Google Search Console

Submit your sitemap to Google Search Console and monitor:

Index coverage
Crawl errors
Sitemap processing status

2. Regular Audits

Periodically check your sitemap for:

Unwanted URLs
Missing important pages
Incorrect priorities or dates

3. Automated Testing

Consider adding tests to verify your sitemap configuration:

javascript

// Example test
test('sitemap excludes build artifacts', () => {
    const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
    expect(sitemap).not.toContain('/_next/');
    expect(sitemap).not.toContain('.js');
});

Debugging and Testing Your Sitemap

Based on the critical bugs I discovered in my own configuration, here's how to properly test your sitemap:

1. Test Priority Assignments

Create a simple test to verify your transform function works correctly:

javascript

// Test your transform function
const testPaths = ['/', '/blog', '/blog/test-post', '/playground/tool'];
testPaths.forEach(async (path) => {
    const result = await transform(config, path);
    console.log(`${path}: priority ${result.priority}, changefreq ${result.changefreq}`);
});

Expected output:

/: priority 1.0, changefreq weekly
/blog: priority 1.0, changefreq weekly
/blog/test-post: priority 0.8, changefreq monthly
/playground/tool: priority 0.6, changefreq monthly

2. Check for Duplicate Entries

javascript

// Check for duplicates in your sitemap
const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
const urls = sitemap.match(/<loc>(.*?)<\/loc>/g) || [];
const uniqueUrls = new Set(urls);

if (urls.length !== uniqueUrls.size) {
    console.error('Duplicate URLs found in sitemap!');
    // Find duplicates
    const duplicates = urls.filter((url, index) => urls.indexOf(url) !== index);
    console.log('Duplicates:', duplicates);
}

3. Validate Content Sources

Test that your dynamic content discovery works:

javascript

// Test your additionalPaths function
const paths = await additionalPaths();
console.log(`Found ${paths.length} dynamic paths`);
paths.forEach((path) => {
    console.log(`${path.loc}: priority ${path.priority}`);
});

4. Manual Sitemap Inspection

Always manually review your generated public/sitemap.xml:

Check URL count: Does it match your expectations?
Verify priorities: Are main pages getting priority 1.0?
Look for unwanted URLs: Any build artifacts or component files?
Check for duplicates: Same content with different URLs?

5. Unit Testing Your Configuration

Here's a comprehensive test suite for your sitemap config:

javascript

describe('Sitemap Configuration', () => {
    test('excludes build artifacts', () => {
        const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
        expect(sitemap).not.toContain('/_next/');
        expect(sitemap).not.toContain('.js');
        expect(sitemap).not.toContain('.css');
    });

    test('includes main pages with correct priority', () => {
        const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
        expect(sitemap).toContain('<priority>1.0</priority>');
    });

    test('no duplicate URLs', () => {
        const sitemap = fs.readFileSync('public/sitemap.xml', 'utf8');
        const urls = sitemap.match(/<loc>(.*?)<\/loc>/g) || [];
        const uniqueUrls = new Set(urls);
        expect(urls.length).toBe(uniqueUrls.size);
    });

    test('root path gets correct priority', async () => {
        const result = await transform(config, '/');
        expect(result.priority).toBe(1.0);
        expect(result.changefreq).toBe('weekly');
    });

    test('blog posts get correct priority', async () => {
        const result = await transform(config, '/blog/test-post');
        expect(result.priority).toBe(0.8);
        expect(result.changefreq).toBe('monthly');
    });
});

Conclusion

A well-configured sitemap is a powerful SEO tool that helps search engines understand and index your content effectively. By excluding unwanted files, setting appropriate priorities, and handling dynamic content properly, you can significantly improve your website's search engine visibility.

The configuration I've shared has helped this website maintain clean, focused sitemaps that guide search engines to the most important content while avoiding unnecessary crawling of build artifacts and static assets.

Remember to regularly audit your sitemap and adjust the configuration as your website evolves. What works for one site might need tweaking for another, but the principles remain the same: keep it clean, prioritize correctly, and focus on content that matters to your users.