Why Sitemaps Matter More Than You Think

A sitemap is not just an SEO checkbox. It's a direct communication channel with search engine crawlers. When Google's bot visits your site, it looks for /sitemap.xml first to understand the site structure, discover new pages, and prioritize crawling. Without a sitemap, Google relies on following links — which means orphan pages (pages with no internal links) never get indexed.

JekCMS generates sitemaps dynamically. No static XML files to maintain. Every new post, category, or page is automatically included within minutes.

XML Sitemap Structure

A valid XML sitemap follows the sitemaps.org protocol. Each URL entry includes the location, last modification date, change frequency hint, and priority hint:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://example.com/blog/my-post</loc>
        <lastmod>2026-03-15</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

JekCMS generates separate sitemaps for different content types: sitemap-posts.xml, sitemap-pages.xml, sitemap-categories.xml, and sitemap-tags.xml. A master sitemap.xml index links to all of them.

Dynamic Generation in PHP

// includes/sitemap-handler.php
header('Content-Type: application/xml; charset=utf-8');

$uri = $_SERVER['REQUEST_URI'];
$type = '';
if (strpos($uri, 'sitemap-posts') !== false) $type = 'posts';
elseif (strpos($uri, 'sitemap-pages') !== false) $type = 'pages';
elseif (strpos($uri, 'sitemap-categories') !== false) $type = 'categories';

echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

if ($type === 'posts') {
    $posts = $db->fetchAll(
        "SELECT slug, updated_at FROM posts WHERE status='published' ORDER BY updated_at DESC"
    );
    foreach ($posts as $post) {
        echo "<url><loc>" . SITE_URL . "/blog/" . htmlspecialchars($post['slug']) . "</loc>";
        echo "<lastmod>" . date('Y-m-d', strtotime($post['updated_at'])) . "</lastmod></url>";
    }
}

echo '</urlset>';

Image Sitemap

Google Images is a significant traffic source. Adding image information to your sitemap helps Google index your images correctly:

<url>
    <loc>https://example.com/blog/my-post</loc>
    <image:image>
        <image:loc>https://example.com/uploads/images/photo.avif</image:loc>
        <image:title>Photo description</image:title>
    </image:image>
</url>

robots.txt

The robots.txt file tells crawlers which parts of your site to crawl and which to ignore:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /includes/
Disallow: /config/
Disallow: /cache/

Sitemap: https://example.com/sitemap.xml

Critical: the Sitemap directive must use an absolute URL. Relative URLs are invalid and Google will ignore them.

Crawl Budget Optimization

For sites with thousands of pages, crawl budget matters. Google allocates a finite number of crawls per day. Optimize by: blocking non-essential pages in robots.txt, using noindex meta tags for thin content, keeping sitemap URLs under 50,000 per file (Google's limit), and serving fast responses (slow sites get crawled less).

After fixing robots.txt on three of our sites (blocking /admin/, /api/, /cache/), we saw a 40% increase in organic page crawls — Google was spending crawl budget on admin pages that should never be indexed.