A deep walkthrough of the JekCMS media pipeline — from upload validation and EXIF stripping to AVIF/WebP conversion, four-size thumbnail generation, picture element output, and orphaned file cleanup.
The Problem With Unmanaged Media
Six months into running our first batch of JekCMS sites, the uploads folder on one installation had grown to 4.2 GB. The site had maybe 800 posts. That is 5.25 MB per post on average, which is absurd for a blog. The culprit was obvious: authors were uploading 4000x3000 JPEG photos straight from their phones, and the system stored them at full resolution.
The same image served as a 200x200 thumbnail in the sidebar, a 400px card image on the homepage, and an 800px hero on the post page — all from the same 4.8 MB source file. Every page load transferred 10-15x more data than necessary.
That is when we built the media pipeline. Every image that enters JekCMS now goes through validation, EXIF stripping, format conversion, and multi-size thumbnail generation before a single byte hits permanent storage.
Upload Flow: What Happens When You Hit Submit
The upload process has five distinct stages. Understanding them helps you debug issues when something goes wrong.
Stage 1: Validation
Before any processing begins, the uploaded file goes through validation checks:
private function validateUpload(array $file): void
{
// 1. Check PHP upload errors
if ($file['error'] !== UPLOAD_ERR_OK) {
throw new MediaException($this->uploadErrorMessage($file['error']));
}
// 2. Check file size (max 10MB for images)
$maxSize = 10 * 1024 * 1024;
if ($file['size'] > $maxSize) {
throw new MediaException('File exceeds 10MB limit');
}
// 3. Verify MIME type from file content, not extension
$finfo = new finfo(FILEINFO_MIME_TYPE);
$mime = $finfo->file($file['tmp_name']);
$allowed = ['image/jpeg', 'image/png', 'image/gif', 'image/webp', 'image/avif'];
if (!in_array($mime, $allowed)) {
throw new MediaException('File type not allowed: ' . $mime);
}
// 4. Verify it is actually a valid image
$info = getimagesize($file['tmp_name']);
if ($info === false) {
throw new MediaException('File is not a valid image');
}
// 5. Check for embedded PHP code (security)
$content = file_get_contents($file['tmp_name']);
if (preg_match('/
The MIME type check uses finfo to read the actual file header, not the extension or the browser-provided content type. Someone renaming malware.php to malware.jpg fails this check because the file header will not match any image MIME type.
The PHP code scan is a basic defense against embedded PHP in image files — a known attack vector where a valid JPEG has PHP code appended to it. If the server is misconfigured to execute PHP in the uploads directory, this file could run arbitrary code. Our .htaccess also blocks PHP execution in uploads, but defense in depth is the right approach.
Stage 2: EXIF Stripping
JPEG images from phones carry EXIF metadata: GPS coordinates, device model, date/time, and sometimes the photographer's name. Publishing this data is a privacy risk. A photo of your office reveals your exact location to anyone who downloads it and checks the EXIF.
private function stripExifData(string $sourcePath, string $mime): string
{
if ($mime !== 'image/jpeg') {
return $sourcePath;
}
$image = imagecreatefromjpeg($sourcePath);
if ($image === false) {
return $sourcePath;
}
$stripped = tempnam(sys_get_temp_dir(), 'exif_');
imagejpeg($image, $stripped, 95);
imagedestroy($image);
return $stripped;
}
The approach is straightforward: load the JPEG with GD and re-save it. GD does not preserve EXIF data, so the re-saved file is clean. We keep 95% quality at this step because the image will be re-encoded during format conversion anyway — no need to compress twice.
Stage 3: Format Conversion
JekCMS converts all uploaded images to AVIF as the primary format, with WebP as a fallback. The original file is discarded after successful conversion.
Why AVIF first? Because compression is measurably better. In our tests across 500 images from real JekCMS sites, AVIF at quality 80 produced files 38% smaller than WebP at quality 85, with comparable visual quality. That is not marginal — it is more than a third less bandwidth per image.
private function convertToModernFormats(string $sourcePath, string $destBase): array
{
$image = $this->loadImage($sourcePath);
$results = [];
// Try AVIF first (best compression)
if (function_exists('imageavif')) {
$avifPath = $destBase . '.avif';
imageavif($image, $avifPath, 80);
$results['avif'] = $avifPath;
}
// WebP fallback (wider browser support)
if (function_exists('imagewebp')) {
$webpPath = $destBase . '.webp';
imagewebp($image, $webpPath, 85);
$results['webp'] = $webpPath;
}
imagedestroy($image);
if (empty($results)) {
$results['original'] = $sourcePath;
}
return $results;
}
Note the quality values: AVIF at 80, WebP at 85. I tested quality values from 50 to 100 in increments of 5 across 100 photos. Quality 80 AVIF and 85 WebP were the sweet spots where reducing quality further produced visible artifacts while savings became marginal.
Stage 4: Thumbnail Generation
Each uploaded image produces four thumbnail sizes:
- thumbnail — 400x400, center-cropped. Used in admin panels, small widgets, and author avatars.
- medium — 800x800, proportionally scaled (no crop). The workhorse size for post cards and listing pages.
- large — 1600x1600, proportionally scaled. Used for the main content area on single post pages.
- pinterest — 1000x1500, center-cropped at 2:3 ratio. Specifically for Pinterest sharing where vertical images get 2-3x more engagement.
private function generateThumbnails(string $sourcePath, string $destBase): array
{
$sizes = [
'thumbnail' => ['width' => 400, 'height' => 400, 'crop' => true],
'medium' => ['width' => 800, 'height' => 800, 'crop' => false],
'large' => ['width' => 1600, 'height' => 1600, 'crop' => false],
'pinterest' => ['width' => 1000, 'height' => 1500, 'crop' => true],
];
$results = [];
$source = $this->loadImage($sourcePath);
$srcW = imagesx($source);
$srcH = imagesy($source);
foreach ($sizes as $name => $config) {
if ($srcW <= $config['width'] && $srcH <= $config['height']) {
continue; // Skip if source is smaller than target
}
$thumb = $config['crop']
? $this->cropResize($source, $srcW, $srcH, $config['width'], $config['height'])
: $this->fitResize($source, $srcW, $srcH, $config['width'], $config['height']);
$thumbPath = $destBase . '-' . $name;
if (function_exists('imageavif')) {
imageavif($thumb, $thumbPath . '.avif', 80);
$results[$name . '_avif'] = $thumbPath . '.avif';
}
if (function_exists('imagewebp')) {
imagewebp($thumb, $thumbPath . '.webp', 85);
$results[$name . '_webp'] = $thumbPath . '.webp';
}
imagedestroy($thumb);
}
imagedestroy($source);
return $results;
}
The "skip if source is smaller" check is critical. If someone uploads a 300x200 image, generating a 1600x1600 "large" version would upscale it — producing a blurry, larger file. Better to serve the original.
Stage 5: Storage and Database Registration
The final stage writes files to disk and records the upload in the media table:
uploads/
images/
2026/
03/
my-photo.avif full-size AVIF
my-photo.webp full-size WebP
my-photo-thumbnail.avif 400x400
my-photo-thumbnail.webp
my-photo-medium.avif 800x800
my-photo-medium.webp
my-photo-large.avif 1600x1600
my-photo-large.webp
my-photo-pinterest.avif 1000x1500
my-photo-pinterest.webp
The database stores only the base path without extension: images/2026/03/my-photo. The get_featured_image() helper constructs the full URL with the appropriate size suffix and format at render time. This means you can change the serving logic (add a CDN, change format preference) without touching the database.
The Picture Element Output
Serving the right format to the right browser requires the HTML <picture> element:
function get_featured_picture(array $post, string $size = 'medium'): string
{
$basePath = $post['featured_image'] ?? '';
if (empty($basePath)) return '';
$baseUrl = UPLOADS_URL . '/' . $basePath;
$suffix = ($size !== 'full') ? '-' . $size : '';
return sprintf(
'<picture>
<source srcset="%s" type="image/avif">
<source srcset="%s" type="image/webp">
<img src="%s" alt="%s" loading="lazy" width="%d" height="%d">
</picture>',
$baseUrl . $suffix . '.avif',
$baseUrl . $suffix . '.webp',
$baseUrl . $suffix . '.webp',
htmlspecialchars($post['title'] ?? ''),
$size === 'thumbnail' ? 400 : ($size === 'medium' ? 800 : 1600),
$size === 'thumbnail' ? 400 : ($size === 'medium' ? 500 : 1000)
);
}
The browser picks the first <source> it supports. Chrome, Edge, Firefox, and Safari 16+ all support AVIF. Older Safari versions fall back to WebP. The <img> tag carries the alt, loading, width, and height attributes.
The explicit width and height prevent Cumulative Layout Shift (CLS). Without them, the browser does not know how much space to reserve for the image, causing content to jump.
CDN Integration
For sites using a CDN (Cloudflare, BunnyCDN, KeyCDN), the media URL construction needs one change:
define('CDN_URL', 'https://cdn.example.com');
// In get_featured_image()
$baseUrl = defined('CDN_URL') && CDN_URL
? CDN_URL . '/uploads/' . $basePath
: UPLOADS_URL . '/' . $basePath;
The CDN pulls from origin on the first request and caches the response at edge nodes worldwide. Subsequent requests are served from the nearest edge. With AVIF files already small, the CDN's primary benefit is latency reduction rather than bandwidth savings.
One thing to watch: CDN cache invalidation. When you re-upload an image with the same filename (replacing a photo, for example), the CDN may continue serving the old cached version. Include a version query parameter or use content-based filenames to avoid this:
// Content-based filename approach
$hash = substr(md5_file($sourcePath), 0, 8);
$filename = $slug . '-' . $hash;
Garbage Collection for Orphaned Files
Over time, files accumulate in the uploads directory that no post or page references. This happens when authors upload an image and decide not to use it, or when posts are deleted without cleaning up their media. These orphaned files waste disk space.
class MediaGarbageCollector
{
public function collectOrphans(): array
{
// 1. Gather all referenced paths from the database
$referenced = [];
$posts = $this->db->fetchAll(
"SELECT featured_image FROM posts WHERE featured_image != ''"
);
foreach ($posts as $p) {
$referenced[$p['featured_image']] = true;
}
$media = $this->db->fetchAll("SELECT path FROM media");
foreach ($media as $m) {
$referenced[$m['path']] = true;
}
// 2. Scan uploads directory
$orphans = [];
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($this->uploadsDir)
);
foreach ($iterator as $file) {
if ($file->isDir()) continue;
$relativePath = str_replace(
$this->uploadsDir . DIRECTORY_SEPARATOR, '', $file->getPathname()
);
$relativePath = str_replace('\' , '/', $relativePath);
// Strip size suffix and extension to get base path
$basePath = preg_replace(
'/-(thumbnail|medium|large|pinterest).(avif|webp)$/', '', $relativePath
);
$basePath = preg_replace('/.(avif|webp|jpg|jpeg|png|gif)$/', '', $basePath);
if (!isset($referenced[$basePath])) {
$orphans[] = [
'path' => $relativePath,
'size' => $file->getSize(),
'modified' => $file->getMTime(),
];
}
}
return $orphans;
}
}
The garbage collector does not delete files automatically. It returns a list that an admin reviews before confirming deletion. Automatic deletion is dangerous — images might be referenced from custom HTML blocks, email templates, or external content that the scanner cannot detect.
We run this scan weekly via cron. Typical results: 3-5% of files in uploads are orphaned on any given site, corresponding to 50-200 MB on sites with 1000+ posts.
Common Path Pitfalls
The single most frequent media bug in JekCMS installations is the double uploads/ path. This happens when the database stores a path like uploads/images/2026/03/photo and the URL builder prepends UPLOADS_URL (which already ends in /uploads). The result: https://site.com/uploads/uploads/images/2026/03/photo.avif — a 404.
The rule is simple: the database stores paths relative to the uploads directory, never including the uploads/ prefix itself. The correct format is images/2026/03/photo. The URL builder adds the UPLOADS_URL prefix at render time.
// Correct: database has "images/2026/03/photo"
$url = UPLOADS_URL . '/' . $path;
// Result: https://site.com/uploads/images/2026/03/photo.avif
// Wrong: database has "uploads/images/2026/03/photo"
$url = UPLOADS_URL . '/' . $path;
// Result: https://site.com/uploads/uploads/images/2026/03/photo.avif (404!)
We added a safety check in the upload function to strip the prefix if it accidentally gets included:
if (strpos($path, 'uploads/') === 0) {
$path = substr($path, 8);
}
Real Numbers From Production
After deploying the media pipeline to all JekCMS installations, here is the measured impact:
- Average image size: 3.8 MB (original JPEG) to 142 KB (AVIF medium thumbnail) — 96% reduction
- Homepage weight (12 post cards): 18 MB to 1.7 MB
- Largest Contentful Paint: 1.4 seconds improvement average across all sites
- Storage per 1000 posts: 3.2 GB to 890 MB (all thumbnail sizes included)
- Upload processing time: 800ms average for a 4 MB JPEG (EXIF strip + AVIF/WebP + 4 thumbnail sizes)
The 800ms processing time is the tradeoff. Uploading is slower because the server is doing real work. But that 800ms saves megabytes of bandwidth on every page load for the lifetime of that image. Over a month, a single popular post card image viewed 10,000 times saves roughly 36 GB of transfer — from one image.
What We Would Do Differently
If I were building this pipeline from scratch today, I would add two features. First, background processing: move the conversion and thumbnail generation to a queue instead of doing it synchronously during upload. The 800ms delay is acceptable, but it adds up when bulk-uploading 50 images for a gallery page.
Second, responsive srcset generation. Instead of fixed thumbnail sizes, generate multiple widths (320, 640, 960, 1280, 1600) and let the browser pick the best one using srcset with width descriptors. This would serve even smaller files on mobile devices, where a 400px-wide card does not need an 800px image.
Both features are on our roadmap for later this year. For now, the four-size approach covers 95% of use cases well enough that the additional complexity is not justified.