{"id":629,"date":"2025-06-12T18:15:46","date_gmt":"2025-06-12T18:15:46","guid":{"rendered":"https:\/\/mouryasolutions.in\/blog\/?p=629"},"modified":"2025-06-12T18:15:47","modified_gmt":"2025-06-12T18:15:47","slug":"removing-duplicate-images-in-php-using-md5-hashing","status":"publish","type":"post","link":"https:\/\/mouryasolutions.in\/blog\/removing-duplicate-images-in-php-using-md5-hashing\/","title":{"rendered":"Removing Duplicate Images in PHP Using MD5 Hashing"},"content":{"rendered":"<div class=\"post-content\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>When working with image directories\u2014especially in content-heavy applications or media galleries\u2014duplicate files can waste storage space and clutter your filesystem. Thankfully, with a few lines of PHP, we can automatically detect and move these duplicate files using a hash-based approach.<\/p>\n\n\n\n<p>In this post, we\u2019ll walk through a simple PHP script that removes duplicate images based on their file content (not just the name) using the <code>md5<\/code> hashing algorithm.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2>\u2705 What This Script Does<\/h2>\n\n\n\n<ul>\n<li>Scans a given directory for image files.<\/li>\n\n\n\n<li>Computes an MD5 checksum for each file.<\/li>\n\n\n\n<li>If the checksum already exists, it recognizes the file as a duplicate.<\/li>\n\n\n\n<li>Moves duplicates to a separate directory (<code>duplicate_images\/<\/code>).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2>\ud83d\udee0\ufe0f The PHP Script<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;?php \n\/\/ Function to ensure cross-platform compatibility for file paths\nfunction platformSlashes($path) {\n    return str_replace('\/', DIRECTORY_SEPARATOR, $path);\n}\n\n$mdir = \"D:\\justest\\\\\"; \/\/ Base directory\n$dir = $mdir . \"images\"; \/\/ Directory containing images\n\n$checksums = array();\n\nif ($h = opendir($dir)) {\n    while (($file = readdir($h)) !== false) {\n        \/\/ Skip directories\n        if (is_dir($_ = \"{$dir}\/{$file}\")) continue;\n\n        \/\/ Normalize file path\n        $main_dir = platformSlashes($_);\n\n        \/\/ Generate MD5 hash for the file\n        $hash = hash_file('md5', $main_dir);\n\n        \/\/ Destination path for duplicates\n        $dup_dir = str_replace('images', 'duplicate_images', $main_dir);\n\n        \/\/ Check if this hash has already been encountered\n        if (in_array($hash, $checksums)) {\n            \/\/ Move the duplicate file to another folder\n            rename($main_dir, $dup_dir);\n        } else {\n            \/\/ Store the hash to detect future duplicates\n            $checksums&#91;] = $hash;\n        }\n    }\n    closedir($h);\n}\n\n\/\/ Output the checksums (optional)\nprint_r($checksums);\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2>\ud83d\udcc2 Folder Structure Before &amp; After<\/h2>\n\n\n\n<p><strong>Before:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>D:\\justest\\\n\u2502\n\u251c\u2500\u2500 images\\\n\u2502   \u251c\u2500\u2500 img1.jpg\n\u2502   \u251c\u2500\u2500 img1_copy.jpg  \u2190 duplicate\n\u2502   \u251c\u2500\u2500 img2.png\n<\/code><\/pre>\n\n\n\n<p><strong>After:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>D:\\justest\\\n\u2502\n\u251c\u2500\u2500 images\\\n\u2502   \u251c\u2500\u2500 img1.jpg\n\u2502   \u251c\u2500\u2500 img2.png\n\u2502\n\u251c\u2500\u2500 duplicate_images\\\n\u2502   \u251c\u2500\u2500 img1_copy.jpg\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2>\ud83d\udd12 Why Use Hashing?<\/h2>\n\n\n\n<p>Using a hashing function like <code>md5<\/code> lets us compare files based on <strong>content<\/strong> rather than name or size alone. While <code>md5<\/code> is not suitable for cryptographic security, it&#8217;s fast and ideal for checksumming in file comparisons.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2>\ud83d\udccc Notes<\/h2>\n\n\n\n<ul>\n<li>Ensure that the <code>duplicate_images<\/code> directory exists beforehand or add logic to create it.<\/li>\n\n\n\n<li>This script works on all platforms but assumes a Windows-style path in the example. Modify paths accordingly for Linux\/macOS.<\/li>\n\n\n\n<li>You can replace <code>rename()<\/code> with <code>unlink()<\/code> to delete duplicates instead of moving them.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2>\ud83d\udca1 Wrapping Up<\/h2>\n\n\n\n<p>With this quick script, you can clean up your image directories automatically, saving storage and keeping your media organized. Extend this script further by integrating it into your CMS, setting up cron jobs, or adding a logging system.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div><!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>When working with image directories\u2014especially in content-heavy applications or media<a href=\"https:\/\/mouryasolutions.in\/blog\/removing-duplicate-images-in-php-using-md5-hashing\/\">Read More<i class=\"fa fa-long-arrow-right\" aria-hidden=\"true\"><\/i><\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10],"tags":[],"_links":{"self":[{"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/posts\/629"}],"collection":[{"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/comments?post=629"}],"version-history":[{"count":1,"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/posts\/629\/revisions"}],"predecessor-version":[{"id":630,"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/posts\/629\/revisions\/630"}],"wp:attachment":[{"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/media?parent=629"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/categories?post=629"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mouryasolutions.in\/blog\/wp-json\/wp\/v2\/tags?post=629"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}