Delete duplicate e-mail messages

If you need to delete duplicate e-mail messages on an IMAP server, take a look at this useful IMAP de-duplicator script:

IMAP de-duplicator – IMAPdedup

As IMAPdedup is a command line interface tool (a python script), it’s particularly useful for:

  • automated deletion of duplicates (as it can be called from other scripts)
  • extraordinarily big mailboxes or if you have many subfolders (as there’s no intervention by the user required)
  • if you have console/shell access to the IMAP server (as you can then run the script on the server itself, speeding the de-duplication process further up)

I also found that it deals relatively well with failures (e.g. when a folder is read-only and hence messages can’t be deleted): It simply reports them on the screen and carries on.

Here’s a quick’n’dirty bash script to de-dup the inbox and all subfolders of the specified account:

#!/bin/sh
# Delete all duplicate messages in all folders of said account.
# Note that we connect through SSL (-x) to the default port.

SERVER="my.server.com"
USER="mylogin"
PASS="mypass"

for folder in `imapdedup.py -s $SERVER -x -u $USER -w $PASS -l`;
do
 imapdedup.py -s $SERVER -x -u $USER -w $PASS $folder
done

If you only have to de-duplicate messages in a small folder, you could also use the following de-duplication add-on for Mozilla Thunderbird:

Remove Duplicate Messages Add-on for Thunderbird

Note however that the ‘Remove Duplicate Messages’ add-on is intended for interactive use only, not for batch processing. I also noticed that it fails at cleaning big mail folders (e.g. containing 50’000 messages).