Delete duplicate e-mail messages

If you need to delete duplicate e-mail messages on an IMAP server, take a look at this useful IMAP de-duplicator script:

IMAP de-duplicator – IMAPdedup

As IMAPdedup is a command line interface tool (a python script), it’s particularly useful for:

  • automated deletion of duplicates (as it can be called from other scripts)
  • extraordinarily big mailboxes or if you have many subfolders (as there’s no intervention by the user required)
  • if you have console/shell access to the IMAP server (as you can then run the script on the server itself, speeding the de-duplication process further up)

I also found that it deals relatively well with failures (e.g. when a folder is read-only and hence messages can’t be deleted): It simply reports them on the screen and carries on.

Here’s a quick’n’dirty bash script to de-dup the inbox and all subfolders of the specified account:

#!/bin/sh
# Delete all duplicate messages in all folders of said account.
# Note that we connect through SSL (-x) to the default port.

SERVER="my.server.com"
USER="mylogin"
PASS="mypass"

for folder in `imapdedup.py -s $SERVER -x -u $USER -w $PASS -l`;
do
 imapdedup.py -s $SERVER -x -u $USER -w $PASS $folder
done

If you only have to de-duplicate messages in a small folder, you could also use the following de-duplication add-on for Mozilla Thunderbird:

Remove Duplicate Messages Add-on for Thunderbird

Note however that the ‘Remove Duplicate Messages’ add-on is intended for interactive use only, not for batch processing. I also noticed that it fails at cleaning big mail folders (e.g. containing 50’000 messages).

 

Notes on tracing code execution in Django and Python « SaltyCrane Blog

Eliot from the SaltyCrane blog wrote a nice Django management command that allows to easily trace a Django runserver simply by executing ./manage.py trace runserver. Works great!

Django trace tool, django-trace is [..] a Django management command that uses sys.settrace with other Django management commands. https://github.com/saltycrane/django-trace.

via Notes on tracing code execution in Django and Python « SaltyCrane Blog.

Skype 5.3.0.116 – a memory hog with memory leaks

Just take a look at the following screenshot I just took, showing two Skype 5.3.0.116 instances running on a current Windows 7 box with 4 GB of RAM:

That’s 330 MB of private memory for each instance at this very moment! Note that these numbers are steadily growing (at about 2 KB/s) for both processes – for no apparent reason. A hint, that there’s likely a memory leak somewhere in Skype.

Let’s hope Microsoft will rewrite Skype from scratch (The current code-base probably isn’t worth refactoring). I’m confident they don’t lack the human and financial resources to do it. It can only get better.

Kimai – Open Source Time Tracking Tool

So far, I’ve always used “good old” spreadsheets for time tracking on projects. Custom ones I pimped up with some nifty formulae, but still just spreadsheets. Advantage: I can easily adjust them to any special needs anytime – be it the inclusion or exclusion of specific work or just a customization of the sheet’s design or layout. The price for this flexibility is the generally higher effort to track the time “manually” rather than using a specialized time tracking tool – which makes time tracking a tedious task.

Of course I’ve evaluated many proprietary and open source time tracking tools over the years, but so far, none of them managed to fully convince me.

Today, I’ve just stumbled over Kimai – an open source, web-based time tracking tool written in PHP. And so far, Kimai looks promising. Installation is dead easy – just make sure you’ve compiled PDO support into PHP (Gentooers: enable the PDO flag for dev-lang/php and remerge php), else the nice web-based installation wizard will abort without printing any error message.

Once you’ve logged in, you’ll be presented a very clean, intuitive GUI where you can setup customers, projects and tasks. On the top-right there’s a big push-button to start/stop/pause the time tracking.

During my quick evaluation, I haven’t found the functionality yet to export the timesheets, but as far as I know, such functionality will be provided by extensions that can be installed. Let’s see. [Addition 20091009: There’s a stats extension quick-hack for Kimai 0.8.x that can be used to list and print selected reports. To use it, simply download it, extract it in the extensions folder and navigate to {Kimai install folder}/extensions/stats/]

Here’s a screenshot of Kimai 0.8.1.890:

Kimai 0.8.1.890 Screenshot
Kimai 0.8.1.890 Screenshot

With the currently still very limited feature-set, Kimai doesn’t compete with full-grown project management solutions (I’ve recently seen a quick demo of a very sophisticated and cool, Django-based project management solution I’m not allowed to tell any details about yet). But it looks like a promising start. I hope the Kimai project will gain momentum, grow and mature as there’s definitely a need for open source time tracking tools – particularly web-based ones.

P.S. I haven’t had the time yet to audit Kimai’s source code, but if the orderly, clean GUI is any indication, it can’t be too bad.

Django custom model field for an unsigned BIGINT data type

Web 2.0 social media platforms tend to think “big”. They hence often use big integers (8 bytes / 64 bits long instead of just 4 bytes / 32 bits like a normal integer) for user IDs (or sometimes message IDs) to be prepared for even the most extreme potential future growth of their user base. Usually, these big integers are unsigned, allowing for up to 18’446’744’073’709’551’615 UIDs to be stored (which is probably enough to register the inhabitants of quite a few other blue planets too ;).

Facebook, with currently more than 300 million active users, also uses  a 64 bit unsigned integer for storing user IDs and expects Facebook applications to be able to handle this. Of course, 300 M user IDs would still easily fit into a 32 bit unsigned integer, but Facebook already goes beyond the 32 bit limit by issuing 15 digit UIDs like 100’000’xxx’xxx’xxx to registered test users (which allows Facebook to better distinguish between test accounts and real accounts).

Now if you happen to use Django to build your Facebook application, this fact needs special attention as Django doesn’t support 64 bit integer field types for its ORM models by default. As a Django developer, you could thus resort to using a CharField for storing Facebook UIDs (which would be odd) or, better, define a custom model field you can use in your models instead of IntegerField. Fortunately, Django offers an elegant way to define custom model fields. You can write your custom PositiveBigIntegerField by simply subclassing (extending, inheriting from) models.PositiveIntegerField:

So, in your models.py add the following code:

from django.db import models
from django.db.models.fields import PositiveIntegerField

class PositiveBigIntegerField(PositiveIntegerField):
    """Represents MySQL's unsigned BIGINT data type (works with MySQL only!)"""
    empty_strings_allowed = False

    def get_internal_type(self):
        return "PositiveBigIntegerField"

    def db_type(self):
        # This is how MySQL defines 64 bit unsigned integer data types
        return "bigint UNSIGNED"

class Mytest(models.Model):
    """Just a test model"""

    huge_id = PositiveBigIntegerField()

    def __unicode__(self):
        return u'id: %s, huge_id: %s' % (self.id, self.huge_id)

(NB: The “Mytest” class is just for testing the PositiveIntegerField definition, it’s not part of the PositiveIntegerField definition.)

Note that this solution only works for MySQL as a database backend (as MySQL supports the “bigint UNSIGNED” data type for columns which isn’t defined in the SQL standard).

For testing, define a “Mytest” model as shown above and execute “python manager.py syncdb” to create a new myapp_mytest table with an unsigned bigint(20) column named huge_id. Register this new model “Mytest” in admin.py, restart runserver and you’ll be able to enter 64 bit integer values through Django’s admin application.

The only minor “issue” is that Django admin’s CSS class (.vIntegerField) used for HTML form input fields representing integer values defines the width as “5em” which is a bit too narrow to display the entire 64 bit integer. This can be adjusted however (e.g. by writing your own ModelForm and telling ModelAdmin to use that, see the Django admin documentation and the Widget.attrs documentation).

P.S. Note that for Django to be able to access and use a “bigint UNSIGNED” data type, you don’t necessarily need to define a PositiveBigIntegerField and adjust your models. Instead, you could simply adjust the column type in MySQL accordingly as a quick-fix. If you use syncdb (like most Django devs) and want it to create your tables and columns correctly however, defining a custom model type as described is the way to go and strongly recommended for consistency and QA.

Thunderbird Add-on: S/MIME Security for Multiple Identities

I’ve just found and installed the following add-on for Mozilla Thunderbird:

S/MIME Security for Multiple Identities

It allows you to use a different S/MIME certificate for each of the different identities (i.e. “e-mail address aliases” or “profile aliases”) you defined in your Thunderbird profile.

It’s currently still marked as an experimental add-on and I’ve noticed a minor glitch in v0.3.0 when using it (see my add-on review), but this might also be related to the fact that I also use the Virtual Identity add-on (another nice add-on which allows you to use an arbitrary sender address for sending messages).

The “S/MIME Security for Multiple Identities” add-on is very convenient if you have multiple e-mail accounts and want to use S/MIME message signing and/or encryption with all of them.

Bonus hint: You can get your own, personal S/MIME certificates for free at Thawte (for e-Mail only) or StartCom/StartSSL (also offers free class 1 SSL/TLS certificates for FTP servers, web servers etc. -> the latter don’t “work” with Internet Explorer, however).

Simultaneously using multiple accounts with Skype 4.0

As an addendum to my earlier post, I’ve just noticed that a) Skype 4.0 Beta 2 runs pretty stable on Windows Vista 64 and b) has built-in support for managing multiple Skype accounts! With the help of this feature, you could for example set up a private account and a business account and use both of them at the same time, using the same Windows user account. The setup is straightforward:

1) Install Skype 4.0 Beta 2

2) Create a shortcut to Skype.exe and place it on the quick launch bar. Rename the shortcut to “Skype Private Account”, for example. Start Skype using this shortcut and setup your first account (in this case, your private account).

3) Create another shortcut to Skype.exe (add it to the quick launch bar, too) and name it “Skype Business Account”. Open the “Properties” dialog of this shortcut by right-clicking on it. In the “Target” text field of the properties dialog box, append ” /secondary” (without double quotes) to the Skype.exe path that is already there. For example, in my case, the “Target” text field contains:

“C:\Program Files (x86)\Skype\Phone\Skype.exe” /secondary

Rename this shortcut to “Skype Business Account” (right-click->rename). Having done this, start Skype using this shortcut. Skype will then prompt you to enter the credentials of another Skype account of yours (in this example, of your business account).

You can also choose different icons for the two shortcuts. Further, I’d assume the /secondary feature isn’t limited to managing two Skype accounts, though I haven’t tested it with more than two accounts. The main advantage of this “/secondary” feature is that you don’t need to have a separate Windows user account for each of your Skype accounts. Note however that, even with this solution, a new instance of Skype will be created for each of your Skype accounts – every instance consuming about 40 MB of RAM.

I think that’s a very useful feature and I like it a lot.

Google Chrome from a business and “techie” view

If Google will really deliver what it promises with its new Chrome browser plans (Google blog) (personally, I have no doubts about this), the line between web applications and standalone applications will further blur and hereby enable a better, seamless user experience and probably a whole new class of powerful applications.

Some thoughts:

  • From a technical point of view, Google’s Chrome will be the WebOS others have been dreaming about for a long time already. It basically offers memory management, process management, markup renderers, a GUI and a VM with a JIT compiler (V8).
  • It will finally unify the ideas behind the WebOS, “The network is the computer”, cloud computing, SaaSRIA and virtualization.
  • Actually, it’s astonishing it took so long for someone to come up with something like this. The concepts as such are not new at all, but the combination of all these different concepts is what makes the thing cool. It’s typical for a good idea that, once you’ve heard of it, you almost can’t imagine living without it anymore, as it seems all so natural.
  • Detachable tabs on top: Not a new idea either. For example, I remember that the Fluxbox window manager actually offered the same feature back in 2001/2002 (or even earlier) already. I remember it as I used it myself too (and I liked it a lot, despite of its “suboptimal” scalability), as illustrated in these animations:
    Fluxbox Window Grouping Feature (2002) 1/2 (small animated GIF screenshot)
    Fluxbox Window Grouping Feature (2002) 2/2 (large animated GIF screenshot)  

    I guess there were other window managers and GUIs that had the same features even before fluxbox had them.

  • With this move, Google will be gradually taking control and power away from traditional Desktop OS manufacturers such as Microsoft and Apple. Being open source, Chrome and its components like V8 will be the “Linux of the web” and thus a big threat particularly to Microsoft that still generates most of its revenue with Windows and standalone applications like Office.
  • The ongoing process, that (desktop) operating systems are becoming commodities more and more, will further be accelerated. Will there be an “unsacred” alliance between Apple and Microsoft to fight these tendencies or will they shift their businesses further into the “web” application (SaaS), content (music, videos, TV, e-books, multimedia etc.) and lifestyle (design, hardware, ethics) spaces?
  • Of course that’s in the best interest of Google (as their business is data/content and webapps/SaaS). I wouldn’t call this move an evil move, but it’s definitely not a friendly move in the eyes of the competition.
  • From a “techie” point of view, this move will enable many interesting applications in the future. As the framework will be open source, the dev community will potentially be as vital and dynamic as in other high-profile OSS projects (like Mozilla, Linux)
  • What about the Mozilla, Safari, IE, Opera camps? They will have to adapt themselves to the concept and try to top it. IE (and perhaps also Safari) might try to take the “embrace and extend” route.
  • With the birth of the WebOS, there will probably be a need of an open, standardized webapp GUI toolkit and webapp GUI guidelines soon (and there’s a big potential for conflicts here). Who will provide these? What will be the roles of the current big players? Also, standardized, open specs for user authentication and user data exchange will be required – here, there’s already some progress with OpenID, OAuth etc.
  • I like that Google communicates its plans using an easy-to-follow cartoon and that they give credit to individual internal and external contributors and players (though I assume there were much more people involved in the process than those mentioned)
  • The thing that disappoints me a bit is that when talking about V8, they only talk about targeting JavaScript. I’d prefer a more generic approach providing a VM and JIT for various languages (similarly to a CLI VM – why not re-use/extend Mono, for example?). Maybe that’s what V8 actually provides and they just don’t emphasize it at this point in order to not confuse or upset end-users, devs, big players etc..
  • Taking a look at the big picture, it seems that there’s a very pragmatic driver behind this whole development: It’s the laziness of us end-users (just as a fact, not meant in a negative sense – being “lazy” is usually quite rational). Or in other words: The information takes the line of the least resistance. And so far, that line for the “Network OS” happens to be the web, i.e. basically HTTP, despite of its known shortcomings.

[UPDATE 20080902: Corrected a typo. And here’s a statement regarding the Google Chrome news by John Lilly, CEO of Mozilla Corp.]