Thursday, July 1, 2010

Converting string to unique integer in perl

I'm writing some perl code for a project I'm working on. One of my needs is to convert a string into a 32-bit integer. The catch is that the conversion must be deterministic (i.e. a hash function).

I looked at using CRC32 but the resulting integer wasn't always the same each time. I looked at using MD5, but MD5 produces a very long hex string that will overflow a 32-bit integer.

I decided to do something a bit weird; because a 32 bit integer has up to 4,294,967,295 values (signed), I could take the first 8 characters in the MD5 value and then convert any of the letters to a corresponding number.

So the function looks like this in perl:

sub convert_string
# Function to convert string into a unique 32-bit integer
 my ($str) = @_;
 my $md5str = md5_hex($str);
 my $md5strsub = substr $md5str, 0, 8;
 $md5strsub =~ tr/a-f/1-6/;
 return $md5strsub;