Skip to main content

Converting string to unique integer in perl

I'm writing some perl code for a project I'm working on. One of my needs is to convert a string into a 32-bit integer. The catch is that the conversion must be deterministic (i.e. a hash function).

I looked at using CRC32 but the resulting integer wasn't always the same each time. I looked at using MD5, but MD5 produces a very long hex string that will overflow a 32-bit integer.

I decided to do something a bit weird; because a 32 bit integer has up to 4,294,967,295 values (signed), I could take the first 8 characters in the MD5 value and then convert any of the letters to a corresponding number.

So the function looks like this in perl:

sub convert_string
# Function to convert string into a unique 32-bit integer
{
 my ($str) = @_;
 
 my $md5str = md5_hex($str);
 my $md5strsub = substr $md5str, 0, 8;
 $md5strsub =~ tr/a-f/1-6/;
 return $md5strsub;
}

Comments

Popular posts from this blog

Building a Hadoop cluster

I've recently had to build a Hadoop cluster for a class in information retrieval . My final project involved building a Hadoop cluster. Here are some of my notes on configuring the nodes in the cluster. These links on configuring a single node cluster and multi node cluster were the most helpful. I downloaded the latest Hadoop distribution then moved it into /hadoop. I had problems with this latest distribution (v.21) so I used v.20 instead. Here are the configuration files I changed: core-site.xml: fs.default.name hdfs://master:9000 hadoop.tmp.dir /hadoop/tmp A base for other temporary directories. hadoop-env.sh: # Variables required by Mahout export HADOOP_HOME=/hadoop export HADOOP_CONF_DIR=/hadoop/conf export MAHOUT_HOME=/Users/rpark/mahout PATH=/hadoop/bin:/Users/rpark/mahout/bin:$PATH # The java implementation to use. Required. export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home hdfs-site...

The #1 Mistake Made by Product People at All Levels

In my 20+ year career in product management for B2B enterprise companies, I have seen product managers at every level make a certain kind of mistake. It is so easy to make that I occasionally make it myself when I'm not careful. What is this mistake? It is to spend too much time on tasks and deliverables that are not core to the product function, which is to to determine and define products to be built. If you keep falling into this trap then ultimately you can't be effective at your job and your company won't sell compelling products. Your primary job as a product manager is to figure out what your market and customers need and make sure it gets built. If you aren't careful, you can spend all of your time performing tasks in support of products such as sales enablement, customer success, product marketing, and pre-sales. How Do You Know This Is Happening? It is easy to fall into this trap for many reasons. Here are a few scenarios that come to mind: Product Marketing ...

Connecting to SQL Server from OS X perl

I've been spending my coding time in the offhours working on Perl instead of Ruby. My coding time in general has been very limited, which is part of the reason for the length of time between updates. :) My latest project is to pull data out of a Microsoft SQL Server database for analysis. I'm using perl for various reasons: I need a crossplatform environment, and I need certain libraries that only work on perl. Some of the target users for my code run on Windows. I know that Ruby runs on Windows but it's not the platform of choice for Ruby developers. The vast majority seem to develop either on OS X or Linux. So Ruby on Windows isn't at the maturity that ActiveState perl is on Windows. In fact, I don't even run native perl anymore on my MacBook Pro. I've switched over to ActiveState perl because I don't need to compile anything every time I want to install new CPAN libraries. And because it's ActiveState, I'm that much more confident it will w...