Skip to main content

Posts

Converting string to unique integer in perl

I'm writing some perl code for a project I'm working on. One of my needs is to convert a string into a 32-bit integer. The catch is that the conversion must be deterministic (i.e. a hash function). I looked at using CRC32 but the resulting integer wasn't always the same each time. I looked at using MD5, but MD5 produces a very long hex string that will overflow a 32-bit integer. I decided to do something a bit weird; because a 32 bit integer has up to 4,294,967,295 values (signed), I could take the first 8 characters in the MD5 value and then convert any of the letters to a corresponding number. So the function looks like this in perl: sub convert_string # Function to convert string into a unique 32-bit integer { my ($str) = @_; my $md5str = md5_hex($str); my $md5strsub = substr $md5str, 0, 8; $md5strsub =~ tr/a-f/1-6/; return $md5strsub; }

Relevance of "Social Studies" as a college major to product management

Whenever people ask me what I majored in at college, I always hesitate a bit before answering. The real answer is  Social Studies , though I typically don't give this answer anymore. People would either give me a puzzled look or a snide comment, such as "Oh, I took that in 5th grade." So I started answering that I majored in sociology. Not many people really know what sociology is either, but at least it doesn't sound like a 5th grade subject. Here's a good explanation  of what the Social Studies program is about: Social Studies is a unique program of study at Harvard College. ...It reflects the belief that the study of the social world requires an integration of the disciplines of history and political science, sociology and economics, anthropology and philosophy. Concerned with the fragmentation caused by increasing disciplinary specialization, the faculty and students of Social Studies seek an integrated approach to the study of social phenomena that synthes...

Measuring memory per Unix process

Wow, it's been a long time since I've written anything here. Since my last blog post, I've been taking two classes (in two very different topics), along with the other daily responsibilities of working, raising two children, etc. For a work-related requirement, I had to figure out how to measure memory consumed by a specific Unix/Linux process. This is more difficult than it may seem. For one thing, apparently the memory statistics given by Linux are meaningless. This is mainly because the ps command (and the VSZ metric specifically) only lists the size of the address space referenced by a process, not the actual memory size itself. This page suggests the use of smaps, where /proc/$pid/smaps provides the actual amount of memory used by a process. Because the output of smaps is pretty lengthy, Someone wrote a python script called mem_usage.py to make the output more understandable. The main issue is that smaps only exists in Linux, and I had a requirement to measu...

Merits of Customer Development

I wouldn't be a card-carrying product manager without some thoughts on product management. My latest product management interest has been following a trend known as Customer Development, advocated by Steve Blank . It's a business model for developing new products, mostly applicable to startups but I think it also applies to more established companies. The main message is actually fairly intuitive: you develop a successful product by continually iterating it in a tight feedback loop between developing the product, getting customer input, and then making changes. Rather than spend a lot of time upfront in creating the product, you develop a minimum viable product (MVP) : the product with just the necessary features to get money and feedback from early adopters. Then you let your early adopter customers tell you what works well and what needs to be changed. This model sounds intuitive but by far the most prevalent development model for Silicon Valley startups looks somethi...

Displaying Ruby code on a blog

I spent a bit of time trying to figure out the best way to display Ruby code on this blog. Initially I just converted the code to Courier font but it looked ugly and was hard to work with. I eventually found this link on Stack Overflow: http://stackoverflow.com/questions/1644201/how-can-i-display-code-better-on-my-blogger-blog There were a few other methods I found, such as this strategy to convert the code via a Ruby script to HTML and then put it in the clipboard. But I liked the Stack Overflow method the best because all I need to do is add these HTML tags to the code: <pre class="brush: ruby" name="code"> (Code) <pre> Here's an example of what it looks like: Class Foo def bar end end And here is how to widen the main text column so the code doesn't constantly get wrapped: http://johndeeremom.blogspot.com/2008/07/how-to-widen-your-columns-on-blogger.html

Build simple PDF search engine in Ruby (Part 1)

I decided to build a simple Ruby search engine to search through PDFs. The main application was that I wanted a quick way to search through songsheets on my church's Web site. I didn't want to repeatedly look through different PDFs to find the song I was interested in. I was mostly inspired by this example of someone who had written a search engine in 200 lines of Ruby. I knew my program would be much easier because it didn't need to support any crawling; just indexing and querying. The first challenge was to find a Ruby library that would parse PDFs. I ultimately settled on this because it was easy to work with. It's basically just a Ruby wrapper around pdftohtml that provides high level access to the text objects of a PDF. I don't care about layout, graphics, etc. so this was sufficient. The PDF code mostly works without problems but it assumes that the directory for pdftohtml exists in $PATH. I used MacPorts to compile pdftohtml so it was stored in /opt...

Purpose of this blog

Hi, I wanted to create this blog to document some of the coding work I'd like to do, mostly in Ruby and iPhone development. This is mostly for me so I don't forget what I work on, but I hope it's helpful to anyone out there. I'm still primarily interested in Ruby on Rails but Sinatra is looking very interesting as a simple Ruby framework for building applications without a database. Stay tuned for more stuff!