I wouldn't be a card-carrying product manager without some thoughts on product management.
My latest product management interest has been following a trend known as Customer Development, advocated by Steve Blank. It's a business model for developing new products, mostly applicable to startups but I think it also applies to more established companies.
The main message is actually fairly intuitive: you develop a successful product by continually iterating it in a tight feedback loop between developing the product, getting customer input, and then making changes. Rather than spend a lot of time upfront in creating the product, you develop a minimum viable product (MVP): the product with just the necessary features to get money and feedback from early adopters. Then you let your early adopter customers tell you what works well and what needs to be changed.
This model sounds intuitive but by far the most prevalent development model for Silicon Valley startups looks something like this:
1. Get excited about an idea. Start doing some research regarding markets, customers, pricing, etc.
2. Build the product, along with accompanying sales tools (demo, PowerPoint slides, data sheets, etc.). Start building a sales force to sell the product.
3. Work with a small group of alpha/beta customers. Enlist a PR agency start building "buzz."
4. Officially release the product in a public launch event, hopefully getting lots of attention from a site such as Digg or Techcrunch. Go full steam ahead in selling and marketing the product.
Steve Blank calls the model I just described as the Product Development model, and he labels it "the leading cause of startup death". You can read more about why on his blog, starting with this link.
I've started reading Steve's book, Four Steps to the Epiphany. I first heard about the book on Marc Andreessen's blog. I was intrigued by his recommendation because Marc has some really great thoughts on the idea of product-market fit - the idea that what matters most in determining the success of a product is how much it fits what the market needs. It turns out that Marc was borrowing concepts from Steve Blank's book!
The book is laid out more like a manual but from what I can tell, it has great content.
In fact, I don't know why we don't hear more about Steve Blank and Customer Development. The model makes so much sense because it basically says that entrepreneurs may have vision but they aren't fortune tellers. They can't predict exactly what people need so there is a constant need to go back and iterate.
Maybe this model isn't so popular because it flies in the face of conventional wisdom? Imagine if all startups started to follow this model of starting with a minimum viable product and holding off on enlisting the professional sales force, marketing team, PR agency, etc. upfront. This would certainly change the economics of the startup industry in regions such as Silicon Valley.
Here's another fascinating article inline with the idea of Customer Development. It seems almost strange that a Web site could make 50 changes in their production system every day, but if you take the time to read through the article, it makes a lot of sense.
http://timothyfitz.wordpress.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/
What happens when a product manager likes to play around with Ruby coding in the off hours
Friday, November 13, 2009
Thursday, November 12, 2009
Displaying Ruby code on a blog
I spent a bit of time trying to figure out the best way to display Ruby code on this blog. Initially I just converted the code to Courier font but it looked ugly and was hard to work with.
I eventually found this link on Stack Overflow:
http://stackoverflow.com/questions/1644201/how-can-i-display-code-better-on-my-blogger-blog
There were a few other methods I found, such as this strategy to convert the code via a Ruby script to HTML and then put it in the clipboard. But I liked the Stack Overflow method the best because all I need to do is add these HTML tags to the code:
<pre class="brush: ruby" name="code">
(Code)
<pre>
Here's an example of what it looks like:
And here is how to widen the main text column so the code doesn't constantly get wrapped:
http://johndeeremom.blogspot.com/2008/07/how-to-widen-your-columns-on-blogger.html
I eventually found this link on Stack Overflow:
http://stackoverflow.com/questions/1644201/how-can-i-display-code-better-on-my-blogger-blog
There were a few other methods I found, such as this strategy to convert the code via a Ruby script to HTML and then put it in the clipboard. But I liked the Stack Overflow method the best because all I need to do is add these HTML tags to the code:
<pre class="brush: ruby" name="code">
(Code)
<pre>
Here's an example of what it looks like:
Class Foo def bar end end
And here is how to widen the main text column so the code doesn't constantly get wrapped:
http://johndeeremom.blogspot.com/2008/07/how-to-widen-your-columns-on-blogger.html
Wednesday, November 11, 2009
Build simple PDF search engine in Ruby (Part 1)
I decided to build a simple Ruby search engine to search through PDFs.
The main application was that I wanted a quick way to search through songsheets on my church's Web site. I didn't want to repeatedly look through different PDFs to find the song I was interested in.
I was mostly inspired by this example of someone who had written a search engine in 200 lines of Ruby. I knew my program would be much easier because it didn't need to support any crawling; just indexing and querying.
The first challenge was to find a Ruby library that would parse PDFs. I ultimately settled on this because it was easy to work with. It's basically just a Ruby wrapper around pdftohtml that provides high level access to the text objects of a PDF. I don't care about layout, graphics, etc. so this was sufficient.
The PDF code mostly works without problems but it assumes that the directory for pdftohtml exists in $PATH. I used MacPorts to compile pdftohtml so it was stored in /opt/local/bin, and TextMate didn't recognize /opt/local/bin in my $PATH. I did some research and discovered this page that says I need to create a file called ~/.MacOSX/environment.plist and explicitly set the PATH variable:
The actual indexing code is straightforward. It's mostly based on the saush engine article. Rather than rehash the site, the index is based on an inverted index. The search engine saves the inverted index in a SQLite database using the DataMapper library.
There are three main "tables": Song, Word, and Location. Song and Word have a many-to-many relationship, where a song has multiple words and a word is used in multiple songs. Location is the mapping table between Song and Word.
Here is the indexing library. Note that it uses DataMapper so it relies on the dm-core and dm-timestamps libraries, as well as stemmer and pdf-struct (the PDF library mentioned earlier). The saush search engine uses dm-more but I couldn't get this to be properly included. But dm-timestamps was all that was needed out of dm-more.
Here is the code for index.rb:
The actual indexing code goes through each PDF. It extracts the words from the song (except the guitar chords) and creates a space-delimited string of words. Then it goes through the string, creating the Word or Song objects if necessary and creating the many-to-many relationship between Word and Song.
Code for pdfindex.rb:
The digger code actually searches through the song database and searches for songs. A song is searched for by passing a string to Digger.search(). It returns a list of songs that the string can be found in, along with a score.
Code for digger.rb:
Note: the biggest disadvantage with this search method is that it doesn't show the search string in its context in the song. Rather than continue with this approach, my thinking is to use a search engine such as Solr to do the search, so I can show the search string within the song.
The main application was that I wanted a quick way to search through songsheets on my church's Web site. I didn't want to repeatedly look through different PDFs to find the song I was interested in.
I was mostly inspired by this example of someone who had written a search engine in 200 lines of Ruby. I knew my program would be much easier because it didn't need to support any crawling; just indexing and querying.
The first challenge was to find a Ruby library that would parse PDFs. I ultimately settled on this because it was easy to work with. It's basically just a Ruby wrapper around pdftohtml that provides high level access to the text objects of a PDF. I don't care about layout, graphics, etc. so this was sufficient.
The PDF code mostly works without problems but it assumes that the directory for pdftohtml exists in $PATH. I used MacPorts to compile pdftohtml so it was stored in /opt/local/bin, and TextMate didn't recognize /opt/local/bin in my $PATH. I did some research and discovered this page that says I need to create a file called ~/.MacOSX/environment.plist and explicitly set the PATH variable:
{
PATH = "/opt/local/bin:/opt/local/sbin:/opt/local/bin:/opt/local/sbin:/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin";
}
The actual indexing code is straightforward. It's mostly based on the saush engine article. Rather than rehash the site, the index is based on an inverted index. The search engine saves the inverted index in a SQLite database using the DataMapper library.
There are three main "tables": Song, Word, and Location. Song and Word have a many-to-many relationship, where a song has multiple words and a word is used in multiple songs. Location is the mapping table between Song and Word.
Here is the indexing library. Note that it uses DataMapper so it relies on the dm-core and dm-timestamps libraries, as well as stemmer and pdf-struct (the PDF library mentioned earlier). The saush search engine uses dm-more but I couldn't get this to be properly included. But dm-timestamps was all that was needed out of dm-more.
Here is the code for index.rb:
require 'rubygems'
require 'dm-core'
require 'dm-timestamps'
require 'dm-aggregates'
require 'stemmer'
require 'pdf-struct'
DBLOC = 'songdb.sqlite3'
DataMapper.setup(:default, 'sqlite3:///' + DBLOC)
class String
def words
words = self.gsub(/[^0-9A-Za-z_\s]/,"").split # self is the string; no need for parms
# Get rid of all non-word and non-space characters and split on spaces
d = []
words.each { |word| d << word.downcase.stem unless word =~ /^[A-G]+[bgm]?$/ } # Ignore guitar chords
return d
end
end
class Song
include DataMapper::Resource
property :id, Serial
property :title, String, :length => 255
has n, :locations
has n, :words, :through => :locations
property :created_at, DateTime
property :updated_at, DateTime
def self.find(title)
song = first(:title => title)
song = new(:title => title) if song.nil?
return song
end
def refresh
update( {:updated_at => DateTime.parse(Time.now.to_s)})
end
end
class Word
include DataMapper::Resource
property :id, Serial
property :stem, String
has n, :locations
has n, :songs, :through => :locations
def self.find(word)
wrd = first(:stem => word)
wrd = new(:stem => word) if wrd.nil?
return wrd
end
end
class Location
include DataMapper::Resource
property :id, Serial
property :position, Integer
belongs_to :word
belongs_to :song
end
DataMapper.auto_migrate! if ARGV[0] == 'reset' # This issues the necessary Create statements and wipes out existing database
The actual indexing code goes through each PDF. It extracts the words from the song (except the guitar chords) and creates a space-delimited string of words. Then it goes through the string, creating the Word or Song objects if necessary and creating the many-to-many relationship between Word and Song.
Code for pdfindex.rb:
#!/usr/bin/ruby
require 'rubygems'
require 'fileutils'
require 'logger'
require 'index'
SONGDIR = '/Users/rpark/ruby/pdfsearch/'
LOGFILE = 'songsearch.log'
LASTRUN = 'lastrun'
class SongSearch
def process(file) # returns string of all stemmed words in song
array = []
document = PDF::Extractor.open(file)
document.elements.each do |element|
array << element.content
end
return array.join(" ").words # .join creates a string separated by delimiter
rescue => e
#puts "Exception in parsing #{e}"
@log.debug "Exception in parsing #{e}"
nil
end
def index(words, filename)
if words.nil?
#puts "ERROR parsing #{filename}"
@log.debug "ERROR parsing #{filename}"
return
end
print "Indexing #{filename}: "
logmsg = "Indexing #{filename}: "
song = Song.find(filename)
unless song.new?
print "Overwriting... "
logmsg += "Overwriting... "
song.refresh
song.locations.destroy!
end
words.each_with_index { |word, index|
loc = Location.new(:position => index)
loc.word, loc.song = Word.find(word), song
loc.save
}
puts "#{words.size.to_i} words"
@log.debug logmsg + "#{words.size.to_i} words"
end
def cycle
lastrun = File.mtime(LASTRUN)
@log = Logger.new(LOGFILE, 'monthly')
Dir.glob(SONGDIR + "*.pdf") {
|file|
index(process(file), file) if File.mtime(file) > lastrun # Only process newer songs
}
FileUtils.touch LASTRUN
end
end
search = SongSearch.new
search.cycle
The digger code actually searches through the song database and searches for songs. A song is searched for by passing a string to Digger.search(
Code for digger.rb:
#!/usr/bin/ruby
require 'index'
class Digger
SEARCH_LIMIT = 19
def search(for_text)
@search_params = for_text.words
wrds = []
@search_params.each { |param| wrds << "stem = '#{param}'" }
word_sql = "select * from words where #{wrds.join(" or ")}"
@search_words = repository(:default).adapter.query(word_sql)
tables, joins, ids = [], [], []
@search_words.each_with_index { |w, index|
tables << "locations loc#{index}"
joins << "loc#{index}.song_id = loc#{index+1}.song_id"
ids << "loc#{index}.word_id = #{w.id}"
}
joins.pop
@common_select = "from #{tables.join(', ')} where #{(joins + ids).join(' and ')} group by loc0.song_id"
rank[0..SEARCH_LIMIT]
end
def rank
merge_rankings(frequency_ranking, location_ranking, distance_ranking)
end
def merge_rankings(*rankings)
r = {}
rankings.each { |ranking| r.merge!(ranking) { |key, oldval, newval| oldval + newval} }
r.sort {|a,b| b[1] <=> a[1]}
end
def frequency_ranking
freq_sql= "select loc0.song_id, count(loc0.song_id) as count #{@common_select} order by count desc"
list = repository(:default).adapter.query(freq_sql)
rank = {}
list.size.times { |i| rank[list[i].song_id] = list[i].count.to_f/list[0].count.to_f }
#puts freq_sql
#puts list
#puts rank.inspect
return rank
end
def location_ranking
total = []
@search_words.each_with_index { |w, index| total << "loc#{index}.position + 1" }
loc_sql = "select loc0.song_id, (#{total.join(' + ')}) as total #{@common_select} order by total asc"
list = repository(:default).adapter.query(loc_sql)
rank = {}
list.size.times { |i| rank[list[i].song_id] = list[0].total.to_f/list[i].total.to_f }
#puts loc_sql
#puts list
#puts rank.inspect
return rank
end
def distance_ranking
return {} if @search_words.size == 1
dist, total = [], []
@search_words.each_with_index { |w, index| total << "loc#{index}.position" }
total.size.times { |index| dist << "abs(#{total[index]} - #{total[index + 1]})" unless index == total.size - 1 }
dist_sql = "select loc0.song_id, (#{dist.join(' + ')}) as dist #{@common_select} order by dist asc"
list = repository(:default).adapter.query(dist_sql)
rank = Hash.new
list.size.times { |i| rank[list[i].song_id] = list[0].dist.to_f/list[i].dist.to_f }
#puts dist_sql
#puts list
#puts rank.inspect
return rank
end
end
Note: the biggest disadvantage with this search method is that it doesn't show the search string in its context in the song. Rather than continue with this approach, my thinking is to use a search engine such as Solr to do the search, so I can show the search string within the song.
Subscribe to:
Posts (Atom)
