Saturday, November 29, 2014

Creating a Hackintosh

I've always wanted to create a "Hackintosh", i.e. a standard PC that runs OS X. My PC is over 5 years old so it was time for a refresh. I figured this was the best time to give the Hackintosh a go.


  • CPU: Intel Quad Core i7 4790 3.6 Ghz
  • Motherboard: GIGABYTE GA-Z97-HD3
    • Audio: ALC 887
    • Network: Realtek 8111F-VL
  • Network Card: 4 Antennas 802.11ac WiFi BCM94360CD Wireless Network Card
  • Graphics Card: nVidia 750 GTX
  • Memory: Corsair Vengeance DDR3-1600 32 GB (4x8 GB)
  • Hard Drive: Seagate ST3000DM001 3 TB SATA3 7200 rpm
  • DVD: Samsung SH-224DB 24X

BIOS Changes

The first step was to change the BIOS settings to support OS X. Disabling VT-d is the only setting that is clearly required; the others are questionable but were done by others so I thought they were worth trying.

F7 to load Optimized Defaults
  • M.I.T.
    • Advanced Frequency Settings
      • Extreme Memory Profile (X.M.P.): Enabled
    • Miscellaneous Settings
      • PCIe Slot Configuration: Gen3
  • BIOS Features
    • Intel Virtualization Technology: Disabled
    • VT-d: Disabled
  • Peripherals
    • XHCI Mode: Auto
    • XHCI Hand-off: Enabled
    • EHCI Hand-off: Enabled

Installing Clover

Initially I installed Unibeast onto a USB drive to run the OS X Installer but couldn't get past some errors. In retrospect, the errors were caused by the 750 GTX card.

I decided to install Clover because it would give me more control over the installation process. I followed Steps 1 and 2 to download OS X Yosemite and create the bootable USB drive. I won't repeat the steps here because I followed them exactly. I chose RealtekRTL81xx.kext as the network card driver. I chose as my minimal configuration.

When I booted the Clover USB drive I added these settings to the boot options:
-v nv_disable=1

This had the effect of disabling the video card and going into verbose mode so I could see the boot messages. Clover will hang and reboot unless you disable the Nvidia driver.

I then followed the instructions in Step 3 to use Disk Utility to create a partition on the hard drive then install OS X Yosemite on the drive.

When the install finished, it restarted the machine. Since the next step automatically runs stage 2 of the install process (create the recovery partition and complete the install), it was important to again add "nv_disable=1" to the boot options during this restart. Otherwise the boot would fail and the installer would not go to stage 2, requiring me to start over.


I followed Step 4 in reformatting the EFI partition to FAT32 and installing Clover on the hard drive so it could boot without the USB drive. I added RealtekRTL81xx.kext and FakeSMC.kext to /Volumes/EFI/EFI/Clover/kexts/10.10.

I also followed the instructions in running Clover Configurator and to generate a new config.plist to install in /Volumes/EFI/EFI/Clover.

Note that I removed nv_disable=1 and added nvda_drv=1 to the boot options because of the next step.

Installing Video and Audio Drivers

I installed this Nvidia experimental driver to get the 750 GTX working. Note that it only works with 10.10.0; I had previously installed 10.10.1 and had to start over all again to get this driver installed. Once you install this driver then you must activate it with the nvda_drv=1 boot option.

There is now an Nvidia driver for 10.10.1 but I'm not sure if I'm ready to move to it yet. There are reports that the DVI port doesn't work yet, which is what I use. This comment may come in handy.

To get sound working, I used this audio guide to point me to the right place. I wound up pursuing the cloverALC/Clover option. I downloaded from this site. After unzipping the file, I ran it and answered "Yes" to all the questions.

To get the network card working, I downloaded the .zip file from RehabMan's fork of Mieze's Realtek RTL8111 Network Driver.

Wednesday, February 20, 2013

Using chef to build out a Hadoop cluster

After not posting for a while, I have about 3-4 posts that I'd like to get out there. The first is about using chef to build a Hadoop cluster.

Chef is a configuration management tool that allows one to automate the process of provisioning servers. I had to create a Hadoop cluster of 4-5 servers and I wanted to use this opportunity to automate the process with chef.

I had to perform a series of the same steps on these Linux nodes:
  • Install ruby and chef
  • Install Sun Java
  • Install VMware Tools
  • Install NTP
  • Add its hostname to a shared /etc/hosts file
  • Configure passwordless ssh login

Installing Chef and Ruby

I followed the steps in this link.

The first step is to sign up for a Hosted Chef account on the Opscode site. An account is free for 5 nodes or less. Perform the following steps:
  1. Create a new organization
  2. Select "Generate knife config" to download knife.rb
  3. Select "Regenerate validation key" to download (validator).pem

  • Click on your account and click "get private key" to download (private key).pem
  • Then install ruby and chef on your first host. Once you do the first host, you can quickly bootstrap the others. Install Ruby:
    sudo apt-get update
    sudo apt-get install ruby ruby-dev libopenssl-ruby rdoc ri irb build-essential wget ssl-cert git-core
    Install rubygems:
    cd /tmp
    tar zxf rubygems-1.8.10.tgz
    cd rubygems-1.8.10
    sudo ruby setup.rb --no-format-executable
    Install chef:
    sudo gem install chef
    cd ~
    git clone
    mkdir -p ~/chef-repo/.chef
    cp (private key).pem ~/chef-repo/.chef
    cp (validator).pem ~/chef-repo/.chef
    cp knife.rb ~/chef-repo/.chef
    Connect to Hosted Chef and configure workstation as a client:
    cd ~/chef-repo
    knife configure client ./client-config
    sudo mkdir /etc/chef
    sudo cp -r ~/chef-repo/client-config/* /etc/chef
    sudo chef-client
    Once the client is installed on the first host, you can bootstrap the clients on the other hosts by using this command, as described here. Bootstrap the other clients. This assumes you have created a user called hadoop who is the main hadoop user.
    knife bootstrap (node IP) -x hadoop -P (password) --sudo
    Repeat this for all of your other chef nodes.

    Installing some Chef recipes

    Now that chef is installed on all the nodes, it's time to run some chef recipes. A recipe is a set of configuration instructions. In my case, I want to install some packages. I started with VMware Tools, Sun Java, and NTP.
    Start by creating a new cookbook:
    knife cookbook create MYCOOKBOOK
    Then download some existing cookbooks from the Chef Repository.
    knife cookbook site install vmtools
    knife cookbook site install java
    knife cookbook site install ntp
    Add these recipes to each node's run list:
    knife node run_list add NODE_NAME "recipe[java:sun]"
    knife node run_list add NODE_NAME "recipe[vmtools]"
    knife node run_list add NODE_NAME "recipe[ntp]"
    You'll then need to run "sudo chef-client" on each node to execute the run list and install these packages.

    Populate /etc/hosts

    The next step is to create a recipe that will populate the /etc/hosts file from the Chef repository. One of Hadoop's requirements is to store the name-IP mapping for every node in the cluster in /etc/hosts. The easiest way to do this is to populate /etc/hosts from the list of hosts that Chef knows about.
    So start by creating your new recipe in your cookbook. I call it "hosts":
    knife cookbook create hosts
    Your cookbook will now have a subdirectory called hosts with some skeleton files already created. Create your default ruby script in hosts/recipes/default.rb:
    # Gets list of names from all nodes in repository and rewrites /etc/hosts
    hosts = {}
    localhost = nil
    search(:node, "name:*", %w(ipaddress fqdn)) do |n|
     hosts[n["ipaddress"]] = n
    template "/etc/hosts" do
     source "hosts.erb"
     mode 0644
     variables(:hosts => hosts)
    Now edit the hosts.erb file, stored in hosts/templates/default/hosts.erb:
    hosts/templates/default/hosts.erb: localhost
    <% @hosts.keys.sort.each do |ip| %>
    <%= ip %> <%= @hosts[ip]["fqdn"] %>
    <% end %>
    Now deploy it to all your hosts:
    knife node run_list add NODE_NAME "recipe[hosts]"
    Don't forget to run "sudo chef-client" on each node.
    You should also upload this recipe to the Chef server:
    knife cookbook upload hosts

    Installing passwordless ssh login

    A Hadoop cluster requires passwordless ssh login between the master and its slave nodes. The easiest way to do this is to have each node create its own SSH keys with an empty password, and then copy the public keys for all nodes to the master node.

    So create a recipe to create the SSH key with empty password. I call it "sshlogin":
    knife cookbook create sshlogin
    Create your default ruby script in sshlogin/recipes/default.rb:
    # Create empty RSA password
    execute "ssh-keygen" do
      command "sudo -u hadoop ssh-keygen -q -t rsa -N '' -f /home/hadoop/.ssh/id_rsa"
      creates "/home/hadoop/.ssh/id_rsa"
      action :run
    # Copy public key to node1; if key doesn't exist in authorized_keys, append it to this file
    execute <<EOF
    cat /home/hadoop/.ssh/ | sudo -u hadoop ssh hadoop@node1 "(cat > /tmp/tmp.pubkey; mkdir -p .ssh; touch .ssh/authorized_keys; grep #{node[:fqdn]} .ssh/authorized_keys > /dev/null || cat /tmp/tmp.pubkey >> .ssh/authorized_keys; rm /tmp/tmp.pubkey)"
    Note that when you run this recipe on each host, it will prompt you to type the password of node1 each time because you are essentially scp'ing the key to this master node.
    Now you can deploy this recipe:
    knife cookbook upload sshlogin
    knife node run_list add node2 "recipe[sshlogin]"
    Type this command to run the recipes on each host:
    sudo chef-client

    Monday, May 16, 2011

    Building a Hadoop cluster

    I've recently had to build a Hadoop cluster for a class in information retrieval. My final project involved building a Hadoop cluster.

    Here are some of my notes on configuring the nodes in the cluster.

    These links on configuring a single node cluster and multi node cluster were the most helpful.

    I downloaded the latest Hadoop distribution then moved it into /hadoop. I had problems with this latest distribution (v.21) so I used v.20 instead.

    Here are the configuration files I changed:

        A base for other temporary directories.

    # Variables required by Mahout
    export HADOOP_HOME=/hadoop
    export HADOOP_CONF_DIR=/hadoop/conf
    export MAHOUT_HOME=/Users/rpark/mahout
    # The java implementation to use.  Required.
    export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home





    Be sure to enable password-less ssh between master and slaves. Use this command to create an SSH key with an empty password:
    ssh-keygen -t rsa -P ""

    Enable password-less ssh login for the master to itself:
    cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

    Then copy to each slave and do the same with each slave's authorized_keys file.

    I ran into a few errors along the way. Here is an error that gave me a lot of trouble in the datanode logs:
    2011-05-08 01:04:30,032 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_1804860059826635300_1001 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_1804860059826635300_1001 is valid, and cannot be written to.

    The solution was to use hostnames every time I referenced a host, either itself or a remote host. I set a host's own name in /etc/hostname and the others in /etc/hosts. I used these hostnames in /hadoop/conf/masters, slaves, and the various conf files.

    Every so often I ran into this error in the datanode logs:
    ... ERROR org.apache.hadoop.dfs.DataNode: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
            at org.apache.hadoop.dfs.DataStorage.doTransition(
            at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(
            at org.apache.hadoop.dfs.DataNode.startDataNode(
            at org.apache.hadoop.dfs.DataNode.(
            at org.apache.hadoop.dfs.DataNode.makeInstance(
            at org.apache.hadoop.dfs.DataNode.createDataNode(
            at org.apache.hadoop.dfs.DataNode.main(

    I fixed this by deleting tmp/dfs/data on the datanodes where I saw the error. Unfortunately, I had to reformat the HDFS volume after I did this.

    I had to raise the ulimit for open files. On Ubuntu nodes I edited /etc/security/limits.conf:
    rpark  soft nofile  8192
    rpark  hard nofile  8192

    For OS X nodes I just edited ~/.profile:
    ulimit -n 8192

    I ran into this error when copying data into HDFS:
    could only be replicated to 0 nodes, instead of 1

    The solution was simply to wait for the datanode to start up. I usually saw the error when I immediately copied data into HDFS after starting the cluster.

    Port 50070 on the namenode gave me a Web UI to tell me how many nodes were in the cluster. This was very useful.

    Working with VMware vShield REST API in perl

    Here is an overview of how to use perl code to work with VMware's vShield API.

    vShield App and Edge are two security products offered by VMware. vShield Edge has a broad range of functionality such as firewall, VPN, load balancing, NAT, and DHCP. vShield App is a NIC-level firewall for virtual machines.

    We'll focus today on how to use the API to programatically make firewall rule changes. Here are some of the things you can do with the API:
    • List the current firewall ruleset
    • Add new rules
    • Get a list of past firewall revisions
    • Revert back to a previous ruleset revision
    vShield API documentation is available here.

    Before we get into the API itself, let's look at what the firewall ruleset looks like. It's formatted as XML:
    Application type="UNICAST">LDAP over SSL
    Here are some notes about the XML configuration:
    • The API works mainly with container objects. A container can range from a datacenter or cluster all the way down to a port group or IP address.
    • Every container object must be listed in the <containerassociation> section.
    • Container objects have instance IDs. The instance ID is also referred to as the managed object ID (MOID)
    • Every firewall rule has its own ID as well as precedence and position fields.

    If you want to edit a firewall ruleset, you must specify which ruleset you want. Every object has its own ruleset. So you could edit a ruleset at the datacenter level, cluster level, etc. all the way down to the port group level.

    For simplicity let's work with the ruleset at the datacenter level because this will cover all VMs in that datacenter.

    The first thing to do is get the object ID for the datacenter. If you don't already know this then you must look it up. There are two places you can find it:
    1. Use the Managed Object Browser in vCenter Server, located at https://<vcenter IP>/mob, e.g.,
    2. Query your vCenter Server with the vSphere SDK

    Either way you must have access to vCenter Server.

    Here is some perl code to query vCenter. The code assumes that $dc_name is set to the name of your datacenter. You can find this name in the vSphere client.
    use VMware::RunTime;
    $ENV{'VI_SERVER'} = $vc_ip;
    $ENV{'VI_USERNAME'} = $vc_user;
    $ENV{'VI_PASSWORD'} = $vc_pass;
    # read/validate options and connect to the server
    $view = Vim::find_entity_views(view_type => 'Datacenter');
    foreach $datacenter (@$view) {
     if (lc($datacenter->{name}) eq lc($dc_name)) {
      return $datacenter->{mo_ref}->{value};
    return "Not_found";

    Note that this code requires you include the .pm files from the perl SDK. You can find these in lib/VMware/share/VMware/ in the perl SDK tarball.

    Once you have the object ID for your datacenter, you can use it to create the vShield URL that you will need to access the datacenter's firewall ruleset:
    $url = "https://" . $vsm_ip . "/api/1.0/zones/" . $moid . "/firewall/rules";
    Note that this URL is to access the ruleset in vShield App. If you want to access the ruleset in vShield Edge instead, simply change the "zones" in the URL to "network". So the resulting vShield Edge URL looks like this:
    $url = "https://" . $vsm_ip . "/api/1.0/network/" . $moid . "/firewall/rules";
    Now that you have the URL, use it to get the ruleset with a simple HTTP GET using Basic Authentication:
    $ua = LWP::UserAgent->new;
    $request = HTTP::Request->new(GET =>$url);
    $request->authorization_basic($vsm_user, $vsm_pass);
    $response = $ua->request($request);

    $response now contains the XML ruleset. Copy it to a variable such as $ruleset and use your favorite XML library to work directly with each rule. I found that using XML::LibXML provides the best routines for both parsing and editing the XML.

    This code iterates through each rule, loading the source address and protocol into variables.
    my $parser = new XML::LibXML;
    my $tree = $parser->parse_string($ruleset);
    my $root = $tree->documentElement();
    foreach my $rule_ref ($root->findnodes('RuleSet/Rule')) {
     $rule_src = $rule_ref->findvalue('Source/@ref');
     $rule_prot = $rule_ref->findvalue('Protocol');

    Note that the source address is accessible as an attribute named "ref" in the source tag. XPath syntax uses '@' to access XML attributes.

    The vShield API has certain restrictions when it comes to adding firewall rules. You can't just add a rule to the existing ruleset. Every time you update the ruleset with new rules, you replace all of the old rules.

    The proper way to add a rule is to load the existing rules into memory as an XML tree, add the new rules to the tree, then post the updated tree back as the new ruleset.

    This sample code illustrates how to add a new rule. Note that only a few of the fields are included here but every field in the rule is required, with the exception of Notes. You will get an error if you leave out a required field.
    my $rule_ref = XML::LibXML::Element->new("Rule");
    my $id_el = XML::LibXML::Element->new("ID");
    my $src_el = XML::LibXML::Element->new("Source");
    $src_el->setAttribute("ref", $src_ip);
    $src_el->setAttribute("exclude", "false");
    my $rule_root = $root->findnodes('RuleSet')->get_node(1);

    Here are some notes:
    • To add a new rule, specify an ID of 0. When vShield adds the new rule to the ruleset, it will automatically generate a new ID.
    • The Position field is required but you can set it to any value. I set it to a default of 50. vShield Manager rewrites this field every time you move rules around in the vShield GUI.
    After you add the rule to the XML tree, you must also add a new container object for the IP addresses referenced by the rule:
    my $contain_root = $root->findnodes('ContainerAssociation')->get_node(1);
    my $contain_el = XML::LibXML::Element->new("Container");
    $contain_el->setAttribute("id", $ip_addr);
    my $ip_addr_el = XML::LibXML::Element->new("IPAddress");

    When you're done updating the XML tree, post the complete ruleset:
    $ua = LWP::UserAgent->new;
    $request = HTTP::Request->new(POST=>$self->{url});
    $request->authorization_basic($self->{vsm_user}, $self->{vsm_pass});
    $response = $ua->request($request);

    Tuesday, October 26, 2010

    Using multiple versions of Ruby on the same host

    I've recently come across a tool called RVM or Ruby Version Manager. It enables you to run different versions of Ruby on the same host.

    RVM uses git so my first step was to install git with the Homebrew package manager. Homebrew is an increasingly popular alternative to MacPorts and Fink.

    Note that you'll need to install Xcode first.

    /usr/bin/ruby -e "$(curl -fsSL"
    brew install git

    Then I just followed the instructions available here.

    bash < <( curl )
    source ~/.rvm/scripts/rvm
    rvm install jruby,1.9.2-head

    Here is what the output looks like:

    info: Downloading jruby-bin-1.5.1, this may take a while depending on your connection...
    info: Extracting jruby-bin-1.5.1 ...
    info: Building Nailgun
    info: Installing JRuby to /Users/rpark/.rvm/rubies/jruby-1.5.1
    info: Importing initial gems...
    info: Installing rake
    info: Installing Ruby from source to: /Users/rpark/.rvm/rubies/ruby-1.9.2-head
    info: Downloading source from
    info: Copying from repo to src path...
    info: Running autoconf
    info: Configuring ruby-1.9.2-head, this may take a while depending on your cpu(s)...
    info: Compiling ruby-1.9.2-head, this may take a while depending on your cpu(s)...
    info: Installing ruby-1.9.2-head
    info: Installation of ruby-1.9.2-head is complete.
    info: Updating rubygems for /Users/rpark/.rvm/gems/ruby-1.9.2-head@global
    info: Updating rubygems for /Users/rpark/.rvm/gems/ruby-1.9.2-head
    info: adjusting shebangs for ruby-1.9.2-head (gem irb erb ri rdoc testrb rake).
    info: Importing initial gems...

    The install is done. When you want to switch between different versions of ruby, just use the "rvm use" command:

    rvm use 1.9.2-head
    11:10:23[~:15]$ which ruby
    11:11:06[~:16]$ which gem
    rvm use jruby
    11:10:01[~:11]$ which ruby
    11:10:06[~:12]$ which gem

    Then add this to the end of ~/.profile:

    [[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm"

    RVM allows you to use different gemsets with each ruby version. One use case is working with Rails 2.3.3 and Rails 3.0. I didn't need this feature but it's nice to know it's there.

    One final step is TextMate integration. This is described here.

    rvm wrapper 1.9.2-head

    Then run this script:

    #!/usr/bin/env bash
    mkdir -p /Library/Application\ Support/TextMate/
    sudo chown -R $(whoami) /Library/Application\ Support/TextMate
    cd /Library/Application\ Support/TextMate/
    if [[ -d Bundles/.svn ]] ; then
      cd Bundles && svn up
      if [[ -d Bundles ]] ; then
        mv Bundles Bundles.old
      svn co
    exit 0

    Open up Shell Variables in TextMate's Preferences -> Advanced

    Set TM_RUBY to /Users/rpark/.rvm/bin/textmate_ruby

    Final step:

    cd /Applications/ ; mv Builder.rb Builder.rb.backup

    Monday, October 4, 2010

    Connecting to JDBC data source from OS X perl

    Here's another blog post on a similar topic. I recently had to figure it out and I documented it already, so I thought I would share it here too so I don't forget.

    Here are instructions on connecting to a JDBC data source from OS X perl. I used DBD::JDBC.

    The process mainly involves setting up a local Java server that provides the front end for a JDBC driver. The Java code implements the JDBC connection to the data source. The perl code talks to the Java server on localhost via DBI so it can access the database.

    Here are the steps:

    1. Check out any documentation about required setup for your JDBC data source. For example, you may need to install an SSL certificate from the server to your client. You may also need to ensure that your server permits database access at all from your client or IP address.

    2. Download your JDBC driver and put any .jar files into a lib directory. For example, VJDBC is a JDBC driver that enables you to establish JDBC connections over Java RMI. It comes with vjdbc.jar.

    3. Install the perl modules Convert::BER, DBI, and DBD::JDBC. You can either install the modules from CPAN or install ActiveState perl. If you install from CPAN, you'll need to install Xcode first so you have a C compiler. If you use ActiveState, use PPM to install the required modules. Note that you'll still need to download the DBD::JDBC tarball from CPAN and install it manually because it doesn't show up in ppm.

    4. Copy dbd_jdbc.jar from the DBD::JDBC tarball into the same lib directory as the other .jar files you've collected.

    5. Download the latest version of log4j and copy log4j.jar into the same lib directory as above. You should now have at least 3 jar files: dbd_jdbc, log4j-1.2.16, vjdbc.

    6. Set your CLASSPATH variable to point to your jar files. If you want this variable to be always set, add it to your .profile.

    7. Startup the Java server with this command:

    java -Djdbc.drivers=com.sourcefire.vjdbc.VirtualDriver -Ddbd.port=9001 com.vizdom.dbd.jdbc.Server

    Note that I set the port to 9001. This is completely arbitrary and can be whatever you want it to be. But whatever you set this to, make sure you indicate this port in your perl code.

    You'll probably see something like this:

    log4j:WARN No appenders could be found for logger (com.vizdom.dbd.jdbc.Server).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See for more info.

    You're seeing this because the code is looking for a log4j config file. Copy to the same directory you started the Java server so this error message won't appear.

    After this, your host is now accepting JDBC connections on port 9001.

    8. Here's some sample code that should work:

    use strict;
    use DBI;
    my $user = "user";
    my $pass = "password";
    my $host = "";
    my $port = 9001;
    my $url = "jdbc:vjdbc:rmi://$host:2000/VJdbc,eqe";  # Get this URL from JDBC data src
    my %properties = ('user' => $user, 
                      'password' => $pass, 
                      '' => $host, 
                      'host.port' => $port);
    my $dsn = "dbi:JDBC:hostname=localhost;port=$port;url=$url";
    my $dbh = DBI->connect($dsn, undef, undef, 
              { PrintError => 0, RaiseError => 1, jdbc_properties => \%properties })
              or die "Failed to connect: ($DBI::err) $DBI::errstr\n";
    my $sql = qq/select * from table/;
    my $sth = $dbh->prepare($sql);
    my @row;
    while (@row = $sth->fetchrow_array) {
     print join(", ", @row), "\n";

    One last point: your URL and JDBC properties hash are configured with the IP address of the actual JDBC data source. But your data source is configured to talk with localhost. So you establish a JDBC connection to the Java server on localhost on port 9001, and the Java server in turn establishes a connection to the actual data source using the URL.

    Connecting to SQL Server from OS X perl

    I've been spending my coding time in the offhours working on Perl instead of Ruby. My coding time in general has been very limited, which is part of the reason for the length of time between updates. :)

    My latest project is to pull data out of a Microsoft SQL Server database for analysis. I'm using perl for various reasons: I need a crossplatform environment, and I need certain libraries that only work on perl. Some of the target users for my code run on Windows.

    I know that Ruby runs on Windows but it's not the platform of choice for Ruby developers. The vast majority seem to develop either on OS X or Linux. So Ruby on Windows isn't at the maturity that ActiveState perl is on Windows.

    In fact, I don't even run native perl anymore on my MacBook Pro. I've switched over to ActiveState perl because I don't need to compile anything every time I want to install new CPAN libraries. And because it's ActiveState, I'm that much more confident it will work on other platforms.

    The bottom line is that perl makes the most sense for what I'm trying to do. I vastly prefer ruby to perl but I don't mind working in perl when I have to.

    So how to connect to SQL Server from perl? My first thought was that I could use ODBC. My research quickly took me to DBD::ODBC in CPAN. After spending some time Googling for other examples and trying to get it working, I wasn't getting anywhere.

    It took me some more research until I realized that DBD::ODBC is only one piece of the whole picture. I also need an ODBC driver manager and an ODBC driver.

    The two main ODBC driver managers for Unix/Linux are unixODBC and iODBC. Fortunately, iODBC is already included in OS X.

    For the ODBC driver itself, I wound up using FreeTDS. Because FreeTDS is only available as source code, I had to use MacPorts to download and compile the code. MacPorts installs everything in /opt/local by default.

    So here was the process:

    1. Install FreeTDS. If you are using ActiveState perl as I am, you must force MacPorts to build FreeTDS as 32-bit because ActiveState is 32-bit only. If you go with the default of compiling FreeTDS as 64-bit (or x86_64) then you will get this error when you call the FreeTDS library code from ActiveState perl:

    [iODBC][Driver Manager]dlopen(/opt/local/lib/, 6): no suitable image found. Did find:
    /opt/local/lib/ mach-o, but wrong architecture (SQL-00000) at line 26

    So to fix this error, edit the MacPorts configuration file:

    sudo vi /opt/local/etc/macports/macports.conf

    Uncomment this line:

    set build_arch i386

    Once you build and install FreeTDS, ensure that you can use it to talk with the database. I used tsql:

    TDSVER=8.0 tsql -H -p 1433 -U 'DOMAIN\user' -P 'password'

    2. After you do this, test your ODBC driver manager and driver. I used iODBC because it comes with OS X. It comes with a utility called iodbctest. I used similar parameters to tsql when testing:

    iodbctest "Driver=/opt/local/lib/;Server=;Port=1433;TDS_Version=8.0;uid=DOMAIN\user;pwd=password;Database=Database"

    Note a few things:
    • Other sites tell you to make entries in odbc.ini, odbcinst.ini, or freetds.conf. If you set all parameters on the command line then you don't need to tweak these other config files.
    • The Driver parameter is set to the full path for This is the actual ODBC driver. The name or path may differ, depending on your OS and ODBC driver software.
    • I haven't created an ODBC data source on my Windows SQL Server host so far. Some sites say you need to do this but I found it worked without this.
    3. Use the perl code with DBD::ODBC to call the database. Here's some sample code. Note how I first populate the data source ($dsn) with the parameters I used with iodbctest, and then I pass this to the DBI->connect method.
    use strict;
    use DBI;
    my $user = 'user';
    my $pass = 'password';
    my $driver = "/opt/local/lib/";
    my $db_server = "";
    my $db_name = 'Database';
    my $port = 1433;
    my $tds_version = "8.0";
    my $dsn = join "", ("dbi:ODBC:",
    my $db_options = {PrintError => 1, RaiseError => 1, AutoCommit => 0, };
    my $dbh = DBI->connect($dsn, $db_options);
    $dbh->{LongReadLen} = 32768;
    my $sql = qq/select * from table/;
    my $sth = $dbh->prepare($sql);
    my @row;
    while (@row = $sth->fetchrow_array) {
        print join(", ", @row), "\n";
    Note on 12/29/10: I added this after the DBI->connect: $dbh->{LongReadLen} = 32768; I was getting "Data truncated" errors when accessing ntext fields. This link was a real help and helped me solve the problem.