Saturday, 20 February 2010

Great Intro to Clojure Video

This is a really great 40 minute introduction to Clojure:

http://parleys.com/#id=1518&st=5&sl=66

Thursday, 18 February 2010

Object to Classes - Use Maps

In the 10 years that I have worked as a computer programmer I have worked almost exclusively with object oriented languages. Primarily Java, but also Ruby, Groovy, and Perl (though object oriented Perl wasn't much fun!). Thinking in objects therefore comes very naturally to me. Part of the challenge of learning Clojure is that it isn't an object oriented language. I'm having to re-wire my brain to work in the functional paradigm, something that is both a challenge and great fun.

I read about Abstraction Barriers in SICP and wanted to apply it to the tool I'm currently developing in Clojure. I created a new namespace to encapsulate the concept of a 'version' (as in a software version - I need functions to manipulate version strings like 1.0.1-b01-SNAPSHOT). Based on what I had read in SICP I created functions like 'make-version' to act as the abstraction barrier. But then the question arose - what should this actually return? Initially, again inspired by SICP, make-version created a closure with a message-passing dispatch function:
(defn make-version [major minor patch build snapshot]
  (fn [selector]
    (cond
     (= :major selector) major
     (= :minor selector) minor
     (= :patch selector) patch
     true (throw (IllegalArgumentException. "Selector not recognized"))))
  )
But it occurred to me that by doing this I would lose all the benefits of the data structures Clojure provides wrt concurrency etc. So I thought about the implementation a bit more, and did some reading on the Clojure Google group. After doing a search for "data abstraction" on the group I found a post by Rich Hickey that lit a light-bulb in my head:
 "I know people usually think of collections when they see vector/map/ set, and they think classes and types define something else. However, the vast majority of class and type instances in various languages are actually maps, and what the class/type defines is a specification of what should be in the map. Many of the languages don't expose the instances as maps as such and in failing to do so greatly deprive the users of the language from writing generic interoperable code. 
Classes and types usually create desert islands. When you say:
//Java 
class Foo {int x; int y; int z;} 

--Haskell 
Foo = Foo {x :: int, y :: int, z :: int}
you end up with types with a dearth of functionality. Sure, you might get hashCode and equals for free, or some other free stuff by deriving from Eq or Show, but the bottom line is you are basically starting from scratch every time. No existing user code can do anything useful with your instances."
[...snip...]
"I guess I want to advocate - don't merely replicate the things with which you are familiar. Try to do things in the Clojure way. If your logical structure is a mapping of names to values, please use a map. Positional data is fragile, non-self-descriptive and unmanageable after a certain length - look at function argument lists. Note that using maps doesn't preclude also having positional constructors, nor does it dictate a space cost for repeating key names - e.g. structmaps provide positional constructors and shared key storage. "
As a follow up, Stuart Sierra posted a link to a blog entry he had written about how to model data. In it he directly contrasts the difference between the OO mindset and the Clojure / functional mindset:
"So here’s a slightly radical notion: don’t use classes to model the real world. Treat data as data. Every modern programming language has at least a few built-in data structures that usually provide all the semantics you need.
It all boils down to this:
"It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures."(Alan Perlis)
In OO programming we create data structures at the drop of a hat, but that's not the Clojure way. The Clojure way is to use the data structures provided by the language, and the vast library of functions that know how to manipulate them.

So yes, it's fine to have an abstraction barrier with functions like make-version. But there is no need to create custom data structures using closures, message-passing, and dispatch functions. Just use a map. It's simple, it works, and it's the Clojure way.

Wednesday, 17 February 2010

mvn clojure:swank throws java.lang.NumberFormatException: Invalid number: 2009-09-14

For some reason the 'swank' target of the clojure-maven-plugin isn't working for me on Windows. At home, on Ubuntu, it works fine. At work, with the same code base, it doesn't work. It throws the following exception:

Exception in thread "main" clojure.lang.LispReader$ReaderException: java.lang.NumberFormatException: Invalid number: 2009-09-14

I decided to work around this by extending my 'clj' starter script. I added a --pom switch which tells the script to build the classpath using the pom file in the current directory. It makes use of the build-classpath target of the maven-dependency-plugin. So now at work I can start up a Swank server simply by typing:

clj --pom "c:\dev\nwalex.com\clojure-scripts\start-swank.clj"

I have this aliased to 'swank' in my .aliases file in Cygwin.

Anyway, here's the full script:
#!/bin/bash

function init_classpath {
    # set up the classpath dynamically. Note this includes the jline jar
    CLASSPATH=""
    for jarfile in `ls -l ~/.clojure-classpath/ | pcol 11`; do
        JAR=`cygpath --windows $jarfile`
        CLASSPATH="$CLASSPATH;$JAR"
    done
}

function init_classpath_from_pom {
    echo "Initializing classpath from pom.xml..."

    # create the classpath file
    mvn dependency:build-classpath -Dmdep.outputFile=classpath 2>&1 > /dev/null

    # store it and remove the file
    CLASSPATH=`more classpath`
    rm classpath

    # also add everything under the clojure source directories
    CLASSPATH="$CLASSPATH;./src/main/clojure;./src/test/clojure"
}
 
if [ $# -eq 0 ] ; then
    init_classpath
    stty -icanon min 1 -echo
    java -Djline.terminal=jline.UnixTerminal -cp $CLASSPATH jline.ConsoleRunner clojure.main
else
    TMPFILE=""
    while [ $# -gt 0 ] ; do
        case "$1" in
        --pom)
            init_classpath_from_pom
            ;;
        -cp|--classpath)
            CLASSPATH="$CLASSPATH;$2"
            shift
            ;;
        -e)
            TMPFILE="/tmp/$(basename $0).$$.tmp"
            /bin/echo $2 > $TMPFILE
            ARGS=$TMPFILE
            break
            ;;
        *)
            ARGS="$ARGS $1"
            ;;
        esac
        shift
    done

    if [ "$CLASSPATH" == "" ] ; then
        init_classpath
    fi

    if [ "$ARGS" != "" ]; then 
        ARGS=`cygpath --windows $ARGS`
    fi
 
    java -cp "$CLASSPATH" clojure.main $ARGS
    if [ "$TMPFILE" != "" ] ; then
        rm $TMPFILE
    fi
fi
start-swank.clj is simply:
(require 'swank.swank)
(swank.swank/start-server "nul" :encoding "utf-8" :port 4005)
Update 1: Updated script to include the src/main/clojure and src/test/clojure directories on the classpath.

Sunday, 24 January 2010

Speaking in Sentences

I have done quite a lot of programming in Clojure over the last couple of days, and have finally moved past the initial, hesitant stage where I would sit for ages trying to work out what I needed to type. I'm no longer looking up the dictionary to translate individual words. Now I'm speaking in sentences.

This is a very important point to reach because it means I have moved beyond the initial frustrating stage where, sat with my fingers on the keyboard in front of Emacs, connected to a repl, I just didn't know what to type. Every little step forward was painful. As the code flows more and more easily from my fingertips, so it becomes more and more satisfying.

Today, for example, I was trying to convert a series of 163 mp3 files into an audiobook on iTunes, consisting of 15 chapters. Since I couldn't find any nice friendly tools on the Mac, I switched over to my Ubuntu PC to do some proper work. Previously I would have written a little Perl script to automate the tasks I wanted to do, but this time I decided to do it in Clojure. And what a joyful experience that turned out to be.

Let me give you an example. I had already, a few years ago, sorted the 163 mp3 files into 15 different folders. The files are an audio recording of a General Semantics seminar given by Alfred Korzybski himself in 1949. Each folder contained the files for individual lectures. The first thing I wanted to do was tidy up the file names to remove spaces. So I kicked off a repl and started to write some code in Emacs.

First I got a handle on all the mp3 files:
(defn has-ext? [ext file]
  "Does the file end with the extension ext"
  (.endsWith (.getName file) (str "." ext)))

(def mp3? (partial has-ext? "mp3"))

(defn mp3s
  [dir]
  (filter mp3? (file-seq (in-dir dir))))
At the repl I was then able to do something like:
(def mp3files (mp3s "/path/to/top/level"))
mp3files then contains a sequence of java.io.File objects for each of the files. So now I can use this to rename the files:
(defn replace-str
  "Wraps STring replaceAll"
  [pattern in with]
  (.replaceAll in pattern with)
  )

(defn replace-whitespace
  "Replace whitespace in 'in' with 'with'"
  [in with]
  (replace-str " " in with)
  )

(defn rename-file
  "Rename the file using the single arg fn to transform the file name"
  [file fn]
  (.renameTo file (java.io.File. (.getParentFile file) (fn (.getName file)))))

(defn rename-files
  "Rename all the files in the fseq using fn to transform the name"
  [fseq fn]
  (map #(rename-file %1 fn) fseq)
  )
With these functions, replacing the whitespace was as simple as typing the following in the repl:
com.nwalex.gs> (rename-files mp3files #(replace-whitespace %1 "-"))
This kind of dynamic programming is so much more satisfying than trying to write a Perl script that will finally (hopefully) work only once all the code has been written. Clojure lends itself to writing a little bit of code at a time, sketching out your solution and evolving it. Once I had the handle to the java.io.Files, I could play around with them, experiment with how to extract the names etc. It's just an incredibly satisfying way to work.

I'm now beginning to think in Clojure. The next stage is to gain more familiarity with the core and contrib apis. I'm speaking in sentences, now I need to improve on the vocabulary.

Saturday, 23 January 2010

"Seeing" by Jose Saramago


I finished reading "Seeing" by Jose Saramago this morning. What a wonderful book. It is a sequel of sorts to "Blindness", though I didn't realize this until halfway through.

Words to describe this book: funny, intelligent, gripping, angry, sad. The phrase 'biting satire' is too mild. This doesn't so much bite as dismember with ruthless precision. The world Saramago describes doesn't seem too far removed from our own.

I went straight from the coffee shop where I devoured the last 30 odd pages to Waterstones to find more by the same author. I left with 2 in my hand, in eager anticipation of what he will serve up next.

Abstracting

In this blog entry I want to demonstrate the elegance of Clojure by showing how a function I wrote evolved. This is for a little tool that I have written previously in Perl and in Groovy, that I'm now writing it in Clojure. The aim of the function is to recursively find all the pom files in a specific directory.

Here is version 1:
(defn pom-files-under
  "Find all the pom files in the current directory"
  [dir]
  (filter pom-file? (file-seq (File. dir)))
  )

(defn pom-file?
  [file]
  (and (= "pom.xml" (.getName file))
       (not-under-target? file)))

(defn not-under-target?
  [file]
  (not (.contains (.getAbsolutePath file) "target")))
This worked, but I wasn't particularly happy with it. There is no checking that the parameter to pom-files-under is actually a directory and the pom-file? method is ugly. So I scrapped that code and started again. Here is version 2:
(defmulti in-dir
  "Abstracts over the concept of a directory, always returing a java.io.File
that is guaranteed to be a directory"
  class)

(defmethod in-dir String [s]
  (in-dir (java.io.File. s)))

(defmethod in-dir java.io.File [f]
  (if (.isDirectory f)
    f
    (throw (IllegalArgumentException. "File is not a directory"))))

(defn has-name? [name file]
  (= name (.getName file))
  )

;; partial application of has-name? checking if file name is pom.xml
(def pom? (partial has-name? "pom.xml"))

(defn not-under-target?
  [file]
  (not (.contains (.getAbsolutePath file) "target")))

(defn find-files
  "Find files in directory that match predicates pred & others"
  [in-dir pred & others]
  (filter pred (file-seq in-dir))
  )

(defn pom-files
  "Find all pom files recursively in directory in-dir"
  [dir]
  (find-files (in-dir dir) pom?)
  )
This seems much more elegant.The in-dir multi-method now guarantees to return a java.io.File object that represents a directory. And the find-files method now takes in multiple predicates, the idea being that you supply the directory and the predicates, and it returns the files that match all predicates. The only problem is that the find-files method doesn't actually work at this point. I couldn't for the life of me work out how to implement that functionality. And so I posted a plea for help on the Clojure Google Groups board. The advice I got back really opened my eyes to the kind of abstraction possible in Clojure.

The first solution I implemented based on this advice was:
(defn find-files
  "Find files in directory that match predicates pred & others"
  [in-dir pred & others]
  (reduce (fn [xs f] (filter f xs)) (file-seq in-dir) (cons pred others)))
This fixed the find-files function, making it apply all predicates. Then a discussion started about how to abstract this out further, leading to the following comment from Perry Trolard:
I think it's easier to think about combining predicates separately from your file-filtering code.
Then Sean Devlin followed up with this code to combine predicates:
(defn every-pred?
    "Mimics AND"
    [& preds]
    (fn [& args] (every? #(apply % args) preds)))

(defn any-pred?
    "Mimics OR"
    [& preds]
    (fn [& args] (some #(apply % args) preds))) 
I incorporated this into my code, and did a bit more abstracting, leading to this final version (the rest of the code remained the same):
(def target? (partial has-name? "target"))

(defn not-under-target?
  [file]
  (not (target? (.getParentFile file))))

(defn every-pred?
    "Mimics AND"
    [& preds]
    (fn [& args] (every? #(apply % args) preds))) 

(defn pom-files
  "Find all pom files recursively in directory in-dir"
  [dir]
  (filter (every-pred? pom? not-under-target?) (file-seq (in-dir dir))))
This process of abstracting really opened my eyes to what is possible in Clojure. I love the elegance and expressiveness possible in this language. The only problem is, the more I program in it, the less I want to go back to my day job of programming in Java! It seems so clunky and primitive now!

Friday, 22 January 2010

Setting up the Clojure Classpath for Utility Scripts in Cygwin

I'm beginning to really enjoy programming in Clojure, but it's still a struggle. The best way to improve is to use it every day. And so I decided to configure my environment to make it easy to write scripts in Clojure. Over the last few years, whenever I have learned a new language, I have used it to write little utility scripts for day to day tasks at work. I have scripts written in Bash, Perl, Groovy, and JRuby. I decided it was time to add Clojure to the list.

I use Cygwin at work, so my starting point was the clj Bash script at http://en.wikibooks.org/wiki/Clojure_Programming/Getting_Started#Create_clj_Script. I used this one:
CLASSPATH="/path/to/clojure.jar"
if [ $# -eq 0 ] ; then
    JLINE="/path/to/jline.jar"
    CLASSPATH=$JLINE:$CLASSPATH
    java -cp $CLASSPATH jline.ConsoleRunner clojure.main --repl
else
    TMPFILE=""
    while [ $# -gt 0 ] ; do
        case "$1" in
        -cp|-classpath)
            CLASSPATH=$CLASSPATH:$2
            shift
            ;;
        -e)
            TMPFILE="/tmp/$(basename $0).$$.tmp"
            /bin/echo $2 > $TMPFILE
            ARGS=$TMPFILE
            break
            ;;
        *)
            ARGS="$ARGS $1"
            ;;
        esac
        shift
    done
 
    java -cp $CLASSPATH clojure.main $ARGS
    if [ "$TMPFILE" != "" ] ; then
        rm $TMPFILE
    fi
fi
I had to edit it a bit to make it Windows friendly, but the script basically worked. However, I wasn't satisfied. I didn't want to have to manually edit the script, or an environment variable, simply to add jar files to the classpath. I wanted a simpler, more obvious way. And so I created a directory in my home directory called .clojure-classpath. The contents of the directory are:
$ ls -l
total 24
-rw-r--r-- 1 alexanc mkgroup-l-d 82 Jan 22 11:16 README
lrwxrwxrwx 1 alexanc mkgroup-l-d 92 Jan 22 12:06 clojure-contrib.jar -> /cygdrive/c/m2repository/org/clojure/clojure-contrib/1.1.0-RC3/clojure-contrib-1.1.0-RC3.jar
lrwxrwxrwx 1 alexanc mkgroup-l-d 68 Jan 22 12:06 clojure.jar -> /cygdrive/c/m2repository/org/clojure/clojure/1.1.0/clojure-1.1.0.jar
lrwxrwxrwx 1 alexanc mkgroup-l-d 79 Jan 22 12:42 commons-exec-1.0.1.jar -> /cygdrive/c/m2repository/commons-exec/commons-exec/1.0.1/commons-exec-1.0.1.jar
lrwxrwxrwx 1 alexanc mkgroup-l-d 60 Jan 22 12:06 jline.jar -> /cygdrive/c/m2repository/jline/jline/0.9.94/jline-0.9.94.jar
lrwxrwxrwx 1 alexanc mkgroup-l-d 82 Jan 22 12:08 swank-clojure.jar -> /cygdrive/c/m2repository/swank-clojure/swank-clojure/1.1.0/swank-clojure-1.1.0.jar
I already have a mountain of jar files in my local maven repository so, in order to make them available to Clojure, I simply created symbollic links to them. Then, I modified the clj
Bash script to automajically add all the jar files in this directory to the classpath prior to starting up Clojure. My new, modified version of clj is as follows:
#!/bin/bash
 
# set up the classpath dynamically. Note this includes the jline jar
CLASSPATH=""
for jarfile in `ls -l ~/.clojure-classpath/ | pcol 11`; do
    JAR=`cygpath --windows $jarfile`
    CLASSPATH="$CLASSPATH;$JAR"
done

if [ $# -eq 0 ] ; then
    stty -icanon min 1 -echo
    java -Djline.terminal=jline.UnixTerminal -cp $CLASSPATH jline.ConsoleRunner clojure.main
else
    TMPFILE=""
    while [ $# -gt 0 ] ; do
        case "$1" in
        -cp|-classpath)
            CLASSPATH="$CLASSPATH;$2"
            shift
            ;;
        -e)
            TMPFILE="/tmp/$(basename $0).$$.tmp"
            /bin/echo $2 > $TMPFILE
            ARGS=$TMPFILE
            break
            ;;
        *)
            ARGS="$ARGS $1"
            ;;
        esac
        shift
    done
 
    java -cp $CLASSPATH clojure.main $ARGS
    if [ "$TMPFILE" != "" ] ; then
        rm $TMPFILE
    fi
fi
The only non-standard thing in the script is the call to pcol which is a little Perl script I wrote to split a line on whitespace and print the column specified by the number. This modified version of clj:
  1. Lists all the files in the .clojure-classpath directory, and gets the full path from the output of ls.
  2. Converts the path to a Windows friendly path.
  3. Dynamically adds every file to the classpath.
  4. Starts Clojure.
That worked beautifully. Now I can run stand-alone scripts by simply typing: clj [script_name].

The next problem I had to solve was how to edit my scripts in Emacs using the same classpath. Fortunately I knew that I could do this by starting an instance of a Swank-clojure server from within the repl on the command line. Note that the swank-clojure.jar is in the .clojure-classpath directory. The script to start the server is simply:
(require 'swank.swank)
(swank.swank/start-server "nul" :encoding "utf-8" :port 4005)
I can start the server by:
  1. Running: clj start-swank.clj
  2. From a running repl loading start-swank.clj.
  3. Entering the commands directly into the running repl:
user=> (require 'swank.swank)
nil
user=> (swank.swank/start-server "nul" :encoding "utf-8" :port 4005)
Connection opened on local port  4005
#
Bingo. I can now edit my scripts in Emacs using Slime, with exactly the same classpath that will be used to run them. And since the .clojure-classpath directory contains a symbollic link to the directory I keep the scripts in (not shown in the listing above - I copied that before I added the link to the directory), all the scripts I write will automatically be on the classpath, and hence I can start building up a library of utility functions.

I am delighted with this set up, and can't wait to start writing some utility scripts in Clojure using Emacs.