Wednesday, February 29, 2012

Clojure in pure Python is a great idea

Edit: Here's the link

Not good enough for me to do it myself, but I'd be very happy if someone else did it. Why?

You'd give up Java interoperability, but you'd gain interoperability with native code. Python offers excellent C++ interoperability via Boost and other methods, as well as excellent (but limited by the constraints of C) interoperability with Fortran. Further, rpy embeds, more or less, R in Python. It would make Clojure much more suitable for numerical work. For that matter, you could call all of scipy from Clojure. Awesome.


Another reason is that it would reduce the amount of competition between languages. I hate language wars because they're a waste of time. I might prefer one language for a given task, but I use many, so using one doesn't mean you can't use another. Anything that allows you to write in both Python and Clojure is helpful - let the developer make the choice, even on individual sections of code.

Another big advantage is getting us away from Java. The build tools, the classpath, the .com.lengthy.verbose.wtf.on.and.on are inherited from Java, and seriously suck. I guess tail call optimization is possible.

So you could say I'm a fan. Provided that it is possible to implement nearly all of Clojure, to the point that you can be confident that you will be able to write Clojure code for the JVM and expect it to run on the Python VM. It would be necessary to add functions such as Math/pow to get to that point. A lot of work, but definitely valuable. I hope it is viewed as an additional option for Clojure developers, not a threat to the JVM.

{Why am I writing this here? I put all comments on my blog. I don't have the time or interest to get into an internet shouting match.}

Wednesday, February 22, 2012

Preliminary thoughts on Scala vs Clojure

I've been using Scala for a little over a week now. Here are some preliminary thoughts on Scala vs Clojure.

(i) Scala is a better Java. Scala is still Java, in the sense that it is an OOP language requiring you to put a lot of effort into determining what is public and what is private. I've written about alternative approaches to OOP before [here, here, and here]. I don't want to go through the same discussion here (and don't claim to have fully grasped all the issues). I'll just say that I'm not smart enough to keep track of all the things you have to keep track of to do Java-style OOP. I can write pure functions that operate on data structures. I can't keep track of how all the data and methods and private etc. interact with one another. I couldn't do it when I tried Java and I can't do it now.

(ii) I miss macros. That surprised me because I don't write a lot of macros in Clojure. A lot of what I have done in Clojure makes use of Incanter, so it was just simple library calls. I don't even understand macros fully. Yet I miss them when they're not available. I spent a few minutes looking at Scala macros. That was enough.

(iii) Syntax is double-edged sword. I'm a fan of "everything's a list". If anything, I'd prefer to cut down on the amount of syntax that ends up in Clojure programs. Yet "everything's a list" just can't be sold to others. Scala, on the other hand, probably has the best syntax I've seen. I'd imagine that Scala makes a very good first impression. I haven't talked to anyone else about Scala so I don't know.

(iv) Scala performance is better than Clojure's. It's too easy to write slow code in Clojure. The various benchmarks I've seen support this. There are a lot of good things I can say about Incanter, but you wouldn't choose it for performance. I don't mean this in a way to start a flamewar, it's just an observation, and I'd be happy for someone to point out how I'm wrong. By that I don't mean a list of ways to tune my Clojure code. Tuning is a PITA that should be done by the author of the library. If performance is critical, I can't see a reasonable argument for using Clojure rather than Scala.

(v) I do mostly numerical programming. The JVM is a serious drawback for both languages. Beyond that, I don't see much interest from the Clojure community in numerical computing. All I see is Incanter, which doesn't seem to be very active, doesn't offer great performance, and is far from a complete solution relative to R, Matlab, and Scipy. Clojure is focused on other areas, and that's fine, but this is my comparison of Clojure and Scala, and it's what matters to me. On the other hand, Scala is very active in this area. Five years from now Scala will probably be a strong competitor to Matlab, while offering a much better language. It would be interesting to compare Clojure against SBCL for numerical computing.

(vi) Both have adequate IDE support in Eclipse. Counterclockwise is pretty good. The Scala IDE is just unbelievable. You don't need to debug Scala code: you just look at the bottom of the screen to see if there are any errors. You can even understand Scala error messages.

(vii) I probably have a preference for static typing overall, but it depends on what I'm doing and my mood.

(viii) Clojure has some rough edges. The transition to 1.3 coinciding with changes to Clojure/core left me frustrated. I use Counterclockwise with 1.2 and the old Clojure/core. I value my time. I don't think Clojure is where it needs to be in terms of documentation. Also, I really, really wish that those who write Clojure documentation would understand that NOT ALL CLOJURE USERS ARE CURRENTLY EMPLOYED AS ENTERPRISE JAVA DEVELOPERS! I tried Java years ago, hated it, and moved on. I know nothing about Java. It took me quite a while to understand how the hideous "com.Enterprise.Verbose.For.No.Reason" thing works when using libraries. I had to look in a Java book to get the necessary background. I have had a much smoother experience with Scala. Maybe the difference is that I've learned the Java approach by using Clojure.

(ix) Clojure has an impressive set of books available. They all do a decent job of explaining the language. Yet none of them can compete with Programming in Scala by Odersky, Spoon and Venners. You learn a lot more than Scala in that book. I'd highly recommend it to any intermediate programmer, even those who don't plan to use Scala. There are excellent resources available for either language so I don't see this as a reason to choose Scala.

Conclusion: I like the Clojure language better. I'd probably be more productive in Scala because it has a stronger future for numerical computing. I could sell others on Scala, but probably not on Clojure, due to syntax. Neither is going to replace my current combination of R + Fortran for everyday work.

That's a lot more than I planned to write. I'll update as things come to mind or as I learn I was wrong.

Tuesday, February 21, 2012

I understand C pointer syntax, but it doesn't make sense

No, C pointer syntax does not make sense, even if the author of this post understands what it represents. Let me begin by saying that the concepts underlying C pointers are not difficult to understand. You don't even have to know how to program to understand such a simple concept.

The problem is that you have to memorize a bunch of rules that violate good programming language design to use pointers in C. It's such a poor design that anyone with an ounce of programming ability has to say to herself, "That cannot possibly be the correct syntax - you just don't write programs like that!" Then you get used to it, and even though it doesn't make any sense, you're used to it, so you can do it.

What's wrong with float *pf? Well, for one thing, you're all of a sudden using a non-alphanumeric character to name a variable. That's completely inconsistent with the rest of C syntax. It's acceptable for Lisp, but not for C, because * is not acceptable as part of a variable name.

The second thing that's wrong with it is that it implicitly defines another variable, pf. Actually, it also defines **pf and ***pf as well. You can define **pf as (sort of but not really) a matrix, and then that defines *pf for you. One of the first things you learn when programming in C is that you have to define variables before you use them. Except sometimes you don't. They decided to throw type inference into the mix when it comes to pointers. They cleverly hide the type inference part by talking about things like indirection and addresses.

The third thing is that it is just weird to say you can define *pf as a float, and if you drop the first character, you get a variable that's completely different. How about pf for the float and pf.pointer for the pointer?

The fourth thing, which some people think makes sense, is that you use *pf to define a float and then *pf to dereference a pointer. Yeah, that makes sense. They've overloaded the (potentially non-alphanumeric) first character of a variable's name. Bjarne Stroustrup's proposal for overloading whitespace was a joke. C pointer notation is much worse, and unfortunately it was not a joke.

C pointer syntax is Hungarian notation taken to an extreme. There is nothing good about it. If you're able to use C pointer syntax correctly after seeing it for the first time, you're just memorizing language rules without making any attempt to understand the underlying concepts, and you should think seriously about whether programming is for you.

Clearly the author of the blog post I linked above does have programming ability. That's why he struggled for so long. He now understands something that makes no sense.

Friday, February 17, 2012

Scala, clusters, and numerical computing

This post is mostly for my own reference, but I suspect that others might be interested as well. First, a brief explanation. In the C++ (ie, native code) world, it is easy to set up a Linux cluster and run jobs using MPI. Searching for "Java MPI" is unlikely to inspire you to use Java or other JVM languages if you currently use MPI to run jobs on a cluster.

"Cluster" as I'm using it here refers to a group of different physical computers in different locations each working on different parts of the same job. This is usually called a "Beowulf cluster". Simulations can be easy to parallelize because each iteration is independent of all the others, and if you're doing 100,000 simulations on 100 processors, there is never a need for two processors to work on the same job.

It's good to see that Scala has at least two options for running jobs on clusters:

akka - Also available for Java
Spark - Runs on top of Apache Mesos, which itself works with other languages such as Python, Java, and C++

I've not used either yet, but hopefully sometime in the next couple weeks I will have time to compare them. Here are some links to other numerical computing projects that I have or plan to check out.

scalanlp Contains some optimization routines, linear algebra, etc.
scalalab 
JGAP for Scala*
Parallel Colt (Java)
Stochastic Simulation in Java (Java)
jblas (Java)
Rserve (Java)
Call Octave from Java (Java)
Call Scilab from Java (Java)
JPOP* (Java)
JavaNNLS*(Java)
Apache Commons Math (Java) Already part of scalalab, but Clojure or Java users might be interested
Mantissa (Java)
Java Genetic Algorithms Package* (Java)
Michael Flanagan's library* (Java)
Java Matrix Benchmark

* These might be good, but they are smaller projects for which I have little information about the author, so you probably want to investigate before using for anything serious.

Conclusion: the momentum for serious numerical computing on the JVM is with Scala. I'm more comfortable with Clojure than with Scala, but both Clojure and Scala are so much better than the alternatives that I'd be happy to work full time with either. Clojure isn't there yet, let's see how far I can get with Scala. Most of the numerical library links are Java libaries, and will also work with Clojure.

Thursday, February 16, 2012

Some more Scala experimentation

Ended up with some free time to play with Scala again today. Like many programmers, I learn a language by doing the things I do in other languages.

Today's project was to try out jblas. I gained a greater appreciation of the language from just a few lines of code. The (simple) challenge: create matrix A and matrix B filled with random numbers, then multiply them. As a baseline, in R this can be done using three straightforward lines of code:

A <- matrix(rnorm(12),4,3)
B <- matrix(rnorm(12),3,4)
C <- A %*% B


Here is the Scala code:

object tryblas {
  import org.jblas.DoubleMatrix
 
  def main(args: Array[String]) {
    val A = DoubleMatrix.randn(3,4)
    val B = DoubleMatrix.randn(4,3)
    val C = A.mmul(B)
  }
}

Type inference means I don't have to specify that A is "new DoubleMatrix" - the language figured that out for me! It's more lines than the R program, but not by much. There's a little overhead in a Scala program, and you have to import the jblas library. The Scala IDE even allows you to see the inferred type of A, B, or C by placing the cursor over the name. Impressive! The Java approach for some reason seems reasonable when I use Scala.

Wednesday, February 15, 2012

Calling R from Scala

My main programming language is still R. Incanter, while a nice attempt, is definitely not anywhere close to allowing Clojure to be a replacement for R. The JVM is not suitable for the numerical programming I do, and I find myself spending too much time trying to figure out how to change the code to speed it up.

Thus, as I've started learning Scala, I want to make sure I've got the backup of easy interoperability with R. That's basically what I did when I started learning Clojure. Here is my attempt to do the same thing in Scala. I'm still a beginner, so I won't claim to be able to write good Scala code yet.

My preference when using JVM languages is Eclipse. For Scala, there is an excellent IDE available. I followed the usual instructions to install it. (There is plenty of information on the internet, but it's basically pasting the link they provide into Eclipse, clicking some buttons, and restarting).

I switched to the Scala perspective, started a new Scala project, and added a new file under src. I then started typing. I was really impressed with the Scala IDE. It's still early in this adventure, but based on what I'd read, I was expecting it to be a piece of garbage. It was incredible for the small things I did with it. I especially like how it points out errors while you type. It's just a pleasure to use.

To access R, I downloaded the two jar files that are made available for Rserve. I use Linux, so I opened R in a terminal window, did "library(Rserve)", then "Rserve()".

Back in Eclipse, I right-clicked on the project name, went to "Build Path", then "Add External Archives...". I chose Rengine.jar and then RserveEngine.jar. Those names appeared under "Referenced Libraries" below the project name.

After a few small mistakes, this is the program I came up with. (It's released under the GPL version 2 or greater if you're wondering.)

object rs {
  import org.rosuda.REngine
  import org.rosuda.REngine.Rserve.RConnection
 
  def main(args: Array[String]) {
    val c = new RConnection
    val d = c.eval("rnorm(10)")
    val e = d.asDoubles
 
    for (i <- 0 until 10) {
        println(e(i))
    }
  }
}

Edit: Scala has a better way to print an array than the C-style for loop that I used. You can replace the for loop with  

println(e.deep.mkString("\n"))

I chose "Run" from the menu, followed by "Run" again. After a brief wait (presumably for compilation), I got the following output:

1.1357939434863817
0.45168475066687624
0.8312101791389808
-0.8825102796935446
0.9597726735219193
0.5916220359121344
-0.8979526251577276
0.2217958737969706
-0.013414880990143055
0.021108802828942192

That's it! Super easy. I called R from Scala, had R generate 10 random numbers, shipped those random numbers back to Scala, and printed them out. I can now do anything with "e" that I would do if I had created the random numbers in Scala.

The only drawback is all the syntax. I'm resigned to the fact that most programmers think syntax is good.