Friday, February 17, 2012

Scala, clusters, and numerical computing

This post is mostly for my own reference, but I suspect that others might be interested as well. First, a brief explanation. In the C++ (ie, native code) world, it is easy to set up a Linux cluster and run jobs using MPI. Searching for "Java MPI" is unlikely to inspire you to use Java or other JVM languages if you currently use MPI to run jobs on a cluster.

"Cluster" as I'm using it here refers to a group of different physical computers in different locations each working on different parts of the same job. This is usually called a "Beowulf cluster". Simulations can be easy to parallelize because each iteration is independent of all the others, and if you're doing 100,000 simulations on 100 processors, there is never a need for two processors to work on the same job.

It's good to see that Scala has at least two options for running jobs on clusters:

akka - Also available for Java
Spark - Runs on top of Apache Mesos, which itself works with other languages such as Python, Java, and C++

I've not used either yet, but hopefully sometime in the next couple weeks I will have time to compare them. Here are some links to other numerical computing projects that I have or plan to check out.

scalanlp Contains some optimization routines, linear algebra, etc.
scalalab 
JGAP for Scala*
Parallel Colt (Java)
Stochastic Simulation in Java (Java)
jblas (Java)
Rserve (Java)
Call Octave from Java (Java)
Call Scilab from Java (Java)
JPOP* (Java)
JavaNNLS*(Java)
Apache Commons Math (Java) Already part of scalalab, but Clojure or Java users might be interested
Mantissa (Java)
Java Genetic Algorithms Package* (Java)
Michael Flanagan's library* (Java)
Java Matrix Benchmark

* These might be good, but they are smaller projects for which I have little information about the author, so you probably want to investigate before using for anything serious.

Conclusion: the momentum for serious numerical computing on the JVM is with Scala. I'm more comfortable with Clojure than with Scala, but both Clojure and Scala are so much better than the alternatives that I'd be happy to work full time with either. Clojure isn't there yet, let's see how far I can get with Scala. Most of the numerical library links are Java libaries, and will also work with Clojure.

2 comments:

Anonymous said...

I rarely write remarks, but i did some searching and wound up here "Scala, clusters, and numerical computing".
And I actually do have 2 questions for you if you usually do not mind.
Could it be only me or does it look like a few of the responses appear like left
by brain dead people? :-P And, if you are writing on other online sites, I would like to follow anything new you
have to post. Would you list of the complete urls of all your public sites like your Facebook page, twitter feed, or linkedin profile?



Look into my blog - best diet to lose weight fast for women

mohit sona said...

nice post i really appreciate your point of view
black cumin seed oil benefits