My Corner of the World: 2012

Wednesday, July 04, 2012

Blog discontinued

This blog is no longer active. Some of my posts are still getting hits, so I'll leave it up for anyone wanting to use it as a reference, but I probably won't respond to comments. I place everything in every post in the public domain. You can use the information in any way you wish, with or without attribution, but if you do attribute something to this blog, it has to be an exact quote.

Monday, June 18, 2012

Not much to blog about

I haven't posted much related to programming lately. Looks like I've gone about as far as I can with Clojure, given the limitations of the JVM. It's really a shame because I'd love to use it more. There's no way around it, Java is simply not suitable for numerical computing due to the lack of libraries (for what I do, not necessarily what anyone else does) and the speed difference between Java and native for a handful of tasks like linear algebra. I've done a lot of research on it and concluded that Java will always be a good deal slower for some things (jblas may be a step in the right direction).

Maybe clojure-py is the answer. It offers a native Clojure solution. For now, due to time constraints, I've decided to set Clojure aside. This will most likely be the end of blogging about my Clojure and Lisp experiences. It's been a great journey and I hope that in the future I will be able to program in Lisp on a regular basis. I may occasionally comment on a piece of news in the Lisp world, but that's it.

Where will I go? The honest answer is that my current combination of R and Fortran works well. R is not a bad language if, like me, you believe in functional programming.

For a while, Julia was getting a lot of attention. It has potential. In a year or two it might be worthwhile to dig into it. It has a lot of features, but there's not a lot that's new, so I have no motivation to work with it now.

Haskell and OCaml would be great languages to learn for the sake of learning, but given my time constraints right now, I'm not sure the benefits would outweight the costs. (F# is not going to happen either.) I don't see either as realistic replacements for R.

How about Scheme, Common Lisp, or newLISP? Each has its uses, and I still use newLISP, but they also have drawbacks. Scheme documentation is crap, plain and simple. I know, Racket has good documentation of the language, blah blah blah, I've tried Scheme and I'm not smart enough for it. Common Lisp could be great, but a language that doesn't evolve is useless. I'll use the next revision of the Common Lisp standard (LOL) that has been updated for the 21st Century. newLISP is a scripting language, and I love using it in that capacity, but it will take more than that to replace R.

I'm not interested in Go. It's a better C. I have as much use for a better C as I do for a better Mandarin. If you don't speak Mandarin, and don't have a reason to speak Mandarin, a better Mandarin is not going to excite you.

I may give D another try. I like the language. It provides little in the way of numerical libraries so my best guess is that I would be better off with Clojure than with D.

That's where things sit. When I have something worth blogging about, I will do so.

Monday, May 14, 2012

Clojure with Python syntax, part II

A couple months ago, I posted some thoughts I was having about writing a DSL that would make it possible to write Clojure without the s-expressions, the nesting, and back-and-forth indentation style that is so common in Lisp programs. I said the following would be the goals of the project, if I were to do it:

Eliminate the Lisp syntax barrier to getting programmers to try Clojure.
Make Clojure acceptable to universities that want to avoid Lisp syntax.
Provide a stepping stone so that programmers can learn Clojure concepts, then when they want more control over their language, they can start writing Clojure without giving up any of their existing code.

The post was motivated by my attempts to get others interested in Clojure. I'm a true believer in Clojure - I don't think there's another language that offers anything close to the experience. It's just a well thought out language that was implemented by someone with a lot of programming experience. The meaning of the term "pragmatic" has changed in recent years to mean "excuse for hypocrisy", so I won't say Clojure is pragmatic, but it certainly is a practical approach to functional programming. I think it is the best choice for those learning to program as well as those who want to get work done.

That is why I was disappointed at the way Clojure's syntax, which is quite an improvement on the Lisps that came before, is still viewed as clumsy and hard to read. I've always loved the consistency of Lisp syntax, and have always felt that the advantages of consistency outweigh any of the complaints. Too bad that's not true for everyone.

I've come to accept that humans treat their first experience as the default. Once you've reached puberty, you can forget about speaking a foreign language without an accent, because you will always, to some extent, try to put the new language into the form of the old. An accent is nothing more than pronouncing the new language using the "idiomatic" form of your native tongue.

I see no compelling reason that Lisp syntax should be a barrier to using Clojure. If your "default" is that a language should look like Python, then you should be able to write Clojure in that form. You'd lose the power of macros, but how is that a problem for someone who doesn't know Clojure or any other Lisp? When they're ready to take the leap to start writing macros, they can start working with s-expressions.

I did some searching on Google before my previous post. I had hoped this would already have been done, but I couldn't find anything. Richard Gabriel and Guy Steele even have a section in this paper devoted to alternative Lisp syntax. The problem is that all of those alternatives are ugly, much worse than Scheme or Common Lisp, so it's not at all surprising that none of them took off.

Basically every similar project has the same fatal flaw. They're all trying to be Lisp without parenthesis. In my experience, the objection to Lisp has little to do with parenthesis. Even when you end up with something like this: )))))))), the problem is that you've nested eight things in one statement, usually with varying levels of indentation, so it's not really the eight parens that causes the grumbling. Python, Ruby, or C programmers are unlikely to have eight levels of nesting in their programs.

The thing to realize with alternative Lisp syntax is that you can't save macros or prefix notation. Clojure != macros. Clojure != prefix notation. Clojure does so much so well that there is no reason to fear the loss of macros. I know about languages like Dylan. Dylan is not in any way worthy of comparison with Clojure. As a matter of fact, Clojure with Python syntax wouldn't even be a Lisp anymore, and that's okay, because I don't really care about Lisp. I could still write my Clojure with parens, it wouldn't make any difference how others write theirs, since it would still be Clojure. I can sit in my basement on Saturday night as a hobbyist writing macros, but I'd rather use Clojure all day at the office, even if I'm not using Lisp.

I haven't had time to dig into it as deeply as I'd like. Here's what I have concluded on the basis of my research:

I don't know anything about writing a DSL, parsing, or code generation.
I'm an intermediate Clojure user, so I don't yet understand some of the features of the language, and I don't understand most of the language as well as I should.
I don't use Python much these days (I don't think I've written a line of Python code in the last year). I don't understand what is so readable about Python, it looks the same to me as any other language.
Nobody would care if I did write it. I could show it to the people I know, but being realistic, few people would use it.

Combining my current time constraints with the huge cost of learning everything I'd have to learn, and comparing with the benefits, I can't justify doing it. I nevertheless think it would be a worthwhile project for someone who has no kids and is looking for a way to kill some time.

Sunday, May 13, 2012

Thanks to StackOverflow? Not really, to be honest

I always find it interesting when I read praise being heaped onto StackOverflow and StackExchange. Let's see, they run a business, it's been successful, and most of the contributions are made by others. A fanboy even recently posted on Hacker News about how good it felt to get a signed letter from Joel Spolsky thanking him for his help. He should get a letter. He's put a lot of money in Joel's pocket. Let's not act silly, it was started as a business venture by Jeff and Joel, and if it hadn't worked, they'd have killed it right away, no matter how helpful it was.

Since my last blog post on stackoverflow, about out-of-hand closing of questions, I've come to another realization: I no longer use stackoverflow. Let me explain. I will search for a problem I'm having, and Google will regularly point me to stackoverflow. Almost without fail, if there's something helpful on stackoverflow, it has a date prior to 2011.

Why would that be? The last couple years there has been an attempt to close as many questions as possible. The questions that you are allowed to ask are all things that can be found in the documentation. For R, there's almost nothing that would be allowed on stackoverflow that you can't find by typing ?search-term. Even worse, a lot of the answers on stackoverflow are not good, because they're either noise written in an attempt to get points, or they're just wrong. There are exceptions, but answers usually only point you in the right direction, so you have to consult the documentation anyway.

The goal is to eliminate discussion. That means you have someone asking a specific question with a single, simple answer, and someone writes out that answer. The problem is that there are very few interesting programming questions that fit into such a format. "How can I debug a program in R?" is an example of a question that is considered inappropriate. It's very important, it's not easy for a new R programmer to figure out on her own, yet it's off-limits because it might result in someone stating an opinion. Something that can be posted is, "What is wrong with this call: x <- solve(y,z,linpack=TRUE)?" Snooze. It'll get four answers. Two will be general discussion, one will be an off-the-wall, totally wrong answer from someone looking for points, and one will be from someone summarizing ?solve. Yeah, not the world's biggest innovation.

These days I only visit the site to look at answers to questions that are at least a couple of years old. I don't have much reason to visit a website that is a substitute for looking things up in the documentation. Perhaps a new website, that allows for interesting questions, will appear. The strategy worked for Jeff and Joel, they made a lot of money before they decided to strictly enforce the "I'm too lazy to read the documentation" limitation on questions, hopefully someone else will take the idea and run with it.

Tuesday, April 10, 2012

clojure-py 0.2 is out

And now things are starting to get interesting. Multimethods are implemented. I would have a hard time using Clojure without them. Various other improvements are also present but that's the big one for me.

It's still much too early to think about putting it to use on projects you use to feed yourself. It's not too early start writing software in it, so that you can find, report, and fix bugs. And documenting. And writing tests. And writing examples.

For those of us who whine about the JVM, it's time to start looking seriously at clojure-py. Please excuse me so I can go drink a beer to celebrate. As the saying goes, it's five o'clock somewhere.

Sunday, April 08, 2012

In Defense of R

This post has been planned for several weeks now. After I saw this presentation I was in the mood to finish it. Although I question the choice of John Cook given the other speakers at the conference (to my knowledge, he's not involved with the core R development), I'm not responding to anything he said, as everything here was outlined weeks ago. I had never heard of John Cook before this presentation. His description of R as "a strange, deeply flawed language" does capture the views of the critics.

Some of the criticisms thrown at R include: the syntax is inconsistent; the language is poorly designed - by statisticians, heaven forbid, rather than computer scientists; too much stuff was added to the language through time rather than being part of an initial grand design; and it is slow, slow, slow.

The last criticism is correct. Loops are slow relative to Fortran or C++, for sure, but that reflects the interactive nature of the language. The same is true for Python, Ruby, and a whole lot of other languages. A JIT compiler is on the way, I've read. There already is a JIT compiler, but I'm talking about one that gives 10x or 20x or better speedups for some tasks relative to the current version of R. Since I don't know a lot about it I'll let you use Google to find information if you're interested.

The other criticisms I've listed are not accurate descriptions of what the critics dislike. What they're really saying is "R is flawed because I tried to do X and the behavior was not what I expected" and "R is flawed because it does a lot of things I don't need and don't understand."

Not all criticisms can be put into one of those two categories. R's approach to OOP (S3 or S4) leaves something to be desired. The language offers no support for tail calls that you get in a language like Clojure or Scala. That's a perfectly reasonable statement of areas in which R is lacking.

The source of most criticisms of the design of R is that the language does so much. Nothing else that I've tried comes close. There are a ton of packages available. You can see the R source code. You can get detailed information on the internals of R. The language supports functional, object oriented, and FORTRAN 66 programming styles. It's easy to interface with C++ and Fortran code when you need speed. The metaprogramming capabilities are astonishing.

And not to be lost is that this is all functionality of a statistical programming language. Statistical analysis is not something bolted on top of a language designed by computer scientists, or web developers, or someone who decided to improve the state of imperative programming. The core of the language is functions for analyzing data. You can load a dataset, do some manipulations, and estimate a probit model without loading any external packages. There are an enormous number of tradeoffs involved with writing a language that does everything I've described here (and I've only scratched the surface). I rarely see the critics acknowledging that - yet it is precisely the many things that R does that makes it so much better than the alternatives.

Lists are a good example of something that seems "weird" to the C or Matlab programmer. They are very important in R. Someone who has programmed in Lisp already knows all about R lists, but most of the critics do not have that background, so they think lists are a flaw rather than a feature. Making matters worse is that the term "list" is used in different ways, so even if they've seen a list, it probably wasn't important to the language and wasn't used the same way.
In R you can do:

x <- list(x=rnorm(100),z="Blackbeard the Pirate")

Matlab programmers think R is an inferior program because they're used to jamming everything into a matrix or array form. They see someone using a list and can't understand why you'd use such a complicated, verbose solution. The Matlab solution is easier. You create the vector x and then write a comment so that you can keep track in your head that x corresponds to Blackbeard the Pirate. It's much less typing.

The Matlab programmer then sees data frame objects everywhere when using R. In Matlab, you store data in a matrix, so to get your data out you take a slice from a matrix. In R, a data frame is a list with elements that have equal length. Not understanding what R lists are, but not willing to spend time learning what a data frame is, the Matlab programmer gives up and says "R has a lot of useful functionality, but it's strange." And they're right. What they don't realize is that "strange" is a reflection of their limited knowledge, not a reflection of the poor design of the R language. It's not so easy to keep track of the index of the column holding each variable when you've got 600 variables.

That's a simplistic example, but one I've seen many times. Different == wrong. Going a little deeper, a lot of newbies coming from other languages don't realize that vectors hold more than just numbers and characters. A vector can hold NA values for missing data. A vector can have one element (a scalar). It can have zero elements. You can work with NULL objects in vectors. For instance, you can do this

c(1,NULL,2)

and you'll get back a vector with two elements. Or you can do this

c(1,NA,2)

and you'll get back a vector with three elements. I realize that they confuse newbies, but given just how helpful it can be when you need it (which is all the time for me) you view it as a feature of the language, not a flaw in the language's design.

A third example is

cbind(1:6,1:3)

The output is

      [,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    3
[4,]    4    1
[5,]    5    2
[6,]    6    3

Sometimes recycling of elements gives a warning, sometimes it doesn't. If you try

cbind(1:6,1:4)

You get a warning message:

Warning message:
In cbind(1:6, 1:4) :
number of rows of result is not a multiple of vector length (arg 2)

Recycling means R won't identify some of your bugs. When you need recycling, though, it is very nice to have.

These are three examples of language design decisions, not language design flaws. Someone coming from another language may not anticipate that R works the way it does. That argument doesn't mean much to me. Everyone would laugh at me if I said the design of Java is deeply flawed because I've been using R for years and the OOP approach in R doesn't work in Java. I think it's no less silly to criticize the design of R because it doesn't do things the way you expect. Maybe there needs to be a better way to communicate to newbies what the language is doing. Maybe R should have a newbie mode with lots of warnings. Maybe it should come with better documentation. But to criticize the language design on that basis is absurd.

The reason R is heavily used is not just because of the existing packages. It's also not because statisticians don't know anything about software development. It's because R is a programming language for statistical analysis. You don't get that with SAS, SPSS, or Stata, what you get is a way to call prewritten functions and something that sort of resembles a programming language. You feel shortchanged when you use them. I've put much effort into fitting my problem into the limitations of those programs. In a few cases I even had to change what I was doing due to their limitations. Python, Java, C++, Ruby, or whatever happens to be the latest fashion are general programming languages. You can write yourself a library if you want, but that's far from what R offers.

Julia is the current flavor of the month. I looked at it, and when I can actually do something useful, maybe I'll give it a try. It's easy for a language to appear clean and elegant if it doesn't do anything useful. Use it for a year as a replacement for everything you do in R, then get back to me about how clean and elegant it is. How does it handle missing data? How does package management work? How's the documentation system? Clojure is my favorite language, but I tried to use it as a replacement for R, and I didn't get anywhere. And not just because of the existing collection of R packages. I was doing so much with Rserve that I finally realized I was better off to go back to R.

R should get some credit for being the language that offers the right set of tradeoffs for many of us. The Bjarne Stroustrup quote, “There are only two kinds of languages: the ones people complain about and the ones nobody uses” comes to mind when reading criticisms of R.

Monday, March 26, 2012

Some people are genuinely too stupid to run a business

Here's the link.

And they were hoping to accomplish what, exactly? I've never heard of geeklist before, and I don't expect them to be in business very long. If I had given them funding, I'd be kicking myself.

Even if they thought they were right (they're not) you have to know how to pick your battles. Some people are just too emotional for the business world.

Sunday, March 18, 2012

Clojure with Python syntax

This post is of the brainstorming variety. It is a project that I might undertake in the future, when I have time.

Many programmers (not me, but apparently a large number) find Python to be a readable language, and Lisp to not be readable. You give up macros when using Python, but maybe that's not a big deal to many developers. Maybe if you're running a company with a lot of turnover, macros aren't a big deal. Or maybe you just don't understand or don't care about macros. I love them but they can be hard to get right.

So what if there was a language with Python syntax, complete with whitespace rules, that compiled to Clojure? My google-fu did not find any such projects. Clojure is a really good language even without macros. The code generated by the new language would be trivially interoperable with any other Clojure code.

It would have to make available everything that is in Clojure. I have no idea how much work it would be to:

Parse a Pythonesque language
Provide access to everything in Clojure core
Keep up with all new releases of Clojure
Write documentation
Produce Clojure code that is easy to read and modify (should not be difficult)

I don't know anything about designing a language, I don't know a lot about Python, and I don't have a solid understanding of many parts of Clojure. I wouldn't actually be implementing anything, and I would be borrowing the syntax of an existing language, but that doesn't mean it would be something that could be done in a weekend. I'd only call it a language because it technically would be a different language.

The goals would be:

Eliminate the Lisp syntax barrier to getting programmers to try Clojure.
Make Clojure acceptable to universities that want to avoid Lisp syntax.
Provide a stepping stone so that programmers can learn Clojure concepts, then when they want more control over their language, they can start writing Clojure without giving up any of their existing code.

Any constructive feedback, positive or negative, is welcome. I'd especially appreciate links to a project that already does what I'm proposing. And suggestions for the best way to proceed. I want to be clear that this is only something I've been thinking about the last couple months as a way to bring Clojure to the world, it would not compete with Clojure, and I haven't done any work on it.

Wednesday, March 14, 2012

What I learned showing Clojure to others

Here are a few things I've noticed when I've (attempted to) show Clojure to others. I'm not saying that my sample (n=4) is representative, so if you think that is too small a sample size, there's no point reading further. I sat down at the computer to demonstrate to two, and had lengthy discussions with the other two.

1. (Not important) The name is unusual, and they've never heard of it. When I tell them that it runs on the JVM, you have access to all Java libraries, and that there is a large, helpful community with four - soon to be five - books published by major publishers, they view it as credible. Outstanding libraries written in Clojure, like Incanter, also help.

2. (Critical) After they realize it's not a toy language, the next question is ease of use. I was surprised that ease of use has nothing to do with the language itself. How can I make an executable that I can share with others? How can I access existing libraries? And it has to be something automatic, where you write the code, make the executable, and then you can run it by clicking the mouse or issuing a simple command.

This is where Clojure fails, big time. The claim is made that you don't need to know Java to use Clojure. That's false. You need a lot of knowledge of the Java build system, how to set the classpath, etc. True, you don't need to know much about Java the language, but you need to know a lot about Java the development environment. And you're not likely to learn much about it from the Clojure community, unfortunately, because it's expected you can learn it yourself. This is a big burden to impose on someone who wants to learn Clojure.

Counterclockwise helps a lot. It's dead simple to get started with a feature-filled Eclipse development environment. Got some Java libraries you want to access? Click a couple times to add them to your project. Want to use infix notation and a random number generator? Add the Incanter .jar executable in a couple clicks. Don't want to mess with the classpath? You don't have to.

Unfortunately I still haven't got the whole making a .jar figured out myself. There's still a long way to go in this area. Clojure has to free itself from the Java approach or it's only going to appeal to Enterprise Java developers looking for a good Lisp. The requirement that you understand and accept the Java approach was, to be honest, a dealbreaker. None of the four had much experience with Java (mostly by choice) and they were not interested in learning a whole new approach to software development plus Java.

3. (Sort of important) The parenthesis issue is really a non-issue. Whitespace is actually a bigger deal in Clojure than in Python. There's just so dang much indentation, and indentation is syntax, and there are so many levels of indentation in both directions in a single block of code.

4. (Very important) The typical approach to writing in any Lisp is to do a lot of nesting. "let" helps, but all the indentation I referred to above is due to nesting. You have to follow everything in a given line plus everything in indented lines below it to know what's going on. You don't usually see something like that in imperative code. Another thing is that using operators in prefix form is stupid. "+" is an infix operator. It should be possible to make infix operators the default.

5. (Hard to classify) They don't understand the importance of functional programming.

6. (Important) They like the idea of a customizable programming language. They customize everything else in their lives, so they were very open to the idea that you can customize your language. I think everyone would like to change something in their current language, so macros have appeal.

The most important thing I took away is that their evaluation of Clojure had almost nothing to do with the language itself, beyond the concept of macros, which they liked but didn't really understand. I think it's important to explain before you even start why the syntax is "funny". There's no reason to hide macros, they are a killer feature.

I realized that most of the world doesn't care all that much about programming languages. They are not interested in learning about new programming concepts, and if they've never thought about the basics of functional programming, they're not going to care if a new language supports it.

It's also obvious why Python is popular. If you like imperative programming, and you don't want to customize your language, Python can probably make you happy. I don't see Clojure having the same broad appeal any time soon.

Monday, March 12, 2012

A quick look at clojure-py

I played around with clojure-py this weekend. I jotted down some initial thoughts on the advantages of Clojure in pure Python in an earlier post.

Before I start, I want to point out that this is a very young project. My goal was only to get a feel for how things will work when there is a stable, well-tested release. I was impressed with what I saw and will happily await future releases. One to two years is a reasonable time frame for a production quality release on a project of this size. Users (myself included) have no reason to expect any specific functionality to work. We should expect bugs. I want to clarify these points because sometimes readers see something negative like "x doesn't work" and think I've written "Those incompetent developers haven't even implemented x. What a joke. Sigh."

I followed the instructions on the project wiki to install using easy_install. Consistent with the name, the install was easy. I started the REPL at the (Linux) command line using "clojurepy". I was greeted by

clojure-py 0.1.0
user=>

The starting point with any language has to be "Hello, World!":

(println "Hello, World!")

Okay, that's great, but what about something real? There are supposed to be 350 Clojure functions implemented, but I'm not sure which ones they are (reduce and ref are not).

Functions

Define a function:

(defn f [x y z] (+ x y z))

Call it:

(f 3 4.2 7.6)
(f 3, 4.2, 7.6)

clojure-py lets you use commas between arguments. The prefix notation is kind of ugly. Incanter offers infix notation, complete with operator precedence, so you could instead define f using

(defn f [x y z] ($= x + y + z))

Unfortunately ref is not yet implemented, so the Incanter infix library cannot be used. (I'm sure it could be made to work, but at this early stage it's wiser to let the language catch up.)

Multiple Arity

Let's try out multiple arity functions:

(defn sum
([x y]
    (+ x y))
([x y z]
    (+ x y z)))

(sum 1 2)
(sum 1 2 3)

That works. Now I can avoid prefix operators. This is not a functional solution, and it's pretty lengthy by Lisp standards, so let me use less code to sum an arbitrary number of arguments:

(defn sum [& stuff] (apply + stuff))

The [& stuff] indicates that the function takes a variable number of arguments, which will all be put into "stuff". (apply + stuff) applies the function "+" to all the elements of "stuff". Test it out:
(sum 1 2 3)
(sum 1 2 47.56 13 12 11 10 9)
(sum)

It works. clojure-py is clearly beyond the initial prototype stage; one could write complicated programs at this point.

Accessing Python Libraries

Enough with the pure Clojure stuff. It's nice to see that clojure-py is at such an advanced state but I can do all that and more on the JVM. The cool thing about clojure-py is that you can import existing Python libraries. That may not be helpful in a lot of areas (Java probably has more good quality libraries than any other language) but for numerical computing it is huge.

That includes scipy, Rpy2 to call R, and f2py to call Fortran, among many others. Presumably that also means you could call Pycuda from clojure-py, adding GPU computing to the things you can do from Clojure. I can only imagine the possibilities for using Clojure for GPU metaprogramming. cuda programming is not fun.

Python Standard Library

First I pulled some examples from the Python standard library tutorial. I created a namespace and added everything from the math library:

(ns tryclojure (:require math))

(math/cos 1.5)
(math/cos (/ math/pi 4))
(math/log 1024 2)

(random/choice ["apple" "pear" "banana"])

Oops, that last call throws an error. Let's add random to our namespace, but we want only two functions:

(ns tryclojure
(:require [random :only [choice random]]))

(random/choice ["apple" "pear" "banana"])

(random/sample (py/range 10) 10)

(random/random)

The second example shows how to access functions like range that don't have to be imported into Python. You do not have to explicitly import them into clojure-py either.

Numpy

Now give numpy a try. I'd rather write np than numpy, so I can use the :as keyword.

(ns tryclojure
(:require [numpy :as np]))

(np/arange 10)

(* 3 (np/arange 10))

(def x (np/array [1 2 3]))
(def y (np/array [4 5 6]))
(* x y)

(def a (np/array [[1 2] [3 4]]))
(def b (transpose a))
(dot a b)

I'm using arrays rather than matrices, so matrix multiplication is done with dot. (* a b) is element-by-element multiplication. The numpy syntax requires only two arguments to dot, so if you want to compute a*b*b*a, you have to do (dot (dot (dot a b) b) a). You should avoid nesting like that unless you have a very good reason. Nesting like that is only a little easier to follow than GOTO statements in idiomatic FORTRAN 66.

Threading Macro

Let's see if the threading macro is available. Don't worry if you don't know what it is, but it does simplify things. An example is (-> (dot 3.0) (/ 2)), which returns 1.5. It evaluates the first argument and uses it as the first argument of the next call.

(-> (dot a b) (dot b) (dot a))

I could also just use temporary variables in a let statement, which is verbose, but fine if you're a Clojure newbie or plan to share code with a Clojure newbie:

(let [temp-1 (dot a b)
      temp-2 (dot temp-1 b)]
(dot temp-2 a))

Conclusions

If you want to add Clojure syntax and basic functionality to Python, it looks like you've already got most of what you need. I hope clojure-py eventually implements all of JVM Clojure and then provides a way to pass messages from one implementation to the other. Great work by the developers.

Sunday, March 04, 2012

I don't think web developers should talk about programming

Web developers, I'll let you in on a secret. I know you've never heard this outside of your small circle of friends. It'll probably come as a shock.

Programming is not a subset of web development. It is possible to write programs that are not part of a website.

I know you won't believe me.

Today's award for "Headline I'd Never Write" goes to this gem:

What Every Programmer Should Know About SEO

No, I didn't read the article. The answer is "absolutely nothing". I'm not about to read something written by someone who worked his way through a Ruby on Rails tutorial and thinks he can tell me something interesting. Why not "What Every Shoe Salesman Should Know About SEO"? When you write R and Fortran code most of the day, SEO is just as relevant.

I don't even see why every web developer should be concerned about SEO. I know some folks who work on complicated web apps, but it wouldn't be very helpful for their work to know anything about SEO, because that's not their department.

Not that I haven't seen similar many, many times before. I can't find the link, but there was a blog post a few months ago that said Scala and Clojure have nothing to offer because all programmers are web developers and web developers don't need to worry about concurrency.

Wednesday, February 29, 2012

Clojure in pure Python is a great idea

Edit: Here's the link

Not good enough for me to do it myself, but I'd be very happy if someone else did it. Why?

You'd give up Java interoperability, but you'd gain interoperability with native code. Python offers excellent C++ interoperability via Boost and other methods, as well as excellent (but limited by the constraints of C) interoperability with Fortran. Further, rpy embeds, more or less, R in Python. It would make Clojure much more suitable for numerical work. For that matter, you could call all of scipy from Clojure. Awesome.

Another reason is that it would reduce the amount of competition between languages. I hate language wars because they're a waste of time. I might prefer one language for a given task, but I use many, so using one doesn't mean you can't use another. Anything that allows you to write in both Python and Clojure is helpful - let the developer make the choice, even on individual sections of code.

Another big advantage is getting us away from Java. The build tools, the classpath, the .com.lengthy.verbose.wtf.on.and.on are inherited from Java, and seriously suck. I guess tail call optimization is possible.

So you could say I'm a fan. Provided that it is possible to implement nearly all of Clojure, to the point that you can be confident that you will be able to write Clojure code for the JVM and expect it to run on the Python VM. It would be necessary to add functions such as Math/pow to get to that point. A lot of work, but definitely valuable. I hope it is viewed as an additional option for Clojure developers, not a threat to the JVM.

{Why am I writing this here? I put all comments on my blog. I don't have the time or interest to get into an internet shouting match.}

Wednesday, February 22, 2012

Preliminary thoughts on Scala vs Clojure

I've been using Scala for a little over a week now. Here are some preliminary thoughts on Scala vs Clojure.

(i) Scala is a better Java. Scala is still Java, in the sense that it is an OOP language requiring you to put a lot of effort into determining what is public and what is private. I've written about alternative approaches to OOP before [here, here, and here]. I don't want to go through the same discussion here (and don't claim to have fully grasped all the issues). I'll just say that I'm not smart enough to keep track of all the things you have to keep track of to do Java-style OOP. I can write pure functions that operate on data structures. I can't keep track of how all the data and methods and private etc. interact with one another. I couldn't do it when I tried Java and I can't do it now.

(ii) I miss macros. That surprised me because I don't write a lot of macros in Clojure. A lot of what I have done in Clojure makes use of Incanter, so it was just simple library calls. I don't even understand macros fully. Yet I miss them when they're not available. I spent a few minutes looking at Scala macros. That was enough.

(iii) Syntax is double-edged sword. I'm a fan of "everything's a list". If anything, I'd prefer to cut down on the amount of syntax that ends up in Clojure programs. Yet "everything's a list" just can't be sold to others. Scala, on the other hand, probably has the best syntax I've seen. I'd imagine that Scala makes a very good first impression. I haven't talked to anyone else about Scala so I don't know.

(iv) Scala performance is better than Clojure's. It's too easy to write slow code in Clojure. The various benchmarks I've seen support this. There are a lot of good things I can say about Incanter, but you wouldn't choose it for performance. I don't mean this in a way to start a flamewar, it's just an observation, and I'd be happy for someone to point out how I'm wrong. By that I don't mean a list of ways to tune my Clojure code. Tuning is a PITA that should be done by the author of the library. If performance is critical, I can't see a reasonable argument for using Clojure rather than Scala.

(v) I do mostly numerical programming. The JVM is a serious drawback for both languages. Beyond that, I don't see much interest from the Clojure community in numerical computing. All I see is Incanter, which doesn't seem to be very active, doesn't offer great performance, and is far from a complete solution relative to R, Matlab, and Scipy. Clojure is focused on other areas, and that's fine, but this is my comparison of Clojure and Scala, and it's what matters to me. On the other hand, Scala is very active in this area. Five years from now Scala will probably be a strong competitor to Matlab, while offering a much better language. It would be interesting to compare Clojure against SBCL for numerical computing.

(vi) Both have adequate IDE support in Eclipse. Counterclockwise is pretty good. The Scala IDE is just unbelievable. You don't need to debug Scala code: you just look at the bottom of the screen to see if there are any errors. You can even understand Scala error messages.

(vii) I probably have a preference for static typing overall, but it depends on what I'm doing and my mood.

(viii) Clojure has some rough edges. The transition to 1.3 coinciding with changes to Clojure/core left me frustrated. I use Counterclockwise with 1.2 and the old Clojure/core. I value my time. I don't think Clojure is where it needs to be in terms of documentation. Also, I really, really wish that those who write Clojure documentation would understand that NOT ALL CLOJURE USERS ARE CURRENTLY EMPLOYED AS ENTERPRISE JAVA DEVELOPERS! I tried Java years ago, hated it, and moved on. I know nothing about Java. It took me quite a while to understand how the hideous "com.Enterprise.Verbose.For.No.Reason" thing works when using libraries. I had to look in a Java book to get the necessary background. I have had a much smoother experience with Scala. Maybe the difference is that I've learned the Java approach by using Clojure.

(ix) Clojure has an impressive set of books available. They all do a decent job of explaining the language. Yet none of them can compete with Programming in Scala by Odersky, Spoon and Venners. You learn a lot more than Scala in that book. I'd highly recommend it to any intermediate programmer, even those who don't plan to use Scala. There are excellent resources available for either language so I don't see this as a reason to choose Scala.

Conclusion: I like the Clojure language better. I'd probably be more productive in Scala because it has a stronger future for numerical computing. I could sell others on Scala, but probably not on Clojure, due to syntax. Neither is going to replace my current combination of R + Fortran for everyday work.

That's a lot more than I planned to write. I'll update as things come to mind or as I learn I was wrong.

Tuesday, February 21, 2012

I understand C pointer syntax, but it doesn't make sense

No, C pointer syntax does not make sense, even if the author of this post understands what it represents. Let me begin by saying that the concepts underlying C pointers are not difficult to understand. You don't even have to know how to program to understand such a simple concept.

The problem is that you have to memorize a bunch of rules that violate good programming language design to use pointers in C. It's such a poor design that anyone with an ounce of programming ability has to say to herself, "That cannot possibly be the correct syntax - you just don't write programs like that!" Then you get used to it, and even though it doesn't make any sense, you're used to it, so you can do it.

What's wrong with float *pf? Well, for one thing, you're all of a sudden using a non-alphanumeric character to name a variable. That's completely inconsistent with the rest of C syntax. It's acceptable for Lisp, but not for C, because * is not acceptable as part of a variable name.

The second thing that's wrong with it is that it implicitly defines another variable, pf. Actually, it also defines **pf and ***pf as well. You can define **pf as (sort of but not really) a matrix, and then that defines *pf for you. One of the first things you learn when programming in C is that you have to define variables before you use them. Except sometimes you don't. They decided to throw type inference into the mix when it comes to pointers. They cleverly hide the type inference part by talking about things like indirection and addresses.

The third thing is that it is just weird to say you can define *pf as a float, and if you drop the first character, you get a variable that's completely different. How about pf for the float and pf.pointer for the pointer?

The fourth thing, which some people think makes sense, is that you use *pf to define a float and then *pf to dereference a pointer. Yeah, that makes sense. They've overloaded the (potentially non-alphanumeric) first character of a variable's name. Bjarne Stroustrup's proposal for overloading whitespace was a joke. C pointer notation is much worse, and unfortunately it was not a joke.

C pointer syntax is Hungarian notation taken to an extreme. There is nothing good about it. If you're able to use C pointer syntax correctly after seeing it for the first time, you're just memorizing language rules without making any attempt to understand the underlying concepts, and you should think seriously about whether programming is for you.

Clearly the author of the blog post I linked above does have programming ability. That's why he struggled for so long. He now understands something that makes no sense.

Friday, February 17, 2012

Scala, clusters, and numerical computing

This post is mostly for my own reference, but I suspect that others might be interested as well. First, a brief explanation. In the C++ (ie, native code) world, it is easy to set up a Linux cluster and run jobs using MPI. Searching for "Java MPI" is unlikely to inspire you to use Java or other JVM languages if you currently use MPI to run jobs on a cluster.

"Cluster" as I'm using it here refers to a group of different physical computers in different locations each working on different parts of the same job. This is usually called a "Beowulf cluster". Simulations can be easy to parallelize because each iteration is independent of all the others, and if you're doing 100,000 simulations on 100 processors, there is never a need for two processors to work on the same job.

It's good to see that Scala has at least two options for running jobs on clusters:

akka - Also available for Java
Spark - Runs on top of Apache Mesos, which itself works with other languages such as Python, Java, and C++

I've not used either yet, but hopefully sometime in the next couple weeks I will have time to compare them. Here are some links to other numerical computing projects that I have or plan to check out.

scalanlp Contains some optimization routines, linear algebra, etc.
scalalab
JGAP for Scala*
Parallel Colt (Java)
Stochastic Simulation in Java (Java)
jblas (Java)
Rserve (Java)
Call Octave from Java (Java)
Call Scilab from Java (Java)
JPOP* (Java)
JavaNNLS*(Java)
Apache Commons Math (Java) Already part of scalalab, but Clojure or Java users might be interested
Mantissa (Java)
Java Genetic Algorithms Package* (Java)
Michael Flanagan's library* (Java)
Java Matrix Benchmark

* These might be good, but they are smaller projects for which I have little information about the author, so you probably want to investigate before using for anything serious.

Conclusion: the momentum for serious numerical computing on the JVM is with Scala. I'm more comfortable with Clojure than with Scala, but both Clojure and Scala are so much better than the alternatives that I'd be happy to work full time with either. Clojure isn't there yet, let's see how far I can get with Scala. Most of the numerical library links are Java libaries, and will also work with Clojure.

Thursday, February 16, 2012

Some more Scala experimentation

Ended up with some free time to play with Scala again today. Like many programmers, I learn a language by doing the things I do in other languages.

Today's project was to try out jblas. I gained a greater appreciation of the language from just a few lines of code. The (simple) challenge: create matrix A and matrix B filled with random numbers, then multiply them. As a baseline, in R this can be done using three straightforward lines of code:

A <- matrix(rnorm(12),4,3)

B <- matrix(rnorm(12),3,4)

C <- A %*% B

Here is the Scala code:

object tryblas {
  import org.jblas.DoubleMatrix
  
  def main(args: Array[String]) {
    val A = DoubleMatrix.randn(3,4)
    val B = DoubleMatrix.randn(4,3)
    val C = A.mmul(B)
  }
}

Type inference means I don't have to specify that A is "new DoubleMatrix" - the language figured that out for me! It's more lines than the R program, but not by much. There's a little overhead in a Scala program, and you have to import the jblas library. The Scala IDE even allows you to see the inferred type of A, B, or C by placing the cursor over the name. Impressive! The Java approach for some reason seems reasonable when I use Scala.

Wednesday, February 15, 2012

Calling R from Scala

My main programming language is still R. Incanter, while a nice attempt, is definitely not anywhere close to allowing Clojure to be a replacement for R. The JVM is not suitable for the numerical programming I do, and I find myself spending too much time trying to figure out how to change the code to speed it up.

Thus, as I've started learning Scala, I want to make sure I've got the backup of easy interoperability with R. That's basically what I did when I started learning Clojure. Here is my attempt to do the same thing in Scala. I'm still a beginner, so I won't claim to be able to write good Scala code yet.

My preference when using JVM languages is Eclipse. For Scala, there is an excellent IDE available. I followed the usual instructions to install it. (There is plenty of information on the internet, but it's basically pasting the link they provide into Eclipse, clicking some buttons, and restarting).

I switched to the Scala perspective, started a new Scala project, and added a new file under src. I then started typing. I was really impressed with the Scala IDE. It's still early in this adventure, but based on what I'd read, I was expecting it to be a piece of garbage. It was incredible for the small things I did with it. I especially like how it points out errors while you type. It's just a pleasure to use.

To access R, I downloaded the two jar files that are made available for Rserve. I use Linux, so I opened R in a terminal window, did "library(Rserve)", then "Rserve()".

Back in Eclipse, I right-clicked on the project name, went to "Build Path", then "Add External Archives...". I chose Rengine.jar and then RserveEngine.jar. Those names appeared under "Referenced Libraries" below the project name.

After a few small mistakes, this is the program I came up with. (It's released under the GPL version 2 or greater if you're wondering.)

object rs {
import org.rosuda.REngine
import org.rosuda.REngine.Rserve.RConnection

def main(args: Array[String]) {
    val c = new RConnection
    val d = c.eval("rnorm(10)")
    val e = d.asDoubles

    for (i <- 0 until 10) {
        println(e(i))
    }
}
}

Edit: Scala has a better way to print an array than the C-style for loop that I used. You can replace the for loop with

println(e.deep.mkString("\n"))

I chose "Run" from the menu, followed by "Run" again. After a brief wait (presumably for compilation), I got the following output:

1357939434863817

45168475066687624

8312101791389808

-0.8825102796935446

9597726735219193

5916220359121344

-0.8979526251577276

2217958737969706

-0.013414880990143055

021108802828942192

That's it! Super easy. I called R from Scala, had R generate 10 random numbers, shipped those random numbers back to Scala, and printed them out. I can now do anything with "e" that I would do if I had created the random numbers in Scala.

The only drawback is all the syntax. I'm resigned to the fact that most programmers think syntax is good.

Friday, January 20, 2012

stackoverflow needs tweaking

stackoverflow is a wonderful site. It's the first place I go when I have a question related to programming.

Unfortunately, I think it needs a few tweaks. My sample may be small, but lately (the last few months) I've been seeing too many cases where there was a cry of "Duplicate! Duplicate! Duplicate!" or even closing the question. That wouldn't be a bad thing if Question A was, 'What does "Error: Variable not defined" mean?" and Question B was, 'I got this error "Error: Variable not defined". What does it mean?"

The problem is that in many cases those voting to close are trolling rather than looking out for the site. In some cases it is probably just trolling by the immature, with too much time on their hands and too little common sense, but in others it is regular users having fun with power.

Here is one good example. The OP lists a couple of things he's been doing to debug his programs and asks for other suggestions. If that's not a perfect question for stackoverflow, nothing is.

Yet it was closed on the grounds that "We expect answers to generally involve facts, references, or specific expertise; this question will likely solicit opinion, debate, arguments, polling, or extended discussion." Good chance that the trolls voting to close have never written a line of R code.

Here is another example. It was reopened, but as user Dan Burton wrote in his comment, "I checked the profiles of the 5 people that closed this, I'm rather appaled that none of them have made a single contribution to the scala tag."

By the standard that seems to be in place right now, even the question I started with, where the user gives an error message, should be closed. There are multiple reasons you might get a certain error message. That leads to opinions rather than facts, because you'd have to make a judgement about the most likely reasons for a certain error to occur in a given language. It might also be interpreted as a question not about a programming language, but rather what is wrong with the programmer's training that would make him arrive at that error message. (If you think I'm stretching things, dig around on stackoverflow and look at the closed questions.)

My goal, though, is not to talk about these specific cases. Maybe they are terrible examples - whatever. My point is that the same thing happens again and again, and it is creating a hostile environment, particularly for new programmers. It used to be that I could recommend stackoverflow for someone learning a language, but I've stopped. It is very discouraging to ask what you think is a reasonable question and then be subjected to a public flogging. (Even when the question is not closed, the answers are more harsh than a couple of years ago.)

Traditional trolling is bad enough, but at least you can delete or ignore those comments and get on with your business. This new-fangled trolling puts your own posts on the line, and the "closed" message is an official "you don't belong here" message, sent by stackoverflow.com, that the whole world can see. That's kind of hard to ignore, particularly if you used your real name.

Maybe one solution would be banning the close trolls for a week if they vote to close two questions in a 30-day period that are then reopened. Cries of duplication are not as easy to monitor, but they're more in line with traditional trolling, so I don't know that it is as much of a concern. Whatever the solution is, something has to be done to make the site friendly again.

Edit: Just came across this gem. Somebody named Michael Petrotta explained the reason for his vote to close: "@Jay: I was one of the people who voted to close. I thought you might be interested in my reasons. I don't program in C++, and I don't feel much, one way or the other, for the language or its adherents. I read your "question", and saw an attack on a project, with a lot of nasty name-calling. I read the answers so far, and saw a bunch of defensive, content-light replies. I didn't see any value in the question or its answers to date, and I voted to close."

So you've got someone who admits he knows nothing about C++, doesn't care about C++, and he's voting to close the discussion? And he's allowed to continue trolling?

Saturday, January 14, 2012

Use Counterclockwise or Clooj if you want to start with Clojure

It appears that I didn't do a good job of writing my previous post about Emacs and Clojure. I was trying to give a flavor of what someone new to Clojure would think when encountering "Emacs + Clojure is the best development environment". Seriously, getting it set up sucks. Really bad. I speak from experience, having tried to get someone set up using Clojure and Emacs recently.

What you should do is use Counterclockwise. It's wonderful. It's easy to set up. It has lots of features. It's a joy to program in Clojure when using Counterclockwise. That's what my friend is now using.

The other alternative is Clooj. That is truly the easiest way to get going. Just download and run. The disadvantage is that Swing apps don't look that great on Linux. I've gone through all the guides but the fonts still aren't what I want. To my knowledge it doesn't work with dual monitors. But if you want to get started quickly, you cannot possibly do better than Clooj.

Note: I'm not saying you shouldn't use anything else, just that these are awesome for beginners, and others may or may not be.

Friday, January 13, 2012

Is this a good introduction to Clojure?

So I wanted to help someone get started with Clojure today. For those who haven't done much with Java before, the incredible overhead associated with doing the most trivial tasks leads to a bad impression.

To make matters worse, I was trying to set up Emacs for use with Clojure. I'm not an Emacs noob. I used ESS as my main development environment several years ago, but was not impressed, and moved on.

[Edit: Used bold font to emphasize that I'm writing from the perspective of someone who decided he'd try Clojure using Emacs. My personal recommendation is to use Counterclockwise or Clooj.]

This is what someone new to Clojure would experience.

Go to the main Clojure page instructions.

"Install clojure-mode."

Don't know what that means, but I'll click the link.

"If you use package.el, you can install with M-x package-install clojure-mode. Otherwise you can do a manual install by downloading clojure-mode.el and placing it in the ~/.emacs.d/ directory, creating it if it doesn't exist. Then add this to the file ~/.emacs.d/init.el:"

Well, not being a total Emacs noob, I tried M-x package-install clojure mode. I got [No Match] in return.

What is clojure-mode.el? Where do I download it? Google found it for me. I downloaded it to the ~/.emacs.d directory. I added the lines to init.el (though I had to create that file, something the instructions didn't mention).

The vast majority of new Clojure users would have given up by now. Users should not have to figure things out for themselves - that is the purpose of documentation.

So let's move on. There's a discussion of how to set up package.el. Okay, that doesn't apply to me.

Then there's a discussion of clojure-test-mode. It says "This source repository also includes clojure-test-mode.el, which provides support for running Clojure tests (using the clojure.test framework) via SLIME and seeing feedback in the test buffer about which tests failed or errored. The installation instructions above should work for clojure-test-mode as well."

So do I need it? Is it something that most users install? I'm going to skip it. I can do tests when I get to that point.

Then there's something about paredit, which I hate, but it's recommended for all users.

Following that, there's a section titled "Basic REPL", which says, "
Use M-x run-lisp to open a simple REPL subprocess using Leiningen. Once that has opened, you can use C-c C-r to evaluate the region or C-c C-l to load the whole file.
If you don't use Leiningen, you can set inferior-lisp-program to a different REPL command"

The only way you'd have any idea what that means is if you knew beforehand. That may have set a record for the least informative paragraph ever written. I get tired when I read something like that. What is meant by "different REPL command"? Heck with it, I'm returning to the main page. {Note: this is explained on the main page, but the reader isn't told that.}

I created a new clojure file in Emacs (ending with .clj). There's no automatic indentation! How on earth do you write Lisp code without automatic indentation?

Eventually I got it sort of working, though I never got Clojure 1.3.0 to work. This is not a pretty introduction to Clojure. {I'm not saying it's my introduction to Clojure. It is the introduction to Clojure for anyone who reads Emacs + Clojure is the best alternative.} There's a lot of room for improvement.

I recommend immediately eliminating all references to Emacs when talking about Clojure, with an exception for "Don't waste your time with Emacs if you want to learn Clojure."

Friday, January 06, 2012

The book the world needs

The world needs "Introduction to Functional Programming Using Clojure".

This is needed for at least two reasons. First, the existing books on Clojure require an understanding of basic functional programming. All of the currently available books do a good job of explaining the language. For the average Java or Python programmer, though, it would be tough to learn Clojure if the only resource were one of those books (or all of them, for that matter).

Second, "introductions" to functional programming are usually too academic. SICP, for as much as I love that book, is a book about computer science, not software development. Scheme will never, ever be a programming language embraced by industry. That doesn't mean there's anything wrong with Scheme, but it does mean that if you use Scheme, the perception will be that functional programming is not something you'd do with a "real" language, so I guess that is something wrong with Scheme.

I've been reading about Standard ML recently, and while I have learned a lot, I can't imagine anyone taking ML (SML or OCaml) seriously. I mean, in the sense that the perception would be any better than the perception of Scheme. Moreover, I've tried to read about Haskell, but it's hard to go more than about five minutes without falling asleep. There's just too much religion about pure functional programming. I can see why it appeals to mathematicians, and I can see the point of Steve Yegge's post on Haskell.

Even worse, there's discussion of type systems, which is an advanced topic. It's my belief that dynamic typing is better for beginners. Don't get me wrong: on technical grounds, the ML and Haskell crowds might be right. I'm not sufficiently informed to take sides in the debate. It's just a matter of perception, and ML and Haskell are non-starters IMO.

Clojure's not-so-secret weapon is the JVM. It can run any Java libraries, and thus it has instant credibility. Unlike Scala, it has dynamic typing. It has the coolness factor, too, because of all the concurrency-related stuff that makes it appear to be cutting edge. The language has an outstanding community with members that have - get this - people skills. The creator gives interesting talks. He is happy to incorporate academic ideas, yet comes at everything from a practical perspective, based on years of real world software development.

Perception matters. Clojure may be what those of us who believe in functional programming have been waiting for, in terms of acceptance. The fans of Scheme, Common Lisp, Haskell, ML, and all other functional languages should be happy about this. Clojure is the first functional programming language that doesn't have to worry about the perception problem. Once a functional programming language, as opposed to functional features in non-functional languages, gains acceptance, developers will ask "Which functional language should I use?" Then the door is open for all the other languages.

So back to the book. I promise I'll preorder it when it becomes available. You can even use my title.