Wednesday, July 04, 2012

Blog discontinued

This blog is no longer active. Some of my posts are still getting hits, so I'll leave it up for anyone wanting to use it as a reference, but I probably won't respond to comments. I place everything in every post in the public domain. You can use the information in any way you wish, with or without attribution, but if you do attribute something to this blog, it has to be an exact quote.

Monday, June 18, 2012

Not much to blog about

I haven't posted much related to programming lately. Looks like I've gone about as far as I can with Clojure, given the limitations of the JVM. It's really a shame because I'd love to use it more. There's no way around it, Java is simply not suitable for numerical computing due to the lack of libraries (for what I do, not necessarily what anyone else does) and the speed difference between Java and native for a handful of tasks like linear algebra. I've done a lot of research on it and concluded that Java will always be a good deal slower for some things (jblas may be a step in the right direction).

Maybe clojure-py is the answer. It offers a native Clojure solution. For now, due to time constraints, I've decided to set Clojure aside. This will most likely be the end of blogging about my Clojure and Lisp experiences. It's been a great journey and I hope that in the future I will be able to program in Lisp on a regular basis. I may occasionally comment on a piece of news in the Lisp world, but that's it.

Where will I go? The honest answer is that my current combination of R and Fortran works well. R is not a bad language if, like me, you believe in functional programming.

For a while, Julia was getting a lot of attention. It has potential. In a year or two it might be worthwhile to dig into it. It has a lot of features, but there's not a lot that's new, so I have no motivation to work with it now.

Haskell and OCaml would be great languages to learn for the sake of learning, but given my time constraints right now, I'm not sure the benefits would outweight the costs. (F# is not going to happen either.) I don't see either as realistic replacements for R.

How about Scheme, Common Lisp, or newLISP? Each has its uses, and I still use newLISP, but they also have drawbacks. Scheme documentation is crap, plain and simple. I know, Racket has good documentation of the language, blah blah blah, I've tried Scheme and I'm not smart enough for it. Common Lisp could be great, but a language that doesn't evolve is useless. I'll use the next revision of the Common Lisp standard (LOL) that has been updated for the 21st Century. newLISP is a scripting language, and I love using it in that capacity, but it will take more than that to replace R.

I'm not interested in Go. It's a better C. I have as much use for a better C as I do for a better Mandarin. If you don't speak Mandarin, and don't have a reason to speak Mandarin, a better Mandarin is not going to excite you.

I may give D another try. I like the language. It provides little in the way of numerical libraries so my best guess is that I would be better off with Clojure than with D.

That's where things sit. When I have something worth blogging about, I will do so.

Monday, May 14, 2012

Clojure with Python syntax, part II

A couple months ago, I posted some thoughts I was having about writing a DSL that would make it possible to write Clojure without the s-expressions, the nesting, and back-and-forth indentation style that is so common in Lisp programs. I said the following would be the goals of the project, if I were to do it:
  • Eliminate the Lisp syntax barrier to getting programmers to try Clojure.
  • Make Clojure acceptable to universities that want to avoid Lisp syntax.
  • Provide a stepping stone so that programmers can learn Clojure concepts, then when they want more control over their language, they can start writing Clojure without giving up any of their existing code.
The post was motivated by my attempts to get others interested in Clojure. I'm a true believer in Clojure - I don't think there's another language that offers anything close to the experience. It's just a well thought out language that was implemented by someone with a lot of programming experience. The meaning of the term "pragmatic" has changed in recent years to mean "excuse for hypocrisy", so I won't say Clojure is pragmatic, but it certainly is a practical approach to functional programming. I think it is the best choice for those learning to program as well as those who want to get work done.

That is why I was disappointed at the way Clojure's syntax, which is quite an improvement on the Lisps that came before, is still viewed as clumsy and hard to read. I've always loved the consistency of Lisp syntax, and have always felt that the advantages of consistency outweigh any of the complaints. Too bad that's not true for everyone.

I've come to accept that humans treat their first experience as the default. Once you've reached puberty, you can forget about speaking a foreign language without an accent, because you will always, to some extent, try to put the new language into the form of the old. An accent is nothing more than pronouncing the new language using the "idiomatic" form of your native tongue.

I see no compelling reason that Lisp syntax should be a barrier to using Clojure. If your "default" is that a language should look like Python, then you should be able to write Clojure in that form. You'd lose the power of macros, but how is that a problem for someone who doesn't know Clojure or any other Lisp? When they're ready to take the leap to start writing macros, they can start working with s-expressions.

I did some searching on Google before my previous post. I had hoped this would already have been done, but I couldn't find anything. Richard Gabriel and Guy Steele even have a section in this paper devoted to alternative Lisp syntax. The problem is that all of those alternatives are ugly, much worse than Scheme or Common Lisp, so it's not at all surprising that none of them took off.

Basically every similar project has the same fatal flaw. They're all trying to be Lisp without parenthesis. In my experience, the objection to Lisp has little to do with parenthesis. Even when you end up with something like this: )))))))), the problem is that you've nested eight things in one statement, usually with varying levels of indentation, so it's not really the eight parens that causes the grumbling. Python, Ruby, or C programmers are unlikely to have eight levels of nesting in their programs.

The thing to realize with alternative Lisp syntax is that you can't save macros or prefix notation. Clojure != macros. Clojure != prefix notation. Clojure does so much so well that there is no reason to fear the loss of macros. I know about languages like Dylan. Dylan is not in any way worthy of comparison with Clojure. As a matter of fact, Clojure with Python syntax wouldn't even be a Lisp anymore, and that's okay, because I don't really care about Lisp. I could still write my Clojure with parens, it wouldn't make any difference how others write theirs, since it would still be Clojure. I can sit in my basement on Saturday night as a hobbyist writing macros, but I'd rather use Clojure all day at the office, even if I'm not using Lisp.

I haven't had time to dig into it as deeply as I'd like. Here's what I have concluded on the basis of my research:
  • I don't know anything about writing a DSL, parsing, or code generation.
  • I'm an intermediate Clojure user, so I don't yet understand some of the features of the language, and I don't understand most of the language as well as I should.
  • I don't use Python much these days (I don't think I've written a line of Python code in the last year). I don't understand what is so readable about Python, it looks the same to me as any other language.
  • Nobody would care if I did write it. I could show it to the people I know, but being realistic, few people would use it.
Combining my current time constraints with the huge cost of learning everything I'd have to learn, and comparing with the benefits, I can't justify doing it. I nevertheless think it would be a worthwhile project for someone who has no kids and is looking for a way to kill some time.

Sunday, May 13, 2012

Thanks to StackOverflow? Not really, to be honest

I always find it interesting when I read praise being heaped onto StackOverflow and StackExchange. Let's see, they run a business, it's been successful, and most of the contributions are made by others. A fanboy even recently posted on Hacker News about how good it felt to get a signed letter from Joel Spolsky thanking him for his help. He should get a letter. He's put a lot of money in Joel's pocket. Let's not act silly, it was started as a business venture by Jeff and Joel, and if it hadn't worked, they'd have killed it right away, no matter how helpful it was.

Since my last blog post on stackoverflow, about out-of-hand closing of questions, I've come to another realization: I no longer use stackoverflow. Let me explain. I will search for a problem I'm having, and Google will regularly point me to stackoverflow. Almost without fail, if there's something helpful on stackoverflow, it has a date prior to 2011.

Why would that be? The last couple years there has been an attempt to close as many questions as possible. The questions that you are allowed to ask are all things that can be found in the documentation. For R, there's almost nothing that would be allowed on stackoverflow that you can't find by typing ?search-term. Even worse, a lot of the answers on stackoverflow are not good, because they're either noise written in an attempt to get points, or they're just wrong. There are exceptions, but answers usually only point you in the right direction, so you have to consult the documentation anyway.

The goal is to eliminate discussion. That means you have someone asking a specific question with a single, simple answer, and someone writes out that answer. The problem is that there are very few interesting programming questions that fit into such a format. "How can I debug a program in R?" is an example of a question that is considered inappropriate. It's very important, it's not easy for a new R programmer to figure out on her own, yet it's off-limits because it might result in someone stating an opinion. Something that can be posted is, "What is wrong with this call: x <- solve(y,z,linpack=TRUE)?" Snooze. It'll get four answers. Two will be general discussion, one will be an off-the-wall, totally wrong answer from someone looking for points, and one will be from someone summarizing ?solve. Yeah, not the world's biggest innovation.

These days I only visit the site to look at answers to questions that are at least a couple of years old. I don't have much reason to visit a website that is a substitute for looking things up in the documentation. Perhaps a new website, that allows for interesting questions, will appear. The strategy worked for Jeff and Joel, they made a lot of money before they decided to strictly enforce the "I'm too lazy to read the documentation" limitation on questions, hopefully someone else will take the idea and run with it.

Tuesday, April 10, 2012

clojure-py 0.2 is out

And now things are starting to get interesting. Multimethods are implemented. I would have a hard time using Clojure without them. Various other improvements are also present but that's the big one for me.

It's still much too early to think about putting it to use on projects you use to feed yourself. It's not too early start writing software in it, so that you can find, report, and fix bugs. And documenting. And writing tests. And writing examples.

For those of us who whine about the JVM, it's time to start looking seriously at clojure-py. Please excuse me so I can go drink a beer to celebrate. As the saying goes, it's five o'clock somewhere.

Sunday, April 08, 2012

In Defense of R

This post has been planned for several weeks now. After I saw this presentation I was in the mood to finish it. Although I question the choice of John Cook given the other speakers at the conference (to my knowledge, he's not involved with the core R development), I'm not responding to anything he said, as everything here was outlined weeks ago. I had never heard of John Cook before this presentation. His description of R as "a strange, deeply flawed language" does capture the views of the critics.

Some of the criticisms thrown at R include: the syntax is inconsistent; the language is poorly designed - by statisticians, heaven forbid, rather than computer scientists; too much stuff was added to the language through time rather than being part of an initial grand design; and it is slow, slow, slow.

The last criticism is correct. Loops are slow relative to Fortran or C++, for sure, but that reflects the interactive nature of the language. The same is true for Python, Ruby, and a whole lot of other languages. A JIT compiler is on the way, I've read. There already is a JIT compiler, but I'm talking about one that gives 10x or 20x or better speedups for some tasks relative to the current version of R. Since I don't know a lot about it I'll let you use Google to find information if you're interested.

The other criticisms I've listed are not accurate descriptions of what the critics dislike. What they're really saying is "R is flawed because I tried to do X and the behavior was not what I expected" and "R is flawed because it does a lot of things I don't need and don't understand."

Not all criticisms can be put into one of those two categories. R's approach to OOP (S3 or S4) leaves something to be desired. The language offers no support for tail calls that you get in a language like Clojure or Scala. That's a perfectly reasonable statement of areas in which R is lacking.

The source of most criticisms of the design of R is that the language does so much. Nothing else that I've tried comes close. There are a ton of packages available. You can see the R source code. You can get detailed information on the internals of R. The language supports functional, object oriented, and FORTRAN 66 programming styles. It's easy to interface with C++ and Fortran code when you need speed. The metaprogramming capabilities are astonishing.

And not to be lost is that this is all functionality of a statistical programming language. Statistical analysis is not something bolted on top of a language designed by computer scientists, or web developers, or someone who decided to improve the state of imperative programming. The core of the language is functions for analyzing data. You can load a dataset, do some manipulations, and estimate a probit model without loading any external packages. There are an enormous number of tradeoffs involved with writing a language that does everything I've described here (and I've only scratched the surface). I rarely see the critics acknowledging that - yet it is precisely the many things that R does that makes it so much better than the alternatives.

Lists are a good example of something that seems "weird" to the C or Matlab programmer. They are very important in R. Someone who has programmed in Lisp already knows all about R lists, but most of the critics do not have that background, so they think lists are a flaw rather than a feature. Making matters worse is that the term "list" is used in different ways, so even if they've seen a list, it probably wasn't important to the language and wasn't used the same way.
In R you can do:

x <- list(x=rnorm(100),z="Blackbeard the Pirate")

Matlab programmers think R is an inferior program because they're used to jamming everything into a matrix or array form. They see someone using a list and can't understand why you'd use such a complicated, verbose solution. The Matlab solution is easier. You create the vector x and then write a comment so that you can keep track in your head that x corresponds to Blackbeard the Pirate. It's much less typing.

The Matlab programmer then sees data frame objects everywhere when using R. In Matlab, you store data in a matrix, so to get your data out you take a slice from a matrix. In R, a data frame is a list with elements that have equal length. Not understanding what R lists are, but not willing to spend time learning what a data frame is, the Matlab programmer gives up and says "R has a lot of useful functionality, but it's strange." And they're right. What they don't realize is that "strange" is a reflection of their limited knowledge, not a reflection of the poor design of the R language. It's not so easy to keep track of the index of the column holding each variable when you've got 600 variables.

That's a simplistic example, but one I've seen many times. Different == wrong. Going a little deeper, a lot of newbies coming from other languages don't realize that vectors hold more than just numbers and characters. A vector can hold NA values for missing data. A vector can have one element (a scalar). It can have zero elements. You can work with NULL objects in vectors. For instance, you can do this

c(1,NULL,2)

and you'll get back a vector with two elements. Or you can do this

c(1,NA,2)

and you'll get back a vector with three elements. I realize that they confuse newbies, but given just how helpful it can be when you need it (which is all the time for me) you view it as a feature of the language, not a flaw in the language's design.

A third example is

cbind(1:6,1:3)

The output is

      [,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    3
[4,]    4    1
[5,]    5    2
[6,]    6    3

Sometimes recycling of elements gives a warning, sometimes it doesn't. If you try

cbind(1:6,1:4)

You get a warning message:

Warning message:
In cbind(1:6, 1:4) :
  number of rows of result is not a multiple of vector length (arg 2)

Recycling means R won't identify some of your bugs. When you need recycling, though, it is very nice to have.



These are three examples of language design decisions, not language design flaws. Someone coming from another language may not anticipate that R works the way it does. That argument doesn't mean much to me. Everyone would laugh at me if I said the design of Java is deeply flawed because I've been using R for years and the OOP approach in R doesn't work in Java. I think it's no less silly to criticize the design of R because it doesn't do things the way you expect. Maybe there needs to be a better way to communicate to newbies what the language is doing. Maybe R should have a newbie mode with lots of warnings. Maybe it should come with better documentation. But to criticize the language design on that basis is absurd.

The reason R is heavily used is not just because of the existing packages. It's also not because statisticians don't know anything about software development. It's because R is a programming language for statistical analysis. You don't get that with SAS, SPSS, or Stata, what you get is a way to call prewritten functions and something that sort of resembles a programming language. You feel shortchanged when you use them. I've put much effort into fitting my problem into the limitations of those programs. In a few cases I even had to change what I was doing due to their limitations. Python, Java, C++, Ruby, or whatever happens to be the latest fashion are general programming languages. You can write yourself a library if you want, but that's far from what R offers.

Julia is the current flavor of the month. I looked at it, and when I can actually do something useful, maybe I'll give it a try. It's easy for a language to appear clean and elegant if it doesn't do anything useful. Use it for a year as a replacement for everything you do in R, then get back to me about how clean and elegant it is. How does it handle missing data? How does package management work? How's the documentation system? Clojure is my favorite language, but I tried to use it as a replacement for R, and I didn't get anywhere. And not just because of the existing collection of R packages. I was doing so much with Rserve that I finally realized I was better off to go back to R.

R should get some credit for being the language that offers the right set of tradeoffs for many of us. The Bjarne Stroustrup quote, “There are only two kinds of languages: the ones people complain about and the ones nobody uses” comes to mind when reading criticisms of R.

Monday, March 26, 2012

Some people are genuinely too stupid to run a business

Here's the link.

And they were hoping to accomplish what, exactly? I've never heard of geeklist before, and I don't expect them to be in business very long. If I had given them funding, I'd be kicking myself.

Even if they thought they were right (they're not) you have to know how to pick your battles. Some people are just too emotional for the business world.