Data Analysis in Python, the Literate Way
Our lab uses Mathematica quite a bit for data analysis and building models. There aren’t that many other people in psychology at NYU (or elsewhere) that use Mathematica. Part of the reason is the large number of libraries exist for Matlab that specifically help with fMRI analysis or experiment design. I guess Mathematica works particularly well for the specific kind of work we do. One of the key advantages of Mathematica is the interactive notebook and the high quality/flexible graphics system.
For those who haven’t used it, the Mathematica notebook allows you to combine text, code, and figures/plots in a single multi-media document. This is very helpful for building up and testing complex bits of code and for exploratory data analysis. Rather than cutting and pasting bits of code from a text editor into an interactive interpreter (as in Matlab, Python, or R) the notebook allows code and graphics to coexist in line with one another. It turns out it is very helpful to keep the graphics from one analysis tied to the code that generated it (along with plain text describing it). For example, if I write some code for a data analysis I can have the plot of this data appear directly below the code itself in the notebook. Later, when I’m going back through the analysis (perhaps weeks or even months later) I can more easily make sense of what goes with what. It’s basically just a more logical way to relate the _outputs_ of a computation to the code that generated it (i.e., “literate programming“).
Anyway, as great as Mathematica is, there are a couple of important draw backs.
First is that Mathematica isn’t free (as in beer or as in speech). This isn’t such a big deal for us (NYU has a university-wide academic license which keeps the cost lower). However, it is hard to involve undergraduates in the research process since it might be too costly to buy licenses for all of them. I’m also reluctant to ask students in a class I’m teaching to shell out for a license (even though Wolfram Research has made this easier lately via their per-semester licensing deals). In addition, I worry that my psych undergrad or grad students are less likely to encounter Mathematica again (whereas they are somewhat more likely to run into R, SPSS, Matlab, or Python).
A second disadvantage to Mathematica is that it isn’t always as fast as something like Matlab or Python. I’m not entirely sure why that is (some matrix computations are optimized), but it likely has to do with the very powerful symbolic computation tools that Mathematica provides. In many cases, this can make programming much easier. However, once a complex model or simulation is set up, we often find it is more effective to translate it into Python which runs runs fast enough for most of our work.
Finally, the Mathematica programming language is pretty old. It doesn’t have very clean object-oriented design patterns and the syntax can be a bit obtuse. For example, compare these two statements which do the same thing, one in Python and one in Mathematica:
for i in range(10):
if i > 5:
print "This number is greater than 5 ", i
print "This number is less than 5 ", i
myfunction := Module[
For[i=1, i<=10, i++,
If[i > 5,
Print["This number is greater than 5 ", i],
Print["This number is less than 5 ", i];
The bottom line is that Python’s language is cleaner, it is more contemporary, it runs faster, the number of available libraries is immense (at least equal to, if not exceeding, the functionality in Mathematica), and it is free/open source. We use Python internally for all our experiments (check out our simple API for developing psychology experiments, PyPsyExp).
Given all this, wouldn’t it be great if Python had a notebook interface?
Well, recently, the possibility of leveraging some of the benefits of Mathematica’s “computational notebook” framework in Python has emerged (thanks to Jay Martin for telling me about this!). In particular, iPython (an “enhanced” python shell) has added a web-based notebook framework. I’ve been playing with the bleeding edge version in Github lately and I’m impressed (thus, this blog post!).
The basic idea of the system is that you launch a small webserver running on your computer (using the command ipython notebook –pylab inline). Then, you point your favorite browser (I’ve found things work very well in Chrome) at a particular local URL it prints out (e.g., http://127.0.0.1:8888). From there, the web application serves up an interactive notebook instantiated as a web page. It might not seems like a web-based interface would be really useful, but advances in AJAX have enabled fully complex, dynamic applications that run in your browser (think Facebook or Google Docs).
The current notebook format feels quite a bit like Mathematica’s notebook interface. There is the concept of a “cell” which links a bit of executable code and the resulting output. Cells can also hold text, Markup, LaTeX, or other types of text. In addition, a system is worked out for showing graphics from pylab/matplotlib, perhaps the most ubiquitous data plotting library for Python.
However, at the current stage, Mathematica’s notebook format is still much more refined. For example, you can’t change the color of cells, can’t collapse/hide chunks of cells/code at a time, can’t execute multiple cells at one (or groups of cells) in iPython notebooks. In addition, since Mathematica has a much more robust graphics system it is easier to export the resulting graphics files for “clean up” in Illustrator. Since iPython is a web-based app all graphics are converted into something like .png files for display which are harder to subsequently edit (although, of course, you can use matplotlib to write to a file on your local disc). Despite these limitations, development of iPython Notebook seems active (at least by the discussion on Github), and I’m sure many of these things will be addressed as time goes on.
p.s. This is a great link for getting it set up on Mac OS X Lion: http://minrk.posterous.com/install-ipython-qtconsolenotebook-on-osx-lion.
UPDATE: See also this page which I will be updating for my course with install instructions for various operating systems. Also, Fernando Perez (original author of iPython) shared this link about the history of the project and this link about scientific python.