Software Signals

Posted on January 7th, 2013

This blog post by Sean Taylor generated quite a stir. He discussed the signals one sends by using certain software packages and seems to think that R users are more competent. The reactions ranged from amusement to bashing.

 

 

While I don't think this type of post is particularly useful, it is fun (especially the John Myles White line), so I'm writing up my thoughts on the issue.

For better or worse, I think the software one uses certainly sends a signal.

I've heard others apply the same arguments to typesetting programs. LaTeX and Beamer, for example, are said to send a "technically competent" signal compared to Word and PowerPoint. For better or worse, I think I am vulnerable to these signals, although I don't use Beamer.

R and Stata are the software packages that I run into most often in political science, and I certainly have stereotypes of their users, but it is a matter of style rather than competence. (These are just my stereotypes.)

  • R users are more likely to be interested in graphics and simulation (i.e. Gelman and Hill). Also, R users are more likely to care about statistical programming. This is how I first became an R user. The first paper that I wrote as a graduate student required a lot of simulation and a few custom graphs, and I needed to do a little programming to get these right. I think programming is a powerful tool and that graphics and simulation are really important in communicating results. Because I think R users are more likely to emphasize these things, I update upward slightly on R users. That said, there are plenty of terrible methodologists that use R.
  • Stata users are much more interested in estimating fancier econometric models (i.e. Cameron and Trivedi). These users put less emphasis on model checking methods such as cross-validation and value complicated models (e.g. bivariate probit with partial observability) more than R users. Since I like model checking and think complicated models are over-used (or at least over trusted), I tend to update downward slightly on Stata users. That said, there are plenty of great methodologists that rely on Stata.

I don't run into users of other software much in political science, but I do in the statistics department. (Again, these are just my stereotypes.)

  • Matlab users work on more theoretical problems. By that I mean building and evaluating new estimators and methods, not proving theorems.
  • SAS users care about analyzing data. They work on real-world problems, probably for a drug company.

I use both R and Stata.

I rely mostly on R in my research. I occasionally use Stata for two purposes.

  1. Recoding data. Whenever I work with huge chucks of (especially survey) data, Stata offers a really useful set of commands for cleaning up the data.
  2. Maximizing a difficult likelihood. Sometimes I'll have a custom model and regular optimization algorithms (e.g BFGS) fail. In this situation, I use a little magic that is found in Stata's " , difficult" option. I don't quite understand why it works so well, but it is relentless. It is the single best feature of Stata.

I don't update much on users' competence.

While I do update on the methodological style of software users, I don't think I update much (if at all) on their competence. Here are some statements from Taylor's post that I disagree with.

  • "When you don’t have to code your own estimators, you probably won’t understand what you’re doing." I think that many people don't code their on estimators (and couldn't easily start), but understand what they are doing. I also think that plenty of people who do code their own estimators have no clue.
  • "When operating software doesn't require a lot of training, users of that software are likely to be poorly trained." I'm sure that researchers who don't want to learn statistics are much less likely to want to learn software beyond point-and-click, but I think that most people who are using any software and writing about it to the public are not "poorly trained."
  • "Researchers who care about statistics enough should have gravitated toward R at some point." I spent three years in the statistics department at Florida State. People over there care about statistics and most use something other than R. I've also met plenty of political scientists who care about statistics and use Stata exclusively. I do think that those who care about certain styles of analysis (e.g. graphs, simulations, and programming) are likely to be drawn to R, but I don't think it's universal.