October 05, 2004

Junk Statistics Alert

Kevin Drum is posting on this new paper that shows a link between Bush's approval ratings and Terror Alert warnings. When I clicked on Kevin's link to the paper I was taken to a...press release.

I personally consider the press release a strong indicator of junk science/statistics. It is not a perfect predictor, but when something is so "hot" as to warrant a press release, there is often somebody pushing the research for political reasons.

In looking at the paper it looks like the paper was a rush job.1

  • Submitted: September 4, 2004
  • First Revision: September 28, 2004
  • Second Revision: September 29, 2004
  • Accepted: September 30, 2004
  • Published: September 30, 2004

Another funny thing here. Isn't it the Gallup poll that many Democrats were crying about as having to large of a Republican sample? Here is a question, why wasn't this study done using more than one poll? Wouldn't it have been an even stronger case to see this effect in several polls?

Another cause for concern is the use of regression models. Are standard linear regression models used? If so, I'm not sure that is the right model to be using. Bush's approval rating is a number between 0 and 1 (1 corresponding to 100% approval). Suppose Bush's approval rating is 47% and the terror warnings boost Bush's approval rating by 65%. Are we to really believe that Bush's new approval rating is 112%? A plain old vanilla linear regression model might be methodologically incorrect here.

Another problem is that this is time series data, and one of the problems with time series data is autocorrelation2. There are tests for autocorrelation and remedial measures for solving the problem of autocorrelation when present. Good standard research, IMO, would be to conduct these tests to see if autocorrelation is present. As far as I can tell this was not done.

So what is the big deal with autocorrelation of the error terms? Well normally nothing big in that it the ordinary least squares estimators are still unbiased, consistent, asymptotically normal, but are inefficient (i.e., you'll be more to fail to reject the null hypothesis). It isn't horrible, especially if you have a fairly large sample. However, when you have lags of the dependent variable, as some of the models in this research have. In this case then the ordinary least squares estimators are no longer consistent or unbiased. In this case all you have is asymptotically normal with is fairly useless in terms of parameter estimation.

So why weren't these basic tests done? This looks to me like a rush job. The results were politically pleasing to the people at the journal and they wanted to get it out. I could be wrong, but I'm a bit suspicious right now.

Update: One of Kevin's commenters provides a link to a graph of Bush's poll approval numbers and terror alerts (here).

In looking at the graph, at least three of the orange alerts in 2003 came when Bush's poll numbers were on the upswing. The orange alert in 2002 coincides with a nearly simultaneous increase in Bush's approval ratings. One spike in Bush's approval rating actuall occures prior to the terror alert in 2003. Further, there are notable spikes in Bush's approval ratings when there were no terror alerts, so I am very skeptical of this paper. It is looking more and more like junk.

Another thing that makes me suspicious is the lack of data being available on the journal's website. Can't they pop a simple text flat file up there for others to download and look at the data?
_____
1In looking at other papers accepted by the journal this appears to be the norm. A fews days turn around from when the referee comments are sent back to acceptance to publication. This paper is notable in that it there was just one day between recieving comments, revising and resubmitting, then another day for publication. Comparing this to some of the papers in NAJ Economics, another online journal, we see a rather stark difference. The first article on the page linked above was first presented at a seminar in 2001. The second paper started in February of 2002.
2Autocorrelation is when the error terms in the regression equation are correlated over time.

Posted by Steve at October 5, 2004 12:59 PM
Comments

Well, it's obvious that the approval ratings caused the terrorism warnings. Isn't it?

Posted by: Slartibartfast on October 5, 2004 02:10 PM

Would Generalized Least Squares be good enough to deal with the autocorrelation problem? My econometrics is a bit rusty.

Posted by: Timothy on October 5, 2004 04:09 PM

Nope, and even so the paper does not use GLS or F(Feasible)GLS, or at least it does not indicate that it does. What is needed is an instrumental variables approach. Again, this is not mentioned in the paper.

Posted by: Steve on October 5, 2004 04:14 PM

"Another cause for concern is the use of regression models. Are standard linear regression models used? If so, I'm not sure that is the right model to be using."

It's not. They should have gone with Logistic in order to properly constrain the dependent variable between 0 and 1. Looks kind of slapdash...

Posted by: Bernard Guerrero on October 6, 2004 05:03 AM

This paper is pretty terrible.

With respect to Bernard's comment, logistic (and probit) models work when the underyling data structure is binary (or at least discrete in the case of multinomial logistics); that could work if the author had individual level data (which, in this case, he does not). Here, the dependent variable is restricted to be *between* zero and 1 (the dependent variable is a probability), so it becomes censored at those endpoints. Therefore, the closest canned package model would be a Tobit or some such, I believe.

Given the fact that Bush's ratings were pushing 90%+ for awhile, I would imagine this would become quite an issue in his model specification.

But this paper isn't really worth any thoughtful effort. I don't think it's a political hack job, it's just someone mindlessly using OLS to either get tenure or pretend to be active in order to get better job reviews and salary increases. Notably, it is a university of Iowa prof. publishing in a University of Iowa journal, right? Maybe they just needed filler, if you know what I mean.

Although I could argue that blank pages might be more enlightening . . .

Posted by: Victor on October 6, 2004 07:20 AM

OK, the econometrics are pretty bad. Including the lags, one must estimate some sort of IV model to do it properly. Steve already pointed this out.

A bigger problem is omitted variables which are correlated with the residuals. (here are more IVs, Steve). Employment growth, the unemployment rate, the Democratic primaries, and the time since 9/11 all have some sort of large effect, the latter particularly since it introduces serious nonstationarity. Using "economic approval" ratings is a cop-out. There are more data, not just Gallup, out there.

This is something that I'd expect from a mediocre undergraduate.

Posted by: Chris on October 6, 2004 08:21 AM

Victor,

That was my thought too, a Tobit model. I'm not sure a logit model is quite right either as that is for discrete dependent variables (dependent variables that take on 0 or 1 for you not stats geeks).

Chris,

Good point on the omitted variables bias as well.

Posted by: Steve on October 6, 2004 09:28 AM

Unfortunately, this kind of crap impresses tenure committees in the debased universities we have today.

Posted by: Robin Roberts on October 6, 2004 03:44 PM
Post a comment