Monday, March 29, 2010

Slightly missing the point

I am, of course, pleased that Nature Materials ran a news item about our recent paper.  However, they appear to have missed the point a bit.  What we were trying to point out was that plotting data in scaled coordinates (current normalized by temperature to some power vs. voltage normalized by temperature, in this case) can be misleading.  In this particular case, plotting temperature-independent data in this way on a log-log plot can make it look like the data all collapse onto some universal curve (with deep implications).  In fact, the data themselves aren't doing any such thing - the apparent collapse is due to a flawed plotting procedure.  Ross McKenzie got this point immediately.  Ahh well.  Bottom line: be very careful when plotting "scaled" quantities, to make sure that you're not biasing yourself toward a particular conclusion.

6 comments:

Don Monroe said...

I've never much liked scaling plots. They seem very compelling when they work, but they completely bypass or subvert our normal mental habits for evaluating the quality of a fit. When faced with noisy data, we naturally look for the trend that underlies the noise. In a scaling plot, the information that tells you whether it's working is the "noise."

Actually, it's worse, because you have to look carefully at each data set to see if that set on its own really wants to be part of that trend. Presumably the scaling has ensured that the central data are on the trend line, so the only information about the quality of the fit is whether the data sets on the extreme high or low values of the parameter have a systematic tendency to deviate from the trend. Unless some data sets cover a very wide range, it won't be at all obvious.

The worst example I saw was when a submitted paper tried to present a scaling plot in which the different data sets were all represented with the same symbol! This made it completely impossible to evaluate whether individual sets agreed with the trend. Of course the same problem occurs when the symbols are all small and the same color.

Douglas Natelson said...

Hi Don - Your second paragraph really hits the nail on the head. The problem with the paper that we were addressing is, they have data sets taken at different temperature, but each set extends only over a limited voltage range. Over that limited range, each data set is a decent (nonlinear) power law, and at sufficiently low T, each data set is not changing much at all from temperature to temperature. On a log-log plot, each data set looks like a little snippet of straight line. As T is decreased, because of the scaling variables, successive line segments are shifted up and to the right on the graph. If the scaling is chosen correctly, all the little line segments line up, and it looks like scaling, even if (and this is the main point) the data themselves are not changing with T at all.

Anonymous said...

The data set in question is from Heeger's group? Why should anyone be surprised......

DanM said...

Irrespective of such contradictory conclusions, I'll still read your blog, Doug.

Tobias said...

Now, now, anonymous name-calling? That's not good style.

Anyhow, my two cents on this is that scaling or not, you always have to be very careful trying to extract data with a model in mind. For our recent extraction of the phase diagram of the 1D imbalanced Fermi gas, we initially had a model that had enough assumptions about what the final diagrams should look like in it, that you could feed it junk and it would produce something that might not even look too bad.
It's amazing what you can "find" if you know what you're looking for. I had previously not appreciated how important it is, bearing in mind for example the recent post here about CDMS, to have well defined conditions about what you will accept as "events" or data w/o knowing what you data will look like.
I remember seeing someone quoted as saying "The most important discovery in the [medical] sciences in the last century was not Penicillin, but the double blind test". I'm not sure that is that much of an exaggeration.

cccwind said...

sometimes, data fitting and scaling can be very tricky