• ldobe

It sounds to me like this is a problem of characterizing a highly chaotic system. Earlier models were good at finding the maxima and minima in the system, but since we don’t have the ability to measure every known and unknown factor at play with arbitrary precision the more we hone our models the more of an impact unknown factors have on the model.

Sort of like the ladder of pure sciences, sociology is applied psychology, psychology is applied biology, biology is applied chemistry, and chemistry is applied physics. But that doesn’t mean that sociological phenomena can be accurately or reliably predicted using fundamental physical theory (yet)

• Keisar Betancourt

also politics is applied sociology and philosophy is applied politics? something is going wrong here! politics is applied philosophy. where do you think politics and philosophy fit on that scale? nowhere? but you said sociology so… they’re relevant.

• axlrosen

I think you’re confusing accuracy with precision? http://www.mathsisfun.com/accuracy-precision.html

• ldobe

Actually I think they do mean accuracy (I skimmed the article).

As they build on the old models, they add on terms for more factors, thus making a model that more accurately accounts for the moving parts in the real world. The problem is, they can’t feed the model enough varied training data, and tuning the model to the invomplete, but available data seems to reduce the reliability of its predictions.

The models give very precise answers using very accurate modeling, but unfortunately the data isn’t precice enough, accuate enough or available to reliably predict the real world. You end up with models hypersensitive to outliers, and overspecialized to the data they’re trained on. When the math is redone to make it more generalized, we can then end up with too little predictive power, and models that don’t reconcile the outliers.

I’m reminded of when I first started using graphing calculators in high school. We learned how to do linear regressions on a set of points. Sometimes we were given sets with outliers which skewed the regression. I was unhappy with this, and discovered the calculator could also find quadratic regressions, cubic regressions, quartic, sine-based regressions etc. So I’d plug in the misbehaving set, and run the regressions till I found one that fit the data exactly. Unfortunately it would never predict new points that made sense knowing what the data was from. It was a case of wrong tool for the wrong job, and even though the models fit the data, they were still the wrong models to use.

• Boundegar

A very simple illustration would use a bell curve.  If you say that tomorrow’s temperature is going to be 43.8 degrees, your chances of nailing it are near zero.  If you say it will be in the 40′s, your chances are high – maybe 95%.  If you want 100% certainty, then all you can say is that it will be between -100 and 212F.

• Luther Blissett

- Ring ring.
- Hello, who’s there?
- It’s the Bell curve. Just rang to say Reis didn’t invent me.
- Gauss, that’s interestin’ news!

• Keisar Betancourt

there’s room here for a fundamental concept of bound sliding scales that should be injected into a basic statistics class for elementary school students.

• Luther Blissett

Just one extra thought: neither the author of the post, nor the linked ‘nature’ and JCompAidedMolDes references mention anything about collinearity of included factors. Which, in fact, is something different than just ‘adding noise’. Entirely. I wonder why they don’t mention it. Probably because usually modelers just kick out one of the factors which are collinear?