Bad Data, Bad Decisions

The foundation of any kind of analysis is data. If the data are objective and unbiased, you have the opportunity to do quality analysis and gather valuable insight. If the data are substandard, you’d be lucky to find anything useful at all, and unable to trust anything you do see. That’s the bottom line: do it right or go home.

Unfortunately, that’s not how life works.

Once upon a time, there were “gold standards” for quality in the US. They included the US Census and, back in the day, The Gallup Organization, The Roper Organization, and The Harris Poll.

  • Dr. George Gallup basically founded polling as we know it in the 1930s. He and his sons ran the company until, after his death, the name was sold in the 1980s.
  • Elmo Roper was another of the early polling giants. His books are still great reads.
  • Lou Harris was John Kennedy’s pollster in 1960 and had a  widely read newspaper column for years after that.

These people were known to the public of their day. Cooperation rates with surveys were quite good, and the quality of data was excellent. Unfortunately, that’s much less true today.

The other trusted source of data was the US Federal government. The Census was considered a gold standard up until relatively recently, when budget cutbacks forced reduction in data collection efforts. Questionnaires were shortened and residents omitted.

Baffour and Valente discuss the inevitability of errors in Census data collection.(1) No question: data collection is undertaken by humans and humans make mistakes. (So do machines — errors that humans program into them.)

However, a lot of people rely on Federal data, for some fairly important decisions. Users include business strategic planners, government policy makers, economists and sociologists. The decisions based on these data affect everyone.

Which brings us to the latest shortcoming — a small example that illustrates a much larger problem.

The small example deals with mineworker safety. Admittedly, this is a shrinking industry with no perceptible future. However, the Federal government stepped into worker safety because of actions mineowners refused to take. However, it’s impossible to do enforcement when you don’t know what’s happening.

Questions about worker injury under-reporting in Kentucky prompted the creation of a study in Illinois.  The newly reported results of the Illinois study are that

  • There were only 4,141 workers in mining in Illinois in 2015.
  • Between 2003 and 2015, there were 5,653 worker injuries reported to the Illinois Workers’ Compensation Commission.
  • Only 1,923 were reported the US Mine Safety Administration.

Worse, some minors don’t file claims with Workers’ Compensation for fear of job loss. So the state statistics undercount actual events.

Obviously, mining is a dangerous line of work. However, the Federal data totally misrepresents actual experience, making the industry look much safer than it is.

We have similar problems with reporting of crime statistics. There are three kinds of errors in these reports:

  • Non-reporting
  • Incomplete reporting
  • Misclassification and inaccurate reporting

In one example, the Los Angeles Police Department misclassified 1,200 violent crimes in 2013 as minor offenses.(3) Sometimes the errors are accidental, but sometimes they may occur for personal or political reasons.  In the LA example, the Chief of Police was coming up for reappointment to a new five-year term.

While the FBI likes to assume its data is accurate, arguably, published statistics understate the level of crime in the US.

We have similar issues with healthcare. We actually don’t know how many cases of flu occurred in the US in the last three months or how many people died from flu. We know what was reported, but flu deaths can be (and are) attributed to a variety of causes and flu may not be mentioned on the death certificate.

I’ve run into similar issues with data on life expectancy. The chart below used to be available on the CDC website.

The chart as it is, is disturbing on several levels. It shows how a combination of lack of quality healthcare, lack of access to healthcare, poverty, obesity and smoking make certain areas of the US basic deathtraps for residents. In those areas, people will die before reaching full retirement age for Social Security benefits.(5)

Life expectancy by county 2014

However, we don’t know if these data are subject to the same under-reporting that we find with other statistics. Where  conditions are bad, as in the Deep South and Appalachia, just how bad are they?

Of course, the obvious irony is that the people most in need of Federal assistance are the ones voting against it.

How do you allocate resources to fix problems when you don’t know how big the problems are?

 


Sources:

    1. https://unstats.un.org/unsd/censuskb20/KnowledgebaseArticle10408.aspx
    2. Kirsten S. Almberg, Lee S. Friedman, David Swedler, Robert A. Cohen. Mine safety and health administration’s part 50 program does not fully capture chronic disease and injury in the Illinois mining industry. American Journal of Industrial Medicine, 2018; DOI: 10.1002/ajim.22826
    3. https://www.scpr.org/news/2014/08/11/45925/lapd-admits-errors-in-how-it-reports-crime-statist/
    4. https://www.ncbi.nlm.nih.gov/books/NBK202273/
    5. http://www.healthdata.org/data-visualization/us-health-map

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.