Statistics and “Everybody Lies”

The last is the title of a recent book by Seth Stephens-Davidowitz (Dey St., 2017). It’s an interesting piece on how people lie to themselves and to others, and how their true attitudes can be revealed by what they do on the Internet.

He cites some instances in which analysis of Internet searches can be revealing. For example, the search for “Is my husband gay?” is most common in the Bible Belt areas of the US where tolerance toward gay or bisexual behavior is relatively low — notably Kentucky and Louisiana.

One of the interesting arguments is that searches sometimes move in the opposite direction of reported cases. One example is the relation of the economic downturn in 2009 and child abuse. Searchers went up, but actual legal complaints went down. His argument is that overworked and understaffed agencies were less able to handle the case load, so that while more children were abused, fewer cases were processed.

Unfortunately, that’s the chicken and egg argument that plagues much of these kinds of discussions. Is an increase in online searchers indicative of increased paranoia or increased actual behavior? The data don’t distinguish. We know there was an increase in searches on child abuse, but we don’t know why. The assertion that the increase was due to increased abuse is an assumption not supported by the data. That’s where the use of Big Data gets into trouble.

I talked before about a trip we made three years ago to visit family in Las Vegas. To this day, I’m still getting promotions from travel agencies and Southwest for gambling junkets in which I have absolutely no interest. I didn’t take that trip to go gambling; I’ve never taken any trip for that reason. All the merchants know is that I went. They don’t know why and chose to make an uninformed guess.

A lot of these vendors are now permanently delegated to my spam folder because they’ve become irritating, meaning that I’ll never see anything they offer. Their loss. Marketing 101: Tick off the customer and they will not buy from you. Guessing is a good way to do that.

Regrettably, the book starts with the tiresome example of the 2016 election. Most of the leading polls showed the election as “too close to call.” That means that the standard error of estimate for the vote result included both winning and losing results for both major candidates. However, the Media doesn’t like ambiguity, and both reporters and editors tried to twist the non-forecast into a prediction. Didn’t work. Doesn’t work. Happily Stephens doesn’t claim that Google correctly predicted the result.

As with any data source, Big Data is a tool, but not most of the time a complete answer to any question. One needs to know how to use a tool, and how not to use it. Whether it’s Big Data or a chain saw, anything placed into untrained hands can be dangerous.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.