"Data is dead."
That's the meme I saw all over social media and the Internet after Donald Trump "defied the polls" and defeated Hillary Clinton in the 2016 U.S. election.
And it’s true that almost every major pollster called it wrong.
They got Florida wrong.
They got Pennsylvania wrong.
They got the Upper Midwest wrong.
They got the Hispanic vote and the women’s vote wrong.
But drawing the simple conclusion that “data is dead” is just as wrong.
In fact, data and data analytics are more alive – more dynamic and more important – than ever before.
What has been mortally wounded – if not killed – are passive data collection and modeling ideas, and poor data science practice:
- Unchallengeable assumptions and confirmation bias
- The tendency to rely too much on out-of-date statistical and analytical models
How USC Dornsife/LA Times got it right
The one major pollster that seemed to get it right was the USC Dornsife/Los Angeles Times Daybreak Tracking Poll.
An outlier among major national polls, Dornsife consistently reflected Trump’s strong support among white males throughout the summer and early fall. It showed a much closer contest than other pollsters did.
Then, in the closing days of the race, it called a surge among undecided voters for Trump. It predicted that he would win the electoral college, regardless of whether Clinton got more popular votes. (Which she did, with nearly 3 million more votes.)
I’m sure we’ll see plenty of case studies about how Dornsife (which, as an outlier, also predicted more support for Barack Obama in his 2012 reelection than other polls did) got the 2016 election right.
But here are some basic reasons, enumerated in the Los Angeles Times itself two days after the vote:
- It used a more complex (and ultimately better) weighting methodology. This aligned its analysis to the actual diversity of the electorate. I’ll discuss that more below.
- By polling online only, it uncovered latent Trump support that phone polls distorted. It learned that Trump voters were willing to express support for him online and face-to-face. But they tended to disguise that support when telephone pollsters called.
“Out of the box” approach, and commitment to big data analytics were validated
Dornsife was both more diligent in their data gathering and more willing to challenge assumptions.
It embraced big data and analyzed unstructured information to gain insights that other pollsters didn’t. We don’t find unstructured data in spreadsheets. It comes from “outside the walls” through social media posts, video data, IoT data and more.
For example, Dornsife scrutinized video taken of Trump rallies in Florida. It realized that attendance indeed was being under-reported. It noted the enthusiasm among Trump’s base, white males, and even went on to mention the presence of women and Hispanics in his crowd.
Its analysis of Clinton rallies, on the other hand, showed smaller gatherings as well as fewer Hispanics and women than anticipated. It noted a drop-off in the African-American presence. This perhaps lent extra validity to its weighting algorithms. And Clinton’s loss in Florida was decisive.
A data-first culture is more critical than ever
For CEOs and CIOs, the 2016 election wasn’t a referendum on big data analytics.
Quite the opposite: it affirms the necessity to focus on creating a data-first culture and approach to decision-making.
Executives confronted with historically disruptive change know they can no longer trust subjective experience to truly understand their world. They need to make critical strategic decisions based on facts. And those facts are driven by current, comprehensive, unmediated data that they can trust.
Five things we can learn
For most of the polling organizations and major media, this was not the case in 2016. There were multiple points of failure. What can we learn?
- Statistical models have short lifespans. Models and methods that worked in 2012, or even during this year’s primary season, failed pollsters on November 8. Disruption in information exchange (and market competition) is accelerating. This obsoletes analytical models faster than ever. I’d argue that today’s most reliable model has a lifespan of as short as nine months, including development time.
- Avoid the herd mentality. Don’t invest in the results of the data; invest in your commitment to finding out everything you can from it. “Confirmation bias” is a fatal flaw. Both the Clinton campaign and other polling organizations criticized Dornsife’s numbers during the campaign as an outlier.
- Obsess about data quality. Other than Dornsife, the pollsters used faulty data. They misunderstood the composition of the electorate. You must align sampling and data models with the most complete and accurate sources of data.
- Go after unstructured data and more sources of data. It’s finally possible to present all of a company’s relevant data – untouched by human hands – directly to managers making decisions, in real time. Mastering big data not only gives you better numbers to crunch and analyze. It also can help you build more accurate statistical models. Just see how Dornsife analyzed rally videos to tweak its weighting formula.
- Don’t get complacent. Challenge assumptions. Respect the speed of change and disruption. Constantly test your data models; even try to crash them. Retire them as soon as they falter. Rebuild them based on real learnings. Gain a deeper insight into behavior. Dornsife learned that telephone polling didn’t capture the breadth of Trump’s support. Their weighting system proved superior.
Has your organization made the same mistakes as the major pollsters? How can you improve your confidence in data-driven decision-making?
Be sure to check out my other blog on 3 steps to take to become a data-first organization.