Election polling: a confounding problem we’ve been covering on Full Measure since before the 2016 election when so many got things so wrong. Today, as we head into the midterms, we get a glimpse of the future of polling. Instead of a massaged and adjusted sample of a few thousand people, the sample is the vast internet. Today, economist Elisa Choy explains more. She’s using artificial intelligence and the web to hopefully get more accurate results.
The following is a transcript of a report from "Full Measure with Sharyl Attkisson." Watch the video by clicking the link at the end of the page.
Choy: In simple terms, so the dataset we use to analyze anything in relation to markets is open-source internet content outside of firewall. So websites, blogs, social media, anything outside of paid firewall, we can see. A lot of people can see. You can buy that information wholesale. But what you need is technology like artificial intelligence to process that information. And with artificial intelligence, we're using machine learning, specifically natural language processing. So the computer analyzes the content based on what we ask it to see.
Sharyl: So if it's looking at behavior and emotion and sentiment, not a specific answer to a question, how does it turn that into what those people's voting behavior might be?
Choy: That's a really good question. So when we looked at the 2020 U.S. presidential election, looking at Trump and Biden from multiple angles, looking at content online, based on their leadership, trust, economy, law and order, all those attributes of choice that would've been asked in a direct question in a poll, we could also attack through internet content.
Sharyl: Because somebody is writing about it in a positive way or negative way or commenting on it?
Choy: Yes, correct. So if you think about the content online, anybody can talk about any of these issues. So what we look at is, if we're testing Biden leadership, for example, we'd be pulling all content that is contextualized related to that narrative Biden, with respect to his leadership. Now we can do it in English, and we can do it in any major source language. So in the 2020 election, what was very interesting was when we looked at these attributes for the candidates, English and Spanish, sometimes the results were very, very different, and that fed into our assessment of calling the key battleground states for the election in 2020. We got seven out of nine accurate.
Sharyl: Which is a good record considering what other polling companies predicted.
Choy: Which is a very good record.
Sharyl: Are there privacy concerns with taking data that exists online and turning it into something — on the part of the people using the internet?
Choy: That I think is an issue that is very difficult to solve. Privacy — who owns the content that is outside, posted online? Like if I post something, if you post something, it originates from me and you, but once it's out there, who actually owns it? We don't know that true answer. And then you've got people who take it and transform it and translate it, like myself, into something else. Who owns that? I don't know. It's such a murky world. And unfortunately, I don't think anyone can actually answer the question of what is true data and what is not true data.
Sharyl: So a lot of people could do this because the data is out there and, you said, available for purchase. Do you think we’ll see a lot of AI, internet-type polling for the next presidential election?
Choy: I think it's already being done. So people have been using Twitter data. They've been using social media data particularly. So it's nothing new. What we're doing is nothing new. What we're doing is actually bringing together technology that's been around since the 1950s, to be honest. It's improved over time, but you've also had the immense amount of data that's been created in the last 10 years. There's a stat, a very old stat, but still quite poignant — 90% percent of the world's data has been created in the last two years. And that accounts to 2.5 quintillion bites a day.
Sharyl: Maybe it's always this way, but I've noticed more in recent years, polling is used to shape public opinion rather than measure public opinion. Maybe that's part of why it's so wrong sometimes. Is the same peril possible using data from the internet for people who want to shape opinion?
Choy: One hundred percent. You can take whatever you want from the internet and craft a story based on your technique and your particular angle or your objective. You give a dataset to 10 different analysts, we'll all have very different outcomes as to what we see, which you can imagine, the world of data that's out there, technology at play, who are you going to trust? I think it goes back down to people. Who's the practitioner that you trust in what they're saying and have they got the experience and consistency over time that demonstrates that they're trustworthy and accurate? You have to be accurate. Otherwise you're not worth it.
Sharyl: Last question. Are you confident that this method at least allows, for those who use it that way, for more accurate polling than what we've been seeing?
Choy: I go by my evidence, and I would say 100%. So we've been looking at elections since 2016. The first one was Clinton and Trump. So we took a look at that and we said, "Oh, I think Trump's got a good chance," and he did. Same thing with Biden. So we've been refining the methodology using elections, because it's the best experiment to test your method. You've got an outcome that's defined, and you've got a lead-up, and we can measure sentiment as it changes over time. It shifts quite dramatically, particularly in elections. So based on our track record, I would say — of course, we're much better than polls. Which is why I'm here.
Sharyl (on-camera): So far, midterm polls have alternately predicted giant Republican gains and a surprisingly strong performance by Democrats.
Watch story here.