We live in a data-rich environment. There is a lot of it. HUGE amounts of information. What can we do with it, though? What lessons can we draw from it?
The majority of individuals are uninterested in data. It's a massive, never-ending set of recorded facts and numbers to them, made even more mundane by countless tweets about what others had for lunch.
Data collection is ubiquitous in the business world.
As Seigel points out, "Every medical procedure, credit application, Facebook post, movie recommendation, fraudulent act, spammy e-mail, and purchase of any kind—each positive or negative outcome, each successful or failed sales call, each incident, event, and transaction—is encoded as data and warehoused."
We appear to be in the midst of an information gold rush, but where is the gold? It isn't the information. The gold is found within the knowledge of events that have yet to occur.
That's what Eric Siegel's book "Predictive Analytics" and its synopsis are all about.
According to Siegel, understanding predictive models' aims, methodologies, and limitations is the best approach to succeed in a predictive society. So let us be conscious of the restrictions. I believe 10 minutes will be well spent.
A prediction is a powerful tool. By forecasting the future fate and value of individual assets, big companies may gain a competitive edge.
Predictive analysis is how a company learns from the collective experience of its employees and computer systems.
It's all about foresight, as Sheldon from The Big Bang Theory puts it: "The alternative would be to think backward... and that's just remembering."
To use predictive analytics, a firm must act on its forecasts, putting what it has learned and discovered in the data to use.
This leads to Seigel's Prediction Effect: predictive analytics is credible as long as the forecasts are better than guessing.
Predictive analytics is divided into two categories:
1. Predicted behavior: the type of behavior (activity, event, or occurrence) to anticipate for each individual, stock, or another type of element.
2. What is being done to address it: The actions done by the organization in reaction to or informed by each forecast; the decisions prompted by prediction.
These are backed up by a predictive model, which takes an individual's characteristic as input, crunches the data, and outputs a forecast.
The higher the score, the more probable the individual will behave in the way indicated. This score is then utilized to inform an organizational decision, directing which action should be taken.
The Ethics of PA
We're now getting into Minority Report's Future Crime Scenario. How can we use a machine that can safely predict the future? Are civil freedoms in jeopardy?
The prediction can pry into our personal lives. This isn't an instance of data mismanagement, leakage, or theft.
Rather, as Siegel puts it, it's about creating new data and the accidental finding of unspoken facts about individuals.
The same thing that makes data sensitive is its worth. The more information there is, the more power there is. The more powerful you are, the more exposed you become.
To effectively profit from predictive analytics, companies must ensure excellent data governance:
- We must select what will be stored and how long it will be kept.
- Certain workers, kinds of employees, or members of a group have access to which data elements?
- Which information may be shared with which internal and external parties.
- What kind of data components can be combined, aggregated, or connected?
- How may each data element be used to determine a company's response or other actions?
Add "...under what conditions and with what sort of aim or purpose" to each of these elements to make it even more difficult.
Predictive Analytics generates new data that is so strong that it requires special handling. We live in a new world where systems must divine new, powerful data and properly control it.
The Data Effect
What's fascinating about data is how rapidly it grows, not how much of it there is. There's always more now than there was yesterday. It makes no difference what size you are. It's the expansion rate.
What ensures that all of this data turbulence is useful? The solution is straightforward. Everything is related to everything else, even if just intangibly, and data reflects this.
Gather some data and, while you can never be sure what you'll uncover, you can be confident that by deciphering the language it speaks, you'll discover useful connections.
In a nutshell, that's The Data Effect.
The Data Effect: Data is always predictive.
However, there is an instant danger in doing so. Correlation is not the same as causation. The finding of a predictive link between A and B does not imply that one causes the other, especially if the relationship is indirect.
We usually don't know and don't care about causality when doing predictive analysis. Rather than explaining, we want to forecast.
The Ensemble Effect
Predictive Analytics in action may be seen at Netflix. Why? They say that internet suggestions account for 70% of Netflix movie selections.
According to Seigel, the predictor variable, which is a single value recorded for each individual, is the foundation of predictive analytics.
However, he points out that frequency—the number of times an individual performs the behavior—is an important metric.
PA gains strength by integrating dozens, if not hundreds, of variables. That is why Netflix is so successful. Taking the counsel of a single individual may not impact us, but the widespread belief of many people does.
We receive a "collective intelligence" effect, just like a multitude of individuals or the audience of the TV quiz show "Who Wants to Be a Millionaire?" This compensates for any flaws in our prediction model, resulting in both false positives and false negatives.
Predictive ratings generated by algorithms are flawed, just like human estimates. Some will be too high, while others will be too low. Averaging scores from a variety of models can help to eliminate a lot of inaccuracy.
This is referred to as The Ensemble Effect by Siegel.
We may enhance the structural complexity of our model by simply connecting models together while preserving a key feature: resilience against over assumption - the idea that our model will forecast accurately.
When predictive models are combined in a collective group, they compensate for one another's flaws, making the ensemble as a whole more likely to forecast accurately than its individual models.
As a result, predictive analysis is best done in groups. To create a solid forecast, it needs both positives and negatives — and plenty of them. One little data collection, one small analysis, one derived prediction... one large error risk.
It's not what you ask; it's how you react.
Frequently, a company must determine what to do next. It doesn't only want to know what people will do; it also wants to know what it can do about it.
Consider the following scenario. Your mobile phone company sends you a brochure with their current deals since they know your contract will soon expire. That was a huge blunder.
The firm has just reminded you that your contract commitment is coming to an end, and you have the option to leave. The mobile phone company is hoping that you will renew your contract.
You, on the other hand, may consider alternatives now that you've been reminded.
This unanticipated conduct raises the question of why Predictive Analytics is being utilized in the first place. From the standpoint of the cell phone carrier, he might use predictive analytics as follows:
Application: Customer Retention
1. What's predicted: Which clients are expected to leave.
2. What's being done about it: At-risk consumers are the focus of retention efforts.
What they should actually be looking at is the following:
Application: Marketing Impact
1. What's predicted: How do you think customers will react to the reminder brochure?
2. What's done about it: At-risk consumers are the focus of retention efforts.
From forecasting a behavior to predicting its effect on behavior, PA changes dramatically.
Because an organization doesn't simply want to know what individuals will do, it also wants to know what it can do about it, predicting influence promises to increase predictive analytics value.
However, as Siegel points out, we can't know everything there is to know about a person. We can't say both of the facts we'd need to know to infer that someone can be persuaded, for example:
- Will Bill buy if we send him a brochure?
- Will Bill buy if we don't send him a brochure?
(1) We can learn more by giving him a brochure. We can discover (2) by refusing to provide him with a brochure. But we can't call Bill and not contact him at the same time.
Instead of forecasting a specific action, we need a model that assigns a score based on the chance of influencing an individual's behavior.
We need what Siegel refers to as an Uplift Model: a predictive model that forecasts the impact of one therapy over another on an individual's behavior.
Companies, governments, law enforcement, charities, hospitals, and colleges all make millions of operational choices to provide services, and predictive analytics has the power to impact them all.
Prediction, according to Siegel, is critical in directing these decisions, and predictive analysis, given an understanding of its limits, is the way by which these activities may be made more efficient.